CSAIL Technical Reports (July 1, 2003 - present)
http://hdl.handle.net/1721.1/29807
2017-11-15T19:44:41ZTypesafety for Explicitly-Coded Probabilistic Inference Procedures
http://hdl.handle.net/1721.1/112172
Typesafety for Explicitly-Coded Probabilistic Inference Procedures
Atkinson, Eric; Carbin, Michael
Researchers have recently proposed several systems that ease the process of developing Bayesian probabilistic inference algorithms. These include systems for automatic inference algorithm synthesis as well as stronger abstractions for manual algorithm development. However, existing systems whose performance relies on the developer manually constructing a part of the inference algorithm have limited support for reasoning about the correctness of the resulting algorithm. In this paper, we present Shuffle, a programming language for developing manual inference algorithms that enforces 1) the basic rules of probability theory and 2) statistical dependencies of the algorithm's corresponding probabilistic model. We have used Shuffle to develop inference algorithms for several standard probabilistic models. Our results demonstrate that Shuffle enables a developer to deliver performant implementations of these algorithms with the added benefit of Shuffle's correctness guarantees.
2017-11-09T00:00:00ZThe Interval Programming Model Solution Algorithm Experimentation Tools and Results
http://hdl.handle.net/1721.1/111117
The Interval Programming Model Solution Algorithm Experimentation Tools and Results
Benjamin, Michael R.
Interval programming (IvP) is model for representing multi-objective optimization problems along with a set of solution algorithms. This paper describes a set of IvP solution experiments run over randomly generated problem instances, using five different versions of the Recursive Interval Programming ALgorithm (RIPAL). The final version is the algorithm used most extensively in practice, with the first four provided mostly for comparison as the final version is built up in complexity. The full details of the algorithms are outside the scope of this paper, with the focus here being the experimental results, and the software tools and technique used in generating the problem instances. Additional tools are described for facilitating the experiments, including visualization tools, and tools for generating the plots and tables shown in this document. All software tools are available under an open source license, and all problem instances reported here are also available online. This document is meant to supplement other discussions on the IvP model, algorithm, and IvP applications to provide the detail of reporting that would not be possible due to length restrictions of other papers.
2017-09-01T00:00:00ZInference and Regeneration of Programs that Manipulate Relational Databases
http://hdl.handle.net/1721.1/111067
Inference and Regeneration of Programs that Manipulate Relational Databases
Shen, Jiasi; Rinard, Martin
We present a new technique that infers models of programs that manipulate relational databases. This technique generates test databases and input commands, runs the program, then observes the resulting outputs and updated databases to infer the model. Because the technique works only with the externally observable inputs, outputs, and databases, it can infer the behavior of programs written in arbitrary languages using arbitrary coding styles and patterns. We also present a technique for automatically regenerating an implementation of the program based on the inferred model. The regenerator can produce a translated implementation in a different language and systematically include relevant security and error checks. We present results that illustrate the use of the technique to eliminate SQL injection vulnerabilities and the translation of applications from Java and Ruby on Rails to Python.
2017-08-29T00:00:00ZAn Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats
http://hdl.handle.net/1721.1/109792
An Efficient Fill Estimation Algorithm for Sparse Matrices and Tensors in Blocked Formats
Ahrens, Peter; Schiefer, Nicholas; Xu, Helen
Tensors, linear-algebraic extensions of matrices in arbitrary dimensions, have numerous applications in computer science and computational science. Many tensors are sparse, containing more than 90% zero entries. Efficient algorithms can leverage sparsity to do less work, but the irregular locations of the nonzero entries pose challenges to performance engineers. Many tensor operations such as tensor-vector multiplications can be sped up substantially by breaking the tensor into equally sized blocks (only storing blocks which contain nonzeros) and performing operations in each block using carefully tuned code. However, selecting the best block size is computationally challenging. Previously, Vuduc et al. defined the fill of a sparse tensor to be the number of stored entries in the blocked format divided by the number of nonzero entries, and showed that the fill can be used as an effective heuristic to choose a good block size. However, they gave no accuracy bounds for their method for estimating the fill, and it is vulnerable to adversarial examples. In this paper, we present a sampling-based method for finding a (1 + epsilon)-approximation to the fill of an order N tensor for all block sizes less than B, with probability at least 1 - delta, using O(N B^N log(B / delta) / epsilon^2) samples for each block size. We introduce an efficient routine to sample for all B^N block sizes at once in O(N B^N) time. We extend our concentration bounds to a more efficient bound based on sampling without replacement, using the recent Hoeffding-Serfling inequality. We then implement our algorithm and compare our scheme to that of Vuduc, as implemented in the Optimized Sparse Kernel Interface (OSKI) library. We find that our algorithm provides faster estimates of the fill at all accuracy levels, providing evidence that this is both a theoretical and practical improvement. Our code is available under the BSD 3-clause license at https://github.com/peterahrens/FillEstimation.
2017-06-09T00:00:00Z