DNA sequence design of non-orthogonal binding networks, and application to DNA data storage
Author(s)
Berleant, Joseph Don
DownloadThesis PDF (55.20Mb)
Advisor
Bathe, Mark
Terms of use
Metadata
Show full item recordAbstract
DNA has proven itself a powerful tool in a diverse array of nanotechnology-related domains, including molecular computation, nanostructure fabrication, and data storage. Most DNA-based systems focus on using sets of DNA sequences that are orthogonal to each other, such that each DNA sequence has a dedicated binding partner, its complementary sequence. This design approach reduces the number of interactions that must be considered when predicting how a system will behave, at the cost of reducing the information-gathering ability of each molecular unit.
Relatively little research has attempted to solve the problem of designing promiscuous, or non-orthogonal, DNA sequences, which are characterized by their ability to bind to several distinct partners with variable binding affinities. Yet there are many situations in which this type of dense interaction network can be useful. For example, in neural networks, a node will often take inputs from hundreds or thousands of upstream nodes, allowing it to condense large amounts of information into a single output value. While naturally occurring biological networks often make use of promiscuous binding behavior, the field of molecular computing currently lacks a general-purpose and efficient method for non-orthogonal DNA sequence design.
In this thesis, I describe a novel, robust, and broadly applicable method for designing small or large sets of non-orthogonal DNA sequences. This method takes an arbitrary matrix of pairwise binding affinities, and attempts to design DNA sequences such that the differential binding affinity between any two pairs of sequences is proportional to the difference in the corresponding elements of the matrix. The key innovation of this method is the reformulation of the matrix via a binary embedding, which reduces the design specification to a set of binary strings that permit relatively straightforward sequence design.
Not all matrices permit a binary embedding and I consider three cases here: when a binary embedding exists, when it is unknown if it exists, and when it does not exist. When it exists, I show through both simulation and experiment that DNA sequences can be designed with high precision. When it is unknown if a binary embedding exists, I give novel conditions for determining existence via representation of the matrix in a weighted graph. Finally, when an exact binary embedding does not exist, I develop an alternative method using approximate binary embeddings. To demonstrate the power of this method, I apply to the task of similarity searching in a large, simulated DNA databases, where I show that it outperforms the existing state of the art. I hope that this work opens the door to further innovations in designing and applying non-orthogonal DNA sequences to DNA nanotechnology.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Biological EngineeringPublisher
Massachusetts Institute of Technology