DNA sequence design of non-orthogonal binding networks, and application to DNA data storage

Berleant, Joseph Don

Author(s)

Berleant, Joseph Don

DownloadThesis PDF (55.20Mb)

Advisor

Bathe, Mark

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

DNA has proven itself a powerful tool in a diverse array of nanotechnology-related domains, including molecular computation, nanostructure fabrication, and data storage. Most DNA-based systems focus on using sets of DNA sequences that are orthogonal to each other, such that each DNA sequence has a dedicated binding partner, its complementary sequence. This design approach reduces the number of interactions that must be considered when predicting how a system will behave, at the cost of reducing the information-gathering ability of each molecular unit. Relatively little research has attempted to solve the problem of designing promiscuous, or non-orthogonal, DNA sequences, which are characterized by their ability to bind to several distinct partners with variable binding affinities. Yet there are many situations in which this type of dense interaction network can be useful. For example, in neural networks, a node will often take inputs from hundreds or thousands of upstream nodes, allowing it to condense large amounts of information into a single output value. While naturally occurring biological networks often make use of promiscuous binding behavior, the field of molecular computing currently lacks a general-purpose and efficient method for non-orthogonal DNA sequence design. In this thesis, I describe a novel, robust, and broadly applicable method for designing small or large sets of non-orthogonal DNA sequences. This method takes an arbitrary matrix of pairwise binding affinities, and attempts to design DNA sequences such that the differential binding affinity between any two pairs of sequences is proportional to the difference in the corresponding elements of the matrix. The key innovation of this method is the reformulation of the matrix via a binary embedding, which reduces the design specification to a set of binary strings that permit relatively straightforward sequence design. Not all matrices permit a binary embedding and I consider three cases here: when a binary embedding exists, when it is unknown if it exists, and when it does not exist. When it exists, I show through both simulation and experiment that DNA sequences can be designed with high precision. When it is unknown if a binary embedding exists, I give novel conditions for determining existence via representation of the matrix in a weighted graph. Finally, when an exact binary embedding does not exist, I develop an alternative method using approximate binary embeddings. To demonstrate the power of this method, I apply to the task of similarity searching in a large, simulated DNA databases, where I show that it outperforms the existing state of the art. I hope that this work opens the door to further innovations in designing and applying non-orthogonal DNA sequences to DNA nanotechnology.

Date issued

2023-06

URI

https://hdl.handle.net/1721.1/153069

Department

Massachusetts Institute of Technology. Department of Biological Engineering

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses