Overcoming Data Scarcity in Deep Learning of Scientific Problems

Loh, Charlotte Chang Le

Author(s)

Loh, Charlotte Chang Le

DownloadThesis PDF (2.821Mb)

Advisor

Soljačić, Marin

Terms of use

In Copyright - Educational Use Permitted Copyright MIT http://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeasible for data-scarce problems where labeled data generation is computationally expensive, or labour and time intensive. Here, I introduce surrogate and invariance- boosted contrastive learning (SIB-CL), a deep learning framework which overcomes data-scarcity by incorporating three “inexpensive" and easily obtainable auxiliary information. Specifically, these are: 1) abundant unlabeled data, 2) prior knowledge of known symmetries or invariances of the problem and 3) a surrogate dataset obtained at near-zero cost either from simplification or approximation. I demonstrate the effectiveness and generality of SIB-CL on various scientific problems, for example, the prediction of the density-of-states of 2D photonic crystals and solving the time-independent Schrödinger equation of 3D random potentials. SIB-CL is shown to provide orders of magnitude savings on the amount of labeled data needed when compared to conventional deep learning techniques, offering opportunities to apply data-driven methods even to data-scarce problems.

Date issued

2021-09

URI

https://hdl.handle.net/1721.1/140165

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses