Overcoming Data Scarcity in Deep Learning of Scientific Problems
Author(s)
Loh, Charlotte Chang Le
DownloadThesis PDF (2.821Mb)
Advisor
Soljačić, Marin
Terms of use
Metadata
Show full item recordAbstract
Data-driven approaches such as machine learning have been increasingly applied to the natural sciences, e.g. for property prediction and optimization or material discovery. An essential criteria to ensure the success of such methods is the need for extensive amounts of labeled data, making it unfeasible for data-scarce problems where labeled data generation is computationally expensive, or labour and time intensive. Here, I introduce surrogate and invariance- boosted contrastive learning (SIB-CL), a deep learning framework which overcomes data-scarcity by incorporating three “inexpensive" and easily obtainable auxiliary information. Specifically, these are: 1) abundant unlabeled data, 2) prior knowledge of known symmetries or invariances of the problem and 3) a surrogate dataset obtained at near-zero cost either from simplification or approximation. I demonstrate the effectiveness and generality of SIB-CL on various scientific problems, for example, the prediction of the density-of-states of 2D photonic crystals and solving the time-independent Schrödinger equation of 3D random potentials. SIB-CL is shown to provide orders of magnitude savings on the amount of labeled data needed when compared to conventional deep learning techniques, offering opportunities to apply data-driven methods even to data-scarce problems.
Date issued
2021-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology