| dc.contributor.advisor | Cafarella, Michael J. | |
| dc.contributor.author | Yang, Steven | |
| dc.date.accessioned | 2022-08-29T16:36:12Z | |
| dc.date.available | 2022-08-29T16:36:12Z | |
| dc.date.issued | 2022-05 | |
| dc.date.submitted | 2022-05-27T16:19:39.096Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/145143 | |
| dc.description.abstract | We aim to build a knowledge graph based provenance system for data objects across institutions and teams. The world of data objects and systems is complex and heterogeneous. For effective collaboration, a shared data model is needed. Specifically, this work examines the problem of provenance subgraph classification: given a coarser low-level provenance subgraph that is not easily digestible by humans, we want to annotate the subgraph with human readable labels describing the operations done on each data object. This work first involves creating the infrastructure needed to select and label subgraphs. Next, this work focuses on producing table embedding techniques using the pretrain and finetune paradigm with an emphasis on the downstream task of Operator Classification. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright MIT | |
| dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Pretraining Table Embeddings for Knowledge Graph Based Provenance Systems | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |