dc.contributor.author | Olivetti, Elsa A | |
dc.contributor.author | Cole, Jacqueline M | |
dc.contributor.author | Kim, Edward | |
dc.contributor.author | Kononova, Olga | |
dc.contributor.author | Ceder, Gerbrand | |
dc.contributor.author | Han, Thomas Yong-Jin | |
dc.contributor.author | Hiszpanski, Anna M | |
dc.date.accessioned | 2022-05-18T15:41:15Z | |
dc.date.available | 2022-05-18T15:41:15Z | |
dc.date.issued | 2020 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/142580 | |
dc.description.abstract | © 2020 Author(s). Given the emergence of data science and machine learning throughout all aspects of society, but particularly in the scientific domain, there is increased importance placed on obtaining data. Data in materials science are particularly heterogeneous, based on the significant range in materials classes that are explored and the variety of materials properties that are of interest. This leads to data that range many orders of magnitude, and these data may manifest as numerical text or image-based information, which requires quantitative interpretation. The ability to automatically consume and codify the scientific literature across domains - enabled by techniques adapted from the field of natural language processing - therefore has immense potential to unlock and generate the rich datasets necessary for data science and machine learning. This review focuses on the progress and practices of natural language processing and text mining of materials science literature and highlights opportunities for extracting additional information beyond text contained in figures and tables in articles. We discuss and provide examples for several reasons for the pursuit of natural language processing for materials, including data compilation, hypothesis development, and understanding the trends within and across fields. Current and emerging natural language processing methods along with their applications to materials science are detailed. We, then, discuss natural language processing and data challenges within the materials science domain where future directions may prove valuable. | en_US |
dc.language.iso | en | |
dc.publisher | AIP Publishing | en_US |
dc.relation.isversionof | 10.1063/5.0021106 | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | DOE repository | en_US |
dc.title | Data-driven materials research enabled by natural language processing and information extraction | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Olivetti, Elsa A, Cole, Jacqueline M, Kim, Edward, Kononova, Olga, Ceder, Gerbrand et al. 2020. "Data-driven materials research enabled by natural language processing and information extraction." Applied Physics Reviews, 7 (4). | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Materials Science and Engineering | |
dc.relation.journal | Applied Physics Reviews | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
dc.date.updated | 2022-05-18T15:35:50Z | |
dspace.orderedauthors | Olivetti, EA; Cole, JM; Kim, E; Kononova, O; Ceder, G; Han, TY-J; Hiszpanski, AM | en_US |
dspace.date.submission | 2022-05-18T15:35:52Z | |
mit.journal.volume | 7 | en_US |
mit.journal.issue | 4 | en_US |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |