Integrating Machine Learning and Large Language Models to Advance Exploration of Electrochemical Reactions

Zheng, Zhiling; Florit, Federico; Jin, Brooke; Wu, Haoyang; Li, Shih‐Cheng; Nandiwale, Kakasaheb Y; Salazar, Chase A; Mustakis, Jason G; Green, William H; Jensen, Klavs F

dc.contributor.author	Zheng, Zhiling
dc.contributor.author	Florit, Federico
dc.contributor.author	Jin, Brooke
dc.contributor.author	Wu, Haoyang
dc.contributor.author	Li, Shih‐Cheng
dc.contributor.author	Nandiwale, Kakasaheb Y
dc.contributor.author	Salazar, Chase A
dc.contributor.author	Mustakis, Jason G
dc.contributor.author	Green, William H
dc.contributor.author	Jensen, Klavs F
dc.date.accessioned	2025-07-07T19:22:07Z
dc.date.available	2025-07-07T19:22:07Z
dc.date.issued	2024-12-03
dc.identifier.uri	https://hdl.handle.net/1721.1/159962
dc.description.abstract	Electrochemical C−H oxidation reactions offer a sustainable route to functionalize hydrocarbons, yet identifying suitable substrates and optimizing synthesis remain challenging. Here, we report an integrated approach combining machine learning and large language models to streamline the exploration of electrochemical C−H oxidation reactions. Utilizing a batch rapid screening electrochemical platform, we evaluated a wide range of reactions, initially classifying substrates by their reactivity, while LLMs text‐mined literature data to augment the training set. The resulting ML models for reactivity prediction achieved high accuracy (>90 %) and enabled virtual screening of a large set of commercially available molecules. To optimize reaction conditions for selected substrates, LLMs were prompted to generate code that iteratively improved yields. This human‐AI collaboration proved effective, efficiently identifying high‐yield conditions for 8 drug‐like substances or intermediates. Notably, we benchmarked the accuracy and reliability of 12 different LLMs–including LLaMA series, Claude series, OpenAI o1, and GPT‐4‐on code generation and function calling related to ML based on natural language prompts given by chemists to showcase potentials for accelerating research across four diverse tasks. In addition, we collected an experimental benchmark dataset comprising 1071 reaction conditions and yields for electrochemical C−H oxidation reactions.	en_US
dc.language.iso	en
dc.publisher	Wiley	en_US
dc.relation.isversionof	10.1002/anie.202418074	en_US
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivatives	en_US
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/	en_US
dc.source	Wiley	en_US
dc.title	Integrating Machine Learning and Large Language Models to Advance Exploration of Electrochemical Reactions	en_US
dc.type	Article	en_US
dc.identifier.citation	Zheng, Zhiling, Florit, Federico, Jin, Brooke, Wu, Haoyang, Li, Shih‐Cheng et al. 2024. "Integrating Machine Learning and Large Language Models to Advance Exploration of Electrochemical Reactions." Angewandte Chemie International Edition, 64 (6).
dc.contributor.department	Massachusetts Institute of Technology. Department of Chemical Engineering	en_US
dc.relation.journal	Angewandte Chemie International Edition	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2025-07-07T19:12:42Z
dspace.orderedauthors	Zheng, Z; Florit, F; Jin, B; Wu, H; Li, S; Nandiwale, KY; Salazar, CA; Mustakis, JG; Green, WH; Jensen, KF	en_US
dspace.date.submission	2025-07-07T19:12:44Z
mit.journal.volume	64	en_US
mit.journal.issue	6	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: Angew Chem Int Ed - 2024 - Zheng ...
Size:: 5.327Mb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record