Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code

Michaud, Eric J.; Liao, Isaac; Lad, Vedang; Liu, Ziming; Mudide, Anish; Loughridge, Chloe; Guo, Zifan Carl; Kheirkhah, Tara Rezaei; Vukelić, Mateja; Tegmark, Max

dc.contributor.author	Michaud, Eric J.
dc.contributor.author	Liao, Isaac
dc.contributor.author	Lad, Vedang
dc.contributor.author	Liu, Ziming
dc.contributor.author	Mudide, Anish
dc.contributor.author	Loughridge, Chloe
dc.contributor.author	Guo, Zifan Carl
dc.contributor.author	Kheirkhah, Tara Rezaei
dc.contributor.author	Vukelić, Mateja
dc.contributor.author	Tegmark, Max
dc.date.accessioned	2025-01-02T22:45:16Z
dc.date.available	2025-01-02T22:45:16Z
dc.date.issued	2024-12-02
dc.identifier.uri	https://hdl.handle.net/1721.1/157939
dc.description.abstract	Can we turn AI black boxes into code? Although this mission sounds extremely challenging, we show that it is not entirely impossible by presenting a proof-of-concept method, MIPS, that can synthesize programs based on the automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy.	en_US
dc.publisher	Multidisciplinary Digital Publishing Institute	en_US
dc.relation.isversionof	http://dx.doi.org/10.3390/e26121046	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Multidisciplinary Digital Publishing Institute	en_US
dc.title	Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code	en_US
dc.type	Article	en_US
dc.identifier.citation	Michaud, E.J.; Liao, I.; Lad, V.; Liu, Z.; Mudide, A.; Loughridge, C.; Guo, Z.C.; Kheirkhah, T.R.; Vukelić, M.; Tegmark, M. Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code. Entropy 2024, 26, 1046.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Physics
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.relation.journal	Entropy	en_US
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2024-12-27T14:02:40Z
dspace.date.submission	2024-12-27T14:02:40Z
mit.journal.volume	26	en_US
mit.journal.issue	12	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: entropy-26-01046.pdf
Size:: 685.6Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record