From concept to manufacturing: evaluating vision-language models for engineering design

Picard, Cyril; Edwards, Kristen M.; Doris, Anna C.; Man, Brandon; Giannone, Giorgio; Alam, Md F.; Ahmed, Faez

dc.contributor.author	Picard, Cyril
dc.contributor.author	Edwards, Kristen M.
dc.contributor.author	Doris, Anna C.
dc.contributor.author	Man, Brandon
dc.contributor.author	Giannone, Giorgio
dc.contributor.author	Alam, Md F.
dc.contributor.author	Ahmed, Faez
dc.date.accessioned	2025-11-18T17:07:59Z
dc.date.available	2025-11-18T17:07:59Z
dc.date.issued	2025-07-01
dc.identifier.uri	https://hdl.handle.net/1721.1/163750
dc.description.abstract	Engineering design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision-language models (VLMs), such as GPT-4V, enabling AI to impact many more types of tasks. Our work presents a comprehensive evaluation of VLMs across a spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Specifically in this paper, we assess the capabilities of two VLMs, GPT-4V and LLaVA 1.6 34B, in design tasks such as sketch similarity analysis, CAD generation, topology optimization, manufacturability assessment, and engineering textbook problems. Through this structured evaluation, we not only explore VLMs’ proficiency in handling complex design challenges but also identify their limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.	en_US
dc.publisher	Springer Netherlands	en_US
dc.relation.isversionof	https://doi.org/10.1007/s10462-025-11290-y	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Springer Netherlands	en_US
dc.title	From concept to manufacturing: evaluating vision-language models for engineering design	en_US
dc.type	Article	en_US
dc.identifier.citation	Picard, C., Edwards, K.M., Doris, A.C. et al. From concept to manufacturing: evaluating vision-language models for engineering design. Artif Intell Rev 58, 288 (2025).	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Mechanical Engineering	en_US
dc.relation.journal	Artificial Intelligence Review	en_US
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2025-07-18T15:31:32Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s)
dspace.embargo.terms	N
dspace.date.submission	2025-07-18T15:31:32Z
mit.journal.volume	58	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 10462_2025_Article_11290.pdf
Size:: 12.17Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record