Enabling Perspective-Aware Ai with Contextual Scene Graph Generation

Platnick, Daniel; Alirezaie, Marjan; Rahnama, Hossein

dc.contributor.author	Platnick, Daniel
dc.contributor.author	Alirezaie, Marjan
dc.contributor.author	Rahnama, Hossein
dc.date.accessioned	2025-01-10T20:56:01Z
dc.date.available	2025-01-10T20:56:01Z
dc.date.issued	2024-12-02
dc.identifier.uri	https://hdl.handle.net/1721.1/157953
dc.description.abstract	This paper advances contextual image understanding within perspective-aware Ai (PAi), an emerging paradigm in human–computer interaction that enables users to perceive and interact through each other’s perspectives. While PAi relies on multimodal data—such as text, audio, and images—challenges in data collection, alignment, and privacy have led us to focus on enabling the contextual understanding of images. To achieve this, we developed perspective-aware scene graph generation with LLM post-processing (PASGG-LM). This framework extends traditional scene graph generation (SGG) by incorporating large language models (LLMs) to enhance contextual understanding. PASGG-LM integrates classical scene graph outputs with LLM post-processing to infer richer contextual information, such as emotions, activities, and social contexts. To test PASGG-LM, we introduce the context-aware scene graph generation task, where the goal is to generate a context-aware situation graph describing the input image. We evaluated PASGG-LM pipelines using state-of-the-art SGG models, including Motifs, Motifs-TDE, and RelTR, and showed that fine-tuning LLMs, particularly GPT-4o-mini and Llama-3.1-8B, improves performance in terms of R@K, mR@K, and mAP. Our method is capable of generating scene graphs that capture complex contextual aspects, advancing human–machine interaction by enhancing the representation of diverse perspectives. Future directions include refining contextual scene graph models and expanding multi-modal data integration for PAi applications in domains such as healthcare, education, and social robotics.	en_US
dc.publisher	Multidisciplinary Digital Publishing Institute	en_US
dc.relation.isversionof	http://dx.doi.org/10.3390/info15120766	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Multidisciplinary Digital Publishing Institute	en_US
dc.title	Enabling Perspective-Aware Ai with Contextual Scene Graph Generation	en_US
dc.type	Article	en_US
dc.identifier.citation	Platnick, D.; Alirezaie, M.; Rahnama, H. Enabling Perspective-Aware Ai with Contextual Scene Graph Generation. Information 2024, 15, 766.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Media Laboratory	en_US
dc.relation.journal	Information	en_US
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2024-12-27T14:02:43Z
dspace.date.submission	2024-12-27T14:02:43Z
mit.journal.volume	15	en_US
mit.journal.issue	12	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: information-15-00766.pdf
Size:: 5.656Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record