MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Enabling Perspective-Aware Ai with Contextual Scene Graph Generation

Author(s)
Platnick, Daniel; Alirezaie, Marjan; Rahnama, Hossein
Thumbnail
Downloadinformation-15-00766.pdf (5.656Mb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
This paper advances contextual image understanding within perspective-aware Ai (PAi), an emerging paradigm in human–computer interaction that enables users to perceive and interact through each other’s perspectives. While PAi relies on multimodal data—such as text, audio, and images—challenges in data collection, alignment, and privacy have led us to focus on enabling the contextual understanding of images. To achieve this, we developed perspective-aware scene graph generation with LLM post-processing (PASGG-LM). This framework extends traditional scene graph generation (SGG) by incorporating large language models (LLMs) to enhance contextual understanding. PASGG-LM integrates classical scene graph outputs with LLM post-processing to infer richer contextual information, such as emotions, activities, and social contexts. To test PASGG-LM, we introduce the context-aware scene graph generation task, where the goal is to generate a context-aware situation graph describing the input image. We evaluated PASGG-LM pipelines using state-of-the-art SGG models, including Motifs, Motifs-TDE, and RelTR, and showed that fine-tuning LLMs, particularly GPT-4o-mini and Llama-3.1-8B, improves performance in terms of R@K, mR@K, and mAP. Our method is capable of generating scene graphs that capture complex contextual aspects, advancing human–machine interaction by enhancing the representation of diverse perspectives. Future directions include refining contextual scene graph models and expanding multi-modal data integration for PAi applications in domains such as healthcare, education, and social robotics.
Date issued
2024-12-02
URI
https://hdl.handle.net/1721.1/157953
Department
Massachusetts Institute of Technology. Media Laboratory
Journal
Information
Publisher
Multidisciplinary Digital Publishing Institute
Citation
Platnick, D.; Alirezaie, M.; Rahnama, H. Enabling Perspective-Aware Ai with Contextual Scene Graph Generation. Information 2024, 15, 766.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.