A Pedagogical Multimodal System for Mathematical Problem-Solving and Visual Reasoning
Author(s)
Lee, Jimin
DownloadThesis PDF (3.074Mb)
Advisor
Liang, Paul Pu
Terms of use
Metadata
Show full item recordAbstract
Effective reasoning often requires more than text or language. It requires visualizing, drawing, gesturing, and interacting for both humans and artificial intelligence (AI). Specifically in educational subjects, such as geometry and graphs, visual tools like auxiliary annotations and drawings can greatly help students understand abstract theories. This thesis explores and suggests how multimodal interaction between humans and AI helps humans engage with the system more naturally and effectively, leading to improved problem-solving in mathematical settings. Recent large multimodal models (LMMs) have the ability to facilitate collaborative reasoning by supporting textual, visual, and interactive inputs, diversifying methods of communication between humans and AI. Utilizing such advancements, this thesis also dives into the development of Interactive Sketchpad, a tutoring system that combines language-based explanations with interactive visualizations to enhance learning. It also reviews findings from user studies with Interactive Sketchpad, demonstrating that multimodality contributes to user task comprehension and engagement levels. Together, these contributions can reframe the role of AI in education as a visual and interactive collaborator that supports deeper reasoning rather than simply providing answers. Furthermore, this work demonstrates the potential of multimodal human-AI systems in fostering engagement and scaling personalized, visual learning across domains.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology