Show simple item record

dc.contributor.advisorTorous, Walter
dc.contributor.authorHahmann, Luca
dc.contributor.authorXie, Richard
dc.date.accessioned2026-04-21T18:12:43Z
dc.date.available2026-04-21T18:12:43Z
dc.date.issued2026-02
dc.date.submitted2026-02-10T15:39:14.430Z
dc.identifier.urihttps://hdl.handle.net/1721.1/165539
dc.description.abstractReal estate market analysis at early analytical stages relies on evidence drawn from heterogeneous public and semi-public sources, including brokerage research, administrative datasets, planning documents, and narrative commentary. These inputs differ systematically in scope, definition, temporal framing, and institutional construction. Market indicators are frequently consumed through static reports and summary tables that obscure provenance, comparability constraints, and evidentiary gaps. As a result, analytical conclusions often depend on implicit assumptions about how market information is constructed and aligned before formal underwriting or quantitative modeling begins. This thesis develops an evidence-centric framework for structuring and inspecting real estate market information prior to inference, using large language models (LLMs) to translate unstructured report artifacts into structured evidence objects. The framework treats market indicators, narrative claims, and source dependencies as constructed analytical objects. Each observation is represented together with explicit metadata describing geographic scope, temporal reference, definitional disclosure, and upstream data dependencies. This representation supports disciplined comparison across sources and makes uncertainty, non-equivalence, and missing context explicit. The methodological contribution consists of a KPI taxonomy, a context-aware data model, and a layered processing architecture. Observations are preserved with their original construction context and aligned only when comparability conditions are satisfied. Uncertainty is encoded through coverage, recency, and dispersion indicators. Visualization functions as an interface for evidentiary inspection, enabling users to navigate indicators, examine parallel representations, and trace reported values to their sources. The framework is demonstrated through an application to U.S. multifamily market reports. A single-source case study illustrates how brokerage reports combine headline KPIs, narrative claims, submarket tables, time-series elements, and transaction summaries within a single artifact. A multi-source case study examines contemporaneous reports for the same market and shows how differences in segmentation logic, measurement conventions, and temporal aggregation shape apparent agreement and disagreement. In both cases, structural non-equivalence remains visible as an analytical feature. The results show that explicit representation of evidentiary structure supports clearer interpretation of market claims independent of predictive modeling. The thesis positions LLM-enabled structuring as foundational infrastructure for real estate market analysis and as a prerequisite for downstream quantitative and causal research. Future work may extend the framework to additional data sources, longitudinal analysis of reporting behavior, and institutional deployment across investment workflows.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleStructuring Heterogeneous Real Estate Market Evidence Using LLMs: A Provenance-Aware Analytical Framework
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Center for Real Estate. Program in Real Estate Development.
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Real Estate Development


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record