Structuring Heterogeneous Real Estate Market Evidence Using LLMs: A Provenance-Aware Analytical Framework

Hahmann, Luca; Xie, Richard

dc.contributor.advisor	Torous, Walter
dc.contributor.author	Hahmann, Luca
dc.contributor.author	Xie, Richard
dc.date.accessioned	2026-04-21T18:12:43Z
dc.date.available	2026-04-21T18:12:43Z
dc.date.issued	2026-02
dc.date.submitted	2026-02-10T15:39:14.430Z
dc.identifier.uri	https://hdl.handle.net/1721.1/165539
dc.description.abstract	Real estate market analysis at early analytical stages relies on evidence drawn from heterogeneous public and semi-public sources, including brokerage research, administrative datasets, planning documents, and narrative commentary. These inputs differ systematically in scope, definition, temporal framing, and institutional construction. Market indicators are frequently consumed through static reports and summary tables that obscure provenance, comparability constraints, and evidentiary gaps. As a result, analytical conclusions often depend on implicit assumptions about how market information is constructed and aligned before formal underwriting or quantitative modeling begins. This thesis develops an evidence-centric framework for structuring and inspecting real estate market information prior to inference, using large language models (LLMs) to translate unstructured report artifacts into structured evidence objects. The framework treats market indicators, narrative claims, and source dependencies as constructed analytical objects. Each observation is represented together with explicit metadata describing geographic scope, temporal reference, definitional disclosure, and upstream data dependencies. This representation supports disciplined comparison across sources and makes uncertainty, non-equivalence, and missing context explicit. The methodological contribution consists of a KPI taxonomy, a context-aware data model, and a layered processing architecture. Observations are preserved with their original construction context and aligned only when comparability conditions are satisfied. Uncertainty is encoded through coverage, recency, and dispersion indicators. Visualization functions as an interface for evidentiary inspection, enabling users to navigate indicators, examine parallel representations, and trace reported values to their sources. The framework is demonstrated through an application to U.S. multifamily market reports. A single-source case study illustrates how brokerage reports combine headline KPIs, narrative claims, submarket tables, time-series elements, and transaction summaries within a single artifact. A multi-source case study examines contemporaneous reports for the same market and shows how differences in segmentation logic, measurement conventions, and temporal aggregation shape apparent agreement and disagreement. In both cases, structural non-equivalence remains visible as an analytical feature. The results show that explicit representation of evidentiary structure supports clearer interpretation of market claims independent of predictive modeling. The thesis positions LLM-enabled structuring as foundational infrastructure for real estate market analysis and as a prerequisite for downstream quantitative and causal research. Future work may extend the framework to additional data sources, longitudinal analysis of reporting behavior, and institutional deployment across investment workflows.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Structuring Heterogeneous Real Estate Market Evidence Using LLMs: A Provenance-Aware Analytical Framework
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Center for Real Estate. Program in Real Estate Development.
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Real Estate Development

Files in this item

Name:: hahmann_lucaha_xie_xrichard_ms ...
Size:: 2.757Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record