🔄️ AEC AI Interpreter [In-progress]

Bridging the "Semantic Gap" between Messy Site Data and Structured Data

1. Motivation & Opportunity

Current AI research in Architecture, Engineering, and Construction (AEC) predominantly clusters into three trajectories:

Streamlining Heterogeneous Data: Aligning disparate formats (drawings, schedules, site records) via multimodal training.
Conditional Generation: Generating 3D geometry or BIM models directly from prompts or sketches.
Context Understanding: Leveraging LLMs as conversational agents to retrieve information.

The Gap: While these technologies generate new data efficiently, they struggle to interpret the messy, unstructured evidence found in ongoing construction projects.

The Question: How can we use AI not just to generate models, but to reliably align unstructured site evidence (photos, chat logs) with strict regulatory schemas (IFC-SG) without information loss?

2. The Problem: "Which Window is This?"

In modern Modular Construction (PPVC), geometric repetition creates a massive disambiguation challenge.

The Ambiguity: A query like "The master bedroom window has a crack" returns 200+ identical GUIDs in a high-rise BIM model.
The Noise: Site teams communicate in high-entropy, mixed-language (e.g., "Singlish", informal shorthand) that standard LLMs fail to parse accurately.
The Silos: 4D Schedules, 3D Models, and Site Photos exist in disconnected databases.

3. The Solution: A 3-Layer Architecture

The 3 layers:

RQ1 (T1) – Cross-modal alignment: photo/issue → IFC element or GUIDs.
RQ2 (T2) – Schema-aware mapping: text/forms/context → Compliance fields.
RQ3 (T3) – Agentic policy: interpreter tool calling.

Key: Agentic AI as the Orchestrator (Macro-Level)

Goal: Cross-Modal Disambiguation.
Tech: Google Gemini 2.5 Flash via Model Context Protocol (MCP).
Function: An agent that utilizes Abductive Reasoning. By correlating visual cues (e.g., seeing trees vs. sky) with 4D Construction Schedules, it filters the search space from 200+ candidates down to the single correct GUID.

System Architecture

4. Demo & Results

Interactive Demo of Semantic Gap Resolution

A. The Semantic Gap (Before): Unstructured site data remains disconnected from BIM schemas, creating ambiguity and requiring manual interpretation.

B. Context-Aware Resolution (After): The dual-layer system transforms noisy input into precise BIM references through fine-tuned language understanding and visual-temporal reasoning.

5. Implementation & Tech Stack

This prototype was built as a proof-of-concept for my Master's Thesis at Carnegie Mellon University.

Core Logic: Python, LangChain, LangGraph.
BIM Processing: IfcOpenShell, Neo4j (Graph Database).
AI/ML: Unsloth (Fine-tuning), OpenAI CLIP (Visual Matching), Google Gemini (Reasoning).
Protocol: Model Context Protocol (FastMCP) for standardized tool calling.