Integrating BIM with Retrieval-Augmented Generation for Language Guidance
Delivering intelligent and adaptive navigation assistance in augmented reality (AR) requires more than visual cues—it demands systems capable of interpreting flexible user intent and reasoning over both spatial and semantic context. Prior AR navigation systems often rely on rigid input schemes or predefined commands, which limit the utility of rich building data and hinder natural interaction. In this work, we propose an embodied AR navigation system that integrates Building Information Modeling (BIM) with a multi-agent retrieval-augmented generation (RAG) framework to support flexible, language-driven goal retrieval and route planning. The system orchestrates three language agents, Triage, Search, and Response, built on large language models (LLMs), which enables robust interpretation of open-ended queries and spatial reasoning using BIM data. Navigation guidance is delivered through an embodied AR agent, equipped with voice interaction and locomotion, to enhance user experience. A real-world user study yields a System Usability Scale (SUS) score of 80.5, indicating excellent usability, and comparative evaluations show that the embodied interface can significantly improve users' perception of system intelligence. These results underscore the importance and potential of language-grounded reasoning and embodiment in the design of user-centered AR navigation systems.
Our approach combines Building Information Modeling (BIM) with a multi-agent retrieval-augmented generation (RAG) framework. The system preprocesses BIM data into a vector database using sentence transformers, enabling semantic similarity search. Three specialized agents orchestrate the navigation process: the Triage Agent classifies user queries and extracts semantic targets, the Search Agent performs vector similarity search and candidate selection using LLM reasoning, and the Response Agent generates contextually appropriate navigation instructions. An embodied AR agent delivers guidance through voice interaction, natural locomotion, and adaptive synchronization with user movement.
Measure | Median Score | p-value | Effect Size (r) | Preference Direction |
---|---|---|---|---|
Clarity & Usability | 4.0 | .002 ** | 0.81 | Agent > Arrow |
Engagement & Enjoyment | 5.0 | < .001 *** | 1.00 | Agent > Arrow |
Perceived Intelligence | 5.0 | < .001 *** | 1.00 | Agent > Arrow |
Trustworthiness | 4.0 | .038 * | 0.61 | Agent > Arrow |
Cognitive Load | 4.0 | .040 * | 0.57 | Agent > Arrow |
User Study Results (N = 20): Results from Wilcoxon Signed-Rank Test comparing embodied agent vs. arrow-only interface (Neutral Midpoint = 3). Significance levels: p < .05 (*), p < .01 (**), p < .001 (***)
Achieved mean SUS score of 80.5 (SD = 11.5), indicating excellent system usability. Participants rated highly on ease of use (4.35), learnability (4.65), and system integration (4.25). Labels marked with an asterisk (*) indicate reverse-scored items.
Examples of successful goal retrieval and navigation route generation demonstrating the system's ability to interpret natural language queries in context, including indirect queries like "I'm hungry, where can I find food?" or queries include context like "meeting room with more than 10 seats".
Visualizations of user trajectories and navigation paths traversed by the embodied agent.
Purple lines represent navigation routes; aqua blue lines indicate actual user walking trajectories.
@inproceedings{embodied_ar_navigation_2025,
title={An Embodied AR Navigation Agent: Integrating BIM with Retrieval-Augmented Generation for Language Guidance},
author={Yang, Hsuan-Kung and Hsiao, Tsu-Ching and Oka, Ryoichiro and Nishino, Ryuya and Tofukuji, Satoko and Kobori, Norimasa},
booktitle={Proceedings of the IEEE International Symposium on Mixed and Augmented Reality (ISMAR)},
year={2025},
}