I recently finished my PhD at the University of Toronto, where I worked on robot control using RL, diffusion, and multi-modal language models with Dr. Goldie Nejat.
If you would like to chat about my work, feel free to book an open slot here.
We introduce a novel Hand-drawn Map Navigation (HAMNav) architecture that leverages pre-trained vision language models for robot navigation across diverse environments, hand-drawing styles, and robot embodiments, even in the presence of map inaccuracies.
4CNet: A Diffusion Approach to Map Prediction for Decentralized Multi-Robot Exploration Aaron Hao Tan,
Siddarth Narasimhan,
Goldie Nejat Under Review at T-RO, 2024 Paper
/
Video
We present a novel robot exploration map prediction method called Confidence-Aware Contrastive Conditional Consistency Model (4CNet), to predict (foresee) unknown spatial configurations in unknown unstructured multi- robot environments with irregularly shaped obstacles.
We present MLLM-Search, a novel multimodal language model approach to address the robotic person search problem under event-driven scenarios with incomplete or unavailable user schedules. Our method introduces zero-shot person search using language models for spatial reasoning.
OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation Siddarth Narasimhan,
Aaron Hao Tan,
Daniel Choi,
Goldie Nejat CoRL Workshop: Lifelong Learning for Home Robots (Spotlight Presentation), 2024 Under Review at ICRA 2025 Paper
/
Poster
/
Video
/
Talk
We introduce OLiVia-Nav, an online lifelong vision language architecture for mobile robot social navigation. By leveraging large vision-language models and a novel distillation process called SC-CLIP, OLiVia-Nav efficiently encodes social and environmental contexts, adapting to dynamic human environments.
Find Everything: A General Vision Language Model Approach to Multi-Object Search Daniel Choi,
Angus Fung,
Haitong Wang,
Aaron Hao Tan CoRL Workshop: Language and Robot Learning, 2024 Under Review at ICRA 2025 Paper
/
Website
/
Video
/
Code
/
Poster
We present Finder, a novel approach to the multi-object search problem that leverages vision language models to efficiently locate multiple objects in diverse unknown environments. Our method combines semantic mapping with spatio-probabilistic reasoning and adaptive planning, improving object recognition and scene understanding through VLMs.
NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments Haitong Wang,
Aaron Hao Tan,
Goldie Nejat IEEE Robotics and Automation Letters, 2024 Paper
/
Video
We propose NavFormer, a novel end-to-end DL architecture consisting of a dual-visual encoder module and a transformer-based navigation network to address for the first time the problem of TDN in unknown and dynamic environments.
The first Macro Action Decentralized Exploration Network (MADE-Net) using multi-agent deep reinforcement learning to address the challenges of communication dropouts during multi-robot exploration in unseen, unstructured, and cluttered environments.
We introduce a robotic pillow placement system using a static 6-DOF manipulator, leveraging YOLOv4-tiny, image transformations, and PCA to infer pillow poses and execute macro-actions.
Enhancing Robot Task Completion Through Environment and Task Inference: A Survey from the Mobile Robot Perspective Aaron Hao Tan,
Goldie Nejat Journal of Intelligent and Robotic Systems, 2022
Paper
The first extensive investigation of mobile robot inference problems in unknown environments with limited sensor and communication range and propose a new taxonomy to classify the different environment and task inference methods for single- and multi-robot systems.