Aaron Hao Tan

I recently finished my PhD at the University of Toronto, where I worked on robot control using RL, diffusion, and multi-modal language models with Dr. Goldie Nejat.

If you would like to chat about my work, feel free to book an open slot here.

Updated: 12/24

Email  /  X  /  Scholar

profile photo
Selected Works

Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach
Aaron Hao Tan, Angus Fung, Haitong Wang, Goldie Nejat
Under Review at RAL, 2024
Paper / Video

We introduce a novel Hand-drawn Map Navigation (HAMNav) architecture that leverages pre-trained vision language models for robot navigation across diverse environments, hand-drawing styles, and robot embodiments, even in the presence of map inaccuracies.

4CNet: A Diffusion Approach to Map Prediction for Decentralized Multi-Robot Exploration
Aaron Hao Tan, Siddarth Narasimhan, Goldie Nejat
Under Review at T-RO, 2024
Paper / Video

We present a novel robot exploration map prediction method called Confidence-Aware Contrastive Conditional Consistency Model (4CNet), to predict (foresee) unknown spatial configurations in unknown unstructured multi- robot environments with irregularly shaped obstacles.

MLLM-Search: A Zero-Shot Approach to Finding People using Multimodal Large Language Models
Angus Fung, Aaron Hao Tan, Haitong Wang, Beno Benhabib, Goldie Nejat
Under Review at RAL, 2024
Paper / Video

We present MLLM-Search, a novel multimodal language model approach to address the robotic person search problem under event-driven scenarios with incomplete or unavailable user schedules. Our method introduces zero-shot person search using language models for spatial reasoning.

OLiVia-Nav: An Online Lifelong Vision Language Approach for Mobile Robot Social Navigation
Siddarth Narasimhan, Aaron Hao Tan, Daniel Choi, Goldie Nejat
CoRL Workshop: Lifelong Learning for Home Robots (Spotlight Presentation), 2024
Under Review at ICRA 2025
Paper / Poster / Video / Talk

We introduce OLiVia-Nav, an online lifelong vision language architecture for mobile robot social navigation. By leveraging large vision-language models and a novel distillation process called SC-CLIP, OLiVia-Nav efficiently encodes social and environmental contexts, adapting to dynamic human environments.

Find Everything: A General Vision Language Model Approach to Multi-Object Search
Daniel Choi, Angus Fung, Haitong Wang, Aaron Hao Tan
CoRL Workshop: Language and Robot Learning, 2024
Under Review at ICRA 2025
Paper / Website / Video / Code / Poster

We present Finder, a novel approach to the multi-object search problem that leverages vision language models to efficiently locate multiple objects in diverse unknown environments. Our method combines semantic mapping with spatio-probabilistic reasoning and adaptive planning, improving object recognition and scene understanding through VLMs.

NavFormer: A Transformer Architecture for Robot Target-Driven Navigation in Unknown and Dynamic Environments
Haitong Wang, Aaron Hao Tan, Goldie Nejat
IEEE Robotics and Automation Letters, 2024
Paper / Video

We propose NavFormer, a novel end-to-end DL architecture consisting of a dual-visual encoder module and a transformer-based navigation network to address for the first time the problem of TDN in unknown and dynamic environments.

Deep Reinforcement Learning for Decentralized Multi-Robot Exploration with Macro Actions
Aaron Hao Tan, Federico Pizzaro Bejarano, Yuhan Zhu, Richard Ren, Goldie Nejat
IEEE Robotics and Automation Letters + ICRA, 2023
Paper / Video / Talk / Poster

The first Macro Action Decentralized Exploration Network (MADE-Net) using multi-agent deep reinforcement learning to address the challenges of communication dropouts during multi-robot exploration in unseen, unstructured, and cluttered environments.

Development of a Pillow Placement Process for Robotic Bed-Making
Chi-Hong Cheung, Aaron Hao Tan, Andrew Goldenberg
IEEE/ASME MESA, 2023  
Paper (Access)

We introduce a robotic pillow placement system using a static 6-DOF manipulator, leveraging YOLOv4-tiny, image transformations, and PCA to infer pillow poses and execute macro-actions.

Enhancing Robot Task Completion Through Environment and Task Inference: A Survey from the Mobile Robot Perspective
Aaron Hao Tan, Goldie Nejat
Journal of Intelligent and Robotic Systems, 2022
Paper

The first extensive investigation of mobile robot inference problems in unknown environments with limited sensor and communication range and propose a new taxonomy to classify the different environment and task inference methods for single- and multi-robot systems.

A Sim-to-Real Pipeline for Deep Reinforcement Learning for Autonomous Robot Navigation in Cluttered Rough Terrain
Han Hu, Kaicheng Zhang, Aaron Hao Tan, Michael Ruan, Christopher Agia, Goldie Nejat
IEEE Robotics and Automation Letters + IROS, 2021  
Paper / Talk / Video / Baseline Demo

The development of a novel sim-to-real pipeline for a mobile robot to effectively learn how to navigate real-world 3D rough terrain environments.

Design and development of a novel autonomous scaled multiwheeled vehicle
Aaron Hao Tan, Michael Peiris, Moustafa El-Gindy, Haoxiang Lang
Robotica, 2021  
ASME IDETC/CIE, 2019  
Journal (Access) / Conference (Access) / Slides / Video

A 1:6 scale, multi-wheeled mobile robotic platform with independent suspension, steering and actuation for off-terrain operations.


8/24 Forever