Ruohan Gao

Assistant Professor
Department of Computer Science, University of Maryland, College Park

Office: IRB-4248
Email: rhgao[AT]umd.edu

I am an assistant professor in the Department of Computer Science at University of Maryland, College Park, where I lead the UMD Multisensory Machine Intelligence Lab. I am also affiliated with the University of Maryland Institute for Advanced Computer Studies (UMIACS), Maryland Robotics Center (MRC), and Artificial Intelligence Interdisciplinary Institute at Maryland (AIM).

I received my Ph.D. in Computer Science from The University of Texas at Austin advised by Kristen Grauman, and then spent two years as a PostDoc at Stanford Vision and Learning Lab working with Fei-Fei Li, Jiajun Wu, and Silvio Savarese.

My research primarily focuses on computer vision and machine learning with a particular emphasis on multisensory machine intelligence involving sight, sound, and touch. The overarching goal of my research is to empower machines to emulate and enhance human capabilities in seeing, hearing, and feeling, ultimately enabling them to comprehensively perceive, understand, and interact with the multisensory world.

Prospective Students: I am always seeking self-motivated students to join my group. If you are interested, here is some more information.

Selected Publications [full list]

2025

Differentiable Room Acoustic Rendering with Multi-View Vision Priors

Derong Jin, and Ruohan Gao

International Conference on Computer Vision (ICCV), 2025

Bib PDF Video Project Page

@inproceedings{jin2025avdar,
  title = {Differentiable Room Acoustic Rendering with Multi-View Vision Priors},
  author = {Jin, Derong and Gao, Ruohan},
  booktitle = {International Conference on Computer Vision (ICCV)},
  year = {2025},
}

Hearing Anywhere in Any Environment

Xiulong Liu, Anurag Kumar, Paul Calamia, Sebastià V. Amengual Garí, Calvin Murdock, Ishwarya Ananthabhotla, Philip Robinson, Eli Shlizerman, Vamsi Krishna Ithapu, and Ruohan Gao

Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Bib PDF Dataset Project Page

@inproceedings{liu2025haae,
  title = {Hearing Anywhere in Any Environment},
  author = {Liu, Xiulong and Kumar, Anurag and Calamia, Paul and Garí, Sebastià V. Amengual and Murdock, Calvin and Ananthabhotla, Ishwarya and Robinson, Philip and Shlizerman, Eli and Ithapu, Vamsi Krishna and Gao, Ruohan},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025},
}

2024

Hearing Anything Anywhere

Mason L. Wang*, Ryosuke Sawata*, Samuel Clarke, Ruohan Gao, Shangzhe Wu, and Jiajun Wu

Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Bib PDF Code Dataset Video Project Page

@inproceedings{wang2024haa,
  title = {Hearing Anything Anywhere},
  author = {Wang*, Mason L. and Sawata*, Ryosuke and Clarke, Samuel and Gao, Ruohan and Wu, Shangzhe and Wu, Jiajun},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2024},
}

2023

The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects

Ruohan Gao*, Yiming Dou*, Hao Li*, Tanmay Agarwal, Jeannette Bohg, Yunzhu Li, Li Fei-Fei, and Jiajun Wu

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Bib PDF Code Video Project Page Interactive Demo

@inproceedings{gao2023ObjectFolderBM,
  title = {The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects},
  author = {Gao*, Ruohan and Dou*, Yiming and Li*, Hao and Agarwal, Tanmay and Bohg, Jeannette and Li, Yunzhu and Fei-Fei, Li and Wu, Jiajun},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023},
}

RealImpact: A Dataset of Impact Sound Fields for Real Objects

Samuel Clarke, Ruohan Gao, Mason Wang, Mark Rau, Julia Xu, Mark Rau, Jui-Hsien Wang, Doug James, and Jiajun Wu

Conference on Computer Vision and Pattern Recognition (CVPR), 2023

Highlight Paper
Bib PDF Supp Code Video Project Page

@inproceedings{clarke2023realimpact,
  title = {RealImpact: A Dataset of Impact Sound Fields for Real Objects},
  author = {Clarke, Samuel and Gao, Ruohan and Wang, Mason and Rau, Mark and Xu, Julia and Rau, Mark and Wang, Jui-Hsien and James, Doug and Wu, Jiajun},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2023},
}

2022

See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

Hao Li*, Yizhi Zhang*, Junzhe Zhu, Shaoxiong Wang, Michelle A. Lee, Huazhe Xu, Edward Adelson, Li Fei-Fei, Ruohan Gao†, and Jiajun Wu†

Conference on Robot Learning (CoRL), 2022

Bib PDF Supp Video Project Page

@inproceedings{li2022seehearfeel,
  title = {See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation},
  author = {Li*, Hao and Zhang*, Yizhi and Zhu, Junzhe and Wang, Shaoxiong and Lee, Michelle A. and Xu, Huazhe and Adelson, Edward and Fei-Fei, Li and Gao†, Ruohan and Wu†, Jiajun},
  booktitle = {Conference on Robot Learning (CoRL)},
  year = {2022},
}

ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer

Ruohan Gao*, Zilin Si*, Yen-Yu Chang*, Samuel Clarke, Jeannette Bohg, Li Fei-Fei, Wenzhen Yuan, and Jiajun Wu

Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Bib PDF Supp Dataset Project Page

@inproceedings{gao2022ObjectFolderV2,
  title = {ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer},
  author = {Gao*, Ruohan and Si*, Zilin and Chang*, Yen-Yu and Clarke, Samuel and Bohg, Jeannette and Fei-Fei, Li and Yuan, Wenzhen and Wu, Jiajun},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022},
}

Visual Acoustic Matching

Changan Chen, Ruohan Gao, Paul Calamia, and Kristen Grauman

Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Oral Presentation
Bib PDF Code Project Page Media Coverage

@inproceedings{chen2022visual,
  title = {Visual Acoustic Matching},
  author = {Chen, Changan and Gao, Ruohan and Calamia, Paul and Grauman, Kristen},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2022},
}

2021

Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Rishabh Garg, Ruohan Gao, and Kristen Grauman

British Machine Vision Conference (BMVC), 2021

Best Paper Award Runner-Up
Bib PDF Supp Dataset Project Page

@inproceedings{garg2021geometry,
  title = {Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video},
  author = {Garg, Rishabh and Gao, Ruohan and Grauman, Kristen},
  booktitle = {British Machine Vision Conference (BMVC)},
  year = {2021},
}

Look and Listen: From Semantic to Spatial Audio-Visual Perception

Ruohan Gao

Ph.D. Dissertation, 2021

Michael H. Granof Award, UT Austin’s Top 1 Doctoral Dissertation
Bib PDF Media Coverage

2019

2.5D Visual Sound

Ruohan Gao, and Kristen Grauman

Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Best Paper Award Finalist
Bib PDF Supp Code Dataset Video Project Page Media Coverage

@inproceedings{gao2019visual-sound,
  title = {2.5D Visual Sound},
  author = {Gao, Ruohan and Grauman, Kristen},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2019},
}

2018

Learning to Separate Object Sounds by Watching Unlabeled Video

Ruohan Gao, Rogerio Feris, and Kristen Grauman

European Conference on Computer Vision (ECCV), 2018

Oral Presentation
Bib PDF Supp Code Video Poster Project Page

@inproceedings{gao2018object-sounds,
  title = {Learning to Separate Object Sounds by Watching Unlabeled Video},
  author = {Gao, Ruohan and Feris, Rogerio and Grauman, Kristen},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2018},
}