
Ruohan Gao

Research Scientist
Meta Reality Labs

Incoming Assistant Professor
Department of Computer Science, University of Maryland, College Park

Email: rhgao[AT]

I am currently a Research Scientist at Meta Reality Labs. Previously, I received my Ph.D. in Computer Science from The University of Texas at Austin advised by Kristen Grauman, and then spent two years as a PostDoc at Stanford Vision and Learning Lab working with Fei-Fei Li, Jiajun Wu, and Silvio Savarese.

My research primarily focuses on computer vision and machine learning, with a particular emphasis on multisensory learning involving sight, sound, and touch. The overarching goal of my research is to enpower machines to emulate and enhance human capabilities in seeing, hearing, and feeling, ultimately enabling them to comprehensively perceive, understand, and engage with the intricacies of the multisensory world.

Prospective Students: I am always seeking self-motivated students to join my UMD Multisensory Machine Intelligence Group. If you are interested, here is some more information.


  • I will be joining the Department of Computer Science at University of Maryland, College Park (UMD) as an Assistant Professor late 2024.
  • We are organizing the Sight and Sound Workshop at CVPR 2024.
  • I serve as an Area Chair for ICCV 2023, and a SPC for AAAI 2023, 2024.
  • We are organizing the AV4D Workshop at ICCV 2023.
  • We are organizing the Creative AI Across Modalities Workshop at AAAI 2023.
  • We are organizing the Embodied Multimodal Learning Workshop at ICLR 2021.
  • I am very honored to have received the Michael H. Granof Award that recognizes UT Austin’s Top 1 Doctoral Dissertation of 2021.

    Selected Publications [full list]


    1. hearing_anything_anywhere_cvpr2024.png
      Hearing Anything Anywhere
      Mason L. Wang*, Ryosuke Sawata*, Samuel ClarkeRuohan GaoShangzhe Wu, and Jiajun Wu
      Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    2. av-conv-cvpr2024.png
      The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
      Conference on Computer Vision and Pattern Recognition (CVPR), 2024


    1. of_benchmark_cvpr2023.jpg
      The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects
      Conference on Computer Vision and Pattern Recognition (CVPR), 2023
    2. realimpact_cvpr2023.jpg
      RealImpact: A Dataset of Impact Sound Fields for Real Objects
      Samuel ClarkeRuohan GaoMason WangMark Rau, Julia Xu, Mark RauJui-Hsien WangDoug James, and Jiajun Wu
      Conference on Computer Vision and Pattern Recognition (CVPR), 2023


    1. see_hear_feel_corl2022.png
      See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
      Hao Li*, Yizhi Zhang*, Junzhe Zhu, Shaoxiong WangMichelle A. LeeHuazhe XuEdward AdelsonLi Fei-FeiRuohan Gao†, and Jiajun Wu†
      Conference on Robot Learning (CoRL), 2022
    2. objectfolderV2.png
      ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
      Conference on Computer Vision and Pattern Recognition (CVPR), 2022
    3. visual_acoustic_matching_cvpr2022.png
      Visual Acoustic Matching
      Changan ChenRuohan GaoPaul Calamia, and Kristen Grauman
      Conference on Computer Vision and Pattern Recognition (CVPR), 2022


    1. bmvc2021.png
      Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
      Rishabh GargRuohan Gao, and Kristen Grauman
      British Machine Vision Conference (BMVC), 2021
    2. thesis_teaser.png
      Look and Listen: From Semantic to Spatial Audio-Visual Perception
      Ruohan Gao
      Ph.D. Dissertation, 2021


    1. 2.5D_visual_sound_cvpr2019.png
      2.5D Visual Sound
      Ruohan Gao, and Kristen Grauman
      Conference on Computer Vision and Pattern Recognition (CVPR), 2019


    1. audioobjects_eccv2018.png
      Learning to Separate Object Sounds by Watching Unlabeled Video
      Ruohan GaoRogerio Feris, and Kristen Grauman
      European Conference on Computer Vision (ECCV), 2018