
*equal contribution, †equal advising


  1. swl_eccv2024.png
    Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
    Heeseung YunRuohan GaoIshwarya AnanthabhotlaAnurag Kumar, Jacob Donley, Chao Li, Gunhee KimVamsi Krishna Ithapu, and Calvin Murdock
    European Conference on Computer Vision (ECCV), 2024
  2. meercat_eccv2024.png
    Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
    European Conference on Computer Vision (ECCV), 2024
  3. diffsound_siggraph2024.png
    DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks
    Xutong Jin*, Chenxi Xu*, Ruohan GaoJiajun Wu, Guoping Wang, and Sheng Li
    ACM Special Interest Group on Computer Graphics and Interactive Techniques Conference (SIGGRAPH), 2024
  4. hearing_anything_anywhere_cvpr2024.png
    Hearing Anything Anywhere
    Mason L. Wang*, Ryosuke Sawata*, Samuel ClarkeRuohan GaoShangzhe Wu, and Jiajun Wu
    Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  5. av-conv-cvpr2024.png
    The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
    Conference on Computer Vision and Pattern Recognition (CVPR), 2024


  1. soundcam_neurips2023.png
    SoundCam: A Dataset for Tasks in Tracking and Identifying Humans from Real Room Acoustics
    Conference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS), 2023
  2. noir_corl2023.png
    NOIR: Neural Signal Operated Intelligent Robot for Everyday Activities
    Sharon Lee*, Ruohan Zhang*Minjune Hwang*Ayano Hiranaka*Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, and 4 more authors
    Conference on Robot Learning (CoRL), 2023
  3. bmvc2021.png
    Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task Learning
    Rishabh GargRuohan Gao, and Kristen Grauman
    International Journal of Computer Vision (IJCV), 2023
  4. of_benchmark_cvpr2023.jpg
    The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real Objects
    Conference on Computer Vision and Pattern Recognition (CVPR), 2023
  5. realimpact_cvpr2023.jpg
    RealImpact: A Dataset of Impact Sound Fields for Real Objects
    Samuel ClarkeRuohan GaoMason WangMark Rau, Julia Xu, Mark RauJui-Hsien WangDoug James, and Jiajun Wu
    Conference on Computer Vision and Pattern Recognition (CVPR), 2023
  6. osf_tmlr2023.jpg
    Learning Object-Centric Neural Scattering Functions for Free-Viewpoint Relighting and Scene Composition
    Transactions on Machine Learning Research (TMLR), 2023
  7. dano_ral2023.jpg
    Differentiable Physics Simulation of Dynamics-Augmented Neural Objects
    Simon Le Cleac’h, Hong-Xing YuMichelle GuoTaylor A. HowellRuohan GaoJiajun WuZachary Manchester, and Mac Schwager
    Robotics and Automation Letters (RA-L), 2023
  8. sonicverse_icra2023.jpg
    Sonicverse: A Multisensory Simulation Platform for Training Household Agents that See and Hear
    Ruohan Gao*Hao Li*, Gokul Dharan, Zhuzhu Wang, Chengshu Li, Fei Xia, Silvio SavareseLi Fei-Fei, and Jiajun Wu
    International Conference on Robotics and Automation (ICRA),, 2023
  9. emma_dataset_iclr2023.jpg
    An Extensible Multi-modal Multi-task Object Dataset with Materials
    Trevor Scott Standley, Ruohan Gao, Dawn Chen, Jiajun Wu, and Silvio Savarese
    International Conference on Learning Representations (ICLR), 2023


  1. see_hear_feel_corl2022.png
    See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation
    Hao Li*, Yizhi Zhang*, Junzhe Zhu, Shaoxiong WangMichelle A. LeeHuazhe XuEdward AdelsonLi Fei-FeiRuohan Gao†, and Jiajun Wu†
    Conference on Robot Learning (CoRL), 2022
  2. objectfolderV2.png
    ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
    Conference on Computer Vision and Pattern Recognition (CVPR), 2022
  3. visual_acoustic_matching_cvpr2022.png
    Visual Acoustic Matching
    Changan ChenRuohan GaoPaul Calamia, and Kristen Grauman
    Conference on Computer Vision and Pattern Recognition (CVPR), 2022


  1. objectfolder_corl2021.png
    ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile Representations
    Ruohan GaoYen-Yu Chang, Shivani Mall, Li Fei-Fei, and Jiajun Wu
    Conference on Robot Learning (CoRL), 2021
  2. diffImpact_corl2021.png
    DiffImpact: Differentiable Rendering and Identification of Impact Sounds
    Conference on Robot Learning (CoRL), 2021
  3. bmvc2021.png
    Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
    Rishabh GargRuohan Gao, and Kristen Grauman
    British Machine Vision Conference (BMVC), 2021
  4. thesis_teaser.png
    Look and Listen: From Semantic to Spatial Audio-Visual Perception
    Ruohan Gao
    Ph.D. Dissertation, 2021
  5. VisualVoice_cvpr2021.jpg
    Visualvoice: Audio-visual speech separation with cross-modal consistency
    Ruohan Gao, and Kristen Grauman
    Conference on Computer Vision and Pattern Recognition (CVPR), 2021
  6. av_wan_iclr2021.jpg
    Learning to Set Waypoints for Audio-Visual Navigation
    International Conference on Learning Representations (ICLR), 2021


  1. visualEchoes_eccv2020.png
    VisualEchoes: Spatial Visual Representation Learning through Echolocation
    European Conference on Computer Vision (ECCV), 2020
  2. listen_to_look_cvpr2020.png
    Listen to Look: Action Recognition by Previewing Audio
    Conference on Computer Vision and Pattern Recognition (CVPR), 2020


  1. co-separation-iccv2019.png
    Co-Separating Sounds of Visual Objects
    Ruohan Gao, and Kristen Grauman
    International Conference on Computer Vision (ICCV), 2019
  2. 2.5D_visual_sound_cvpr2019.png
    2.5D Visual Sound
    Ruohan Gao, and Kristen Grauman
    Conference on Computer Vision and Pattern Recognition (CVPR), 2019


  1. audioobjects_eccv2018.png
    Learning to Separate Object Sounds by Watching Unlabeled Video
    Ruohan GaoRogerio Feris, and Kristen Grauman
    European Conference on Computer Vision (ECCV), 2018
  2. shapecodes_eccv2018.jpg
    ShapeCodes: Self-Supervised Feature Learning by Lifting Views to Viewgrids
    Dinesh JayaramanRuohan Gao, and Kristen Grauman
    European Conference on Computer Vision (ECCV), 2018
  3. im2flow_cvpr2018.jpg
    Im2Flow: Motion Hallucination from Static Images for Action Recognition
    Ruohan GaoBo Xiong, and Kristen Grauman
    Conference on Computer Vision and Pattern Recognition (CVPR), 2018


  1. ondemand_iccv2017.jpg
    On-Demand Learning for Deep Image Restoration
    Ruohan Gao, and Kristen Grauman
    International Conference on Computer Vision (ICCV), 2017


  1. objectcentric_accv2016.jpg
    Object-Centric Representation Learning from Unlabeled Videos
    Ruohan GaoDinesh Jayaraman, and Kristen Grauman
    Asian Conference on Computer Vision (ACCV), 2016
    Accelerating Graph Mining Algorithms via Uniform Random Edge Sampling
    Ruohan GaoHuanle XuPili Hu, and Wing Cheong Lau
    IEEE International Conference on Communications (ICC), 2016


    Graph Property Preservation under Community-Based Sampling
    Ruohan GaoPili Hu, and Wing Cheong Lau
    IEEE Global Communications Conference (GLOBECOM), 2015