Publications
*equal contribution, †equal advising
2025
-   Differentiable Room Acoustic Rendering with Multi-View Vision PriorsInternational Conference on Computer Vision (ICCV), 2025 Differentiable Room Acoustic Rendering with Multi-View Vision PriorsInternational Conference on Computer Vision (ICCV), 2025
-   EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric PerceptionInternational Conference on Computer Vision (ICCV), 2025 EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric PerceptionInternational Conference on Computer Vision (ICCV), 2025
-   GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement LearningInternational Conference on Computer Vision (ICCV), 2025 GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement LearningInternational Conference on Computer Vision (ICCV), 2025
-   AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMsInternational Conference on Computer Vision (ICCV), 2025 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMsInternational Conference on Computer Vision (ICCV), 2025
-   Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMsInternational Conference on Computer Vision (ICCV), 2025 Aurelia: Test-time Reasoning Distillation in Audio-Visual LLMsInternational Conference on Computer Vision (ICCV), 2025
-   Hearing Anywhere in Any EnvironmentConference on Computer Vision and Pattern Recognition (CVPR), 2025 Hearing Anywhere in Any EnvironmentConference on Computer Vision and Pattern Recognition (CVPR), 2025
-   Learning to Highlight Audio by Watching MoviesConference on Computer Vision and Pattern Recognition (CVPR), 2025 Learning to Highlight Audio by Watching MoviesConference on Computer Vision and Pattern Recognition (CVPR), 2025
2024
-   Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024 Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024
-   DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference TasksACM Special Interest Group on Computer Graphics and Interactive Techniques Conference (SIGGRAPH), 2024 DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference TasksACM Special Interest Group on Computer Graphics and Interactive Techniques Conference (SIGGRAPH), 2024
-   The Audio-Visual Conversational Graph: From an Egocentric-Exocentric PerspectiveConference on Computer Vision and Pattern Recognition (CVPR), 2024 The Audio-Visual Conversational Graph: From an Egocentric-Exocentric PerspectiveConference on Computer Vision and Pattern Recognition (CVPR), 2024
2023
-   SoundCam: A Dataset for Tasks in Tracking and Identifying Humans from Real Room AcousticsConference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS), 2023 SoundCam: A Dataset for Tasks in Tracking and Identifying Humans from Real Room AcousticsConference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS), 2023
-   NOIR: Neural Signal Operated Intelligent Robot for Everyday ActivitiesConference on Robot Learning (CoRL), 2023 NOIR: Neural Signal Operated Intelligent Robot for Everyday ActivitiesConference on Robot Learning (CoRL), 2023
-   Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task LearningInternational Journal of Computer Vision (IJCV), 2023 Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task LearningInternational Journal of Computer Vision (IJCV), 2023
-   The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real ObjectsConference on Computer Vision and Pattern Recognition (CVPR), 2023 The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real ObjectsConference on Computer Vision and Pattern Recognition (CVPR), 2023
-   Sonicverse: A Multisensory Simulation Platform for Training Household Agents that See and HearInternational Conference on Robotics and Automation (ICRA),, 2023 Sonicverse: A Multisensory Simulation Platform for Training Household Agents that See and HearInternational Conference on Robotics and Automation (ICRA),, 2023
-   An Extensible Multi-modal Multi-task Object Dataset with MaterialsInternational Conference on Learning Representations (ICLR), 2023 An Extensible Multi-modal Multi-task Object Dataset with MaterialsInternational Conference on Learning Representations (ICLR), 2023
2022
-   See, Hear, and Feel: Smart Sensory Fusion for Robotic ManipulationConference on Robot Learning (CoRL), 2022 See, Hear, and Feel: Smart Sensory Fusion for Robotic ManipulationConference on Robot Learning (CoRL), 2022
-   ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferConference on Computer Vision and Pattern Recognition (CVPR), 2022 ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferConference on Computer Vision and Pattern Recognition (CVPR), 2022
-   Visual Acoustic MatchingConference on Computer Vision and Pattern Recognition (CVPR), 2022 Visual Acoustic MatchingConference on Computer Vision and Pattern Recognition (CVPR), 2022
2021
-   ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile RepresentationsConference on Robot Learning (CoRL), 2021 ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile RepresentationsConference on Robot Learning (CoRL), 2021
-   Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021 Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021
-   Look and Listen: From Semantic to Spatial Audio-Visual PerceptionPh.D. Dissertation, 2021 Look and Listen: From Semantic to Spatial Audio-Visual PerceptionPh.D. Dissertation, 2021
-   Visualvoice: Audio-visual speech separation with cross-modal consistencyConference on Computer Vision and Pattern Recognition (CVPR), 2021 Visualvoice: Audio-visual speech separation with cross-modal consistencyConference on Computer Vision and Pattern Recognition (CVPR), 2021
-   Learning to Set Waypoints for Audio-Visual NavigationInternational Conference on Learning Representations (ICLR), 2021 Learning to Set Waypoints for Audio-Visual NavigationInternational Conference on Learning Representations (ICLR), 2021
2020
-   VisualEchoes: Spatial Visual Representation Learning through EcholocationEuropean Conference on Computer Vision (ECCV), 2020 VisualEchoes: Spatial Visual Representation Learning through EcholocationEuropean Conference on Computer Vision (ECCV), 2020
2019
-   2.5D Visual SoundConference on Computer Vision and Pattern Recognition (CVPR), 2019 2.5D Visual SoundConference on Computer Vision and Pattern Recognition (CVPR), 2019
2018
2017
2016
-   Object-Centric Representation Learning from Unlabeled VideosAsian Conference on Computer Vision (ACCV), 2016 Object-Centric Representation Learning from Unlabeled VideosAsian Conference on Computer Vision (ACCV), 2016
 
  
  
  
  
  
  
  
  
  
  
  
  
 