Publications
*equal contribution, †equal advising
2024
- Spherical World-Locking for Audio-Visual Localization in Egocentric VideosEuropean Conference on Computer Vision (ECCV), 2024
- DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference TasksACM Special Interest Group on Computer Graphics and Interactive Techniques Conference (SIGGRAPH), 2024
- The Audio-Visual Conversational Graph: From an Egocentric-Exocentric PerspectiveConference on Computer Vision and Pattern Recognition (CVPR), 2024
2023
- SoundCam: A Dataset for Tasks in Tracking and Identifying Humans from Real Room AcousticsConference on Neural Information Processing Systems Datasets and Benchmarks Track (NeurIPS), 2023
- NOIR: Neural Signal Operated Intelligent Robot for Everyday ActivitiesConference on Robot Learning (CoRL), 2023
- Visually-Guided Audio Spatialization in Video with Geometry-Aware Multi-task LearningInternational Journal of Computer Vision (IJCV), 2023
- The ObjectFolder Benchmark: Multisensory Object-Centric Learning with Neural and Real ObjectsConference on Computer Vision and Pattern Recognition (CVPR), 2023
- Sonicverse: A Multisensory Simulation Platform for Training Household Agents that See and HearInternational Conference on Robotics and Automation (ICRA),, 2023
- An Extensible Multi-modal Multi-task Object Dataset with MaterialsInternational Conference on Learning Representations (ICLR), 2023
2022
- See, Hear, and Feel: Smart Sensory Fusion for Robotic ManipulationConference on Robot Learning (CoRL), 2022
- ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real TransferConference on Computer Vision and Pattern Recognition (CVPR), 2022
- Visual Acoustic MatchingConference on Computer Vision and Pattern Recognition (CVPR), 2022
2021
- ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and Tactile RepresentationsConference on Robot Learning (CoRL), 2021
- Geometry-Aware Multi-Task Learning for Binaural Audio Generation from VideoBritish Machine Vision Conference (BMVC), 2021
- Look and Listen: From Semantic to Spatial Audio-Visual PerceptionPh.D. Dissertation, 2021
- Visualvoice: Audio-visual speech separation with cross-modal consistencyConference on Computer Vision and Pattern Recognition (CVPR), 2021
- Learning to Set Waypoints for Audio-Visual NavigationInternational Conference on Learning Representations (ICLR), 2021
2020
- VisualEchoes: Spatial Visual Representation Learning through EcholocationEuropean Conference on Computer Vision (ECCV), 2020
2019
- 2.5D Visual SoundConference on Computer Vision and Pattern Recognition (CVPR), 2019
2018
2017
2016
- Object-Centric Representation Learning from Unlabeled VideosAsian Conference on Computer Vision (ACCV), 2016