[02.2026] Two papers accepted to CVPR 2026 (Ego-edit and AuralSAM)! Ego-edit introduces all you need (dataset, method, benchmarks) for egocentric editing!
[01.2026] Two papers accepted to ICLR 2026 (Learning to see before seeing with oral presentation and Kaleido)! In LSBS, we show how visual priors emerge from LLM pre-training and demonstrate how we can deliberately leverage them to build more powerful MLLM capabilities.
[11.2025] Joining FAIR to start working with Mike Lewis on unified MLLM pre-training! We pursue the next paradigm of pre-training.
[09.2025] Hallucination at a Glance was accepted to NeurIPS 2025, a new dataset for benchmarking and training MLLMs for fine-grained visual reasoning!
[07.2025] 3D2EP was accepted to SIGGRAPH 2025, a new method for 3D shape decomposition that represents objects as parametric primitives.
[05.2025] Flex3D was accepted to ICML 2025, a new two-stage pipeline for high-quality 3D generation and reconstruction!
[11.2024] Two papers (3D-GPT and DreamBeast) were accepted to 3DV 2025. Congratulations to Chunyi and Runjia! 3D-GPT is pioneering work in large-scale 3D scene generation, and DreamBeast is a fascinating project generating fantastical 3D animals.
[07.2024] VFusion3D and Unicorns were accepted to ECCV 2024! VFusion3D is the first work exploring scalable 3D generative/reconstruction models as a step towards a 3D foundation. Check it out if you are interested in 3D generation.
[07.2023] Hyperbolic Audio-visual Zero-shot Learning was accepted to ICCV 2023!
[07.2023] Graduated from ANU with First-Class Honours.
[10.2022] Awarded as a top reviewer in NeurIPS 2022!
[07.2022] Blind Image Decomposition (BID) was accepted to ECCV 2022! A new low-level vision task that better adapts to complex real-world scenarios. Check our project page for more.
[05.2022] You Only Cut Once (YOCO) was accepted to ICML 2022! Check here for our work on how to perform data augmentation.
My research focuses on multimodal foundation models, including multimodal large language models and world generation models, as well as their interactions and integration.
I approach them from a vision-first perspective, aiming to make vision a central component of general intelligence.
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), International Journal of Computer Vision (IJCV), Transactions on Image Processing (TIP), Transactions on Geoscience and Remote Sensing (TGRS)