My current research interest involves investigating the role of multimodal language models at understanding and reasoning the world through rich visual representations, especially by leveraging language as a tool to decode and reason the underlying physical rules governing the world. My research also involves understanding the limits of systematic language understanding - the ability of neural systems to understand language in a human-like way - by evaluating the extent of their capabilities in understanding semantics, syntax and generalizability.
I am also involved in improving and enabling reproducibile research in Machine Learning - I’m the lead organizer of the annual Machine Learning Reproducibility Challenge (V1, V2, V3, V4, V5, V6, v7), and I serve as an associate editor at ReScience C, a peer reviewed journal promoting reproducible research. My work has been covered by several news outlets in the past, including Nature, VentureBeat, InfoQ, DailyMail and Hindustan Times.
Featured Publications
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning; Mido Assran*, Adrien Bardes*, David Fan*, Quentin Garrido*, Russell Howes*, Mojtaba, Komeili*, Matthew Muckley*, Ammar Rizvi*, Claire Roberts*, Koustuv Sinha*, Artem Zholus*, Sergio Arnaud*, Abha Gejji*, Ada Martin*, Francois Robert Hogan*, Daniel Dugas*, Piotr Bojanowski, Vasil Khalidov, Patrick Labatut, Francisco Massa, Marc Szafraniec, Kapil Krishnakumar, Yong Li, Xiaodong Ma, Sarath Chandar, Franziska Meier*, Yann LeCun*, Michael Rabbat*, Nicolas Ballas*; Huggingface | Code | Blog
For a full and up to date list of my publications please visit my Google Scholar profile.
News
[06/11/25] Excited to announce the release of our frontier video understanding model, VJEPA-2! Checkout our model code, weights and cool demos!
[06/11/25] Happy to release the Physical World Reasoning leaderboard, along with a new dataset, MVPBench (Paper) for assessing video understanding capabilities of modern VLMs.
[01/06/24] Serving as a Senior Area Chair at ACL 2024.
I’m fortunate to be supervising / have supervised exceptionally strong interns, and always looking to support more students!
Peter Tong, Research Intern @ FAIR, Meta AI 2025-26; Peter also worked with me before, co-supervised with Mike Rabbat and Zhuang Liu as a Research Intern @ Meta AI, 2024
Evaluating Logical Generalization with Graph Neural Networks, Weights and Biases Salon, (Online), May 2020
ML Reproducibility - From Theory to Practice, DL4Science Seminar, Lawrence Berkeley National Laboratory, Berkeley, (Online), August 2020; MICCAI Hackathon, Peru, 2020, October 2020; Bielefield University, Germany, hosted by Malte Schilling, October 2021
Featured Awards
Outstanding Paper Award, ACL 2022, Language model acceptability judgements are not always robust to context; Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Outstanding Paper Award, ACL 2021, UnNatural Language Inference, Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, Adina Williams