Today we’re joined by Sherry Yang, senior research scientist at Google DeepMind and a PhD student at UC Berkeley. In this interview, we discuss her new paper, "Video as the New Language for Real-World Decision Making,” which explores how generative video models can play a role similar to language models as a way to solve tasks in the real world. Sherry draws the analogy between natural language as a unified representation of information and text prediction as a common task interface and demonstrates how video as a medium and generative video as a task exhibit similar properties. This formulation enables video generation models to play a variety of real-world roles as planners, agents, compute engines, and environment simulators. Finally, We explore UniSim, an interactive demo of Sherry's work and a preview of her vision for interacting with AI-generated environments.
The complete show notes for this episode can be found at twimlai.com/go/676.
Can Language Models Be Too Big? 🦜 with Emily Bender and Margaret Mitchell - #467
Applying RL to Real-World Robotics with Abhishek Gupta - #466
Accelerating Innovation with AI at Scale with David Carmona - #465
Complexity and Intelligence with Melanie Mitchell - #464
Robust Visual Reasoning with Adriana Kovashka - #463
Architectural and Organizational Patterns in Machine Learning with Nishan Subedi - #462
Common Sense Reasoning in NLP with Vered Shwartz - #461
How to Be Human in the Age of AI with Ayanna Howard - #460
How to Be Human in the Age of AI with Ayanna Howard - #460
Evolution and Intelligence with Penousal Machado - #459
Innovating Neural Machine Translation with Arul Menezes - #458
Building the Product Knowledge Graph at Amazon with Luna Dong - #457
Towards a Systems-Level Approach to Fair ML with Sarah M. Brown - #456
AI for Digital Health Innovation with Andrew Trister - #455
System Design for Autonomous Vehicles with Drago Anguelov - #454
Building, Adopting, and Maturing LinkedIn's Machine Learning Platform with Ya Xu - #453
Expressive Deep Learning with Magenta DDSP w/ Jesse Engel - #452
Semantic Folding for Natural Language Understanding with Francisco Weber - #451
The Future of Autonomous Systems with Gurdeep Pall - #450
AI for Ecology and Ecosystem Preservation with Bryan Carstens - #449
Create your
podcast in
minutes
It is Free
20/20
The Dropout
Ten Percent Happier with Dan Harris
World News Tonight with David Muir
NEJM This Week