Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs
AI Breakdown

Arxiv paper - Token-Efficient Long Video Understanding for Multimodal LLMs

2025-06-18
In this episode, we discuss Token-Efficient Long Video Understanding for Multimodal LLMs by Jindong Jiang, Xiuyu Li, Zhijian Liu, Muyang Li, Guo Chen, Zhiqi Li, De-An Huang, Guilin Liu, Zhiding Yu, Kurt Keutzer, Sungjin Ahn, Jan Kautz, Hongxu Yin, Yao Lu, Song Han, Wonmin Byeon. The paper introduces STORM, a new architecture that incorporates a temporal encoder using the Mamba State Space Model to better capture temporal dynamics in video-based multimodal large language models. This approach...
View more
Comments (3)

More Episodes

All Episodes>>

Get this podcast on your phone, Free

Create Your Podcast In Minutes

  • Full-featured podcast site
  • Unlimited storage and bandwidth
  • Comprehensive podcast stats
  • Distribute to Apple Podcasts, Spotify, and more
  • Make money with your podcast
Get Started
It is Free