In this episode we discuss MM-VID: Advancing Video Understanding with GPT-4V(ision)
by Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang. The paper introduces MM-VID, a system that incorporates GPT-4V with vision, audio, and speech experts to enhance video understanding. It focuses on handling complex tasks like tracking character storylines across multiple episodes. The paper showcases...
In this episode we discuss MM-VID: Advancing Video Understanding with GPT-4V(ision)
by Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang. The paper introduces MM-VID, a system that incorporates GPT-4V with vision, audio, and speech experts to enhance video understanding. It focuses on handling complex tasks like tracking character storylines across multiple episodes. The paper showcases the capabilities of MM-VID through detailed responses and demonstrations in various figures.
View more