Today we're joined by Armineh Nourbakhsh of JP Morgan AI Research to discuss the development and capabilities of DocLLM, a layout-aware large language model for multimodal document understanding. Armineh provides a historical overview of the challenges of document AI and an introduction to the DocLLM model. Armineh explains how this model, distinct from both traditional LLMs and document AI models, incorporates both textual semantics and spatial layout in processing enterprise documents like reports and complex contracts. We dig into her team’s approach to training DocLLM, their choice of a generative model as opposed to an encoder-based approach, the datasets they used to build the model, their approach to incorporating layout information, and the various ways they evaluated the model’s performance.
The complete show notes for this episode can be found at twimlai.com/go/672.
Enabling Clinical Automation: From Research to Deployment with Devin Singh - #428
Pixels to Concepts with Backpropagation w/ Roland Memisevic - #427
Fighting Global Health Disparities with AI w/ Jon Wang - #426
Accessibility and Computer Vision - #425
NLP for Equity Investing with Frank Zhao - #424
The Future of Education and AI with Salman Khan - #423
Why AI Innovation and Social Impact Go Hand in Hand with Milind Tambe - #422
What's Next for Fast.ai? w/ Jeremy Howard - #421
Feature Stores for MLOps with Mike del Balso - #420
Exploring Causality and Community with Suzana Ilić - #419
Decolonizing AI with Shakir Mohamed - #418
Spatial Analysis for Real-Time Video Processing with Adina Trufinescu
How Deep Learning has Revolutionized OCR with Cha Zhang - #416
Machine Learning for Food Delivery at Global Scale - #415
Open Source at Qualcomm AI Research with Jeff Gehlhaar and Zahra Koochak - #414
Visualizing Climate Impact with GANs w/ Sasha Luccioni - #413
ML-Powered Language Learning at Duolingo with Burr Settles - #412
Bridging The Gap Between Machine Learning and the Life Sciences with Artur Yakimovich - #411
Understanding Cultural Style Trends with Computer Vision w/ Kavita Bala - #410
That's a VIBE: ML for Human Pose and Shape Estimation with Nikos Athanasiou, Muhammed Kocabas, Michael Black - #409
Create your
podcast in
minutes
It is Free
20/20
The Dropout
Ten Percent Happier with Dan Harris
World News Tonight with David Muir
NEJM This Week