Today we're joined by Armineh Nourbakhsh of JP Morgan AI Research to discuss the development and capabilities of DocLLM, a layout-aware large language model for multimodal document understanding. Armineh provides a historical overview of the challenges of document AI and an introduction to the DocLLM model. Armineh explains how this model, distinct from both traditional LLMs and document AI models, incorporates both textual semantics and spatial layout in processing enterprise documents like reports and complex contracts. We dig into her team’s approach to training DocLLM, their choice of a generative model as opposed to an encoder-based approach, the datasets they used to build the model, their approach to incorporating layout information, and the various ways they evaluated the model’s performance.
The complete show notes for this episode can be found at twimlai.com/go/672.
GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple - #681
Teaching Large Language Models to Reason with Reinforcement Learning with Alex Havrilla - #680
Localizing and Editing Knowledge in LLMs with Peter Hase - #679
Coercing LLMs to Do and Reveal (Almost) Anything with Jonas Geiping - #678
V-JEPA, AI Reasoning from a Non-Generative Architecture with Mido Assran - #677
Video as a Universal Interface for AI Reasoning with Sherry Yang - #676
Assessing the Risks of Open AI Models with Sayash Kapoor - #675
OLMo: Everything You Need to Train an Open Source LLM with Akshita Bhagia - #674
Training Data Locality and Chain-of-Thought Reasoning in LLMs with Ben Prystawski - #673
Are Emergent Behaviors in LLMs an Illusion? with Sanmi Koyejo - #671
AI Trends 2024: Reinforcement Learning in the Age of LLMs with Kamyar Azizzadenesheli - #670
Building and Deploying Real-World RAG Applications with Ram Sriharsha - #669
Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao - #668
Learning Transformer Programs with Dan Friedman - #667
AI Trends 2024: Machine Learning & Deep Learning with Thomas Dietterich - #666
AI Trends 2024: Computer Vision with Naila Murray - #665
Are Vector DBs the Future Data Platform for AI? with Ed Anuff - #664
Quantizing Transformers by Helping Attention Heads Do Nothing with Markus Nagel - #663
Responsible AI in the Generative Era with Michael Kearns - #662
Create your
podcast in
minutes
It is Free
20/20
The Dropout
FiveThirtyEight Politics
Ten Percent Happier with Dan Harris
World News Tonight with David Muir