How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
Self-Rewarding Language Models
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering
Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Large Language Models for Generative Information Extraction: A Survey
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Soaring from 4K to 400K: Extending LLM’s Context with Activation Beacon
Parameter-Efficient Transfer Learning for NLP
Mixtral of Experts
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Video Understanding with Large Language Models: A Survey
GPT-4V(ision) is a Generalist Web Agent, if Grounded
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
AnyText: Multilingual Visual Text Generation And Editing
KwaiAgents: Generalized Information-seeking Agent System with Large Language Models
Join Podbean Ads Marketplace and connect with engaged listeners.
Advertise Today
Create your
podcast in
minutes
It is Free
The WAN Show
Cyber Security Headlines
Panic World
Software Engineering Daily
Babbage from The Economist