MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Sequential Modeling Enables Scalable Learning for Large Vision Models
Magicoder: Source Code Is All You Need
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Adversarial Diffusion Distillation
Instruction Tuning with Human Curriculum
Initializing Models with Larger Ones
Improving Sample Quality of Diffusion Models Using Self-Attention Guidance
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
TaskWeaver: A Code-First Agent Framework
Efficient LLM Inference on CPUs
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents
STaR: Bootstrapping Reasoning With Reasoning
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion
Exponentially Faster Language Modelling
Orca 2: Teaching Small Language Models How to Reason
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
A Survey on Language Models for Code
Join Podbean Ads Marketplace and connect with engaged listeners.
Advertise Today
Create your
podcast in
minutes
It is Free
WSJ Tech News Briefing
Rebel Tech
CyberWire Daily
Cyber Security Headlines
The WAN Show