Large models on CPUs

Discover

Podcast Features
Your all-in-one podcasting solution.

Podcast Studio
Easy-to-use audio recorder app.
Livestream
High-performing audio live, without limits.

Podcast App
The best podcast player & podcast app.
Podbean AI
AI-Enhanced Audio Quality and Content Generation.

Ads Marketplace
Join Ads Marketplace to earn money
through sponsorship on your podcast.

PodAds
Manage your ads with dynamic ad insertion capability.
Patron & Paid Content
The seamless way for fans to support you directly
from your podcast.
Apple Podcasts Subscriptions Integration
Effortlessly publish and manage exclusive episodes for your
Apple Podcasts subscribers directly from Podbean.

All Arts Business Comedy Education
Fiction Government Health & Fitness History Kids & Family
Leisure Music News Religion & Spirituality Science
Society & Culture Sports Technology True Crime TV & Film
Live

How to Start a Podcast
How to Start a Live Podcast
How to Monetize a podcast
How to Promote Your Podcast
How to Use Group Recording

Log in
Start your podcast for free

Podcasting
Monetization
Enterprise
Pricing
Discover

Practical AI: Machine Learning, Data Science

Technology

Large models on CPUs

2023-05-02

iOS

Android Share

Model sizes are crazy these days with billions and billions of parameters. As Mark Kurtz explains in this episode, this makes inference slow and expensive despite the fact that up to 90%+ of the parameters don’t influence t...

Mark helps us understand all of the practicalities and progress that is being made in model optimization and CPU inference, including the increasing opportunities to run LLMs and other Generative AI models on commodity hardware.

Leave us a comment

Changelog++ members save 1 minute on this episode because they made the ads disappear. Join today!

Sponsors:

Fastly – Our bandwidth partner. Fastly powers fast, secure, and scalable digital experiences. Move beyond your content delivery network to their powerful edge cloud platform. Learn more at fastly.com
Fly.io – The home of Changelog.com — Deploy your apps and databases close to your users. In minutes you can run your Ruby, Go, Node, Deno, Python, or Elixir app (and databases!) all over the world. No ops required. Learn more at fly.io/changelog and check out the speedrun in their docs.

Featuring:

Mark Kurtz – Twitter, LinkedIn
Daniel Whitenack – Twitter, GitHub, Website

Show Notes:

Neural Magic
SparseML
SparseZoo
Neural Magic Scales up MLPerf™ Inference v3.0 Performance With Demonstrated Power Efficiency; No GPUs Needed
Deploy Optimized Hugging Face Models With DeepSparse and SparseZoo
SparseGPT: Remove 100 Billion Parameters for Free

Something missing or broken? PRs welcome!

Timestamps:

(00:44) - Neural Magic Mark Kurtz
(03:24) - Why does LLM size matter?
(06:15) - GPUs vs. CPUs
(08:45) - Overcoming perception
(10:54) - Most parameters dont affect results
(16:01) - Balancing space & sparsity
(17:47) - Tackling performance hits
(20:38) - Aware optimization vs not?
(23:52) - Community tools
(26:11) - Neural Magic tools
(29:56) - Supporting new architecture
(31:40) - Exciting research trends
(34:52) - Looking forward in this space
(37:05) - Outro