Podcasting
Advertisers
Enterprise
Pricing
Resources
Discover Discover

Log in
Sign up free

Papers Read on AI

Efficient LLM Inference on CPUs

2023-12-02

Download 196

Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with...

Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers.

2023: Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng

https://arxiv.org/pdf/2311.00502v1.pdf

View more

Comments (3)

More Episodes

You may also like

Babbage from The Economist

Big Technology Podcast

Software Engineering Daily

Cyber Security Headlines

Techmeme Ride Home

Noticias de la mañana

Windows Weekly (Audio)

Get this podcast on your phone, Free

Create Your Podcast In Minutes

Full-featured podcast site
Unlimited storage and bandwidth
Comprehensive podcast stats
Distribute to Apple Podcasts, Spotify, and more
Make money with your podcast

It is Free

Podcast Services
MONETIZATION & MORE
KNOWLEDGE BASE
Support
Podbean

Privacy Policy
Cookie Policy
Terms of Use
Consent Preferences
Copyright © 2015-2025 Podbean.com