{ "item_title" : "Quantization and Fast Inference", "item_author" : [" Vivek Kalyanarangan "], "item_description" : "Get the eBook free when you register your print book at Manning. Today's AI models demand a lot of memory, compute, and server horsepower--which quickly translates into cost. This book show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment--all with minimal accuracy loss. From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4. What's inside - Applying post-training quantization to production models- Deploying efficiently on CPUs, edge devices, and mobile- Framework-agnostic techniques and real cross-framework parity testing- Flowcharts and checklists for efficient decision making About the reader For ML engineers and researchers experienced in Python. About the author Vivek Kalyanarangan is an AI/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.", "item_img_path" : "https://covers4.booksamillion.com/covers/bam/1/63/343/391/1633433919_b.jpg", "price_data" : { "retail_price" : "59.99", "online_price" : "59.99", "our_price" : "59.99", "club_price" : "59.99", "savings_pct" : "0", "savings_amt" : "0.00", "club_savings_pct" : "0", "club_savings_amt" : "0.00", "discount_pct" : "10", "store_price" : "" } }

Quantization and Fast Inference|Vivek Kalyanarangan

Quantization and Fast Inference : A Practitioner's Guide to Efficient AI

Name: Quantization and Fast Inference
SKU: 9781633433915
Price: 59.99 USD
Availability: PreOrder

by Vivek Kalyanarangan

PRE-ORDER NOW:
Ship to Me

Preorder. This item will be available on December 29, 2026 .

FREE Shipping for Club Members

Overview

Get the eBook free when you register your print book at Manning. Today's AI models demand a lot of memory, compute, and server horsepower--which quickly translates into cost. This book show you how you can optimize AI models without architectural redesigns or task-specific compression. It reveals practical techniques for quantization, systematically reducing numerical precision to achieve faster inference, lower memory usage, and cheaper deployment--all with minimal accuracy loss. From quantization fundamentals to runtime packaging, the book gives you a complete and comprehensive overview of the full quantization pipeline. It starts by deriving quantization mapping from first principles, and then builds your knowledge and skill through techniques for production-tested PTQ and QAT workflows and a fully compressed deployment. You'll learn to apply post-training quantization to production models, run quantization-aware training using fake quantization and straight-through estimators, and handle subtle tradeoffs like activation outliers in LLMs, KV cache pressure, and sub-8-bit formats like NF4 and FP4. What's inside - Applying post-training quantization to production models
- Deploying efficiently on CPUs, edge devices, and mobile
- Framework-agnostic techniques and real cross-framework parity testing
- Flowcharts and checklists for efficient decision making About the reader For ML engineers and researchers experienced in Python. About the author Vivek Kalyanarangan is an AI/ML architect, researcher, and educator with over twelve years of experience designing and deploying large-scale machine learning systems.

Customers Also Bought

Details

ISBN-13: 9781633433915
ISBN-10: 1633433919
Publisher: Manning Publications
Publish Date: December 2026
Shipping Weight: 0.92 pounds
Page Count: 350

Related Categories

Favorites

What We Recommend

Featured

Shop by Category

Fiction

Nonfiction

Shop By Format

More Information

Favorites

Shop By Author A-G

Shop by Author G-L

Shop by Author R-Z

Shop By Series A-G

Shop By Series H-M

Shop By Series N-Z

Customers Also Liked

More in Manga

Favorites

Favorite Characters

Kids Fiction

Nonfiction

Shop by Age

Top Authors

Educational Resources

More Categories

Favorites

Popular Authors

Bestselling Series A-K

Bestselling Series L-Z

Favorites

Music

Featured

Page to Screen

Tabletop Role-playing

Fandoms

LEGO

Bestsellers

Games & Puzzles

Favorites

Best Books of 2026

#BookTok

Best Gifts for Kids

Toys & Games

For Teens & Young Adults

Pop Culture & Fandoms

Pen to Paper Shop

Faith-Based Gifts

Bargains in Fiction

Bargains in Nonfiction

Bargains in Young Adult Books

Bargains in Kids Fiction

Bargains in Kids Nonfiction

Bargains in Faith & Inspiration

Bargain Favorites

Quantization and Fast Inference : A Practitioner's Guide to Efficient AI

Overview

Customers Also Bought

Details

You May Also Like...

BAM Customer Reviews