{ "item_title" : "The LLM Engineer", "item_author" : [" Lachlan James "], "item_description" : "Most developers working with large language models today are flying blind. They understand the interface but they don't really understand the machine. And there lies the gap - between knowing how to use a model and knowing how to build, debug, and adapt one - is where real engineering capability lives.The LLM Engineer is a hands-on implementation guide that closes that gap completely. Starting from a single sentence - predict the next token - and ending with a quantized model serving requests through an OpenAI-compatible API, this book walks you through every layer of the modern LLM stack. No black boxes. No magic. Every concept is introduced once, explained precisely, and immediately followed by complete, runnable code.The architecture you build matches the design of Llama 3, Mistral, and Gemma at the blueprint level - rotary position embeddings, grouped-query attention, SwiGLU activations, RMSNorm - not a toy approximation, but the real thing at teachable scale.Inside this book, you'll learn how to: Implement byte-pair encoding from scratch and understand the tokenizer quirks that cause real production bugsBuild scaled dot-product attention, multi-head attention, and grouped-query attention (GQA) - the memory-efficient variant used by every major open-weight model since 2022Construct a complete transformer block using pre-norm RMSNorm, SwiGLU feed-forward layers, RoPE positional encodings, and residual connectionsDesign and run a full training pipeline: packed sequences, AdamW with parameter-group weight decay, cosine warmup scheduling, gradient clipping, mixed-precision bfloat16, and distributed data parallelismFine-tune models efficiently using LoRA and QLoRA - implemented entirely from scratch, not just called from a libraryTrain for human preference alignment using Direct Preference Optimization (DPO), the technique that replaced PPO-based RLHF in most production pipelinesQuantize models to INT8 and 4-bit precision using GPTQ, AWQ, and GGUF for CPU deployment with llama.cppServe models at scale using vLLM with PagedAttention and continuous batching, and expose an OpenAI-compatible APIAlong the way, you'll build: A byte-pair encoding tokenizer that handles Unicode, byte-level encoding, and the edge cases that break naive implementationsA complete GPT-style transformer language model - architecturally identical to Llama 3 - trained from scratch on real text dataA full training loop with Weights & Biases experiment tracking, checkpointing, and distributed GPU support via PyTorch DDPAn inference engine with greedy decoding, temperature sampling, top-k, top-p, repetition penalties, speculative decoding, and structured output generationLoRA and QLoRA adapters injected and merged into a pre-trained model, reducing trainable parameters by over 99%A DPO-aligned instruct model trained on preference pairs, starting from an SFT checkpointA production-ready serving stack: quantized model exported to GGUF, served locally via Ollama, and deployed at scale with vLLMEvery chapter includes working, runnable code, common bug sections drawn from real implementation failures, and exercises that push your understanding beyond what the text alone can teach.", "item_img_path" : "https://covers4.booksamillion.com/covers/bam/9/79/825/596/9798255965939_b.jpg", "price_data" : { "retail_price" : "49.67", "online_price" : "49.67", "our_price" : "49.67", "club_price" : "49.67", "savings_pct" : "0", "savings_amt" : "0.00", "club_savings_pct" : "0", "club_savings_amt" : "0.00", "discount_pct" : "10", "store_price" : "" } }

The LLM Engineer : Building Transformer Language Models with Python and PyTorch

Name: The LLM Engineer
SKU: 9798255965939
Price: 49.67 USD
Availability: InStock

by Lachlan James

Ship to Me

In Stock.

FREE Shipping for Club Members

In-Store Pickup

Overview

Most developers working with large language models today are flying blind. They understand the interface but they don't really understand the machine. And there lies the gap - between knowing how to use a model and knowing how to build, debug, and adapt one - is where real engineering capability lives.

The LLM Engineer is a hands-on implementation guide that closes that gap completely. Starting from a single sentence - predict the next token - and ending with a quantized model serving requests through an OpenAI-compatible API, this book walks you through every layer of the modern LLM stack. No black boxes. No magic. Every concept is introduced once, explained precisely, and immediately followed by complete, runnable code.

The architecture you build matches the design of Llama 3, Mistral, and Gemma at the blueprint level - rotary position embeddings, grouped-query attention, SwiGLU activations, RMSNorm - not a toy approximation, but the real thing at teachable scale.

Inside this book, you'll learn how to:

Implement byte-pair encoding from scratch and understand the tokenizer quirks that cause real production bugs
Build scaled dot-product attention, multi-head attention, and grouped-query attention (GQA) - the memory-efficient variant used by every major open-weight model since 2022
Construct a complete transformer block using pre-norm RMSNorm, SwiGLU feed-forward layers, RoPE positional encodings, and residual connections
Design and run a full training pipeline: packed sequences, AdamW with parameter-group weight decay, cosine warmup scheduling, gradient clipping, mixed-precision bfloat16, and distributed data parallelism
Fine-tune models efficiently using LoRA and QLoRA - implemented entirely from scratch, not just called from a library
Train for human preference alignment using Direct Preference Optimization (DPO), the technique that replaced PPO-based RLHF in most production pipelines
Quantize models to INT8 and 4-bit precision using GPTQ, AWQ, and GGUF for CPU deployment with llama.cpp
Serve models at scale using vLLM with PagedAttention and continuous batching, and expose an OpenAI-compatible API

Along the way, you'll build:

A byte-pair encoding tokenizer that handles Unicode, byte-level encoding, and the edge cases that break naive implementations
A complete GPT-style transformer language model - architecturally identical to Llama 3 - trained from scratch on real text data
A full training loop with Weights & Biases experiment tracking, checkpointing, and distributed GPU support via PyTorch DDP
An inference engine with greedy decoding, temperature sampling, top-k, top-p, repetition penalties, speculative decoding, and structured output generation
LoRA and QLoRA adapters injected and merged into a pre-trained model, reducing trainable parameters by over 99%
A DPO-aligned instruct model trained on preference pairs, starting from an SFT checkpoint
A production-ready serving stack: quantized model exported to GGUF, served locally via Ollama, and deployed at scale with vLLM

Every chapter includes working, runnable code, common bug sections drawn from real implementation failures, and exercises that push your understanding beyond what the text alone can teach.

This item is Non-Returnable

Customers Also Bought

Details

ISBN-13: 9798255965939
ISBN-10: 9798255965939
Publisher: Independently Published
Publish Date: April 2026
Dimensions: 9.25 x 7.5 x 0.52 inches
Shipping Weight: 0.95 pounds
Page Count: 246

Related Categories

Favorites

What We Recommend

Featured

Shop by Category

Fiction

Nonfiction

Shop By Format

More Information

Favorites

Featured

Shop By Author A-G

Shop by Author G-L

Shop by Author R-Z

Shop By Series A-G

Shop By Series H-M

Shop By Series N-Z

Customers Also Liked

More in Manga

Favorites

Favorite Characters

Kids Fiction

Nonfiction

Shop by Age

Top Authors

Educational Resources

More Categories

Favorites

Popular Authors

Bestselling Series A-K

Bestselling Series L-Z

Favorites

Music

Featured

Page to Screen

Tabletop Role-playing

Fandoms

LEGO

Bestsellers

Games & Puzzles

Favorites

Best Books of 2026

#BookTok

Best Gifts for Kids

Toys & Games

For Teens & Young Adults

Pop Culture & Fandoms

Pen to Paper Shop

Faith-Based Gifts

Bargains in Fiction

Bargains in Nonfiction

Bargains in Young Adult Books

Bargains in Kids Fiction

Bargains in Kids Nonfiction

Bargains in Faith & Inspiration

Bargain Favorites

The LLM Engineer : Building Transformer Language Models with Python and PyTorch

Overview

Customers Also Bought

Details

You May Also Like...

BAM Customer Reviews