{
"item_title" : "Observability for LLM Applications",
"item_author" : [" Gabriel Anhaia "],
"item_description" : "Your LLM feature went out at 2 a.m. last Thursday. Latency is fine. Error rate is zero. And somewhere, quietly, it is lying to a customer. Traditional observability cannot see this. CPU graphs, HTTP status codes, and p99 dashboards were built for systems that either work or crash. LLMs do neither. They return a confident sentence, the span closes green, and the failure lives in the content. If you ship LLM features in production - as a backend engineer, a platform engineer, an SRE who inherited someone else's prompt - this book is the operational handbook you have been missing. It is not a theory book about transformers. It is not a prompt engineering tour. It is the stack you actually need on Monday morning to know your AI works. What you get: the three new pillars (traces, evals, cost and drift metrics), built first on vendor-neutral OpenTelemetry GenAI semantic conventions, then layered with the tools that matter in 2026 - Langfuse, LangSmith, Arize Phoenix, Braintrust, DeepEval, Helicone, and a roll-your-own OTel Collector + ClickHouse + Grafana stack for teams that want everything in-house. Every tool gets an honest verdict: what it is best at, what it is bad at, when to pick it, what it costs. You will learn how to capture a full LLM decision path as a trace, run evals continuously in CI and in production, track token cost per user and per feature, detect drift before your users do, and write incident response runbooks for a failure mode your pager has never seen. Real code in Python, Go, and TypeScript. Real dashboards. Real traces. Complementary to Hamel Husain's Evals for AI Engineers (O'Reilly, 2026): where that book goes deep on eval methodology for ML engineers, this one covers the wider operational stack - tracing, tooling, cost, drift, and on-call - for the platform-engineer reader. By the end, you will have a production-readiness checklist you can run against your own system and mean it when you tell your boss the answer is yes. The first chapter starts with a real incident. Monday morning, you will have something to do. Book 1 of The AI Engineer's Library.",
"item_img_path" : "https://covers3.booksamillion.com/covers/bam/9/79/825/751/9798257519970_b.jpg",
"price_data" : {
"retail_price" : "24.99", "online_price" : "24.99", "our_price" : "24.99", "club_price" : "24.99", "savings_pct" : "0", "savings_amt" : "0.00", "club_savings_pct" : "0", "club_savings_amt" : "0.00", "discount_pct" : "10", "store_price" : ""
}
}
Observability for LLM Applications : Tracing, Evals, and Shipping AI You Can Trust
Overview
Your LLM feature went out at 2 a.m. last Thursday. Latency is fine. Error rate is zero. And somewhere, quietly, it is lying to a customer.
Traditional observability cannot see this. CPU graphs, HTTP status codes, and p99 dashboards were built for systems that either work or crash. LLMs do neither. They return a confident sentence, the span closes green, and the failure lives in the content. If you ship LLM features in production - as a backend engineer, a platform engineer, an SRE who inherited someone else's prompt - this book is the operational handbook you have been missing. It is not a theory book about transformers. It is not a prompt engineering tour. It is the stack you actually need on Monday morning to know your AI works. What you get: the three new pillars (traces, evals, cost and drift metrics), built first on vendor-neutral OpenTelemetry GenAI semantic conventions, then layered with the tools that matter in 2026 - Langfuse, LangSmith, Arize Phoenix, Braintrust, DeepEval, Helicone, and a roll-your-own OTel Collector + ClickHouse + Grafana stack for teams that want everything in-house. Every tool gets an honest verdict: what it is best at, what it is bad at, when to pick it, what it costs. You will learn how to capture a full LLM decision path as a trace, run evals continuously in CI and in production, track token cost per user and per feature, detect drift before your users do, and write incident response runbooks for a failure mode your pager has never seen. Real code in Python, Go, and TypeScript. Real dashboards. Real traces. Complementary to Hamel Husain's Evals for AI Engineers (O'Reilly, 2026): where that book goes deep on eval methodology for ML engineers, this one covers the wider operational stack - tracing, tooling, cost, drift, and on-call - for the platform-engineer reader. By the end, you will have a production-readiness checklist you can run against your own system and mean it when you tell your boss the answer is yes. The first chapter starts with a real incident. Monday morning, you will have something to do. Book 1 of The AI Engineer's Library.This item is Non-Returnable
Customers Also Bought
Details
- ISBN-13: 9798257519970
- ISBN-10: 9798257519970
- Publisher: Independently Published
- Publish Date: April 2026
- Dimensions: 9 x 6 x 0.69 inches
- Shipping Weight: 0.97 pounds
- Page Count: 330
Related Categories
