menu
{ "item_title" : "Observability for LLM Applications", "item_author" : [" Gabriel Anhaia "], "item_description" : "Your LLM feature went out at 2 a.m. last Thursday. Latency is fine. Error rate is zero. And somewhere, quietly, it is lying to a customer. Traditional observability cannot see this. CPU graphs, HTTP status codes, and p99 dashboards were built for systems that either work or crash. LLMs do neither. They return a confident sentence, the span closes green, and the failure lives in the content. If you ship LLM features in production - as a backend engineer, a platform engineer, an SRE who inherited someone else's prompt - this book is the operational handbook you have been missing. It is not a theory book about transformers. It is not a prompt engineering tour. It is the stack you actually need on Monday morning to know your AI works. What you get: the three new pillars (traces, evals, cost and drift metrics), built first on vendor-neutral OpenTelemetry GenAI semantic conventions, then layered with the tools that matter in 2026 - Langfuse, LangSmith, Arize Phoenix, Braintrust, DeepEval, Helicone, and a roll-your-own OTel Collector + ClickHouse + Grafana stack for teams that want everything in-house. Every tool gets an honest verdict: what it is best at, what it is bad at, when to pick it, what it costs. You will learn how to capture a full LLM decision path as a trace, run evals continuously in CI and in production, track token cost per user and per feature, detect drift before your users do, and write incident response runbooks for a failure mode your pager has never seen. Real code in Python, Go, and TypeScript. Real dashboards. Real traces. Complementary to Hamel Husain's Evals for AI Engineers (O'Reilly, 2026): where that book goes deep on eval methodology for ML engineers, this one covers the wider operational stack - tracing, tooling, cost, drift, and on-call - for the platform-engineer reader. By the end, you will have a production-readiness checklist you can run against your own system and mean it when you tell your boss the answer is yes. The first chapter starts with a real incident. Monday morning, you will have something to do. Book 1 of The AI Engineer's Library.", "item_img_path" : "https://covers3.booksamillion.com/covers/bam/9/79/825/751/9798257519970_b.jpg", "price_data" : { "retail_price" : "24.99", "online_price" : "24.99", "our_price" : "24.99", "club_price" : "24.99", "savings_pct" : "0", "savings_amt" : "0.00", "club_savings_pct" : "0", "club_savings_amt" : "0.00", "discount_pct" : "10", "store_price" : "" } }
Observability for LLM Applications|Gabriel Anhaia

Observability for LLM Applications : Tracing, Evals, and Shipping AI You Can Trust

local_shippingShip to Me
In Stock.
FREE Shipping for Club Members help

Overview

Your LLM feature went out at 2 a.m. last Thursday. Latency is fine. Error rate is zero. And somewhere, quietly, it is lying to a customer.

Traditional observability cannot see this. CPU graphs, HTTP status codes, and p99 dashboards were built for systems that either work or crash. LLMs do neither. They return a confident sentence, the span closes green, and the failure lives in the content.

If you ship LLM features in production - as a backend engineer, a platform engineer, an SRE who inherited someone else's prompt - this book is the operational handbook you have been missing. It is not a theory book about transformers. It is not a prompt engineering tour. It is the stack you actually need on Monday morning to know your AI works.

What you get: the three new pillars (traces, evals, cost and drift metrics), built first on vendor-neutral OpenTelemetry GenAI semantic conventions, then layered with the tools that matter in 2026 - Langfuse, LangSmith, Arize Phoenix, Braintrust, DeepEval, Helicone, and a roll-your-own OTel Collector + ClickHouse + Grafana stack for teams that want everything in-house. Every tool gets an honest verdict: what it is best at, what it is bad at, when to pick it, what it costs.

You will learn how to capture a full LLM decision path as a trace, run evals continuously in CI and in production, track token cost per user and per feature, detect drift before your users do, and write incident response runbooks for a failure mode your pager has never seen. Real code in Python, Go, and TypeScript. Real dashboards. Real traces.

Complementary to Hamel Husain's Evals for AI Engineers (O'Reilly, 2026): where that book goes deep on eval methodology for ML engineers, this one covers the wider operational stack - tracing, tooling, cost, drift, and on-call - for the platform-engineer reader.

By the end, you will have a production-readiness checklist you can run against your own system and mean it when you tell your boss the answer is yes. The first chapter starts with a real incident. Monday morning, you will have something to do.

Book 1 of The AI Engineer's Library.

This item is Non-Returnable

Details

  • ISBN-13: 9798257519970
  • ISBN-10: 9798257519970
  • Publisher: Independently Published
  • Publish Date: April 2026
  • Dimensions: 9 x 6 x 0.69 inches
  • Shipping Weight: 0.97 pounds
  • Page Count: 330

Related Categories

You May Also Like...

    1

BAM Customer Reviews