{ "item_title" : "AI Engineering", "item_author" : [" Husn Ara "], "item_description" : "AI Engineering: Building Multi-Modal Intelligent Systems with Vision, Language, and AudioFrom LLM Fine-Tuning to Voice Agents, AR Interfaces, and Real-World DeploymentUnlock the future of artificial intelligence with practical, production-ready multi-modal engineering.This hands-on guide is built for developers, researchers, and AI professionals who want to go beyond chatbots and dive into building intelligent systems that understand text, images, audio, and human intent - all in one pipeline.Whether you're fine-tuning large language models (LLMs) or creating voice-driven AR interfaces, this book walks you through the real engineering decisions, tools, and architectures needed to bring multi-modal AI to life.What You'll Learn:Fine-tuning Large Language Models (LLMs): Train and adapt models like GPT-2, LLaMA, and Mistral for custom tasks using Hugging Face, LoRA, QLoRA, and PEFT.Voice Interfaces: Combine Whisper, LLMs, and Bark/Tortoise TTS to build interactive speech-driven assistants.Computer Vision + Language: Use models like BLIP, CLIP, and DETR to connect what systems see to what they say and understand.Instruction Tuning & Hyperparameter Optimization: Build smarter, domain-specific models with efficient training workflows.Multi-Modal Pipelines: Chain audio, image, and text inputs for question answering, summarization, tutoring, and AR/robotic control.Real-Time Interfaces: Deploy intelligent agents using FastAPI, Streamlit, Gradio, Docker, and Hugging Face Spaces.Edge & Offline Deployment: Optimize models with ONNX, quantization (4-bit, 8-bit), and TensorRT for low-latency inference on CPU/GPU.Use Cases Covered:Smart document summarizers with OCR + TTSVoice-enabled image assistantsEmotion-aware agentsVirtual tutorsAR-enhanced AI interfacesRobotic perception + control from voice/image inputSecure, multilingual, and privacy-conscious AI systemsTools & Frameworks Inside:Python, PyTorch, Hugging Face TransformersLangChain, OpenCV, Whisper, TTS, BLIPROS, Unity (AR/VR), Gradio, StreamlitDocker, FastAPI, gRPC, TorchServeBuilt for engineers. Written with depth. Designed for real-world impact.If you're ready to build intelligent multi-modal agents that understand the world like humans do - across speech, vision, and language - this book gives you the complete roadmap.Perfect for:Machine learning engineers, data scientists, AI product developers, researchers, robotics engineers, and anyone building cutting-edge AI systems.", "item_img_path" : "https://covers3.booksamillion.com/covers/bam/9/79/829/608/9798296089038_b.jpg", "price_data" : { "retail_price" : "35.80", "online_price" : "35.80", "our_price" : "35.80", "club_price" : "35.80", "savings_pct" : "0", "savings_amt" : "0.00", "club_savings_pct" : "0", "club_savings_amt" : "0.00", "discount_pct" : "10", "store_price" : "" } }

AI Engineering : Building Multi-Modal Intelligent Systems with Vision, Language, and Audio From LLM Fine-Tuning to Voice Agents, AR Interfaces, and Rea

Name: AI Engineering
SKU: 9798296089038
Price: 35.80 USD
Availability: InStock

by Husn Ara

Ship to Me

In Stock.

FREE Shipping for Club Members

In-Store Pickup

Overview

AI Engineering: Building Multi-Modal Intelligent Systems with Vision, Language, and Audio
From LLM Fine-Tuning to Voice Agents, AR Interfaces, and Real-World Deployment

Unlock the future of artificial intelligence with practical, production-ready multi-modal engineering.

This hands-on guide is built for developers, researchers, and AI professionals who want to go beyond chatbots and dive into building intelligent systems that understand text, images, audio, and human intent - all in one pipeline.

Whether you're fine-tuning large language models (LLMs) or creating voice-driven AR interfaces, this book walks you through the real engineering decisions, tools, and architectures needed to bring multi-modal AI to life.

What You'll Learn:

Fine-tuning Large Language Models (LLMs): Train and adapt models like GPT-2, LLaMA, and Mistral for custom tasks using Hugging Face, LoRA, QLoRA, and PEFT.
Voice Interfaces: Combine Whisper, LLMs, and Bark/Tortoise TTS to build interactive speech-driven assistants.
Computer Vision + Language: Use models like BLIP, CLIP, and DETR to connect what systems see to what they say and understand.
Instruction Tuning & Hyperparameter Optimization: Build smarter, domain-specific models with efficient training workflows.
Multi-Modal Pipelines: Chain audio, image, and text inputs for question answering, summarization, tutoring, and AR/robotic control.
Real-Time Interfaces: Deploy intelligent agents using FastAPI, Streamlit, Gradio, Docker, and Hugging Face Spaces.
Edge & Offline Deployment: Optimize models with ONNX, quantization (4-bit, 8-bit), and TensorRT for low-latency inference on CPU/GPU.

Use Cases Covered:

Smart document summarizers with OCR + TTS
Voice-enabled image assistants
Emotion-aware agents
Virtual tutors
AR-enhanced AI interfaces
Robotic perception + control from voice/image input
Secure, multilingual, and privacy-conscious AI systems

Tools & Frameworks Inside:

Python, PyTorch, Hugging Face Transformers
LangChain, OpenCV, Whisper, TTS, BLIP
ROS, Unity (AR/VR), Gradio, Streamlit
Docker, FastAPI, gRPC, TorchServe

Built for engineers. Written with depth. Designed for real-world impact.

If you're ready to build intelligent multi-modal agents that understand the world like humans do - across speech, vision, and language - this book gives you the complete roadmap.

Perfect for:
Machine learning engineers, data scientists, AI product developers, researchers, robotics engineers, and anyone building cutting-edge AI systems.

This item is Non-Returnable

Customers Also Bought

Details

ISBN-13: 9798296089038
ISBN-10: 9798296089038
Publisher: Independently Published
Publish Date: August 2025
Dimensions: 9 x 6 x 0.62 inches
Shipping Weight: 0.88 pounds
Page Count: 296

Related Categories

Favorites

What We Recommend

Featured

Shop by Category

Fiction

Nonfiction

Shop By Format

More Information

Favorites

Featured

Shop By Author A-G

Shop by Author G-L

Shop By Author L-R

Shop by Author R-Z

Shop By Series A-G

Shop By Series H-M

Shop By Series N-Z

Customers Also Liked

More in Manga

Favorites

Favorite Characters

Kids Fiction

Nonfiction

Shop by Age

Top Authors

Educational Resources

More Categories

Favorites

Popular Authors

Bestselling Series A-K

Bestselling Series L-Z

Favorites

Music

Featured

Page to Screen

Tabletop Role-playing

Fandoms

LEGO

Bestsellers

Games & Puzzles

Favorites

Best Books of 2026

#BookTok

Best Gifts for Kids

Toys & Games

For Teens & Young Adults

Pop Culture & Fandoms

Pen to Paper Shop

Faith-Based Gifts

Bargains in Fiction

Bargains in Nonfiction

Bargains in Young Adult Books

Bargains in Kids Fiction

Bargains in Kids Nonfiction

Bargains in Faith & Inspiration

Bargain Favorites

AI Engineering : Building Multi-Modal Intelligent Systems with Vision, Language, and Audio From LLM Fine-Tuning to Voice Agents, AR Interfaces, and Rea

Overview

Customers Also Bought

Details

You May Also Like...

BAM Customer Reviews