{
"item_title" : "Information Theory for Machine Learning",
"item_author" : [" Yehuda Setnik "],
"item_description" : "The complete graduate-level reference for entropy, divergence, and mutual information in modern machine learning, rigorously developed from measure theory to contemporary estimators and algorithms.Measure-theoretic foundations: sigma-algebras, Radon-Nikodym, conditional expectation, change of measure.Core measures: entropy, cross-entropy, KL, mutual information; f-divergences and Renyi divergences with variational dualities (Fenchel, Donsker-Varadhan).Data processing and fundamental inequalities: log-sum, Pinsker, Csiszar-Kullback-Pinsker, Fano, Le Cam, Assouad; equality conditions and sufficiency.Gaussian tools: entropy power inequality, de Bruijn identity, Fisher information, I-MMSE, Gaussian extremality.Maximum entropy and exponential families; log-partition convexity, Bregman geometry, Pythagorean theorems.Fisher information and asymptotics: score, Cramer-Rao bounds, LAN, Bernstein-von Mises, asymptotic efficiency.Information geometry and natural gradients: Fisher-Rao metric, dual connections, mirror descent.Source coding and MDL: Kraft-McMillan, NML, universal coding, compression-generalization links.Generalization: PAC-Bayes bounds, mutual information bounds I(W;S), stability of SGD.Concentration via information: DV method, log-Sobolev and Poincare inequalities, transportation T1/T2, hypercontractivity.Variational inference and divergence minimization: ELBO, alpha-divergences, EP, black-box VI with reparameterization.Estimating entropy and MI: plug-in, kNN, KDE, Kraskov, MINE, InfoNCE; minimax rates and consistency.Rate-distortion and information bottleneck: Blahut-Arimoto, optimal encoders, sufficiency-compression trade-offs.Contrastive representation learning under augmentations: alignment vs uniformity, identifiability, sample complexity.Generative modeling: VAEs, bits-back coding, beta-VAE, TCVAE; likelihood calibration and posterior collapse.Score matching and Stein: Fisher divergence, kernel Stein discrepancies; diffusion models as score-based SDEs with likelihood estimation.Optimal transport with entropic regularization: Kantorovich duality, Sinkhorn, Schrodinger bridges; OT vs f-divergence objectives.Distributed and federated learning under communication limits: quantization, gradient coding, lower bounds via information.Privacy and leakage: differential privacy, Renyi DP, moments accountant; accuracy-privacy trade-offs and inference risks.Active learning and Bayesian experimental design: expected information gain, submodularity, scalable estimators.",
"item_img_path" : "https://covers1.booksamillion.com/covers/bam/9/79/827/372/9798273722620_b.jpg",
"price_data" : {
"retail_price" : "79.99", "online_price" : "79.99", "our_price" : "79.99", "club_price" : "79.99", "savings_pct" : "0", "savings_amt" : "0.00", "club_savings_pct" : "0", "club_savings_amt" : "0.00", "discount_pct" : "10", "store_price" : ""
}
}
Information Theory for Machine Learning : Theorems, Proofs, and Python Implementations
Overview
The complete graduate-level reference for entropy, divergence, and mutual information in modern machine learning, rigorously developed from measure theory to contemporary estimators and algorithms.
- Measure-theoretic foundations: sigma-algebras, Radon-Nikodym, conditional expectation, change of measure.
- Core measures: entropy, cross-entropy, KL, mutual information; f-divergences and Renyi divergences with variational dualities (Fenchel, Donsker-Varadhan).
- Data processing and fundamental inequalities: log-sum, Pinsker, Csiszar-Kullback-Pinsker, Fano, Le Cam, Assouad; equality conditions and sufficiency.
- Gaussian tools: entropy power inequality, de Bruijn identity, Fisher information, I-MMSE, Gaussian extremality.
- Maximum entropy and exponential families; log-partition convexity, Bregman geometry, Pythagorean theorems.
- Fisher information and asymptotics: score, Cramer-Rao bounds, LAN, Bernstein-von Mises, asymptotic efficiency.
- Information geometry and natural gradients: Fisher-Rao metric, dual connections, mirror descent.
- Source coding and MDL: Kraft-McMillan, NML, universal coding, compression-generalization links.
- Generalization: PAC-Bayes bounds, mutual information bounds I(W;S), stability of SGD.
- Concentration via information: DV method, log-Sobolev and Poincare inequalities, transportation T1/T2, hypercontractivity.
- Variational inference and divergence minimization: ELBO, alpha-divergences, EP, black-box VI with reparameterization.
- Estimating entropy and MI: plug-in, kNN, KDE, Kraskov, MINE, InfoNCE; minimax rates and consistency.
- Rate-distortion and information bottleneck: Blahut-Arimoto, optimal encoders, sufficiency-compression trade-offs.
- Contrastive representation learning under augmentations: alignment vs uniformity, identifiability, sample complexity.
- Generative modeling: VAEs, bits-back coding, beta-VAE, TCVAE; likelihood calibration and posterior collapse.
- Score matching and Stein: Fisher divergence, kernel Stein discrepancies; diffusion models as score-based SDEs with likelihood estimation.
- Optimal transport with entropic regularization: Kantorovich duality, Sinkhorn, Schrodinger bridges; OT vs f-divergence objectives.
- Distributed and federated learning under communication limits: quantization, gradient coding, lower bounds via information.
- Privacy and leakage: differential privacy, Renyi DP, moments accountant; accuracy-privacy trade-offs and inference risks.
- Active learning and Bayesian experimental design: expected information gain, submodularity, scalable estimators.
This item is Non-Returnable
Customers Also Bought
Details
- ISBN-13: 9798273722620
- ISBN-10: 9798273722620
- Publisher: Independently Published
- Publish Date: November 2025
- Dimensions: 11 x 8.5 x 0.77 inches
- Shipping Weight: 1.91 pounds
- Page Count: 374
Related Categories
