MLmetrics: Machine Learning for Econometricians
This open textbook introduces modern machine-learning methods for graduate students in econometrics. The aim is not to replace econometric reasoning with black-box prediction, but to show how flexible ML tools can be used carefully in the settings econometricians actually face: forecasting, financial risk, macroeconomic data, firm-level panels, and distributional uncertainty.
The book assumes a strong background in econometrics and statistics, but no prior exposure to machine learning. Each method is introduced from first principles and connected to familiar econometric ideas such as likelihood, loss functions, forecast evaluation, time-series dependence, and model selection.
Why Does This Book Exist?
This book grew out of two related motivations. The first was my own curiosity about how modern machine-learning methods should be adapted to the econometric problems that arise in research on forecasting, macro-finance, and predictive distributions. The second was teaching the MSc course Machine Learning at the Econometrics Institute at Erasmus University Rotterdam.
In that course, I kept running into the same problem: standard ML texts are usually written for a different audience. They often assume little prior statistical training, work mostly with independent and identically distributed data, and motivate ideas with image or text classification. The econometric issues that arise naturally in economic and financial applications - serial dependence, distributional forecasting, real-time information sets, and leakage from invalid validation designs - usually receive much less attention. At the same time, econometrics texts that discuss prediction or regularization rarely connect those topics to the broader ML toolkit, and seldom explain how methods such as neural networks or gradient boosting should be adapted when the data have the structure of a macroeconomic or financial time series.
The aim of these notes is to bridge that gap. They are not a substitute for the many strong ML resources that already exist. Instead, they frame modern machine learning in a language that econometricians can use immediately, with the guiding question: how can we use flexible methods while respecting the information sets, dependence structures, and evaluation problems that arise in econometric work?
What You Will Learn
Machine-learning examples are often presented in clean i.i.d. settings. Econometric applications rarely look like that. Across the book, the emphasis is on the parts of machine learning that matter when that assumption breaks down:
- evaluating predictions when dependence, nonstationarity, and information sets matter
- using flexible models without contaminating inference through leakage or invalid validation schemes
- thinking about predictive distributions, scoring rules, tail risk, and uncertainty quantification
- the behavior of neural networks and tree-based methods in economic and financial applications
- tuning, comparing, and diagnosing models in a way that remains statistically defensible
Book Roadmap
The chapters are organized into four parts.
| Part | Chapters | Focus |
|---|---|---|
| Background | Information Theory, Cross Validation, Evaluating Predictive Distributions, Optimization | Core tools for measuring uncertainty, selecting models, evaluating forecasts, and training flexible models. |
| Neural Networks | Feed-Forward Networks, Recurrent Networks, LSTM Networks, Empirical Time-Series Exercise, Distribution Modeling | Neural-network models for nonlinear prediction, sequential data, and distributional forecasting. |
| Tree-Based Methods | Decision Trees, Random Forests, Gradient Boosting, Advanced Tree-Based Methods | Interpretable trees, variance reduction by forests, stagewise boosting, and distributional tree methods. |
| Further Topics | Hyperparameter Optimization, Conformal Prediction, Foundation Models for Economic Text, Data Sets Used in This Book | Modern tools for tuning, uncertainty quantification, text as an econometric input, and the data sources that anchor the empirical examples. |
How To Use This Book
Each chapter combines formal notation, econometric interpretation, and Python examples. The code chunks are there to make the ideas concrete, but the mathematical arguments are written so the central logic can be followed without running code.
The exercises are designed with handwritten exam preparation in mind. They emphasize derivations, forecast-evaluation logic, validation pitfalls, and conceptual distinctions that matter when machine-learning methods are applied to economic data.
Why Python?
Many econometricians work primarily in R, and R has excellent support for statistical modeling and time-series analysis. This book uses Python because the modern ML ecosystem - PyTorch, scikit-learn, Optuna, Hugging Face - is implemented there, and students who go on to work with deep learning or foundation models will encounter Python as the default. The focus throughout remains on model choice, loss functions, validation design, and forecast evaluation rather than on software for its own sake.
What This Book Does Not Cover
This book is not a general introduction to econometrics, and it is not a full treatment of time-series econometrics. It does not replace a dedicated treatment of ARIMA, GARCH, state-space models, causal inference, or asymptotic theory. Those topics are assumed as background when needed.
It is also not a software-engineering manual. The code examples are there to make the statistical ideas concrete, not to provide production-ready ML pipelines.
How This Book Relates to Other Resources
Several good resources cover adjacent territory, and it helps to say briefly how this book fits in.
Coqueret and Guida Coqueret and Guida (2020) provide a practitioner-oriented guide to ML in asset pricing, with a focus on return prediction and portfolio formation. This book covers less on portfolio construction but more on time-series validation, distributional scoring rules, conformal prediction under dependence, and foundation models for economic text.
Standard ML textbooks such as James et al. James et al. (2013) or Hands-On Machine Learning are excellent starting points. This book picks up where they leave off: it assumes readers already understand regression and likelihood, and focuses on adapting ML methods to non-i.i.d. econometric settings.
Feedback and Updates
This book is an ongoing project. Its public home is mlmetrics.org. If you spot a typo, find an unclear explanation, or have suggestions for additional material, please open an issue on GitHub: github.com/onnokleen/mlmetrics/issues. Instructors who use the book in their courses are especially welcome to get in touch.
Acknowledgements
I am grateful to my colleagues for numerous discussions that helped shape the ideas and direction of this book. I would also like to thank the team behind Tidy Finance for providing an example of what an open, carefully structured, and practically useful textbook project can look like; their work gave me the inspiration to push this project forward. Finally, I thank my students for the numerous comments and suggestions they provided on earlier versions of these notes.
How To Cite This Book
If you use this book in research, teaching materials, or course syllabi, please cite it as:
Kleen, Onno. 2026. MLmetrics: Machine Learning for Econometricians. Open textbook. https://mlmetrics.org.
BibTeX:
@book{Kleen2026MLmetrics,
author = {Kleen, Onno},
title = {MLmetrics: Machine Learning for Econometricians},
year = {2026},
url = {https://mlmetrics.org},
note = {Open textbook}
}