1 Deep Research Review of MLmetrics
This review covers the public website urlMLmetricsturn1view2 and its public source repository, urlonnokleen/mlmetricsturn9search0, both authored by entity[“people”,“Onno Kleen”,“author of MLmetrics”]. The live site is a Quarto-based open textbook aimed at graduate students in econometrics, organized into four top-level sections with 18 live pages in the public navigation: one landing page plus 17 chapter/appendix pages. The public repository also exposes several additional .qmd files not surfaced in the live navigation, including draft material on autoencoders and reinforcement learning. The site is clearly active and improving quickly, but it is still explicitly marked “under active development,” and the repository history shows rapid changes in late April and early May 2026. citeturn9search2turn12view6turn10view0turn14view0turn15view1turn21view0
1.1 Executive summary
MLmetrics is already strong where many machine-learning teaching sites are weak: it consistently frames machine learning through econometric information sets, real-time forecasting discipline, validation design, scoring rules, and pseudo-out-of-sample evaluation. The best material is in cross-validation, predictive distributions, conformal prediction, the tree-based chapters, and the “foundation models for economic text” chapter, which unusually emphasizes retrieval leakage, version drift, and forecast-origin admissibility. Against the benchmark of standard texts by entity[“people”,“Christopher M. Bishop”,“machine learning researcher and textbook author”], entity[“people”,“Ian Goodfellow”,“deep learning researcher and textbook author”], entity[“people”,“Kevin P. Murphy”,“machine learning researcher and textbook author”], entity[“people”,“Trevor Hastie”,“statistician and textbook author”], and entity[“people”,“Robert Tibshirani”,“statistician and textbook author”], the site is sharper than average on forecasting hygiene, but materially thinner on core baseline material, unsupervised learning, regularized linear methods, interpretability, and modern sequence/tabular practice. citeturn24view1turn24view2turn24view0turn27view3turn27view5turn26search35turn28search12
The highest-confidence corrective findings are all actionable. The most important factual inconsistency is internal: the dataset appendix says the local SPY realized-volatility coverage is 321 trading days from 2015-01-02 to 2016-04-12, while the empirical neural-network chapter prints 2,451 observations running from 2015-01-02 to 2024-12-31. That should be fixed immediately because it undermines trust in the replication path. A second concrete issue is citation integrity: the Diebold–Mariano reference in the predictive-distributions chapter contains an obviously truncated DOI URL. A third issue is reproducibility: the public repo exposes code and scripts but no visible environment file, while the example code assumes Keras 3 plus a TensorFlow backend, Optuna, and scikit-learn functionality that was added only in version 1.4. A fourth issue is licensing: the repo README places the book under CC BY-NC-SA 4.0, but Creative Commons itself does not recommend CC licenses for software, so code snippets and scripts should be given a software-native license. citeturn16view2turn41view0turn37view0turn10view0turn33view0turn33view1turn33view2turn33view3turn33view4turn32search0turn32search1turn9search0turn32search2
The broader editorial conclusion is that MLmetrics is promising and often technically thoughtful, but it is not yet publish-ready as a stable, comprehensive course text. The priority should not be polishing prose first. The priority should be restoring internal consistency, making the empirical material reproducible, separating text and code licensing, and filling the most important “missing middle” topics: regularized linear baselines, modern sequence models beyond LSTM, broader distributional forecasting methods, and practical model interpretation. citeturn9search2turn21view0turn27view3turn27view5turn29search0turn30search3turn31search17turn29search11
1.2 Site structure and indexed content
The live site is structured as a hierarchical Quarto website with floating sidebar navigation, search enabled, and page-to-page navigation enabled. The four public content sections are Background, Neural Networks, Tree-Based Methods, and Further Topics. The site-wide configuration also enables folded code blocks with the label “Show the code,” which means code is intended as embedded supporting material rather than as downloadable notebooks. citeturn12view6
flowchart TD
A[MLmetrics]
A --> B[About This Book]
A --> C[Background]
C --> C1[Information Theory]
C --> C2[Cross Validation]
C --> C3[Evaluating Predictive Distributions]
C --> C4[Optimization for Machine Learning]
A --> D[Neural Networks]
D --> D1[Feed-Forward Neural Networks]
D --> D2[Recurrent Neural Networks]
D --> D3[LSTM Networks]
D --> D4[Empirical Exercise: Networks for Time Series]
D --> D5[Distribution Modeling with Neural Networks]
A --> E[Tree-Based Methods]
E --> E1[Decision Trees]
E --> E2[Random Forests]
E --> E3[Gradient Boosting]
E --> E4[Advanced Tree-Based Methods]
A --> F[Further Topics]
F --> F1[Advanced Hyperparameter Optimization]
F --> F2[Conformal Prediction]
F --> F3[Foundation Models for Economic Text]
F --> F4[Data Sets Used in This Book]
A -. public repo only .-> G[Unpublished or auxiliary qmd files]
G --> G1[Autoencoders]
G --> G2[Reinforcement Learning]
G --> G3[Older Decision Trees draft]
G --> G4[Intro stub]
G --> G5[References stub]
G --> G6[Hidden scratch file]
The live structure comes from the homepage and Quarto configuration; the repo-only nodes come from the public repository listing and exposed .qmd files. citeturn1view2turn12view6turn10view0turn14view0turn14view1turn14view3turn14view4turn15view3
1.2.1 Site map table
The table below consolidates the live navigation and the public repo-only lecture-note sources. citeturn12view6turn10view0
| Status | Page / file | Section | Dominant content types | Notes |
|---|---|---|---|---|
| Live | About This Book | Root | Lecture text, roadmap, references, license | Audience and scope statement |
| Live | Information Theory | Background | Lecture text, equations, exercises, references | Strong link to scoring and likelihood |
| Live | Cross Validation | Background | Lecture text, examples, exercises, references | Strong anti-leakage framing |
| Live | Evaluating Predictive Distributions | Background | Lecture text, derivations, exercises, references | Strong probabilistic-forecast evaluation |
| Live | Optimization for Machine Learning | Background | Lecture text, code, exercises, references | Covers GD, Momentum, AdaGrad, RMSprop, Adam, AdamW |
| Live | Feed-Forward Neural Networks | Neural Networks | Lecture text, code, figures, exercises, references | Good econometric interpretation |
| Live | Recurrent Neural Networks | Neural Networks | Lecture text, exercises, references | Short and conceptually focused |
| Live | LSTM Networks | Neural Networks | Lecture text, exercises, references | Includes transition to attention |
| Live | Empirical Exercise: Networks for Time Series | Neural Networks | Empirical workflow, code, figures, dataset dependence | No visible references section |
| Live | Distribution Modeling with Neural Networks | Neural Networks | Lecture text, niche case study, exercises, references | Focuses on HNN and MDN |
| Live | Decision Trees | Tree-Based Methods | Lecture text, code/examples, exercises, references | Good strengths/limits framing |
| Live | Random Forests | Tree-Based Methods | Lecture text, exercises, references | Good variance-reduction intuition |
| Live | Gradient Boosting | Tree-Based Methods | Lecture text, code, figures, exercises, references | Includes XGBoost note, monotonicity constraints |
| Live | Advanced Tree-Based Methods | Tree-Based Methods | Lecture text, exercises, references | QRF and NGBoost |
| Live | Advanced Hyperparameter Optimization | Further Topics | Lecture text, exercises, references | Covers Bayesian optimization and multi-fidelity |
| Live | Conformal Prediction | Further Topics | Lecture text, code, figures, exercises, references | Strongest advanced chapter |
| Live | Foundation Models for Economic Text | Further Topics | Lecture text, figures, exercises, references | Experimental but unusually thoughtful |
| Live | Data Sets Used in This Book | Further Topics | Dataset docs, code, references | FRED-QD and WRDS/TAQ appendix |
| Repo-only | autoencoders.qmd |
Unpublished | Draft lecture text, exercises | Real substantive chapter, not yet live |
| Repo-only | reinforcement_learning.qmd |
Unpublished | Draft lecture text, exercises | Real substantive chapter, not yet live |
| Repo-only | decision_trees_full.qmd |
Unpublished | Older extended draft | Appears superseded by live tree chapter |
| Repo-only | intro.qmd |
Unpublished | Minimal stub | Not a developed page |
| Repo-only | references.qmd |
Auxiliary | References stub | Boilerplate only |
| Repo-only | hidden.qmd |
Auxiliary | Scratch/code material | Looks like working material, not teaching-ready |
1.2.2 Content types
MLmetrics clearly supports these public content types: lecture text, embedded equations, figures, foldable code, examples, exercises, references, and dataset documentation. It also exposes repo-level scripts and a public FRED-QD CSV. What it does not visibly expose, at least in the live navigation and public root repo listing, are standalone slide decks, downloadable notebooks, or a visible environment specification for code execution. That matters because the pedagogical style is “book plus embedded code,” not “course pack with lecture slides and reproducible labs.” citeturn12view6turn10view0turn13view0turn18view0turn22view7
1.3 Coverage against benchmark texts
Benchmark sources used here were urlProbabilistic Machine Learning: An Introductionturn26search0, urlDeep Learningturn26search1, urlAn Introduction to Statistical Learningturn26search3, urlStatistical Learning with Sparsityturn26search2, and urlPattern Recognition and Machine Learningturn28search12, supplemented by recent survey/review material on deep learning for time-series forecasting, conformal prediction, foundation models for time series, LLM/NLP work in economics and finance, and modern HPO. citeturn27view3turn27view5turn26search35turn28search12turn29search0turn30search3turn31search17turn30search5turn30search14turn29search11
1.3.1 Covered versus expected topics
| Topic family expected from standard ML texts and recent surveys | Coverage on MLmetrics | Assessment |
|---|---|---|
| Probabilistic foundations, entropy, KL, scoring rules | Chapters 1 and 3 | Strong |
| Validation, resampling, leakage, forecast-origin discipline | Chapters 2 and 14 | Strong |
| Optimization fundamentals for ML | Chapter 4 | Strong |
| Feed-forward neural networks and regularization basics | Chapter 5 | Strong but introductory |
| Sequence models | Chapters 6 and 7 | Partial |
| Empirical deep-learning workflow for time series | Chapter 8 | Partial because replication is not public-ready |
| Distributional neural forecasting | Chapter 9 | Partial and too narrow |
| Trees, forests, boosting | Chapters 10–13 | Strong |
| Hyperparameter optimization | Chapter 14 | Strong |
| Conformal prediction and time series issues | Chapter 15 | Strong |
| Foundation models / economic text | Chapter 16 | Strong |
| Dataset documentation | Chapter 17 | Partial because appendix is inconsistent with chapter 8 |
| Linear/logistic baselines and regularized linear models | No real chapter | Missing |
| Unsupervised learning / representation learning beyond repo drafts | Not live | Missing |
| Modern sequence practice beyond LSTM (GRU, transformers, time-series foundation models) | Barely present | Missing/partial |
| Interpretability, calibration practice, SHAP/PDP/ICE, causal ML | No dedicated treatment | Missing |
This synthesis is based on the live chapter set and the benchmark scope in standard texts and recent surveys. citeturn25view2turn22view1turn22view2turn22view3turn25view3turn24view3turn24view4turn38view0turn25view0turn22view10turn25view1turn24view5turn24view2turn24view1turn24view0turn16view0turn27view3turn27view5turn26search35turn29search0turn30search3turn31search17turn29search11
pie showData
title Coverage against a 16-topic benchmark
"Strong" : 7
"Partial" : 5
"Missing on live site" : 4
The chart is a synthesis of the table above rather than a direct site metric. It is intended to show curricular balance, not page counts. citeturn27view3turn27view5turn26search35turn29search0turn30search3turn31search17turn29search11turn25view3turn24view3turn24view4turn38view0
1.3.2 What the site does especially well
The site’s comparative advantage is that it translates ML topics into econometric workflow language. The cross-validation material explicitly warns against invalid random folds for time-series data; the HPO chapter extends that idea to honest model search; the conformal chapter clearly distinguishes marginal from regime-specific coverage; and the foundation-model chapter treats retrieval, labeling, model-version drift, and pretraining contamination as information-set problems. That combination is genuinely distinctive and stronger than many generic ML textbooks for the target audience. citeturn22view1turn24view2turn38view3turn36view1turn38view4
1.3.3 What is most notably missing
Relative to the benchmark texts, the largest curricular omission is not a fashionable topic but the baseline toolkit: linear regression/classification baselines, regularization, shrinkage, and model selection. Standard texts treat linear/logistic models, regularization, resampling, tree methods, deep learning, and unsupervised learning as central. MLmetrics currently jumps from foundations into nonlinear models without first giving the reader a robust baseline chapter against which all later methods can be judged. The live site also lacks a serious unsupervised-learning component on PCA, clustering, dimension reduction, or representation learning, despite the public repo already containing a draft autoencoders chapter. For sequence modeling, the live material stops at RNNs and LSTMs, while recent surveys treat transformers and time-series foundation models as essential parts of the current landscape. citeturn27view3turn27view4turn27view5turn26search35turn14view0turn29search0turn30search3
1.4 Detailed findings
1.4.1 Technical accuracy, consistency, and currency
The strongest hard error is the mismatch between the dataset appendix and the empirical time-series chapter. The appendix says the local SPY realized-measures coverage is 321 trading days from 2015-01-02 through 2016-04-12, whereas chapter 8 prints 2,451 observations from 2015-01-02 through 2024-12-31 from the same nominal local file path. At least one of those statements is stale or wrong, and because the empirical chapter is the site’s main end-to-end example, this inconsistency should be treated as a high-priority correction. citeturn16view2turn41view0
The chapter on evaluating predictive distributions contains a broken reference entry: the Diebold–Mariano citation displays a truncated DOI URL rather than a full DOI link. That is minor in isolation, but for a textbook centered on forecast evaluation it is exactly the kind of trust-eroding reference problem that students notice and remember. citeturn37view0
The site is current in the sense that it is actively maintained: the live pages repeatedly label themselves “under active development,” foundation models and other chapters cite 2025–2026 sources, and the public commit history shows substantial work in late April and early May 2026. But that same evidence also means readers should interpret the site as a moving draft, not as a stable textbook edition yet. citeturn24view0turn22view8turn22view15turn21view0
1.4.2 Completeness relative to the literature
The live site is unusually good on probabilistic forecast evaluation, conformal prediction, and econometric leakage control, but it is not yet a rounded ML curriculum for econometricians. The tree chapters are solid and pedagogically coherent; the boosting chapter even includes an implementation note on XGBoost and a practical discussion of monotonicity constraints. But the sequence-model coverage is narrow: there is no visible treatment of GRUs, transformers, or time-series foundation models, even though the LSTM chapter itself frames attention as the bridge to later methods and recent surveys identify transformer-style models and time-series foundation models as major parts of the modern forecasting landscape. citeturn34view3turn35view6turn36view0turn29search0turn30search3
The distributional neural-network chapter is pedagogically interesting but too narrow. It gives real value by connecting NLL to scoring rules and by explaining why joint mean/variance modeling can destabilize training, but it concentrates on Hemisphere Neural Networks and Mixture Density Networks while omitting other now-standard families such as conformalized quantile approaches, deep ensembles, normalizing-flow-based density models, and broader calibrated uncertainty workflows. That makes the chapter more idiosyncratic than a reference text should be. citeturn39view2turn39view0turn39view1turn39view3turn39view4turn31search17
For the target audience, the most important missing chapter is still regularized tabular baselines. Standard texts place shrinkage and regularization near the center of the curriculum, and econometric practice depends heavily on disciplined linear baselines for comparison. MLmetrics currently mentions ridge and lasso only incidentally, for example as hyperparameters in the HPO chapter or as conceptual analogies in the neural-network chapter, not as a first-class modeling toolbox. citeturn35view1turn35view5turn27view3turn27view4
The public repo suggests the author already knows some of these gaps. Autoencoders and reinforcement learning exist as public draft .qmd files, but they are not wired into the live navigation. That creates an awkward usability state: some missing topics already exist in public source form, but the live site does not help readers discover them, and some repo-only files are obviously auxiliary rather than teaching-ready. citeturn14view0turn15view1turn14view3turn14view4turn15view3
1.4.3 Pedagogical quality
Pedagogically, the writing is strongest when it gives the econometric interpretation first and the ML mechanism second. Examples include the feed-forward chapter’s framing of networks as learned basis expansions, the forecast-origin emphasis in the empirical volatility chapter, the conformal chapter’s careful warnings about dependent data, and the foundation-model chapter’s treatment of text retrieval as an information-set problem. Those are exactly the kinds of bridges a graduate econometrics audience needs. citeturn25view3turn41view0turn24view1turn36view1
The pedagogical weaknesses cluster around sequencing and scaffolding. Students are asked to absorb advanced nonlinear methods before they have been given a live baseline chapter on regularized linear models, generalized linear models, or calibration and interpretation. The distributional-NN chapter also leans early into a niche case study rather than beginning with a broader menu of common practical options. Finally, the empirical chapter is conceptually careful but not course-ready as a reproducible lab because it relies on licensed WRDS/TAQ data and local paths excluded from git. citeturn27view3turn27view4turn38view0turn16view2turn18view0
1.4.4 Code, licensing, accessibility, and usability
Code support is present but fragile. The public repo shows scripts and source material, but there is no visible requirements.txt, pyproject.toml, environment.yml, renv.lock, or similar top-level environment spec in the root listing. Meanwhile, the empirical chapter imports Optuna, Keras, and root_mean_squared_error, sets KERAS_BACKEND="tensorflow", and uses APIs that rely on Keras 3 plus a backend and on scikit-learn 1.4 or newer. Without an environment file, students will predictably hit installation and version errors. citeturn10view0turn33view0turn33view1turn33view2turn33view3turn33view4turn41view0turn32search0turn32search1
Licensing is another real issue. The public repo README says the book is licensed under CC BY-NC-SA 4.0. That is fine for text, but Creative Commons itself says its licenses are not recommended for software. Because the repo includes scripts and executable examples, the site should separate text/media licensing from code licensing and give the latter a standard software license such as MIT or Apache-2.0. citeturn9search0turn32search2
Usability is mixed. On the positive side, the Quarto configuration enables search, page navigation, and folded code examples, all of which improve readability. On the negative side, at least some pages have heading duplication at the top, chapter 8 lacks a visible references section, and the presence of public-but-unwired draft files in the repo creates ambiguity about what is canonical. I also did not find standalone slide decks in the live navigation or public root repo listing, which means the site currently serves better as a self-study book than as a full teaching package for lectures. citeturn12view6turn24view5turn16view0turn22view7turn10view0
1.5 Prioritized recommendations
The action table below prioritizes corrections by impact on reader trust, curricular value, and maintenance burden. The recommendations synthesize the concrete site evidence above with the benchmark texts and recent surveys. citeturn16view2turn41view0turn37view0turn27view3turn27view4turn29search0turn30search3turn31search17turn29search11
| Priority | Recommendation | Why it matters | Effort | Difficulty |
|---|---|---|---|---|
| Highest | Reconcile the SPY dataset appendix with chapter 8 and update both pages together | This is the clearest internal factual inconsistency and weakens confidence in the empirical material | Low | Low |
| Highest | Add a visible reproducibility spec | Environment failures will block student use before any conceptual learning happens | Medium | Low |
| Highest | Separate code licensing from book licensing | CC licenses are not recommended for software; current setup is legally awkward for scripts/snippets | Low | Medium |
| Highest | Repair the references layer | Broken DOI, missing chapter-8 references, and inconsistent bibliography quality are easy trust killers | Low | Low |
| High | Add a new chapter on linear/logistic baselines and regularization | This is the largest curricular omission relative to standard ML texts and econometric practice | High | Medium |
| High | Expand sequence-model coverage to GRU, transformers, and time-series foundation models | Live sequence material stops too early relative to current literature | High | High |
| High | Broaden the probabilistic forecasting chapter family | Add quantile methods, calibrated interval methods, deep ensembles, and broader density-model choices | High | Medium |
| Medium | Publish, hide, or clearly label repo-only drafts | Public draft files create ambiguity about canonical content | Low | Low |
| Medium | Add interpretability and model-diagnostics content | Econometrics students need SHAP/PDP/ICE, calibration, and model-comparison diagnostics | Medium | Medium |
| Medium | Improve empirical accessibility with a public sample dataset and one fully runnable notebook | The current flagship empirical chapter is conceptually strong but not broadly reproducible | Medium | Medium |
| Medium | Create a stronger teaching package | Add lecture slides, downloadable lab notebooks, and chapter summaries for classroom use | Medium | Medium |
| Medium | Clean up heading duplication and page-level QA | Low-cost polish with SEO, accessibility, and perceived-quality benefits | Low | Low |
1.6 Suggested edits and timeline
1.6.1 Concrete edit snippets for key problem areas
Dataset consistency note for chapter 17 and chapter 8
Use one canonical statement in both places. For example:
Replication note. The local file
data/taq_spy/SPY_daily_measures.csvused in the volatility illustration currently contains [insert verified count] trading-day observations from [insert verified start date] through [insert verified end date]. If your local file differs, regenerate the appendix summary and the chapter-8 console output together so that the documented coverage and the empirical code remain synchronized.
This edit is justified by the current contradiction between the appendix and the empirical chapter. citeturn16view2turn41view0
Environment note for code blocks
Add a short standard preface before all multi-package examples:
Environment. The code below assumes Python 3.10+,
keras3 with a TensorFlow backend,scikit-learn1.4+, andoptunainstalled. A pinned environment file for the book is available in the repository.
This is directly motivated by the current example imports and the official Keras and scikit-learn documentation. citeturn41view0turn32search0turn32search1
License clarification
Add a visible license split on the homepage and in the repo README:
License. Unless otherwise noted, book text and figures are licensed under CC BY-NC-SA 4.0. Code snippets and scripts are licensed separately under the MIT License.
This follows the repo’s current book-license statement and Creative Commons’ own guidance that CC licenses are not recommended for software. citeturn9search0turn32search2
Reference repair example
In chapter 3, replace the current malformed Diebold–Mariano entry with a fully formatted publisher citation and verified DOI URL, and add a references block to chapter 8 with at least the dataset/paper sources used there. The issue is visible now because chapter 3 contains a truncated DOI and chapter 8 has no references section at all. citeturn37view0turn22view7
1.6.2 Suggested timeline
Immediate pass
In the next one to two weeks, fix internal contradictions, repair broken citations, add the missing references section to chapter 8, and separate text/code licensing. Also decide whether repo-only draft files should be published, hidden from the repo root, or explicitly labeled as drafts. These are low-effort, high-trust fixes. citeturn16view2turn41view0turn37view0turn9search0turn32search2
Stabilization pass
In the next two to four weeks, add a pinned environment file, a public sample dataset or synthetic fallback for chapter 8, and a minimal fully runnable notebook that reproduces at least one end-to-end experiment without licensed WRDS access. This is the minimum needed to turn the site from “excellent reading notes” into “teachable and runnable.” citeturn10view0turn18view0turn41view0turn32search1turn32search3
Curriculum expansion pass
Over the next one to three months, add a chapter on linear/logistic baselines plus regularization; widen sequence-model coverage to GRU, transformers, and time-series foundation models; broaden uncertainty modeling beyond HNN/MDN; and add a chapter or appendix on interpretation, calibration, and practical diagnostics for predictive models. This would bring the site much closer to the curricular expectations set by the benchmark texts and recent surveys while preserving its distinctive econometric identity. citeturn27view3turn27view4turn27view5turn26search35turn29search0turn30search3turn31search17turn29search11
1.7 Open questions and limitations
This review is based on the public HTML site and the public repository state visible on May 7, 2026. I did not execute the code, inspect private branches, or review any non-public course materials. Claims about absent slide decks, notebooks, and environment specs are therefore limited to the live navigation and the public repo tree that was discoverable from the homepage and repository listing. If private teaching assets exist elsewhere, they would not change the main conclusions about the public site’s current accuracy, completeness, and reproducibility. citeturn1view2turn12view6turn10view0