Under active development. A stable version is expected in November 2026. Feedback is welcome by email or via GitHub issues.

1 Deep Research Review of MLmetrics

This review covers the public website urlMLmetricsturn1view2 and its public source repository, urlonnokleen/mlmetricsturn9search0, both authored by entity[“people”,“Onno Kleen”,“author of MLmetrics”]. The live site is a Quarto-based open textbook aimed at graduate students in econometrics, organized into four top-level sections with 18 live pages in the public navigation: one landing page plus 17 chapter/appendix pages. The public repository also exposes several additional .qmd files not surfaced in the live navigation, including draft material on autoencoders and reinforcement learning. The site is clearly active and improving quickly, but it is still explicitly marked “under active development,” and the repository history shows rapid changes in late April and early May 2026. citeturn9search2turn12view6turn10view0turn14view0turn15view1turn21view0

1.1 Executive summary

MLmetrics is already strong where many machine-learning teaching sites are weak: it consistently frames machine learning through econometric information sets, real-time forecasting discipline, validation design, scoring rules, and pseudo-out-of-sample evaluation. The best material is in cross-validation, predictive distributions, conformal prediction, the tree-based chapters, and the “foundation models for economic text” chapter, which unusually emphasizes retrieval leakage, version drift, and forecast-origin admissibility. Against the benchmark of standard texts by entity[“people”,“Christopher M. Bishop”,“machine learning researcher and textbook author”], entity[“people”,“Ian Goodfellow”,“deep learning researcher and textbook author”], entity[“people”,“Kevin P. Murphy”,“machine learning researcher and textbook author”], entity[“people”,“Trevor Hastie”,“statistician and textbook author”], and entity[“people”,“Robert Tibshirani”,“statistician and textbook author”], the site is sharper than average on forecasting hygiene, but materially thinner on core baseline material, unsupervised learning, regularized linear methods, interpretability, and modern sequence/tabular practice. citeturn24view1turn24view2turn24view0turn27view3turn27view5turn26search35turn28search12

The highest-confidence corrective findings are all actionable. The most important factual inconsistency is internal: the dataset appendix says the local SPY realized-volatility coverage is 321 trading days from 2015-01-02 to 2016-04-12, while the empirical neural-network chapter prints 2,451 observations running from 2015-01-02 to 2024-12-31. That should be fixed immediately because it undermines trust in the replication path. A second concrete issue is citation integrity: the Diebold–Mariano reference in the predictive-distributions chapter contains an obviously truncated DOI URL. A third issue is reproducibility: the public repo exposes code and scripts but no visible environment file, while the example code assumes Keras 3 plus a TensorFlow backend, Optuna, and scikit-learn functionality that was added only in version 1.4. A fourth issue is licensing: the repo README places the book under CC BY-NC-SA 4.0, but Creative Commons itself does not recommend CC licenses for software, so code snippets and scripts should be given a software-native license. citeturn16view2turn41view0turn37view0turn10view0turn33view0turn33view1turn33view2turn33view3turn33view4turn32search0turn32search1turn9search0turn32search2

The broader editorial conclusion is that MLmetrics is promising and often technically thoughtful, but it is not yet publish-ready as a stable, comprehensive course text. The priority should not be polishing prose first. The priority should be restoring internal consistency, making the empirical material reproducible, separating text and code licensing, and filling the most important “missing middle” topics: regularized linear baselines, modern sequence models beyond LSTM, broader distributional forecasting methods, and practical model interpretation. citeturn9search2turn21view0turn27view3turn27view5turn29search0turn30search3turn31search17turn29search11

1.2 Site structure and indexed content

The live site is structured as a hierarchical Quarto website with floating sidebar navigation, search enabled, and page-to-page navigation enabled. The four public content sections are Background, Neural Networks, Tree-Based Methods, and Further Topics. The site-wide configuration also enables folded code blocks with the label “Show the code,” which means code is intended as embedded supporting material rather than as downloadable notebooks. citeturn12view6

flowchart TD
    A[MLmetrics]
    A --> B[About This Book]

    A --> C[Background]
    C --> C1[Information Theory]
    C --> C2[Cross Validation]
    C --> C3[Evaluating Predictive Distributions]
    C --> C4[Optimization for Machine Learning]

    A --> D[Neural Networks]
    D --> D1[Feed-Forward Neural Networks]
    D --> D2[Recurrent Neural Networks]
    D --> D3[LSTM Networks]
    D --> D4[Empirical Exercise: Networks for Time Series]
    D --> D5[Distribution Modeling with Neural Networks]

    A --> E[Tree-Based Methods]
    E --> E1[Decision Trees]
    E --> E2[Random Forests]
    E --> E3[Gradient Boosting]
    E --> E4[Advanced Tree-Based Methods]

    A --> F[Further Topics]
    F --> F1[Advanced Hyperparameter Optimization]
    F --> F2[Conformal Prediction]
    F --> F3[Foundation Models for Economic Text]
    F --> F4[Data Sets Used in This Book]

    A -. public repo only .-> G[Unpublished or auxiliary qmd files]
    G --> G1[Autoencoders]
    G --> G2[Reinforcement Learning]
    G --> G3[Older Decision Trees draft]
    G --> G4[Intro stub]
    G --> G5[References stub]
    G --> G6[Hidden scratch file]

The live structure comes from the homepage and Quarto configuration; the repo-only nodes come from the public repository listing and exposed .qmd files. citeturn1view2turn12view6turn10view0turn14view0turn14view1turn14view3turn14view4turn15view3

1.2.1 Site map table

The table below consolidates the live navigation and the public repo-only lecture-note sources. citeturn12view6turn10view0

Status	Page / file	Section	Dominant content types	Notes
Live	About This Book	Root	Lecture text, roadmap, references, license	Audience and scope statement
Live	Information Theory	Background	Lecture text, equations, exercises, references	Strong link to scoring and likelihood
Live	Cross Validation	Background	Lecture text, examples, exercises, references	Strong anti-leakage framing
Live	Evaluating Predictive Distributions	Background	Lecture text, derivations, exercises, references	Strong probabilistic-forecast evaluation
Live	Optimization for Machine Learning	Background	Lecture text, code, exercises, references	Covers GD, Momentum, AdaGrad, RMSprop, Adam, AdamW
Live	Feed-Forward Neural Networks	Neural Networks	Lecture text, code, figures, exercises, references	Good econometric interpretation
Live	Recurrent Neural Networks	Neural Networks	Lecture text, exercises, references	Short and conceptually focused
Live	LSTM Networks	Neural Networks	Lecture text, exercises, references	Includes transition to attention
Live	Empirical Exercise: Networks for Time Series	Neural Networks	Empirical workflow, code, figures, dataset dependence	No visible references section
Live	Distribution Modeling with Neural Networks	Neural Networks	Lecture text, niche case study, exercises, references	Focuses on HNN and MDN
Live	Decision Trees	Tree-Based Methods	Lecture text, code/examples, exercises, references	Good strengths/limits framing
Live	Random Forests	Tree-Based Methods	Lecture text, exercises, references	Good variance-reduction intuition
Live	Gradient Boosting	Tree-Based Methods	Lecture text, code, figures, exercises, references	Includes XGBoost note, monotonicity constraints
Live	Advanced Tree-Based Methods	Tree-Based Methods	Lecture text, exercises, references	QRF and NGBoost
Live	Advanced Hyperparameter Optimization	Further Topics	Lecture text, exercises, references	Covers Bayesian optimization and multi-fidelity
Live	Conformal Prediction	Further Topics	Lecture text, code, figures, exercises, references	Strongest advanced chapter
Live	Foundation Models for Economic Text	Further Topics	Lecture text, figures, exercises, references	Experimental but unusually thoughtful
Live	Data Sets Used in This Book	Further Topics	Dataset docs, code, references	FRED-QD and WRDS/TAQ appendix
Repo-only	`autoencoders.qmd`	Unpublished	Draft lecture text, exercises	Real substantive chapter, not yet live
Repo-only	`reinforcement_learning.qmd`	Unpublished	Draft lecture text, exercises	Real substantive chapter, not yet live
Repo-only	`decision_trees_full.qmd`	Unpublished	Older extended draft	Appears superseded by live tree chapter
Repo-only	`intro.qmd`	Unpublished	Minimal stub	Not a developed page
Repo-only	`references.qmd`	Auxiliary	References stub	Boilerplate only
Repo-only	`hidden.qmd`	Auxiliary	Scratch/code material	Looks like working material, not teaching-ready

1.2.2 Content types

MLmetrics clearly supports these public content types: lecture text, embedded equations, figures, foldable code, examples, exercises, references, and dataset documentation. It also exposes repo-level scripts and a public FRED-QD CSV. What it does not visibly expose, at least in the live navigation and public root repo listing, are standalone slide decks, downloadable notebooks, or a visible environment specification for code execution. That matters because the pedagogical style is “book plus embedded code,” not “course pack with lecture slides and reproducible labs.” citeturn12view6turn10view0turn13view0turn18view0turn22view7

1.3 Coverage against benchmark texts

Benchmark sources used here were urlProbabilistic Machine Learning: An Introductionturn26search0, urlDeep Learningturn26search1, urlAn Introduction to Statistical Learningturn26search3, urlStatistical Learning with Sparsityturn26search2, and urlPattern Recognition and Machine Learningturn28search12, supplemented by recent survey/review material on deep learning for time-series forecasting, conformal prediction, foundation models for time series, LLM/NLP work in economics and finance, and modern HPO. citeturn27view3turn27view5turn26search35turn28search12turn29search0turn30search3turn31search17turn30search5turn30search14turn29search11

1.3.1 Covered versus expected topics

Topic family expected from standard ML texts and recent surveys	Coverage on MLmetrics	Assessment
Probabilistic foundations, entropy, KL, scoring rules	Chapters 1 and 3	Strong
Validation, resampling, leakage, forecast-origin discipline	Chapters 2 and 14	Strong
Optimization fundamentals for ML	Chapter 4	Strong
Feed-forward neural networks and regularization basics	Chapter 5	Strong but introductory
Sequence models	Chapters 6 and 7	Partial
Empirical deep-learning workflow for time series	Chapter 8	Partial because replication is not public-ready
Distributional neural forecasting	Chapter 9	Partial and too narrow
Trees, forests, boosting	Chapters 10–13	Strong
Hyperparameter optimization	Chapter 14	Strong
Conformal prediction and time series issues	Chapter 15	Strong
Foundation models / economic text	Chapter 16	Strong
Dataset documentation	Chapter 17	Partial because appendix is inconsistent with chapter 8
Linear/logistic baselines and regularized linear models	No real chapter	Missing
Unsupervised learning / representation learning beyond repo drafts	Not live	Missing
Modern sequence practice beyond LSTM (GRU, transformers, time-series foundation models)	Barely present	Missing/partial
Interpretability, calibration practice, SHAP/PDP/ICE, causal ML	No dedicated treatment	Missing

This synthesis is based on the live chapter set and the benchmark scope in standard texts and recent surveys. citeturn25view2turn22view1turn22view2turn22view3turn25view3turn24view3turn24view4turn38view0turn25view0turn22view10turn25view1turn24view5turn24view2turn24view1turn24view0turn16view0turn27view3turn27view5turn26search35turn29search0turn30search3turn31search17turn29search11

pie showData
    title Coverage against a 16-topic benchmark
    "Strong" : 7
    "Partial" : 5
    "Missing on live site" : 4

The chart is a synthesis of the table above rather than a direct site metric. It is intended to show curricular balance, not page counts. citeturn27view3turn27view5turn26search35turn29search0turn30search3turn31search17turn29search11turn25view3turn24view3turn24view4turn38view0

1.3.2 What the site does especially well

The site’s comparative advantage is that it translates ML topics into econometric workflow language. The cross-validation material explicitly warns against invalid random folds for time-series data; the HPO chapter extends that idea to honest model search; the conformal chapter clearly distinguishes marginal from regime-specific coverage; and the foundation-model chapter treats retrieval, labeling, model-version drift, and pretraining contamination as information-set problems. That combination is genuinely distinctive and stronger than many generic ML textbooks for the target audience. citeturn22view1turn24view2turn38view3turn36view1turn38view4

1.3.3 What is most notably missing

Relative to the benchmark texts, the largest curricular omission is not a fashionable topic but the baseline toolkit: linear regression/classification baselines, regularization, shrinkage, and model selection. Standard texts treat linear/logistic models, regularization, resampling, tree methods, deep learning, and unsupervised learning as central. MLmetrics currently jumps from foundations into nonlinear models without first giving the reader a robust baseline chapter against which all later methods can be judged. The live site also lacks a serious unsupervised-learning component on PCA, clustering, dimension reduction, or representation learning, despite the public repo already containing a draft autoencoders chapter. For sequence modeling, the live material stops at RNNs and LSTMs, while recent surveys treat transformers and time-series foundation models as essential parts of the current landscape. citeturn27view3turn27view4turn27view5turn26search35turn14view0turn29search0turn30search3

1.4 Detailed findings

1.4.1 Technical accuracy, consistency, and currency

The strongest hard error is the mismatch between the dataset appendix and the empirical time-series chapter. The appendix says the local SPY realized-measures coverage is 321 trading days from 2015-01-02 through 2016-04-12, whereas chapter 8 prints 2,451 observations from 2015-01-02 through 2024-12-31 from the same nominal local file path. At least one of those statements is stale or wrong, and because the empirical chapter is the site’s main end-to-end example, this inconsistency should be treated as a high-priority correction. citeturn16view2turn41view0

The chapter on evaluating predictive distributions contains a broken reference entry: the Diebold–Mariano citation displays a truncated DOI URL rather than a full DOI link. That is minor in isolation, but for a textbook centered on forecast evaluation it is exactly the kind of trust-eroding reference problem that students notice and remember. citeturn37view0

The site is current in the sense that it is actively maintained: the live pages repeatedly label themselves “under active development,” foundation models and other chapters cite 2025–2026 sources, and the public commit history shows substantial work in late April and early May 2026. But that same evidence also means readers should interpret the site as a moving draft, not as a stable textbook edition yet. citeturn24view0turn22view8turn22view15turn21view0

1.4.2 Completeness relative to the literature

The live site is unusually good on probabilistic forecast evaluation, conformal prediction, and econometric leakage control, but it is not yet a rounded ML curriculum for econometricians. The tree chapters are solid and pedagogically coherent; the boosting chapter even includes an implementation note on XGBoost and a practical discussion of monotonicity constraints. But the sequence-model coverage is narrow: there is no visible treatment of GRUs, transformers, or time-series foundation models, even though the LSTM chapter itself frames attention as the bridge to later methods and recent surveys identify transformer-style models and time-series foundation models as major parts of the modern forecasting landscape. citeturn34view3turn35view6turn36view0turn29search0turn30search3

The distributional neural-network chapter is pedagogically interesting but too narrow. It gives real value by connecting NLL to scoring rules and by explaining why joint mean/variance modeling can destabilize training, but it concentrates on Hemisphere Neural Networks and Mixture Density Networks while omitting other now-standard families such as conformalized quantile approaches, deep ensembles, normalizing-flow-based density models, and broader calibrated uncertainty workflows. That makes the chapter more idiosyncratic than a reference text should be. citeturn39view2turn39view0turn39view1turn39view3turn39view4turn31search17

For the target audience, the most important missing chapter is still regularized tabular baselines. Standard texts place shrinkage and regularization near the center of the curriculum, and econometric practice depends heavily on disciplined linear baselines for comparison. MLmetrics currently mentions ridge and lasso only incidentally, for example as hyperparameters in the HPO chapter or as conceptual analogies in the neural-network chapter, not as a first-class modeling toolbox. citeturn35view1turn35view5turn27view3turn27view4

The public repo suggests the author already knows some of these gaps. Autoencoders and reinforcement learning exist as public draft .qmd files, but they are not wired into the live navigation. That creates an awkward usability state: some missing topics already exist in public source form, but the live site does not help readers discover them, and some repo-only files are obviously auxiliary rather than teaching-ready. citeturn14view0turn15view1turn14view3turn14view4turn15view3

1.4.3 Pedagogical quality

Pedagogically, the writing is strongest when it gives the econometric interpretation first and the ML mechanism second. Examples include the feed-forward chapter’s framing of networks as learned basis expansions, the forecast-origin emphasis in the empirical volatility chapter, the conformal chapter’s careful warnings about dependent data, and the foundation-model chapter’s treatment of text retrieval as an information-set problem. Those are exactly the kinds of bridges a graduate econometrics audience needs. citeturn25view3turn41view0turn24view1turn36view1

The pedagogical weaknesses cluster around sequencing and scaffolding. Students are asked to absorb advanced nonlinear methods before they have been given a live baseline chapter on regularized linear models, generalized linear models, or calibration and interpretation. The distributional-NN chapter also leans early into a niche case study rather than beginning with a broader menu of common practical options. Finally, the empirical chapter is conceptually careful but not course-ready as a reproducible lab because it relies on licensed WRDS/TAQ data and local paths excluded from git. citeturn27view3turn27view4turn38view0turn16view2turn18view0

1.4.4 Code, licensing, accessibility, and usability

Code support is present but fragile. The public repo shows scripts and source material, but there is no visible requirements.txt, pyproject.toml, environment.yml, renv.lock, or similar top-level environment spec in the root listing. Meanwhile, the empirical chapter imports Optuna, Keras, and root_mean_squared_error, sets KERAS_BACKEND="tensorflow", and uses APIs that rely on Keras 3 plus a backend and on scikit-learn 1.4 or newer. Without an environment file, students will predictably hit installation and version errors. citeturn10view0turn33view0turn33view1turn33view2turn33view3turn33view4turn41view0turn32search0turn32search1

Licensing is another real issue. The public repo README says the book is licensed under CC BY-NC-SA 4.0. That is fine for text, but Creative Commons itself says its licenses are not recommended for software. Because the repo includes scripts and executable examples, the site should separate text/media licensing from code licensing and give the latter a standard software license such as MIT or Apache-2.0. citeturn9search0turn32search2

Usability is mixed. On the positive side, the Quarto configuration enables search, page navigation, and folded code examples, all of which improve readability. On the negative side, at least some pages have heading duplication at the top, chapter 8 lacks a visible references section, and the presence of public-but-unwired draft files in the repo creates ambiguity about what is canonical. I also did not find standalone slide decks in the live navigation or public root repo listing, which means the site currently serves better as a self-study book than as a full teaching package for lectures. citeturn12view6turn24view5turn16view0turn22view7turn10view0

1.5 Prioritized recommendations

The action table below prioritizes corrections by impact on reader trust, curricular value, and maintenance burden. The recommendations synthesize the concrete site evidence above with the benchmark texts and recent surveys. citeturn16view2turn41view0turn37view0turn27view3turn27view4turn29search0turn30search3turn31search17turn29search11

Priority	Recommendation	Why it matters	Effort	Difficulty
Highest	Reconcile the SPY dataset appendix with chapter 8 and update both pages together	This is the clearest internal factual inconsistency and weakens confidence in the empirical material	Low	Low
Highest	Add a visible reproducibility spec	Environment failures will block student use before any conceptual learning happens	Medium	Low
Highest	Separate code licensing from book licensing	CC licenses are not recommended for software; current setup is legally awkward for scripts/snippets	Low	Medium
Highest	Repair the references layer	Broken DOI, missing chapter-8 references, and inconsistent bibliography quality are easy trust killers	Low	Low
High	Add a new chapter on linear/logistic baselines and regularization	This is the largest curricular omission relative to standard ML texts and econometric practice	High	Medium
High	Expand sequence-model coverage to GRU, transformers, and time-series foundation models	Live sequence material stops too early relative to current literature	High	High
High	Broaden the probabilistic forecasting chapter family	Add quantile methods, calibrated interval methods, deep ensembles, and broader density-model choices	High	Medium
Medium	Publish, hide, or clearly label repo-only drafts	Public draft files create ambiguity about canonical content	Low	Low
Medium	Add interpretability and model-diagnostics content	Econometrics students need SHAP/PDP/ICE, calibration, and model-comparison diagnostics	Medium	Medium
Medium	Improve empirical accessibility with a public sample dataset and one fully runnable notebook	The current flagship empirical chapter is conceptually strong but not broadly reproducible	Medium	Medium
Medium	Create a stronger teaching package	Add lecture slides, downloadable lab notebooks, and chapter summaries for classroom use	Medium	Medium
Medium	Clean up heading duplication and page-level QA	Low-cost polish with SEO, accessibility, and perceived-quality benefits	Low	Low

1.6 Suggested edits and timeline

1.6.1 Concrete edit snippets for key problem areas

Dataset consistency note for chapter 17 and chapter 8

Use one canonical statement in both places. For example:

Replication note. The local file data/taq_spy/SPY_daily_measures.csv used in the volatility illustration currently contains [insert verified count] trading-day observations from [insert verified start date] through [insert verified end date]. If your local file differs, regenerate the appendix summary and the chapter-8 console output together so that the documented coverage and the empirical code remain synchronized.

This edit is justified by the current contradiction between the appendix and the empirical chapter. citeturn16view2turn41view0

Environment note for code blocks

Add a short standard preface before all multi-package examples:

Environment. The code below assumes Python 3.10+, keras 3 with a TensorFlow backend, scikit-learn 1.4+, and optuna installed. A pinned environment file for the book is available in the repository.

This is directly motivated by the current example imports and the official Keras and scikit-learn documentation. citeturn41view0turn32search0turn32search1

License clarification

Add a visible license split on the homepage and in the repo README:

License. Unless otherwise noted, book text and figures are licensed under CC BY-NC-SA 4.0. Code snippets and scripts are licensed separately under the MIT License.

This follows the repo’s current book-license statement and Creative Commons’ own guidance that CC licenses are not recommended for software. citeturn9search0turn32search2

Reference repair example

In chapter 3, replace the current malformed Diebold–Mariano entry with a fully formatted publisher citation and verified DOI URL, and add a references block to chapter 8 with at least the dataset/paper sources used there. The issue is visible now because chapter 3 contains a truncated DOI and chapter 8 has no references section at all. citeturn37view0turn22view7

1.6.2 Suggested timeline

Immediate pass

In the next one to two weeks, fix internal contradictions, repair broken citations, add the missing references section to chapter 8, and separate text/code licensing. Also decide whether repo-only draft files should be published, hidden from the repo root, or explicitly labeled as drafts. These are low-effort, high-trust fixes. citeturn16view2turn41view0turn37view0turn9search0turn32search2

Stabilization pass

In the next two to four weeks, add a pinned environment file, a public sample dataset or synthetic fallback for chapter 8, and a minimal fully runnable notebook that reproduces at least one end-to-end experiment without licensed WRDS access. This is the minimum needed to turn the site from “excellent reading notes” into “teachable and runnable.” citeturn10view0turn18view0turn41view0turn32search1turn32search3

Curriculum expansion pass

Over the next one to three months, add a chapter on linear/logistic baselines plus regularization; widen sequence-model coverage to GRU, transformers, and time-series foundation models; broaden uncertainty modeling beyond HNN/MDN; and add a chapter or appendix on interpretation, calibration, and practical diagnostics for predictive models. This would bring the site much closer to the curricular expectations set by the benchmark texts and recent surveys while preserving its distinctive econometric identity. citeturn27view3turn27view4turn27view5turn26search35turn29search0turn30search3turn31search17turn29search11

1.7 Open questions and limitations

This review is based on the public HTML site and the public repository state visible on May 7, 2026. I did not execute the code, inspect private branches, or review any non-public course materials. Claims about absent slide decks, notebooks, and environment specs are therefore limited to the live navigation and the public repo tree that was discoverable from the homepage and repository listing. If private teaching assets exist elsewhere, they would not change the main conclusions about the public site’s current accuracy, completeness, and reproducibility. citeturn1view2turn12view6turn10view0