1. The Golem of Prague#
1.1. Statistical golems#
Compare Statistical hypothesis test.
The author claims:
Even a procedure like ordinary linear regression, which is quite flexible in many ways, being able to encode a large diversity of interesting hypotheses, is sometimes fragile. For example, if there is substantial measurement error on prediction variables, then the procedure can fail in spectacular ways. But more importantly, it is nearly always possible to do better than ordinary linear regression, largely because of a phenomenon known as OVERFITTING (Chapter 7).
He likely means to refer to Ordinary least squares, which is a form of Linear regression (đź“Ś).
1.2. Statistical rethinking#
Compare Falsifiability, which actually focuses on the black swan example the author dislikes. The author never says this example is strictly wrong, just unrealistically simple.
1.2.1. Hypotheses are not models#
The author introduces the term “Process Model” almost as a definition, but not in a particularly clear way.
1.3. Tools for golem engineering#
A footnote references Philosophy of Mathematics (Stanford Encyclopedia of Philosophy), which is a bit shorter introduction to the area than Philosophy of mathematics.
1.3.1. Bayesian data analysis#
See Bayesian Data Analysis (BDA) for the book that coined this term. The author has already referred to Gelman’s material in footnotes and will continue to do so.
Compare Probability interpretations, which generally speaking provides a much clearer introduction to this topic. The author is not referring to Resampling (statistics) in this section, somewhat confusingly. The author only uses the words Sampling distribution without providing any kind of definition or background.
The author defines Bayesianism as a normative description of rational belief, though it’s not clear that this is how the word is always used on e.g. Wikipedia (it appears to often simply mean Bayesian statistics).
The author’s final point (that BDA is more intuitive) is among the strongest for it. This may be because the Bayesian interpretation is simpler than the frequentist interpretation. Compare the length of the the descriptions in Credible interval and Confidence interval; although the latter article does a great job of boiling down what a confidence interval is (especially with the drawing Normal distribution 50% CI illustration) the result is just not as succinct. If someone can skip past all the misuses of p-value and simply never need to deal with them, their life could be much easier (unlikely for most people in our generation).
Another strong argument for the Bayesian perspective is that the “data products” (referred to as “statistical propositions” in Statistical inference § Introduction, and elsewhere as probabilistic propositions) are richer. Clearly Highest posterior density interval lets you conclude something about where the parameter sits if it isn’t in the 95% interval (it’s likely higher) when the same can’t be said with a confidence interval as a result.
1.3.2. Model comparison and prediction#
Compare Model selection; information criteria and cross-validation are mentioned in Model selection § Criteria.
1.3.3. Multilevel models#
Compare Multilevel model.
1.3.4. Graphical causal models#
Several of the footnotes in this section directly or indirectly point to the popular science book The Book of Why, a major influence on the author’s thoughts in this area. Compare the author’s terms to Path analysis (statistics), Causal inference, and Causal model.