2. Small Worlds and Large Worlds#
2.1 The garden of forking data#
2.1.1 Counting possibilities#
The authorâs footnote refers to Coxâs theorem; other justifications are given in Bayesian probability § Justification.
As discussed in Coxâs theorem § Interpretation and further discussion, thereâs plenty of reason to doubt this justification.
An arguably better justification is the Dutch book theorems; see Dutch Book Arguments (Stanford Encyclopedia of Philosophy) and Notes on the Dutch Book Argument (by David A Freedman) for some more rigorous mathematics going back to the original person to make the argument (De Finetti). This justification remains completely finite, which seems desirable, not only if you have prior commitments to finitism but based on the following research (quoting from David A. Freedman):
In particular, the 1965 paper with the innocent title âOn the asymptotic behaviour of Bayes estimates in the discrete case IIâ finds the rather disappointing answer that when sampling from a countably infinite population the Bayesian procedure fails almost everywhere, i.e., one does not obtain the true distribution asymptotically. This situation is quite different from the finite case when the (discrete) random variable takes only finite many values and the Bayesian method is consistent in agreement with earlier findings of Doob (1948).
From Bayesian inference § Alternatives to Bayesian updating:
Ian Hacking noted that traditional âDutch bookâ arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. Hacking wrote: âAnd neither the Dutch book argument nor any other in the personalist arsenal of proofs of the probability axioms entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour.â
Indeed, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on âprobability kinematicsâ) following the publication of Richard C. Jeffreyâs rule, which applies Bayesâ rule to the case where the evidence itself is assigned a probability. The additional hypotheses needed to uniquely require Bayesian updating have been deemed to be substantial, complicated, and unsatisfactory.
2.1.2 Combining other information#
2.1.3 From counts to probability#
The author uses the term âplausibilityâ as a synonym for probability; itâs not clear why the word is being introduced.
ways <- c(0, 3, 8, 9, 0)
ways / sum(ways)
- 0
- 0.15
- 0.4
- 0.45
- 0
The following sentence seems to accidentally italicize the word they (đ):
These plausibilities are also probabilitiesâthey are âŠ
2.2. Building a model#
The three steps the author introduces in this section are almost surely coming directly from his reading of BDA3 section The three steps of Bayesian data analysis.
2.2.1. A data story#
2.2.2. Bayesian updating#
2.2.3. Evaluate#
2.3. Components of the model#
2.3.1. Variables#
2.3.2. Definitions#
By far the most confusing definition given here is for the likelihood. In this book, a âlikelihoodâ will refer to a distribution function assigned to an observed variable. In this section, for example, the âlikelihoodâ is the binomial distribution. According to the author, this is the language used in âconventionalâ statistics as well. The author often calls this the âlikelihoodâ but it is a function, of course, because any probability distribution is a function.
In non-Bayesian statistics and in particular on Wikipedia the definition of the likelihood function is completely different and denoted with \(\mathcal{L}\). See the authorâs footnote and Likelihood function.
dbinom(6, size = 9, prob = 0.5)
2.4. Making the model go#
2.4.3 Grid Approximation#
p_grid <- seq(from=0, to=1, length.out=20)
p_grid
- 0
- 0.0526315789473684
- 0.105263157894737
- 0.157894736842105
- 0.210526315789474
- 0.263157894736842
- 0.315789473684211
- 0.368421052631579
- 0.421052631578947
- 0.473684210526316
- 0.526315789473684
- 0.578947368421053
- 0.631578947368421
- 0.684210526315789
- 0.736842105263158
- 0.789473684210526
- 0.842105263157895
- 0.894736842105263
- 0.947368421052632
- 1
prior <- rep(1, 20)
prior
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
- 1
# compute likelihood at each value in grid
likelihood <- dbinom(6, size=9, prob=p_grid)
plot(p_grid, likelihood, type="b", xlab="probability of water", ylab="likelihood")
# compute product of likelihood and prior
unstd.posterior <- likelihood * prior
plot(p_grid, unstd.posterior, type="b", xlab="probability of water", ylab="unstandardized posterior")
2.4.4 Quadratic Approximation#
library(rethinking)
Loading required package: rstan
Loading required package: StanHeaders
rstan version 2.32.6 (Stan version 2.32.2)
For execution on a local, multicore CPU with excess RAM we recommend calling
options(mc.cores = parallel::detectCores()).
To avoid recompilation of unchanged Stan programs, we recommend calling
rstan_options(auto_write = TRUE)
For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
change `threads_per_chain` option:
rstan_options(threads_per_chain = 1)
Loading required package: parallel
rethinking (Version 2.13)
Attaching package: ârethinkingâ
The following object is masked from âpackage:statsâ:
rstudent
globe.qa <- quap(
alist(
W ~ dbinom(W+L, p), # binomial likelihood
p ~ dunif(0, 1) # uniform prior
),
data=list(W=6,L=3) )
precis(globe.qa)
mean | sd | 5.5% | 94.5% | |
---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | |
p | 0.6666669 | 0.1571337 | 0.4155368 | 0.9177969 |
2.4.5 Markov chain Monte Carlo#
n_samples <- 1000
p <- rep(NA, n_samples)
p[1] <- 0.5
W <- 6
L <- 3
for (i in 2:n_samples) {
p_new <- rnorm( 1 , p[i-1] , 0.1 )
if ( p_new < 0 ) p_new <- abs( p_new )
if ( p_new > 1 ) p_new <- 2 - p_new
q0 <- dbinom( W , W+L , p[i-1] )
q1 <- dbinom( W , W+L , p_new )
p[i] <- ifelse( runif(1) < q1/q0 , p_new , p[i-1] )
}
dens( p , xlim=c(0,1) )
curve( dbeta( x , W+1 , L+1 ) , lty=2 , add=TRUE )