# Practice: Chp. 11#

source("iplot.R")
suppressPackageStartupMessages(library(rethinking))


11E1. If an event has probability 0.35, what are the log-odds of this event?

Answer. The odds are $$0.35 / (1 - 0.35) = 0.35/0.65$$ or about a half. The odds (Odds) in favor of the event are about one to two. On the log2 scale we should expect the log-odds to be about negative one:

$log2\left(\frac{p}{1-p}\right) = log2(0.35 / (1 - 0.35)) = -0.893$

For a logarithmic base of e contract the log scale by log(2) or about 0.7:

$log\left(\frac{p}{1-p}\right) = log(0.35 / (1 - 0.35)) = -0.619$

11E2. If an event has log-odds 3.2, what is the probability of the event?

Answer. Expect a log2-odds of about 3.2/0.7 or around 4.6. That is, the odds in favor of the event should be about 2**4.6 to one, perhaps around 25:1.

$\frac{1}{1+e^{-x}} = \frac{1}{1+e^{-3.2}} = 0.961$

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

$q = e^{\beta} = e^{1.7} = 5.47$

The ‘proportional odds’ is the change in odds for a unit increase in the predictor variable. So when the predictor variable increases by one, we can expect the odds $$p/(1-p)$$ to increase by about a factor of five.

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

Answer. We use an offset in a Poisson regression (see section 11.2.3) when we have counts that were collected at varying densities. For example, counts collected over varying:

• Lengths of observation

• Area of sampling

• Intensity of sampling

The offset lets us include all the data in the same model while still accounting for the differences in measurement.

A specific example would be collecting the number of visitors to a restaurant. Some counts may be collected on a daily basis, and others on a weekly basis.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

Answer. The likelihood changes when you move from the disaggregated to aggregated format because now there is no longer exactly one trial associated with every event.

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in outcome?

Answer. Assuming the Poisson regression uses a log link, for a unit increase in the predictor variable we should see an increase in the rate (or mean) $$\lambda$$ of about five:

$q = \frac{exp(\alpha + \beta (x + 1))}{exp(\alpha + \beta x)} = exp(\beta) = exp(1.7) \approx 5.47$

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

Answer. The binomial distribution takes a probability parameter (between zero and one) and the logit link constrains a parameter to this range.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

Answer. The Poisson distribution takes a positive real-valued parameter and the log link constrains a parameter to this range.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

Answer. It would imply the mean $$\lambda$$ is between zero and one, rather than simply positive and real-valued. If the researcher knows the maximum number of times (e.g. one) an event can occur, it seems like it would be more appropriate to use the binomial distribution in many cases and infer p instead.

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

Answer. The maximum entropy constraint for both these distributions is that the expected value is a constant. See Maximum entropy probability distribution.

These constraints are the same because the Poisson distribution is an approximation of the binomial when p is small and N is large. See Poisson approximation.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0, 10). Re-estimate the posterior using both ulam and quap. Do the differences increase or decrease? Why?

Answer. The quadratic approximation assumes a symmetric Gaussian posterior, so it may produce posterior inferences that are more symmetric (unlike MCMC). In this case, the algorithms produce marginally different results.

When we move to the flat prior, quap struggles to produce a decent result because it is starting with a prior that is completely non-symmetric. It may pick either the left or the right peak of the prior to optimize from.

source('practice11/load-chimp-models.R')
flush.console()

m11.4_flat_prior <- ulam(
alist(
pulled_left ~ dbinom(1, p),
logit(p) <- a[actor] + b[treatment],
a[actor] ~ dnorm(0, 10),
b[treatment] ~ dnorm(0, 0.5)
),
data = dat_list, chains = 4, log_lik = TRUE
)

m11.4_quap_flat_prior <- quap(
m11.4_flat_prior@formula,
data = dat_list
)
flush.console()

iplot(function() {
plot(compare(m11.4, m11.4_quap, m11.4_flat_prior, m11.4_quap_flat_prior))
}, ar=3)

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.7 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 0.560287 seconds (Warm-up)
Chain 1:                0.529413 seconds (Sampling)
Chain 1:                1.0897 seconds (Total)
Chain 1:

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 6.6e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 0.625587 seconds (Warm-up)
Chain 2:                0.595378 seconds (Sampling)
Chain 2:                1.22097 seconds (Total)
Chain 2:

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 5.7e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
Chain 3:
Chain 3:
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3:
Chain 3:  Elapsed Time: 0.594808 seconds (Warm-up)
Chain 3:                0.584607 seconds (Sampling)
Chain 3:                1.17942 seconds (Total)
Chain 3:

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 8.3e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.83 seconds.
Chain 4:
Chain 4:
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4:
Chain 4:  Elapsed Time: 0.569952 seconds (Warm-up)
Chain 4:                0.535347 seconds (Sampling)
Chain 4:                1.1053 seconds (Total)
Chain 4:

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.7 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 0.804382 seconds (Warm-up)
Chain 1:                0.692694 seconds (Sampling)
Chain 1:                1.49708 seconds (Total)
Chain 1:

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 6.3e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 0.91432 seconds (Warm-up)
Chain 2:                0.724335 seconds (Sampling)
Chain 2:                1.63865 seconds (Total)
Chain 2:

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 6.2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.62 seconds.
Chain 3:
Chain 3:
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3:
Chain 3:  Elapsed Time: 0.832641 seconds (Warm-up)
Chain 3:                0.677285 seconds (Sampling)
Chain 3:                1.50993 seconds (Total)
Chain 3:

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 5.8e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.58 seconds.
Chain 4:
Chain 4:
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4:
Chain 4:  Elapsed Time: 0.764801 seconds (Warm-up)
Chain 4:                0.671788 seconds (Sampling)
Chain 4:                1.43659 seconds (Total)
Chain 4:

Warning message:
“There were 12 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.”

Warning message:
“Examine the pairs() plot to diagnose sampling problems
”

Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”

Warning message in compare(m11.4, m11.4_quap, m11.4_flat_prior, m11.4_quap_flat_prior):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.” 11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

Answer. The results of inference are significantly different, essentially changing the nature of the answer. The ‘outlier’ is relevant to what we conclude for both low and high contact cultures.

## R code 11.36
library(rethinking)
data(Kline)
d <- Kline

## R code 11.37
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)

## R code 11.45
dat <- list(
T = d$total_tools, P = d$P,
cid = d$contact_id ) # interaction model m11.10 <- ulam( alist( T ~ dpois(lambda), log(lambda) <- a[cid] + b[cid] * P, a[cid] ~ dnorm(3, 0.5), b[cid] ~ dnorm(0, 0.2) ), data = dat, chains = 4, log_lik = TRUE ) flush.console() k <- PSIS(m11.10, pointwise = TRUE)$k

iplot(function() {
par(mfrow=c(1,2))
## R code 11.47
plot(dat$P, dat$T,
xlab = "log population (std)", ylab = "total tools",
col = rangi2, pch = ifelse(dat$cid == 1, 1, 16), lwd = 2, ylim = c(0, 75), cex = 1 + normalize(k) ) # set up the horizontal axis values to compute predictions at ns <- 100 P_seq <- seq(from = -1.4, to = 3, length.out = ns) # predictions for cid=1 (low contact) lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 1)) lmu <- apply(lambda, 2, mean) lci <- apply(lambda, 2, PI) lines(P_seq, lmu, lty = 2, lwd = 1.5) shade(lci, P_seq, xpd = TRUE) # predictions for cid=2 (high contact) lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 2)) lmu <- apply(lambda, 2, mean) lci <- apply(lambda, 2, PI) lines(P_seq, lmu, lty = 1, lwd = 1.5) shade(lci, P_seq, xpd = TRUE) ## R code 11.48 plot(d$population, d$total_tools, xlab = "population", ylab = "total tools", col = rangi2, pch = ifelse(dat$cid == 1, 1, 16), lwd = 2,
ylim = c(0, 75), cex = 1 + normalize(k)
)

ns <- 100
P_seq <- seq(from = -5, to = 3, length.out = ns)
# 1.53 is sd of log(population)
# 9 is mean of log(population)
pop_seq <- exp(P_seq * 1.53 + 9)

lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 1))
lmu <- apply(lambda, 2, mean)
lci <- apply(lambda, 2, PI)
lines(pop_seq, lmu, lty = 2, lwd = 1.5)

lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 2))
lmu <- apply(lambda, 2, mean)
lci <- apply(lambda, 2, PI)
lines(pop_seq, lmu, lty = 1, lwd = 1.5)
}, ar=1.8)

d_drop_hawaii <- subset(d, culture!='Hawaii')

# For some reason, we need to rescale and log to include the 'scale' attributes in the call to ulam.
d_drop_hawaii$P <- scale(log(d_drop_hawaii$population))
d_drop_hawaii$contact_id <- ifelse(d_drop_hawaii$contact == "high", 2, 1)

dat_drop_hawaii <- list(
T = d_drop_hawaii$total_tools, P = d_drop_hawaii$P,
cid = d_drop_hawaii$contact_id[1:9] ) # interaction model m_drop_hawaii <- ulam( alist( T ~ dpois(lambda), log(lambda) <- a[cid] + b[cid] * P, a[cid] ~ dnorm(3, 0.5), b[cid] ~ dnorm(0, 0.2) ), data = dat_drop_hawaii, chains = 4, log_lik = TRUE ) flush.console() k <- PSIS(m_drop_hawaii, pointwise = TRUE)$k

iplot(function() {
par(mfrow=c(1,2))
## R code 11.47
plot(dat_drop_hawaii$P, dat_drop_hawaii$T,
xlab = "log population (std)", ylab = "total tools",
col = rangi2, pch = ifelse(dat_drop_hawaii$cid == 1, 1, 16), lwd = 2, ylim = c(0, 75), cex = 1 + normalize(k) ) # set up the horizontal axis values to compute predictions at ns <- 100 P_seq <- seq(from = -1.4, to = 3, length.out = ns) # predictions for cid=1 (low contact) lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 1)) lmu <- apply(lambda, 2, mean) lci <- apply(lambda, 2, PI) lines(P_seq, lmu, lty = 2, lwd = 1.5) shade(lci, P_seq, xpd = TRUE) # predictions for cid=2 (high contact) lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 2)) lmu <- apply(lambda, 2, mean) lci <- apply(lambda, 2, PI) lines(P_seq, lmu, lty = 1, lwd = 1.5) shade(lci, P_seq, xpd = TRUE) ## R code 11.48 plot(d_drop_hawaii$population, d_drop_hawaii$total_tools, xlab = "population", ylab = "total tools", col = rangi2, pch = ifelse(dat_drop_hawaii$cid == 1, 1, 16), lwd = 2,
ylim = c(0, 75), cex = 1 + normalize(k)
)

ns <- 100
P_seq <- seq(from = -5, to = 3, length.out = ns)
# 1.53 is sd of log(population)
# 9 is mean of log(population)
pop_seq <- exp(P_seq * 1.53 + 9)

lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 1))
lmu <- apply(lambda, 2, mean)
lci <- apply(lambda, 2, PI)
lines(pop_seq, lmu, lty = 2, lwd = 1.5)

lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 2))
lmu <- apply(lambda, 2, mean)
lci <- apply(lambda, 2, PI)
lines(pop_seq, lmu, lty = 1, lwd = 1.5)
}, ar=1.8)

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 0.016788 seconds (Warm-up)
Chain 1:                0.015525 seconds (Sampling)
Chain 1:                0.032313 seconds (Total)
Chain 1:

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 0.018063 seconds (Warm-up)
Chain 2:                0.012588 seconds (Sampling)
Chain 2:                0.030651 seconds (Total)
Chain 2:

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 8e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 3:
Chain 3:
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3:
Chain 3:  Elapsed Time: 0.0164 seconds (Warm-up)
Chain 3:                0.017066 seconds (Sampling)
Chain 3:                0.033466 seconds (Total)
Chain 3:

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 7e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 4:
Chain 4:
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4:
Chain 4:  Elapsed Time: 0.017309 seconds (Warm-up)
Chain 4:                0.015 seconds (Sampling)
Chain 4:                0.032309 seconds (Total)
Chain 4:

Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points. SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 1.6e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
Chain 1:
Chain 1:
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1:
Chain 1:  Elapsed Time: 0.019058 seconds (Warm-up)
Chain 1:                0.019513 seconds (Sampling)
Chain 1:                0.038571 seconds (Total)
Chain 1:

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 2:
Chain 2:
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2:
Chain 2:  Elapsed Time: 0.018274 seconds (Warm-up)
Chain 2:                0.017154 seconds (Sampling)
Chain 2:                0.035428 seconds (Total)
Chain 2:

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 8e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 3:
Chain 3:
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3:
Chain 3:  Elapsed Time: 0.017649 seconds (Warm-up)
Chain 3:                0.016271 seconds (Sampling)
Chain 3:                0.03392 seconds (Total)
Chain 3:

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 8e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 4:
Chain 4:
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4:
Chain 4:  Elapsed Time: 0.017945 seconds (Warm-up)
Chain 4:                0.018228 seconds (Sampling)
Chain 4:                0.036173 seconds (Total)
Chain 4:

Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points. 11H1. Use WAIC to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

Answer. The last model (m11.4) does the best by far because it includes the actor predictor. The actor variable is highly predictive of the output; the big story of this experiment is handedness.

iplot(function() {
plot(compare(m11.1, m11.2, m11.3, m11.4))
}, ar=3)

Warning message in compare(m11.1, m11.2, m11.3, m11.4):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.” 11H2. The data contained in library(MASS);data(eagles) are records of salmon pirating attempts by Bald Eagles in Washington State. See ?eagles for details. While one eagle feeds, sometimes another will swoop in and try to steal the salmon from it. Call the feeding eagle the “victim” and the thief the “pirate.” Use the available data to build a binomial GLM of successful pirating attempts.

(a) Consider the following model: \begin{align} \\ y_i & \sim Binomial(n_i, p_i) \\ logit(p_i) & = \alpha + \beta_P P_i + \beta_V V_i + \beta_A A_i \\ α & \sim Normal(0, 1.5) \\ \beta_P, \beta_V, \beta_A & \sim Normal(0, 0.5) \\ \end{align}

where $$y$$ is the number of successful attempts, $$n$$ is the total number of attempts, $$P$$ is a dummy variable indicating whether or not the pirate had large body size, $$V$$ is a dummy variable indicating whether or not the victim had large body size, and finally $$A$$ is a dummy variable indicating whether or not the pirate was an adult. Fit the model above to the eagles data, using both quap and ulam. Is the quadratic approximation okay?

Answer. The quadratic approximation is OK because this model includes pre-selected reasonable priors.

library(rethinking)
library(MASS)
data(eagles)
eag <- eagles
eag$Pi <- ifelse(eagles$P == "L", 1, 0)
eag$Vi <- ifelse(eagles$V == "L", 1, 0)
eag$Ai <- ifelse(eagles$A == "A", 1, 0)

mq_pirate_success <- quap(
alist(
y ~ dbinom(n, p),
logit(p) <- a + b_P*Pi + b_V*Vi + b_A*Ai,
a ~ dnorm(0, 1.5),
b_P ~ dnorm(0, 0.5),
b_V ~ dnorm(0, 0.5),
b_A ~ dnorm(0, 0.5)
),
data = eag
)

dat_eag <- list(
n = eag$n, y = eag$y,
Pi = eag$Pi, Vi = eag$Vi,
Ai = eag$Ai ) mu_pirate_success <- ulam( mq_pirate_success@formula, data = dat_eag, chains=4, cores=4, log_lik=TRUE ) iplot(function() { plot(compare(mq_pirate_success, mu_pirate_success)) }, ar=4.5)  Warning message in compare(mq_pirate_success, mu_pirate_success): “Not all model fits of same class. This is usually a bad idea, because it implies they were fit by different algorithms. Check yourself, before you wreck yourself.” (b) Now interpret the estimates. If the quadratic approximation turned out okay, then it’s okay to use the quap estimates. Otherwise stick to ulam estimates. Then plot the posterior predictions. Compute and display both (1) the predicted probability of success and its 89% interval for each row (i) in the data, as well as (2) the predicted success count and its 89% interval. What different information does each type of posterior prediction provide? Answer. The eagles data frame: display(eagles)  A data.frame: 8 × 5 ynPAV <int><int><fct><fct><fct> 11724LAL 22929LAS 31727LIL 42020LIS 5 112SAL 61516SAS 7 028SIL 8 1 4SIS This graph is from postcheck, as mentioned in the text. The documentation on this function isn’t too helpful; it’s easier to read the source code. In the following figure, the open circles ($mean in the raw data) represent the predicted probability of success and the vertical lines the 89% interval ($PI). The crosshairs represent the 89% interval of the predicted success count ($outPI). The purple dots are observed outcomes from the eagles data frame above. Notice outcomes are scaled to between zero and one; in case #1 the purple dot is at $$17/24 \approx 0.7083$$.

iplot(function() {
result <- postcheck(mu_pirate_success)
display_markdown("The raw data:")
display(result)
})


The raw data:

$mean 1. 0.707852485246756 2. 0.929201436946226 3. 0.559932119633125 4. 0.871499770934422 5. 0.327330970165867 6. 0.721970579910955 7. 0.202542445705932 8. 0.577003969544917$PI
 5% 94% 0.605532 0.888611 0.446378 0.798585 0.222418 0.607278 0.129328 0.426391 0.803036 0.960127 0.666826 0.928493 0.44702 0.825853 0.284903 0.716812
$outPI  5% 94% 0.541667 0.827586 0.37037 0.75 0.0833333 0.5 0.0714286 0.25 0.875 1 0.740741 1 0.583333 0.9375 0.357143 1 In part (1) we’re visualizing model parameters (including uncertainty). In part (2) we’re visualizing implied predictions on the outcome scale (including uncertainty) which indirectly includes the uncertainty in model parameters from the first part. (c) Now try to improve the model. Consider an interaction between the pirate’s size and age (immature or adult). Compare this model to the previous one, using WAIC. Interpret. Answer. The non-interaction model appears to do slightly better, but generally speaking the performance is almost the same. The interaction model is more complex (has more parameters) and that might slightly hurt its score. eag$PirateTreatment <- 1 + eag$Pi + eag$Ai

dat_eag_inter <- list(
n = eag$n, y = eag$y,
PirateTreatment = eag$PirateTreatment, Vi = eag$Vi
)

mu_pirate_success_inter <- ulam(
alist(
y ~ dbinom(n, p),
logit(p) <- a + b_t[PirateTreatment] + b_V*Vi,
a ~ dnorm(0, 1.5),
b_t[PirateTreatment] ~ dnorm(0, 0.5),
b_V ~ dnorm(0, 0.5)
),
data = dat_eag_inter, chains=4, cores=4, log_lik=TRUE
)

iplot(function() {
plot(compare(mu_pirate_success_inter, mu_pirate_success))
}, ar=4.5) 11H4. The data contained in data(salamanders) are counts of salamanders (Plethodon elongatus) from 47 different 49-m² plots in northern California. The column SALAMAN is the count in each plot, and the columns PCTCOVER and FORESTAGE are percent of ground cover and age of trees in the plot, respectively. You will model SALAMAN as a Poisson variable.

(a) Model the relationship between density and percent cover, using a log-link (same as the example in the book and lecture). Use weakly informative priors of your choosing. Check the quadratic approximation again, by comparing quap to ulam. Then plot the expected counts and their 89% interval against percent cover. In which ways does the model do a good job? In which ways does it do a bad job?

Answer. The salamanders data frame:

data(salamanders)
display(salamanders)
sal <- salamanders
sal$StdPctCover <- scale(sal$PCTCOVER)

sal_dat <- list(
S = sal$SALAMAN, StdPctCover = sal$StdPctCover
)

mq_sal <- quap(
alist(
S ~ dpois(lambda),
log(lambda) <- a + b*StdPctCover,
a ~ dnorm(3, 0.5),
b ~ dnorm(0, 0.2)
), data = sal_dat
)

mu_sal <- ulam(
mq_sal@formula,
data = sal_dat, chains=4, cores=4, log_lik=TRUE
)

A data.frame: 47 × 4
SITESALAMANPCTCOVERFORESTAGE
<int><int><int><int>
11385316
21186 88
31190548
4 988 64
5 889 43
6 783368
7 683200
8 691 71
9 588 42
10 590551
11 487675
12 383217
13 387212
14 389398
15 392357
16 393478
17 2 2 5
18 287 30
19 293551
20 1 7 3
21 116 15
22 119 31
23 129 10
24 134 49
25 146 30
26 180215
27 186586
28 188105
29 192210
30 0 0 0
31 0 1 4
32 0 3 3
33 0 5 2
34 0 8 10
35 0 9 8
36 011 6
37 014 49
38 017 29
39 024 57
40 044 59
41 052 78
42 077 50
43 078320
44 080411
45 086133
46 089 60
47 091187

There doesn’t seem to be a difference between quap and ulam in inference:

iplot(function() {
plot(compare(mu_sal, mq_sal))
}, ar=4.5)

pctc <- list(StdPctCover=scale(seq(0, 100)))
sims <- sim(mq_sal, data=pctc)

expectedSalCount.mu <- apply(sims, 2, mean)
expectedSalCount.PI <- apply(sims, 2, PI)
display_markdown("The raw data:")
display(expectedSalCount.mu)
display(expectedSalCount.PI)

iplot(function() {
plot(pctc$StdPctCover, expectedSalCount.mu, ylim=c(0,12)) shade(expectedSalCount.PI, pctc$StdPctCover)
title("Expected salamander count")
})

Warning message in compare(mu_sal, mq_sal):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.” The raw data:

1. 0.747
2. 0.807
3. 0.746
4. 0.773
5. 0.83
6. 0.843
7. 0.877
8. 0.879
9. 0.915
10. 0.928
11. 0.974
12. 1.046
13. 0.982
14. 1.021
15. 1.05
16. 1.047
17. 1.072
18. 1.118
19. 1.135
20. 1.116
21. 1.203
22. 1.242
23. 1.269
24. 1.218
25. 1.254
26. 1.338
27. 1.402
28. 1.351
29. 1.375
30. 1.435
31. 1.43
32. 1.58
33. 1.481
34. 1.524
35. 1.565
36. 1.604
37. 1.629
38. 1.745
39. 1.745
40. 1.73
41. 1.742
42. 1.871
43. 1.837
44. 1.942
45. 1.948
46. 2.016
47. 2
48. 2.08
49. 2.235
50. 2.199
51. 2.21
52. 2.328
53. 2.38
54. 2.419
55. 2.472
56. 2.496
57. 2.521
58. 2.635
59. 2.66
60. 2.755
61. 2.765
62. 2.806
63. 2.942
64. 3.004
65. 3.145
66. 3
67. 3.185
68. 3.245
69. 3.254
70. 3.352
71. 3.489
72. 3.614
73. 3.635
74. 3.723
75. 3.772
76. 3.873
77. 4.029
78. 4.006
79. 4.176
80. 4.301
81. 4.417
82. 4.648
83. 4.531
84. 4.676
85. 4.837
86. 4.977
87. 4.998
88. 5.197
89. 5.216
90. 5.464
91. 5.393
92. 5.529
93. 5.681
94. 5.777
95. 6.018
96. 6.171
97. 6.272
98. 6.32
99. 6.566
100. 6.749
101. 6.735
 5% 94% 0 0 0 0 0 0 0 0 0 0 ⋯ 2 2 2 2 2 2 2 3 2 3 2 2 2 2 2.055 3 3 3 3 3 ⋯ 10 10 10 10 10 11 11 11 12 11 The model does well (is more certain) when it needs to guess lower numbers; when there are fewer salamanders when the ground cover is low. In general, though, it is uncertain about most of its predictions.

(b) Can you improve the model by using the other predictor, FORESTAGE? Try any models you think useful. Can you explain why FORESTAGE helps or does not help with prediction?

Answer. It doesn’t seem forest age helps much with prediction; it may be that forest age only predicts ground cover and so adds no extra information.

sal$StdForestAge <- scale(sal$FORESTAGE)
sal_dat <- list(
S = sal$SALAMAN, StdPctCover = sal$StdPctCover,
StdForestAge = sal$StdForestAge ) mq_sal_forest_age <- quap( alist( S ~ dpois(lambda), log(lambda) <- a + b_c*StdPctCover + b_f*StdForestAge, a ~ dnorm(3, 0.5), b_c ~ dnorm(0, 0.2), b_f ~ dnorm(0, 0.2) ), data = sal_dat ) iplot(function() { plot(compare(mq_sal, mq_sal_forest_age)) }, ar=4.5) library(dagitty)  11H4. The data in data(NWOgrants) are outcomes for scientific funding applications for the Netherlands Organization for Scientific Research (NWO) from 2010-2012 (see van der Lee and Ellemers (2015) for data and context). These data have a very similar structure to the UCBAdmit data discussed in the chapter. I want you to consider a similar question: What are the total and indirect causal effects of gender on grant awards? Consider a mediation path (a pipe) through discipline. Draw the corresponding DAG and then use one or more binomial GLMs to answer the question. What is your causal interpretation? If NWO’s goal is to equalize rates of funding between men and women, what type of intervention would be most effective? Answer. The NWOGrants data.frame: data(NWOGrants) display(NWOGrants) nwo <- NWOGrants  A data.frame: 18 × 4 disciplinegenderapplicationsawards <fct><fct><int><int> Chemical sciences m 8322 Chemical sciences f 3910 Physical sciences m13526 Physical sciences f 39 9 Physics m 6718 Physics f 9 2 Humanities m23033 Humanities f16632 Technical sciences m18930 Technical sciences f 6213 Interdisciplinary m10512 Interdisciplinary f 7817 Earth/life sciencesm15638 Earth/life sciencesf12618 Social sciences m42565 Social sciences f40947 Medical sciences m24546 Medical sciences f26029 Reproduce the DAG from the text, where D now means discipline rather than department: nwo_dag <- dagitty(' dag { bb="0,0,1,1" A [pos="0.500,0.400"] D [pos="0.300,0.300"] G [pos="0.100,0.400"] D -> A G -> A G -> D }') iplot(function() plot(nwo_dag), scale=10) For the total causal effect of gender on grant awards, do not condition on discipline: nwo_dat <- list( awards = nwo$awards,
applications = nwo$applications, gid = ifelse(as.character(nwo$gender) == "m", 1, 2)
)
m_nwo_tot <- quap(
alist(
awards ~ dbinom(applications, p),
logit(p) <- a[gid],
a[gid] ~ dnorm(0, 1.5)
),
data = nwo_dat
)
iplot(function() {
display_markdown("Raw data:")
x <- precis(m_nwo_tot, depth = 2)
display(x)
plot(x, main="m_nwo_tot")
}, ar=4.5)


Raw data:

A precis: 2 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-1.5314180.06462435-1.634700-1.428136
a-1.7374280.08121368-1.867223-1.607633 These two coefficients are on the log-odds (parameter) scale, so the odds of a male getting an award from an application is about $$exp(-1.53) = 0.217$$ and the odds of a female about $$exp(-1.74) = 0.176$$. If it helps you can convert these absolute odds to probabilities with $$p = o/(1+o)$$.

Lets also look at the relative performance of males to females (see Contrast (statistics)). As in the text we’ll do so on the logit and probability scale:

post <- extract.samples(m_nwo_tot)
diff_a <- post$a[,1] - post$a[,2]
diff_p <- inv_logit(post$a[,1]) - inv_logit(post$a[,2])
iplot(function() {
display_markdown("Raw data:")
x <- precis(list(diff_a=diff_a, diff_p=diff_p))
display(x)
plot(x, main="m_nwo_tot: contrast")
}, ar=4.5)


Raw data:

A precis: 2 × 5
meansd5.5%94.5%histogram
<dbl><dbl><dbl><dbl><chr>
diff_a0.206345890.103935910.0392241460.37274865▁▁▂▇▇▃▁▁
diff_p0.028100650.014043150.0053384120.05057845▁▁▁▂▅▇▇▃▁▁▁ For the direct causal effect of gender on grant awards, condition on discipline:

nwo_dat$disc_id <- rep(1:9, each = 2) m_nwo_dir <- quap( alist( awards ~ dbinom(applications, p), logit(p) <- a[gid] + delta[disc_id], a[gid] ~ dnorm(0, 1.5), delta[disc_id] ~ dnorm(0, 1.5) ), data = nwo_dat ) iplot(function() { display_markdown("Raw data:") x <- precis(m_nwo_dir, depth = 2) display(x) plot(x, main="m_nwo_dir") }, ar=3) post <- extract.samples(m_nwo_dir) diff_a <- post$a[,1] - post$a[,2] diff_p <- inv_logit(post$a[,1]) - inv_logit(post$a[,2]) iplot(function() { display_markdown("Raw data:") x <- precis(list(diff_a=diff_a, diff_p=diff_p)) display(x) plot(x, main="m_nwo_dir: contrast") }, ar=4.5)  Raw data: A precis: 11 × 4 meansd5.5%94.5% <dbl><dbl><dbl><dbl> a-1.15762150.4565098-1.8872123-0.4280307 a-1.29528070.4597880-2.0301107-0.5604507 delta 0.16348100.4908968-0.6210669 0.9480288 delta-0.18864040.4856755-0.9648435 0.5875628 delta 0.13960960.5114927-0.6778545 0.9570737 delta-0.41059420.4709093-1.1631982 0.3420098 delta-0.38111930.4792305-1.1470223 0.3847836 delta-0.44681810.4895117-1.2291523 0.3355161 delta-0.17572420.4742531-0.9336722 0.5822238 delta-0.63720700.4640406-1.3788335 0.1044195 delta-0.51588020.4687249-1.2649931 0.2332328 Raw data: A precis: 2 × 5 meansd5.5%94.5%histogram <dbl><dbl><dbl><dbl><chr> diff_a0.137597940.10750150-0.035753930.31013063▁▁▂▅▇▃▁▁▁ diff_p0.023791040.01962432-0.006132650.05668736▁▁▂▇▇▃▁▁▁ Although the advantage for males has decreased, there is still some evidence for it. One possible intervention is to increase funding to topics considered more valuable by females, assuming they prefer to do research in the areas they are applying. Another is to encourage females to get more involved (probably at a younger age) in the topics with more funding. 11H5. Suppose that the NWO Grants sample has an unobserved confound that influences both choice of discipline and the probability of an award. One example of such a confound could be the career stage of each applicant. Suppose that in some disciplines, junior scholars apply for most of the grants. In other disciplines, scholars from all career stages compete. As a result, career stage influences discipline as well as the probability of being awarded a grant. Add these influences to your DAG from the previous problem. What happens now when you condition on discipline? Does it provide an un-confounded estimate of the direct path from gender to an award? Why or why not? Justify your answer with the backdoor criterion. If you have trouble thinking this though, try simulating fake data, assuming your DAG is true. Then analyze it using the model from the previous problem. What do you conclude? Is it possible for gender to have a real direct causal influence but for a regression conditioning on both gender and discipline to suggest zero influence? Answer. Reproducing the DAG from the text (where U is now career stage): nwo_career_stage_dag <- dagitty(' dag { bb="0,0,1,1" A [pos="0.500,0.400"] D [pos="0.300,0.300"] G [pos="0.100,0.400"] U [latent,pos="0.500,0.300"] D -> A G -> A G -> D U -> D U -> A }') iplot(function() plot(nwo_career_stage_dag), scale=10) If we condition on D then we open the backdoor path ‘G -> D <- U -> A’, which could confound any causal inferences we make about the effect of gender on award. That is, we will induce a statistical association between G and A. This association could cancel out any direct causal influence of G on A. # set.seed(145) # n <- 100 # U <- rnorm(n) # gid <- rbinom(n, 1, 0.5) + 1 # gender_to_disc_log_odds <- c(-0.2, 0.2) # p_disc <- inv_logit(gender_to_disc_log_odds[gid] + U) # disc_id <- rbinom(n, 1, p_disc) + 1 # gender_to_award_log_odds <- c(-1.5, -2) # disc_to_award_log_odds <- c(0, 0) # p_award <- inv_logit(gender_to_award_log_odds[gid] + U + disc_to_award_log_odds[disc_id]) # awards <- rbinom(n, 1, p_award) # # display_markdown(" # Imagine there are only two disciplines. Researchers with more scholarly experience tend to work in # the second discipline, and females tend to work in the second discipline. If we know you're in the # second discipline, and you got an award, then it's likely you got the award because you're # experienced. # # # We're assuming that discipline has no influence on award in this case; this assumption is not # necessary for conditioning on discipline to connect (d-connect) gender and scholarly experience. # # G D U # m 1 2 # f 1 # m 2 2 # f 2  11H6. The data in data(Primates301) are 301 primate species and associated measures. In this problem, you will consider how brain size is associated with social learning. There are three parts. (a) Model the number of observations of social_learning for each species as a function of the log brain size. Use a Poisson distribution for the social_learning outcome variable. Interpret the resulting posterior. Answer. The head of the Primates301 data.frame: data(Primates301) display(head(Primates301, n = 20L)) prim_inc <- Primates301 prim <- prim_inc[complete.cases(prim_inc$social_learning, prim_inc$brain, prim_inc$research_effort), ]

A data.frame: 20 × 16
namegenusspeciessubspeciesspp_idgenus_idsocial_learningresearch_effortbrainbodygroup_sizegestationweaninglongevitysex_maturitymaternal_investment
<fct><fct><fct><fct><int><int><int><int><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1Allenopithecus_nigroviridis AllenopithecusnigroviridisNA 11 0 658.024655.0040.00 NA106.15276.0 NA NA
2Allocebus_trichotis Allocebus trichotis NA 22 0 6 NA 78.09 1.00 NA NA NA NA NA
3Alouatta_belzebul Alouatta belzebul NA 33 01552.846395.00 7.40 NA NA NA NA NA
4Alouatta_caraya Alouatta caraya NA 43 04552.635383.00 8.90185.92323.16243.61276.72509.08
5Alouatta_guariba Alouatta guariba NA 53 03751.705175.00 7.40 NA NA NA NA NA
6Alouatta_palliata Alouatta palliata NA 63 37949.886250.0013.10185.42495.60300.01578.42681.02
7Alouatta_pigra Alouatta pigra NA 73 02551.138915.00 5.50185.92 NA240.0 NA NA
8Alouatta_sara Alouatta sara NA 83 0 459.086611.04 NA NA NA NA NA NA
9Alouatta_seniculus Alouatta seniculus NA 93 08255.225950.00 7.90189.90370.04300.01690.22559.94
10Aotus_azarai Aotus azarai NA 104 02220.671205.00 4.10 NA229.69 NA NA NA
11Aotus_azarai_boliviensis Aotus azarai boliviensis 114NANA NA NA NA NA NA NA NA NA
12Aotus_brumbacki Aotus brumbacki NA 124 0NA NA NA NA NA NA NA NA NA
13Aotus_infulatus Aotus infulatus NA 134 0 6 NA NA NA NA NA NA NA NA
14Aotus_lemurinus Aotus lemurinus NA 144 01616.30 734.00 NA132.23 74.57216.0 755.15206.80
15Aotus_lemurinus_griseimembraAotus lemurinus griseimembra154NANA NA NA NA NA NA NA NA NA
16Aotus_nancymaae Aotus nancymaae NA 164 0 5 NA 791.03 4.00 NA NA NA NA NA
17Aotus_nigriceps Aotus nigriceps NA 174 0 1 NA 958.00 3.30 NA NA NA NA NA
18Aotus_trivirgatus Aotus trivirgatus NA 184 05816.85 989.00 3.15133.47 76.21303.6 736.60209.68
19Aotus_vociferans Aotus vociferans NA 194 012 NA 703.00 3.30 NA NA NA NA NA
20Archaeolemur_majori Archaeolemur majori NA 205NANA NA NA NA NA NA NA NA NA

After filtering for complete cases:

display(head(prim, n = 20L))

brain_dat <- list(
SocialLearning = prim$social_learning, LogBrain = log(prim$brain)
)

mq_brain_size <- quap(
alist(
SocialLearning ~ dpois(lambda),
log(lambda) <- a + b * LogBrain,
a ~ dnorm(3, 0.5),
b ~ dnorm(0, 0.2)
),
data = brain_dat
)

mu_brain_size <- ulam(
mq_brain_size@formula,
data = brain_dat, chains = 4, cores = 4, log_lik = TRUE
)

A data.frame: 20 × 16
namegenusspeciessubspeciesspp_idgenus_idsocial_learningresearch_effortbrainbodygroup_sizegestationweaninglongevitysex_maturitymaternal_investment
<fct><fct><fct><fct><int><int><int><int><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1Allenopithecus_nigroviridisAllenopithecusnigroviridisNA 1 10 6 58.024655.0040.00 NA106.15276.0 NA NA
3Alouatta_belzebul Alouatta belzebul NA 3 3015 52.846395.00 7.40 NA NA NA NA NA
4Alouatta_caraya Alouatta caraya NA 4 3045 52.635383.00 8.90185.92323.16243.61276.72 509.08
5Alouatta_guariba Alouatta guariba NA 5 3037 51.705175.00 7.40 NA NA NA NA NA
6Alouatta_palliata Alouatta palliata NA 6 3379 49.886250.0013.10185.42495.60300.01578.42 681.02
7Alouatta_pigra Alouatta pigra NA 7 3025 51.138915.00 5.50185.92 NA240.0 NA NA
8Alouatta_sara Alouatta sara NA 8 30 4 59.086611.04 NA NA NA NA NA NA
9Alouatta_seniculus Alouatta seniculus NA 9 3082 55.225950.00 7.90189.90370.04300.01690.22 559.94
10Aotus_azarai Aotus azarai NA10 4022 20.671205.00 4.10 NA229.69 NA NA NA
14Aotus_lemurinus Aotus lemurinus NA14 4016 16.30 734.00 NA132.23 74.57216.0 755.15 206.80
18Aotus_trivirgatus Aotus trivirgatus NA18 4058 16.85 989.00 3.15133.47 76.21303.6 736.60 209.68
22Arctocebus_calabarensis Arctocebus calabarensisNA22 60 1 6.92 309.00 1.00133.74109.26156.0 298.91 243.00
23Ateles_belzebuth Ateles belzebuth NA23 7012117.028167.0014.50138.20 NA336.0 NA NA
24Ateles_fusciceps Ateles fusciceps NA24 70 4114.249025.00 NA224.70482.70288.01799.68 707.40
25Ateles_geoffroyi Ateles geoffroyi NA25 7258105.097535.0042.00226.37816.35327.62104.571042.72
26Ateles_paniscus Ateles paniscus NA26 7030103.858280.0020.00228.18805.41453.62104.571033.59
28Avahi_laniger Avahi laniger NA28 8010 9.861207.00 2.00136.15149.15 NA NA 285.30
29Avahi_occidentalis Avahi occidentalisNA29 80 6 7.95 801.00 3.00 NA NA NA NA NA
32Bunopithecus_hoolock Bunopithecus hoolock NA3210024110.686728.00 3.20232.50635.13 NA2689.08 867.63
33Cacajao_calvus Cacajao calvus NA3311011 76.003165.0023.70180.00339.29324.01262.74 519.29
Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”

Warning message:
“Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess”


Use func=PSIS to generate warnings for high Pareto k values.

If you look at the data, you’ll see some classes (e.g. Gorillas and Chimpanzees) have orders of magnitude more study and more social learning examples than other species.

iplot(function()
plot(compare(mq_brain_size, mu_brain_size, func=PSIS))
, ar=4.5)

iplot(function() {
display_markdown("<br/>Raw data:")
x <- precis(mq_brain_size)
display(x)
plot(x, main="mq_brain_size")
}, ar=4.5)

Warning message in compare(mq_brain_size, mu_brain_size, func = PSIS):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”

Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.

Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points. Raw data:

A precis: 2 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-5.6876380.24026505-6.071628-5.303648
b 1.5635200.04787612 1.487005 1.640035 Although the model is confident in the b coefficient, the wide error bars on the WAIC scores indicate the out-of-sample performance may be much worse.

Still, in the context of this model it appears there is an association between brain size and social learning. For every unit increase in LogBrain i.e. for every order of magnitude increase in brain size we should see an exponential increase in examples of social learning. If we let $$b = 1.5$$ then for every order of magnitude we’ll see about $$exp(1.5) \approx 4.5$$ more examples of social learning.

(b) Some species are studied much more than others. So the number of reported instances of social_learning could be a product of research effort. Use the research_effort variable, specifically its logarithm, as an additional predictor variable. Interpret the coefficient for log research_effort. How does this model differ from the previous one?

brain_dat$LogResearchEffort <- log(prim$research_effort)

mq_brain_research <- quap(
alist(
SocialLearning ~ dpois(lambda),
log(lambda) <- a + b * LogBrain + d * LogResearchEffort,
a ~ dnorm(3, 0.5),
b ~ dnorm(0, 0.2),
d ~ dnorm(0, 0.2)
),
data = brain_dat
)

mu_brain_research <- ulam(
mq_brain_research@formula,
data = brain_dat, chains = 4, cores = 4, log_lik = TRUE
)


Again, use func=PSIS to generate warnings for high Pareto k values:

iplot(function()
plot(compare(mq_brain_research, mu_brain_research, func=PSIS))
, ar=4.5)

iplot(function() {
display_markdown("<br/>Raw data:")
x <- precis(mq_brain_research)
display(x)
plot(x, main="mq_brain_research")
}, ar=4.5)

Warning message in compare(mq_brain_research, mu_brain_research, func = PSIS):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”

Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.

Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points. Raw data:

A precis: 3 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-5.26935330.20355854-5.5946792-4.9440274
b 0.23263230.05079226 0.1514565 0.3138081
d 1.30945160.05033078 1.2290133 1.3898899 Answer. When we start to condition on research effort, the influence of brain size drops dramatically (to about 0.2 from 1.5).

(c) Draw a DAG to represent how you think the variables social_learning, brain, and research_effort interact. Justify the DAG with the measured associations in the two models above (and any other models you used).

Answer. It seems likely we were looking at the total causal effect of LogBrain in the first model, and the direct causal effect in the second.

brain_social_learning_dag <- dagitty('
dag {
bb="0,0,1,1"
LogBrain [pos="0.100,0.400"]
LogResearchEffort [pos="0.300,0.300"]
SocialLearning [pos="0.500,0.400"]
LogBrain -> SocialLearning
LogBrain -> LogResearchEffort
LogResearchEffort -> SocialLearning
}')
iplot(function() plot(brain_social_learning_dag), scale=10) 