Practice: Chp. 11#

source("iplot.R")
suppressPackageStartupMessages(library(rethinking))

11E1. If an event has probability 0.35, what are the log-odds of this event?

Answer. The odds are \(0.35 / (1 - 0.35) = 0.35/0.65\) or about a half. The odds (Odds) in favor of the event are about one to two. On the log2 scale we should expect the log-odds to be about negative one:

\[ log2\left(\frac{p}{1-p}\right) = log2(0.35 / (1 - 0.35)) = -0.893 \]

For a logarithmic base of e contract the log scale by log(2) or about 0.7:

\[ log\left(\frac{p}{1-p}\right) = log(0.35 / (1 - 0.35)) = -0.619 \]

11E2. If an event has log-odds 3.2, what is the probability of the event?

Answer. Expect a log2-odds of about 3.2/0.7 or around 4.6. That is, the odds in favor of the event should be about 2**4.6 to one, perhaps around 25:1.

\[ \frac{1}{1+e^{-x}} = \frac{1}{1+e^{-3.2}} = 0.961 \]

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

Answer. The ‘proportional odds’ is about five:

\[ q = e^{\beta} = e^{1.7} = 5.47 \]

The ‘proportional odds’ is the change in odds for a unit increase in the predictor variable. So when the predictor variable increases by one, we can expect the odds \(p/(1-p)\) to increase by about a factor of five.

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

Answer. We use an offset in a Poisson regression (see section 11.2.3) when we have counts that were collected at varying densities. For example, counts collected over varying:

  • Lengths of observation

  • Area of sampling

  • Intensity of sampling

The offset lets us include all the data in the same model while still accounting for the differences in measurement.

A specific example would be collecting the number of visitors to a restaurant. Some counts may be collected on a daily basis, and others on a weekly basis.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

Answer. The likelihood changes when you move from the disaggregated to aggregated format because now there is no longer exactly one trial associated with every event.

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in outcome?

Answer. Assuming the Poisson regression uses a log link, for a unit increase in the predictor variable we should see an increase in the rate (or mean) \(\lambda\) of about five:

\[ q = \frac{exp(\alpha + \beta (x + 1))}{exp(\alpha + \beta x)} = exp(\beta) = exp(1.7) \approx 5.47 \]

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

Answer. The binomial distribution takes a probability parameter (between zero and one) and the logit link constrains a parameter to this range.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

Answer. The Poisson distribution takes a positive real-valued parameter and the log link constrains a parameter to this range.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

Answer. It would imply the mean \(\lambda\) is between zero and one, rather than simply positive and real-valued. If the researcher knows the maximum number of times (e.g. one) an event can occur, it seems like it would be more appropriate to use the binomial distribution in many cases and infer p instead.

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

Answer. The maximum entropy constraint for both these distributions is that the expected value is a constant. See Maximum entropy probability distribution.

These constraints are the same because the Poisson distribution is an approximation of the binomial when p is small and N is large. See Poisson approximation.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0, 10). Re-estimate the posterior using both ulam and quap. Do the differences increase or decrease? Why?

Answer. The quadratic approximation assumes a symmetric Gaussian posterior, so it may produce posterior inferences that are more symmetric (unlike MCMC). In this case, the algorithms produce marginally different results.

When we move to the flat prior, quap struggles to produce a decent result because it is starting with a prior that is completely non-symmetric. It may pick either the left or the right peak of the prior to optimize from.

source('practice11/load-chimp-models.R')
flush.console()

m11.4_flat_prior <- ulam(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + b[treatment],
    a[actor] ~ dnorm(0, 10),
    b[treatment] ~ dnorm(0, 0.5)
  ),
  data = dat_list, chains = 4, log_lik = TRUE
)

m11.4_quap_flat_prior <- quap(
  m11.4_flat_prior@formula,
  data = dat_list
)
flush.console()

iplot(function() {
  plot(compare(m11.4, m11.4_quap, m11.4_flat_prior, m11.4_quap_flat_prior))
}, ar=3)
SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.7 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.560287 seconds (Warm-up)
Chain 1:                0.529413 seconds (Sampling)
Chain 1:                1.0897 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 6.6e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.66 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.625587 seconds (Warm-up)
Chain 2:                0.595378 seconds (Sampling)
Chain 2:                1.22097 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5.7e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.57 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.594808 seconds (Warm-up)
Chain 3:                0.584607 seconds (Sampling)
Chain 3:                1.17942 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 8.3e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.83 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.569952 seconds (Warm-up)
Chain 4:                0.535347 seconds (Sampling)
Chain 4:                1.1053 seconds (Total)
Chain 4: 
SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.7 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.804382 seconds (Warm-up)
Chain 1:                0.692694 seconds (Sampling)
Chain 1:                1.49708 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 6.3e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.91432 seconds (Warm-up)
Chain 2:                0.724335 seconds (Sampling)
Chain 2:                1.63865 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6.2e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.62 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.832641 seconds (Warm-up)
Chain 3:                0.677285 seconds (Sampling)
Chain 3:                1.50993 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5.8e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.58 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.764801 seconds (Warm-up)
Chain 4:                0.671788 seconds (Sampling)
Chain 4:                1.43659 seconds (Total)
Chain 4: 
Warning message:
“There were 12 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.”
Warning message:
“Examine the pairs() plot to diagnose sampling problems
”
Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”
Warning message in compare(m11.4, m11.4_quap, m11.4_flat_prior, m11.4_quap_flat_prior):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/practice-11_3_6.png

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

Answer. The results of inference are significantly different, essentially changing the nature of the answer. The ‘outlier’ is relevant to what we conclude for both low and high contact cultures.

## R code 11.36
library(rethinking)
data(Kline)
d <- Kline

## R code 11.37
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)

## R code 11.45
dat <- list(
  T = d$total_tools,
  P = d$P,
  cid = d$contact_id
)

# interaction model
m11.10 <- ulam(
  alist(
    T ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid] * P,
    a[cid] ~ dnorm(3, 0.5),
    b[cid] ~ dnorm(0, 0.2)
  ),
  data = dat, chains = 4, log_lik = TRUE
)
flush.console()

k <- PSIS(m11.10, pointwise = TRUE)$k

iplot(function() {
  par(mfrow=c(1,2))
  ## R code 11.47
  plot(dat$P, dat$T,
    xlab = "log population (std)", ylab = "total tools",
    col = rangi2, pch = ifelse(dat$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  # set up the horizontal axis values to compute predictions at
  ns <- 100
  P_seq <- seq(from = -1.4, to = 3, length.out = ns)

  # predictions for cid=1 (low contact)
  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  # predictions for cid=2 (high contact)
  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  ## R code 11.48
  plot(d$population, d$total_tools,
    xlab = "population", ylab = "total tools",
    col = rangi2, pch = ifelse(dat$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  ns <- 100
  P_seq <- seq(from = -5, to = 3, length.out = ns)
  # 1.53 is sd of log(population)
  # 9 is mean of log(population)
  pop_seq <- exp(P_seq * 1.53 + 9)

  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)

  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)
}, ar=1.8)

d_drop_hawaii <- subset(d, culture!='Hawaii')

# For some reason, we need to rescale and log to include the 'scale' attributes in the call to ulam.
d_drop_hawaii$P <- scale(log(d_drop_hawaii$population))
d_drop_hawaii$contact_id <- ifelse(d_drop_hawaii$contact == "high", 2, 1)

dat_drop_hawaii <- list(
  T = d_drop_hawaii$total_tools,
  P = d_drop_hawaii$P,
  cid = d_drop_hawaii$contact_id[1:9]
)

# interaction model
m_drop_hawaii <- ulam(
  alist(
    T ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid] * P,
    a[cid] ~ dnorm(3, 0.5),
    b[cid] ~ dnorm(0, 0.2)
  ),
  data = dat_drop_hawaii, chains = 4, log_lik = TRUE
)
flush.console()

k <- PSIS(m_drop_hawaii, pointwise = TRUE)$k

iplot(function() {
  par(mfrow=c(1,2))
  ## R code 11.47
  plot(dat_drop_hawaii$P, dat_drop_hawaii$T,
    xlab = "log population (std)", ylab = "total tools",
    col = rangi2, pch = ifelse(dat_drop_hawaii$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  # set up the horizontal axis values to compute predictions at
  ns <- 100
  P_seq <- seq(from = -1.4, to = 3, length.out = ns)

  # predictions for cid=1 (low contact)
  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  # predictions for cid=2 (high contact)
  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  ## R code 11.48
  plot(d_drop_hawaii$population, d_drop_hawaii$total_tools,
    xlab = "population", ylab = "total tools",
    col = rangi2, pch = ifelse(dat_drop_hawaii$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  ns <- 100
  P_seq <- seq(from = -5, to = 3, length.out = ns)
  # 1.53 is sd of log(population)
  # 9 is mean of log(population)
  pop_seq <- exp(P_seq * 1.53 + 9)

  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)

  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)
}, ar=1.8)
SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.5e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.15 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.016788 seconds (Warm-up)
Chain 1:                0.015525 seconds (Sampling)
Chain 1:                0.032313 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.018063 seconds (Warm-up)
Chain 2:                0.012588 seconds (Sampling)
Chain 2:                0.030651 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 8e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.0164 seconds (Warm-up)
Chain 3:                0.017066 seconds (Sampling)
Chain 3:                0.033466 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 7e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.017309 seconds (Warm-up)
Chain 4:                0.015 seconds (Sampling)
Chain 4:                0.032309 seconds (Total)
Chain 4: 
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
../_images/practice-11_5_2.png
SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.6e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.019058 seconds (Warm-up)
Chain 1:                0.019513 seconds (Sampling)
Chain 1:                0.038571 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 1.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.018274 seconds (Warm-up)
Chain 2:                0.017154 seconds (Sampling)
Chain 2:                0.035428 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 8e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.017649 seconds (Warm-up)
Chain 3:                0.016271 seconds (Sampling)
Chain 3:                0.03392 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 8e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.017945 seconds (Warm-up)
Chain 4:                0.018228 seconds (Sampling)
Chain 4:                0.036173 seconds (Total)
Chain 4: 
Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
../_images/practice-11_5_5.png

11H1. Use WAIC to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

Answer. The last model (m11.4) does the best by far because it includes the actor predictor. The actor variable is highly predictive of the output; the big story of this experiment is handedness.

iplot(function() {
  plot(compare(m11.1, m11.2, m11.3, m11.4))
}, ar=3)
Warning message in compare(m11.1, m11.2, m11.3, m11.4):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/practice-11_7_1.png

11H2. The data contained in library(MASS);data(eagles) are records of salmon pirating attempts by Bald Eagles in Washington State. See ?eagles for details. While one eagle feeds, sometimes another will swoop in and try to steal the salmon from it. Call the feeding eagle the “victim” and the thief the “pirate.” Use the available data to build a binomial GLM of successful pirating attempts.

(a) Consider the following model: $\( \begin{align} \\ y_i & \sim Binomial(n_i, p_i) \\ logit(p_i) & = \alpha + \beta_P P_i + \beta_V V_i + \beta_A A_i \\ α & \sim Normal(0, 1.5) \\ \beta_P, \beta_V, \beta_A & \sim Normal(0, 0.5) \\ \end{align} \)$

where \(y\) is the number of successful attempts, \(n\) is the total number of attempts, \(P\) is a dummy variable indicating whether or not the pirate had large body size, \(V\) is a dummy variable indicating whether or not the victim had large body size, and finally \(A\) is a dummy variable indicating whether or not the pirate was an adult. Fit the model above to the eagles data, using both quap and ulam. Is the quadratic approximation okay?

Answer. The quadratic approximation is OK because this model includes pre-selected reasonable priors.

library(rethinking)
library(MASS)
data(eagles)
eag <- eagles
eag$Pi <- ifelse(eagles$P == "L", 1, 0)
eag$Vi <- ifelse(eagles$V == "L", 1, 0)
eag$Ai <- ifelse(eagles$A == "A", 1, 0)

mq_pirate_success <- quap(
  alist(
    y ~ dbinom(n, p),
    logit(p) <- a + b_P*Pi + b_V*Vi + b_A*Ai,
    a ~ dnorm(0, 1.5),
    b_P ~ dnorm(0, 0.5),
    b_V ~ dnorm(0, 0.5),
    b_A ~ dnorm(0, 0.5)
  ),
  data = eag
)

dat_eag <- list(
  n = eag$n,
  y = eag$y,
  Pi = eag$Pi,
  Vi = eag$Vi,
  Ai = eag$Ai
)

mu_pirate_success <- ulam(
  mq_pirate_success@formula,
  data = dat_eag, chains=4, cores=4, log_lik=TRUE
)

iplot(function() {
  plot(compare(mq_pirate_success, mu_pirate_success))
}, ar=4.5)
Warning message in compare(mq_pirate_success, mu_pirate_success):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/practice-11_9_1.png

(b) Now interpret the estimates. If the quadratic approximation turned out okay, then it’s okay to use the quap estimates. Otherwise stick to ulam estimates. Then plot the posterior predictions. Compute and display both (1) the predicted probability of success and its 89% interval for each row (i) in the data, as well as (2) the predicted success count and its 89% interval. What different information does each type of posterior prediction provide?

Answer. The eagles data frame:

display(eagles)
A data.frame: 8 × 5
ynPAV
<int><int><fct><fct><fct>
11724LAL
22929LAS
31727LIL
42020LIS
5 112SAL
61516SAS
7 028SIL
8 1 4SIS

This graph is from postcheck, as mentioned in the text. The documentation on this function isn’t too helpful; it’s easier to read the source code. In the following figure, the open circles ($mean in the raw data) represent the predicted probability of success and the vertical lines the 89% interval ($PI). The crosshairs represent the 89% interval of the predicted success count ($outPI). The purple dots are observed outcomes from the eagles data frame above. Notice outcomes are scaled to between zero and one; in case #1 the purple dot is at \(17/24 \approx 0.7083\).

iplot(function() {
  result <- postcheck(mu_pirate_success)
  display_markdown("The raw data:")
  display(result)
})

The raw data:

$mean
  1. 0.707852485246756
  2. 0.929201436946226
  3. 0.559932119633125
  4. 0.871499770934422
  5. 0.327330970165867
  6. 0.721970579910955
  7. 0.202542445705932
  8. 0.577003969544917
$PI
A matrix: 2 × 8 of type dbl
5%0.60553180.88861080.44637830.79858450.22241800.60727760.12932810.4263905
94%0.80303630.96012660.66682580.92849280.44701950.82585270.28490340.7168125
$outPI
A matrix: 2 × 8 of type dbl
5%0.54166670.82758620.37037040.750.083333330.50000.071428570.25
94%0.87500001.00000000.74074071.000.583333330.93750.357142861.00
../_images/practice-11_13_2.png

In part (1) we’re visualizing model parameters (including uncertainty). In part (2) we’re visualizing implied predictions on the outcome scale (including uncertainty) which indirectly includes the uncertainty in model parameters from the first part.

(c) Now try to improve the model. Consider an interaction between the pirate’s size and age (immature or adult). Compare this model to the previous one, using WAIC. Interpret.

Answer. The non-interaction model appears to do slightly better, but generally speaking the performance is almost the same. The interaction model is more complex (has more parameters) and that might slightly hurt its score.

eag$PirateTreatment <- 1 + eag$Pi + eag$Ai

dat_eag_inter <- list(
  n = eag$n,
  y = eag$y,
  PirateTreatment = eag$PirateTreatment,
  Vi = eag$Vi
)

mu_pirate_success_inter <- ulam(
  alist(
    y ~ dbinom(n, p),
    logit(p) <- a + b_t[PirateTreatment] + b_V*Vi,
    a ~ dnorm(0, 1.5),
    b_t[PirateTreatment] ~ dnorm(0, 0.5),
    b_V ~ dnorm(0, 0.5)
  ),
  data = dat_eag_inter, chains=4, cores=4, log_lik=TRUE
)

iplot(function() {
  plot(compare(mu_pirate_success_inter, mu_pirate_success))
}, ar=4.5)
../_images/practice-11_15_0.png

11H4. The data contained in data(salamanders) are counts of salamanders (Plethodon elongatus) from 47 different 49-m² plots in northern California. The column SALAMAN is the count in each plot, and the columns PCTCOVER and FORESTAGE are percent of ground cover and age of trees in the plot, respectively. You will model SALAMAN as a Poisson variable.

(a) Model the relationship between density and percent cover, using a log-link (same as the example in the book and lecture). Use weakly informative priors of your choosing. Check the quadratic approximation again, by comparing quap to ulam. Then plot the expected counts and their 89% interval against percent cover. In which ways does the model do a good job? In which ways does it do a bad job?

Answer. The salamanders data frame:

data(salamanders)
display(salamanders)
sal <- salamanders
sal$StdPctCover <- scale(sal$PCTCOVER)

sal_dat <- list(
  S = sal$SALAMAN,
  StdPctCover = sal$StdPctCover
)

mq_sal <- quap(
  alist(
    S ~ dpois(lambda),
    log(lambda) <- a + b*StdPctCover,
    a ~ dnorm(3, 0.5),
    b ~ dnorm(0, 0.2)
  ), data = sal_dat
)

mu_sal <- ulam(
  mq_sal@formula,
  data = sal_dat, chains=4, cores=4, log_lik=TRUE
)
A data.frame: 47 × 4
SITESALAMANPCTCOVERFORESTAGE
<int><int><int><int>
11385316
21186 88
31190548
4 988 64
5 889 43
6 783368
7 683200
8 691 71
9 588 42
10 590551
11 487675
12 383217
13 387212
14 389398
15 392357
16 393478
17 2 2 5
18 287 30
19 293551
20 1 7 3
21 116 15
22 119 31
23 129 10
24 134 49
25 146 30
26 180215
27 186586
28 188105
29 192210
30 0 0 0
31 0 1 4
32 0 3 3
33 0 5 2
34 0 8 10
35 0 9 8
36 011 6
37 014 49
38 017 29
39 024 57
40 044 59
41 052 78
42 077 50
43 078320
44 080411
45 086133
46 089 60
47 091187

There doesn’t seem to be a difference between quap and ulam in inference:

iplot(function() {
  plot(compare(mu_sal, mq_sal))
}, ar=4.5)

pctc <- list(StdPctCover=scale(seq(0, 100)))
sims <- sim(mq_sal, data=pctc)

expectedSalCount.mu <- apply(sims, 2, mean)
expectedSalCount.PI <- apply(sims, 2, PI)
display_markdown("The raw data:")
display(expectedSalCount.mu)
display(expectedSalCount.PI)

iplot(function() {
  plot(pctc$StdPctCover, expectedSalCount.mu, ylim=c(0,12))
  shade(expectedSalCount.PI, pctc$StdPctCover)
  title("Expected salamander count")
})
Warning message in compare(mu_sal, mq_sal):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/practice-11_19_1.png

The raw data:

  1. 0.747
  2. 0.807
  3. 0.746
  4. 0.773
  5. 0.83
  6. 0.843
  7. 0.877
  8. 0.879
  9. 0.915
  10. 0.928
  11. 0.974
  12. 1.046
  13. 0.982
  14. 1.021
  15. 1.05
  16. 1.047
  17. 1.072
  18. 1.118
  19. 1.135
  20. 1.116
  21. 1.203
  22. 1.242
  23. 1.269
  24. 1.218
  25. 1.254
  26. 1.338
  27. 1.402
  28. 1.351
  29. 1.375
  30. 1.435
  31. 1.43
  32. 1.58
  33. 1.481
  34. 1.524
  35. 1.565
  36. 1.604
  37. 1.629
  38. 1.745
  39. 1.745
  40. 1.73
  41. 1.742
  42. 1.871
  43. 1.837
  44. 1.942
  45. 1.948
  46. 2.016
  47. 2
  48. 2.08
  49. 2.235
  50. 2.199
  51. 2.21
  52. 2.328
  53. 2.38
  54. 2.419
  55. 2.472
  56. 2.496
  57. 2.521
  58. 2.635
  59. 2.66
  60. 2.755
  61. 2.765
  62. 2.806
  63. 2.942
  64. 3.004
  65. 3.145
  66. 3
  67. 3.185
  68. 3.245
  69. 3.254
  70. 3.352
  71. 3.489
  72. 3.614
  73. 3.635
  74. 3.723
  75. 3.772
  76. 3.873
  77. 4.029
  78. 4.006
  79. 4.176
  80. 4.301
  81. 4.417
  82. 4.648
  83. 4.531
  84. 4.676
  85. 4.837
  86. 4.977
  87. 4.998
  88. 5.197
  89. 5.216
  90. 5.464
  91. 5.393
  92. 5.529
  93. 5.681
  94. 5.777
  95. 6.018
  96. 6.171
  97. 6.272
  98. 6.32
  99. 6.566
  100. 6.749
  101. 6.735
A matrix: 2 × 101 of type dbl
5%00000.00000000 2 2 2 2 2 2 2 3 2 3
94%22222.0553333310101010101111111211
../_images/practice-11_19_5.png

The model does well (is more certain) when it needs to guess lower numbers; when there are fewer salamanders when the ground cover is low. In general, though, it is uncertain about most of its predictions.

(b) Can you improve the model by using the other predictor, FORESTAGE? Try any models you think useful. Can you explain why FORESTAGE helps or does not help with prediction?

Answer. It doesn’t seem forest age helps much with prediction; it may be that forest age only predicts ground cover and so adds no extra information.

sal$StdForestAge <- scale(sal$FORESTAGE)
sal_dat <- list(
  S = sal$SALAMAN,
  StdPctCover = sal$StdPctCover,
  StdForestAge = sal$StdForestAge
)

mq_sal_forest_age <- quap(
  alist(
    S ~ dpois(lambda),
    log(lambda) <- a + b_c*StdPctCover + b_f*StdForestAge,
    a ~ dnorm(3, 0.5),
    b_c ~ dnorm(0, 0.2),
    b_f ~ dnorm(0, 0.2)
  ), data = sal_dat
)

iplot(function() {
  plot(compare(mq_sal, mq_sal_forest_age))
}, ar=4.5)
../_images/practice-11_21_0.png
library(dagitty)

11H4. The data in data(NWOgrants) are outcomes for scientific funding applications for the Netherlands Organization for Scientific Research (NWO) from 2010-2012 (see van der Lee and Ellemers (2015) for data and context). These data have a very similar structure to the UCBAdmit data discussed in the chapter. I want you to consider a similar question: What are the total and indirect causal effects of gender on grant awards? Consider a mediation path (a pipe) through discipline. Draw the corresponding DAG and then use one or more binomial GLMs to answer the question. What is your causal interpretation? If NWO’s goal is to equalize rates of funding between men and women, what type of intervention would be most effective?

Answer. The NWOGrants data.frame:

data(NWOGrants)
display(NWOGrants)
nwo <- NWOGrants
A data.frame: 18 × 4
disciplinegenderapplicationsawards
<fct><fct><int><int>
Chemical sciences m 8322
Chemical sciences f 3910
Physical sciences m13526
Physical sciences f 39 9
Physics m 6718
Physics f 9 2
Humanities m23033
Humanities f16632
Technical sciences m18930
Technical sciences f 6213
Interdisciplinary m10512
Interdisciplinary f 7817
Earth/life sciencesm15638
Earth/life sciencesf12618
Social sciences m42565
Social sciences f40947
Medical sciences m24546
Medical sciences f26029

Reproduce the DAG from the text, where D now means discipline rather than department:

nwo_dag <- dagitty('
dag {
    bb="0,0,1,1"
    A [pos="0.500,0.400"]
    D [pos="0.300,0.300"]
    G [pos="0.100,0.400"]
    D -> A
    G -> A
    G -> D
}')
iplot(function() plot(nwo_dag), scale=10)
../_images/practice-11_26_0.png

For the total causal effect of gender on grant awards, do not condition on discipline:

nwo_dat <- list(
  awards = nwo$awards,
  applications = nwo$applications,
  gid = ifelse(as.character(nwo$gender) == "m", 1, 2)
)
m_nwo_tot <- quap(
  alist(
    awards ~ dbinom(applications, p),
    logit(p) <- a[gid],
    a[gid] ~ dnorm(0, 1.5)
  ),
  data = nwo_dat
)
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(m_nwo_tot, depth = 2)
  display(x)
  plot(x, main="m_nwo_tot")
}, ar=4.5)

Raw data:

A precis: 2 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a[1]-1.5314180.06462435-1.634700-1.428136
a[2]-1.7374280.08121368-1.867223-1.607633
../_images/practice-11_28_2.png

These two coefficients are on the log-odds (parameter) scale, so the odds of a male getting an award from an application is about \(exp(-1.53) = 0.217\) and the odds of a female about \(exp(-1.74) = 0.176\). If it helps you can convert these absolute odds to probabilities with \(p = o/(1+o)\).

Lets also look at the relative performance of males to females (see Contrast (statistics)). As in the text we’ll do so on the logit and probability scale:

post <- extract.samples(m_nwo_tot)
diff_a <- post$a[,1] - post$a[,2]
diff_p <- inv_logit(post$a[,1]) - inv_logit(post$a[,2])
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(list(diff_a=diff_a, diff_p=diff_p))
  display(x)
  plot(x, main="m_nwo_tot: contrast")
}, ar=4.5)

Raw data:

A precis: 2 × 5
meansd5.5%94.5%histogram
<dbl><dbl><dbl><dbl><chr>
diff_a0.206345890.103935910.0392241460.37274865▁▁▂▇▇▃▁▁
diff_p0.028100650.014043150.0053384120.05057845▁▁▁▂▅▇▇▃▁▁▁
../_images/practice-11_30_2.png

For the direct causal effect of gender on grant awards, condition on discipline:

nwo_dat$disc_id <- rep(1:9, each = 2)
m_nwo_dir <- quap(
  alist(
    awards ~ dbinom(applications, p),
    logit(p) <- a[gid] + delta[disc_id],
    a[gid] ~ dnorm(0, 1.5),
    delta[disc_id] ~ dnorm(0, 1.5)
  ),
  data = nwo_dat
)
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(m_nwo_dir, depth = 2)
  display(x)
  plot(x, main="m_nwo_dir")
}, ar=3)

post <- extract.samples(m_nwo_dir)
diff_a <- post$a[,1] - post$a[,2]
diff_p <- inv_logit(post$a[,1]) - inv_logit(post$a[,2])
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(list(diff_a=diff_a, diff_p=diff_p))
  display(x)
  plot(x, main="m_nwo_dir: contrast")
}, ar=4.5)

Raw data:

A precis: 11 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a[1]-1.15762150.4565098-1.8872123-0.4280307
a[2]-1.29528070.4597880-2.0301107-0.5604507
delta[1] 0.16348100.4908968-0.6210669 0.9480288
delta[2]-0.18864040.4856755-0.9648435 0.5875628
delta[3] 0.13960960.5114927-0.6778545 0.9570737
delta[4]-0.41059420.4709093-1.1631982 0.3420098
delta[5]-0.38111930.4792305-1.1470223 0.3847836
delta[6]-0.44681810.4895117-1.2291523 0.3355161
delta[7]-0.17572420.4742531-0.9336722 0.5822238
delta[8]-0.63720700.4640406-1.3788335 0.1044195
delta[9]-0.51588020.4687249-1.2649931 0.2332328
../_images/practice-11_32_2.png

Raw data:

A precis: 2 × 5
meansd5.5%94.5%histogram
<dbl><dbl><dbl><dbl><chr>
diff_a0.137597940.10750150-0.035753930.31013063▁▁▂▅▇▃▁▁▁
diff_p0.023791040.01962432-0.006132650.05668736▁▁▂▇▇▃▁▁▁
../_images/practice-11_32_5.png

Although the advantage for males has decreased, there is still some evidence for it.

One possible intervention is to increase funding to topics considered more valuable by females, assuming they prefer to do research in the areas they are applying. Another is to encourage females to get more involved (probably at a younger age) in the topics with more funding.

11H5. Suppose that the NWO Grants sample has an unobserved confound that influences both choice of discipline and the probability of an award. One example of such a confound could be the career stage of each applicant. Suppose that in some disciplines, junior scholars apply for most of the grants. In other disciplines, scholars from all career stages compete. As a result, career stage influences discipline as well as the probability of being awarded a grant. Add these influences to your DAG from the previous problem. What happens now when you condition on discipline? Does it provide an un-confounded estimate of the direct path from gender to an award? Why or why not? Justify your answer with the backdoor criterion. If you have trouble thinking this though, try simulating fake data, assuming your DAG is true. Then analyze it using the model from the previous problem. What do you conclude? Is it possible for gender to have a real direct causal influence but for a regression conditioning on both gender and discipline to suggest zero influence?

Answer. Reproducing the DAG from the text (where U is now career stage):

nwo_career_stage_dag <- dagitty('
dag {
    bb="0,0,1,1"
    A [pos="0.500,0.400"]
    D [pos="0.300,0.300"]
    G [pos="0.100,0.400"]
    U [latent,pos="0.500,0.300"]
    D -> A
    G -> A
    G -> D
    U -> D
    U -> A
}')
iplot(function() plot(nwo_career_stage_dag), scale=10)
../_images/practice-11_34_0.png

If we condition on D then we open the backdoor path ‘G -> D <- U -> A’, which could confound any causal inferences we make about the effect of gender on award. That is, we will induce a statistical association between G and A. This association could cancel out any direct causal influence of G on A.

# set.seed(145)
# n <- 100
# U <- rnorm(n)
# gid <- rbinom(n, 1, 0.5) + 1
# gender_to_disc_log_odds <- c(-0.2, 0.2)
# p_disc <- inv_logit(gender_to_disc_log_odds[gid] + U)
# disc_id <- rbinom(n, 1, p_disc) + 1
# gender_to_award_log_odds <- c(-1.5, -2)
# disc_to_award_log_odds <- c(0, 0)
# p_award <- inv_logit(gender_to_award_log_odds[gid] + U + disc_to_award_log_odds[disc_id])
# awards <- rbinom(n, 1, p_award)
#
# display_markdown("
# Imagine there are only two disciplines. Researchers with more scholarly experience tend to work in
# the second discipline, and females tend to work in the second discipline. If we know you're in the
# second discipline, and you got an award, then it's likely you got the award because you're
# experienced.
#
#
# We're assuming that discipline has no influence on award in this case; this assumption is not
# necessary for conditioning on discipline to connect (d-connect) gender and scholarly experience.
#
# G D U
# m 1 2
# f 1
# m 2 2
# f 2

11H6. The data in data(Primates301) are 301 primate species and associated measures. In this problem, you will consider how brain size is associated with social learning. There are three parts.

(a) Model the number of observations of social_learning for each species as a function of the log brain size. Use a Poisson distribution for the social_learning outcome variable. Interpret the resulting posterior.

Answer. The head of the Primates301 data.frame:

data(Primates301)
display(head(Primates301, n = 20L))
prim_inc <- Primates301
prim <- prim_inc[complete.cases(prim_inc$social_learning, prim_inc$brain, prim_inc$research_effort), ]
A data.frame: 20 × 16
namegenusspeciessubspeciesspp_idgenus_idsocial_learningresearch_effortbrainbodygroup_sizegestationweaninglongevitysex_maturitymaternal_investment
<fct><fct><fct><fct><int><int><int><int><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1Allenopithecus_nigroviridis AllenopithecusnigroviridisNA 11 0 658.024655.0040.00 NA106.15276.0 NA NA
2Allocebus_trichotis Allocebus trichotis NA 22 0 6 NA 78.09 1.00 NA NA NA NA NA
3Alouatta_belzebul Alouatta belzebul NA 33 01552.846395.00 7.40 NA NA NA NA NA
4Alouatta_caraya Alouatta caraya NA 43 04552.635383.00 8.90185.92323.16243.61276.72509.08
5Alouatta_guariba Alouatta guariba NA 53 03751.705175.00 7.40 NA NA NA NA NA
6Alouatta_palliata Alouatta palliata NA 63 37949.886250.0013.10185.42495.60300.01578.42681.02
7Alouatta_pigra Alouatta pigra NA 73 02551.138915.00 5.50185.92 NA240.0 NA NA
8Alouatta_sara Alouatta sara NA 83 0 459.086611.04 NA NA NA NA NA NA
9Alouatta_seniculus Alouatta seniculus NA 93 08255.225950.00 7.90189.90370.04300.01690.22559.94
10Aotus_azarai Aotus azarai NA 104 02220.671205.00 4.10 NA229.69 NA NA NA
11Aotus_azarai_boliviensis Aotus azarai boliviensis 114NANA NA NA NA NA NA NA NA NA
12Aotus_brumbacki Aotus brumbacki NA 124 0NA NA NA NA NA NA NA NA NA
13Aotus_infulatus Aotus infulatus NA 134 0 6 NA NA NA NA NA NA NA NA
14Aotus_lemurinus Aotus lemurinus NA 144 01616.30 734.00 NA132.23 74.57216.0 755.15206.80
15Aotus_lemurinus_griseimembraAotus lemurinus griseimembra154NANA NA NA NA NA NA NA NA NA
16Aotus_nancymaae Aotus nancymaae NA 164 0 5 NA 791.03 4.00 NA NA NA NA NA
17Aotus_nigriceps Aotus nigriceps NA 174 0 1 NA 958.00 3.30 NA NA NA NA NA
18Aotus_trivirgatus Aotus trivirgatus NA 184 05816.85 989.00 3.15133.47 76.21303.6 736.60209.68
19Aotus_vociferans Aotus vociferans NA 194 012 NA 703.00 3.30 NA NA NA NA NA
20Archaeolemur_majori Archaeolemur majori NA 205NANA NA NA NA NA NA NA NA NA

After filtering for complete cases:

display(head(prim, n = 20L))

brain_dat <- list(
  SocialLearning = prim$social_learning,
  LogBrain = log(prim$brain)
)

mq_brain_size <- quap(
  alist(
    SocialLearning ~ dpois(lambda),
    log(lambda) <- a + b * LogBrain,
    a ~ dnorm(3, 0.5),
    b ~ dnorm(0, 0.2)
  ),
  data = brain_dat
)

mu_brain_size <- ulam(
  mq_brain_size@formula,
  data = brain_dat, chains = 4, cores = 4, log_lik = TRUE
)
A data.frame: 20 × 16
namegenusspeciessubspeciesspp_idgenus_idsocial_learningresearch_effortbrainbodygroup_sizegestationweaninglongevitysex_maturitymaternal_investment
<fct><fct><fct><fct><int><int><int><int><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1Allenopithecus_nigroviridisAllenopithecusnigroviridisNA 1 10 6 58.024655.0040.00 NA106.15276.0 NA NA
3Alouatta_belzebul Alouatta belzebul NA 3 3015 52.846395.00 7.40 NA NA NA NA NA
4Alouatta_caraya Alouatta caraya NA 4 3045 52.635383.00 8.90185.92323.16243.61276.72 509.08
5Alouatta_guariba Alouatta guariba NA 5 3037 51.705175.00 7.40 NA NA NA NA NA
6Alouatta_palliata Alouatta palliata NA 6 3379 49.886250.0013.10185.42495.60300.01578.42 681.02
7Alouatta_pigra Alouatta pigra NA 7 3025 51.138915.00 5.50185.92 NA240.0 NA NA
8Alouatta_sara Alouatta sara NA 8 30 4 59.086611.04 NA NA NA NA NA NA
9Alouatta_seniculus Alouatta seniculus NA 9 3082 55.225950.00 7.90189.90370.04300.01690.22 559.94
10Aotus_azarai Aotus azarai NA10 4022 20.671205.00 4.10 NA229.69 NA NA NA
14Aotus_lemurinus Aotus lemurinus NA14 4016 16.30 734.00 NA132.23 74.57216.0 755.15 206.80
18Aotus_trivirgatus Aotus trivirgatus NA18 4058 16.85 989.00 3.15133.47 76.21303.6 736.60 209.68
22Arctocebus_calabarensis Arctocebus calabarensisNA22 60 1 6.92 309.00 1.00133.74109.26156.0 298.91 243.00
23Ateles_belzebuth Ateles belzebuth NA23 7012117.028167.0014.50138.20 NA336.0 NA NA
24Ateles_fusciceps Ateles fusciceps NA24 70 4114.249025.00 NA224.70482.70288.01799.68 707.40
25Ateles_geoffroyi Ateles geoffroyi NA25 7258105.097535.0042.00226.37816.35327.62104.571042.72
26Ateles_paniscus Ateles paniscus NA26 7030103.858280.0020.00228.18805.41453.62104.571033.59
28Avahi_laniger Avahi laniger NA28 8010 9.861207.00 2.00136.15149.15 NA NA 285.30
29Avahi_occidentalis Avahi occidentalisNA29 80 6 7.95 801.00 3.00 NA NA NA NA NA
32Bunopithecus_hoolock Bunopithecus hoolock NA3210024110.686728.00 3.20232.50635.13 NA2689.08 867.63
33Cacajao_calvus Cacajao calvus NA3311011 76.003165.0023.70180.00339.29324.01262.74 519.29
Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”
Warning message:
“Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess”

Use func=PSIS to generate warnings for high Pareto k values.

If you look at the data, you’ll see some classes (e.g. Gorillas and Chimpanzees) have orders of magnitude more study and more social learning examples than other species.

iplot(function()
  plot(compare(mq_brain_size, mu_brain_size, func=PSIS))
, ar=4.5)

iplot(function() {
  display_markdown("<br/>Raw data:")
  x <- precis(mq_brain_size)
  display(x)
  plot(x, main="mq_brain_size")
}, ar=4.5)
Warning message in compare(mq_brain_size, mu_brain_size, func = PSIS):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
../_images/practice-11_42_3.png


Raw data:

A precis: 2 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-5.6876380.24026505-6.071628-5.303648
b 1.5635200.04787612 1.487005 1.640035
../_images/practice-11_42_6.png

Although the model is confident in the b coefficient, the wide error bars on the WAIC scores indicate the out-of-sample performance may be much worse.

Still, in the context of this model it appears there is an association between brain size and social learning. For every unit increase in LogBrain i.e. for every order of magnitude increase in brain size we should see an exponential increase in examples of social learning. If we let \(b = 1.5\) then for every order of magnitude we’ll see about \(exp(1.5) \approx 4.5\) more examples of social learning.

(b) Some species are studied much more than others. So the number of reported instances of social_learning could be a product of research effort. Use the research_effort variable, specifically its logarithm, as an additional predictor variable. Interpret the coefficient for log research_effort. How does this model differ from the previous one?

brain_dat$LogResearchEffort <- log(prim$research_effort)

mq_brain_research <- quap(
  alist(
    SocialLearning ~ dpois(lambda),
    log(lambda) <- a + b * LogBrain + d * LogResearchEffort,
    a ~ dnorm(3, 0.5),
    b ~ dnorm(0, 0.2),
    d ~ dnorm(0, 0.2)
  ),
  data = brain_dat
)

mu_brain_research <- ulam(
  mq_brain_research@formula,
  data = brain_dat, chains = 4, cores = 4, log_lik = TRUE
)

Again, use func=PSIS to generate warnings for high Pareto k values:

iplot(function()
  plot(compare(mq_brain_research, mu_brain_research, func=PSIS))
, ar=4.5)

iplot(function() {
  display_markdown("<br/>Raw data:")
  x <- precis(mq_brain_research)
  display(x)
  plot(x, main="mq_brain_research")
}, ar=4.5)
Warning message in compare(mq_brain_research, mu_brain_research, func = PSIS):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
../_images/practice-11_46_3.png


Raw data:

A precis: 3 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-5.26935330.20355854-5.5946792-4.9440274
b 0.23263230.05079226 0.1514565 0.3138081
d 1.30945160.05033078 1.2290133 1.3898899
../_images/practice-11_46_6.png

Answer. When we start to condition on research effort, the influence of brain size drops dramatically (to about 0.2 from 1.5).

(c) Draw a DAG to represent how you think the variables social_learning, brain, and research_effort interact. Justify the DAG with the measured associations in the two models above (and any other models you used).

Answer. It seems likely we were looking at the total causal effect of LogBrain in the first model, and the direct causal effect in the second.

brain_social_learning_dag <- dagitty('
dag {
    bb="0,0,1,1"
    LogBrain [pos="0.100,0.400"]
    LogResearchEffort [pos="0.300,0.300"]
    SocialLearning [pos="0.500,0.400"]
    LogBrain -> SocialLearning
    LogBrain -> LogResearchEffort
    LogResearchEffort -> SocialLearning
}')
iplot(function() plot(brain_social_learning_dag), scale=10)
../_images/practice-11_48_0.png