Practice: Chp. 11

Practice: Chp. 11#

source("iplot.R")
suppressPackageStartupMessages(library(rethinking))

11E1. If an event has probability 0.35, what are the log-odds of this event?

Answer. The odds are \(0.35 / (1 - 0.35) = 0.35/0.65\) or about a half. The odds (Odds) in favor of the event are about one to two. On the log2 scale we should expect the log-odds to be about negative one:

\[ log2\left(\frac{p}{1-p}\right) = log2(0.35 / (1 - 0.35)) = -0.893 \]

For a logarithmic base of e contract the log scale by log(2) or about 0.7:

\[ log\left(\frac{p}{1-p}\right) = log(0.35 / (1 - 0.35)) = -0.619 \]

11E2. If an event has log-odds 3.2, what is the probability of the event?

Answer. Expect a log2-odds of about 3.2/0.7 or around 4.6. That is, the odds in favor of the event should be about 2**4.6 to one, perhaps around 25:1.

\[ \frac{1}{1+e^{-x}} = \frac{1}{1+e^{-3.2}} = 0.961 \]

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

Answer. The ‘proportional odds’ is about five:

\[ q = e^{\beta} = e^{1.7} = 5.47 \]

The ‘proportional odds’ is the change in odds for a unit increase in the predictor variable. So when the predictor variable increases by one, we can expect the odds \(p/(1-p)\) to increase by about a factor of five.

11E4. Why do Poisson regressions sometimes require the use of an offset? Provide an example.

Answer. We use an offset in a Poisson regression (see section 11.2.3) when we have counts that were collected at varying densities. For example, counts collected over varying:

  • Lengths of observation

  • Area of sampling

  • Intensity of sampling

The offset lets us include all the data in the same model while still accounting for the differences in measurement.

A specific example would be collecting the number of visitors to a restaurant. Some counts may be collected on a daily basis, and others on a weekly basis.

11M1. As explained in the chapter, binomial data can be organized in aggregated and disaggregated forms, without any impact on inference. But the likelihood of the data does change when the data are converted between the two formats. Can you explain why?

Answer. The likelihood changes when you move from the disaggregated to aggregated format because now there is no longer exactly one trial associated with every event.

11M2. If a coefficient in a Poisson regression has value 1.7, what does this imply about the change in outcome?

Answer. Assuming the Poisson regression uses a log link, for a unit increase in the predictor variable we should see an increase in the rate (or mean) \(\lambda\) of about five:

\[ q = \frac{exp(\alpha + \beta (x + 1))}{exp(\alpha + \beta x)} = exp(\beta) = exp(1.7) \approx 5.47 \]

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

Answer. The binomial distribution takes a probability parameter (between zero and one) and the logit link constrains a parameter to this range.

11M4. Explain why the log link is appropriate for a Poisson generalized linear model.

Answer. The Poisson distribution takes a positive real-valued parameter and the log link constrains a parameter to this range.

11M5. What would it imply to use a logit link for the mean of a Poisson generalized linear model? Can you think of a real research problem for which this would make sense?

Answer. It would imply the mean \(\lambda\) is between zero and one, rather than simply positive and real-valued. If the researcher knows the maximum number of times (e.g. one) an event can occur, it seems like it would be more appropriate to use the binomial distribution in many cases and infer p instead.

11M6. State the constraints for which the binomial and Poisson distributions have maximum entropy. Are the constraints different at all for binomial and Poisson? Why or why not?

Answer. The maximum entropy constraint for both these distributions is that the expected value is a constant. See Maximum entropy probability distribution.

These constraints are the same because the Poisson distribution is an approximation of the binomial when p is small and N is large. See Poisson approximation.

11M7. Use quap to construct a quadratic approximate posterior distribution for the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330). Compare the quadratic approximation to the posterior distribution produced instead from MCMC. Can you explain both the differences and the similarities between the approximate and the MCMC distributions? Relax the prior on the actor intercepts to Normal(0, 10). Re-estimate the posterior using both ulam and quap. Do the differences increase or decrease? Why?

Answer. The quadratic approximation assumes a symmetric Gaussian posterior, so it may produce posterior inferences that are more symmetric (unlike MCMC). In this case, the algorithms produce marginally different results.

When we move to the flat prior, quap struggles to produce a decent result because it is starting with a prior that is completely non-symmetric. It may pick either the left or the right peak of the prior to optimize from.

source('practice11/load-chimp-models.R')
flush.console()

m11.4_flat_prior <- ulam(
  alist(
    pulled_left ~ dbinom(1, p),
    logit(p) <- a[actor] + b[treatment],
    a[actor] ~ dnorm(0, 10),
    b[treatment] ~ dnorm(0, 0.5)
  ),
  data = dat_list, chains = 4, log_lik = TRUE
)

m11.4_quap_flat_prior <- quap(
  m11.4_flat_prior@formula,
  data = dat_list
)
flush.console()

iplot(function() {
  plot(compare(m11.4, m11.4_quap, m11.4_flat_prior, m11.4_quap_flat_prior))
}, ar=3)
SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 7e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.7 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.531393 seconds (Warm-up)
Chain 1:                0.462519 seconds (Sampling)
Chain 1:                0.993912 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 6.1e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.61 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.503059 seconds (Warm-up)
Chain 2:                0.459116 seconds (Sampling)
Chain 2:                0.962175 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 5.6e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.56 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.538378 seconds (Warm-up)
Chain 3:                0.503094 seconds (Sampling)
Chain 3:                1.04147 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL '82480dff1a626a42c2ca9de938d65b9d' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 5.4e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.54 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.521342 seconds (Warm-up)
Chain 4:                0.519543 seconds (Sampling)
Chain 4:                1.04088 seconds (Total)
Chain 4: 
SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 6.3e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.63 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.707849 seconds (Warm-up)
Chain 1:                0.616323 seconds (Sampling)
Chain 1:                1.32417 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 6.2e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.62 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.693034 seconds (Warm-up)
Chain 2:                0.619947 seconds (Sampling)
Chain 2:                1.31298 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 6.1e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.61 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.735461 seconds (Warm-up)
Chain 3:                0.790272 seconds (Sampling)
Chain 3:                1.52573 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL '4758af77bea457d7564d7c3909a35b52' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 6.5e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.65 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.682325 seconds (Warm-up)
Chain 4:                0.612379 seconds (Sampling)
Chain 4:                1.2947 seconds (Total)
Chain 4: 
Warning message:
“There were 5 divergent transitions after warmup. See
https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
to find out why this is a problem and how to eliminate them.”
Warning message:
“Examine the pairs() plot to diagnose sampling problems
”
Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”
Warning message in compare(m11.4, m11.4_quap, m11.4_flat_prior, m11.4_quap_flat_prior):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/3012735790ffec4326f357e4c964e281c25fbd1ccca359e3cf3211da333c01b2.png

11M8. Revisit the data(Kline) islands example. This time drop Hawaii from the sample and refit the models. What changes do you observe?

Answer. The results of inference are significantly different, essentially changing the nature of the answer. The ‘outlier’ is relevant to what we conclude for both low and high contact cultures.

## R code 11.36
library(rethinking)
data(Kline)
d <- Kline

## R code 11.37
d$P <- scale(log(d$population))
d$contact_id <- ifelse(d$contact == "high", 2, 1)

## R code 11.45
dat <- list(
  T = d$total_tools,
  P = d$P,
  cid = d$contact_id
)

# interaction model
m11.10 <- ulam(
  alist(
    T ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid] * P,
    a[cid] ~ dnorm(3, 0.5),
    b[cid] ~ dnorm(0, 0.2)
  ),
  data = dat, chains = 4, log_lik = TRUE
)
flush.console()

k <- PSIS(m11.10, pointwise = TRUE)$k

iplot(function() {
  par(mfrow=c(1,2))
  ## R code 11.47
  plot(dat$P, dat$T,
    xlab = "log population (std)", ylab = "total tools",
    col = rangi2, pch = ifelse(dat$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  # set up the horizontal axis values to compute predictions at
  ns <- 100
  P_seq <- seq(from = -1.4, to = 3, length.out = ns)

  # predictions for cid=1 (low contact)
  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  # predictions for cid=2 (high contact)
  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  ## R code 11.48
  plot(d$population, d$total_tools,
    xlab = "population", ylab = "total tools",
    col = rangi2, pch = ifelse(dat$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  ns <- 100
  P_seq <- seq(from = -5, to = 3, length.out = ns)
  # 1.53 is sd of log(population)
  # 9 is mean of log(population)
  pop_seq <- exp(P_seq * 1.53 + 9)

  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)

  lambda <- link(m11.10, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)
}, ar=1.8)

d_drop_hawaii <- subset(d, culture!='Hawaii')

# For some reason, we need to rescale and log to include the 'scale' attributes in the call to ulam.
d_drop_hawaii$P <- scale(log(d_drop_hawaii$population))
d_drop_hawaii$contact_id <- ifelse(d_drop_hawaii$contact == "high", 2, 1)

dat_drop_hawaii <- list(
  T = d_drop_hawaii$total_tools,
  P = d_drop_hawaii$P,
  cid = d_drop_hawaii$contact_id[1:9]
)

# interaction model
m_drop_hawaii <- ulam(
  alist(
    T ~ dpois(lambda),
    log(lambda) <- a[cid] + b[cid] * P,
    a[cid] ~ dnorm(3, 0.5),
    b[cid] ~ dnorm(0, 0.2)
  ),
  data = dat_drop_hawaii, chains = 4, log_lik = TRUE
)
flush.console()

k <- PSIS(m_drop_hawaii, pointwise = TRUE)$k

iplot(function() {
  par(mfrow=c(1,2))
  ## R code 11.47
  plot(dat_drop_hawaii$P, dat_drop_hawaii$T,
    xlab = "log population (std)", ylab = "total tools",
    col = rangi2, pch = ifelse(dat_drop_hawaii$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  # set up the horizontal axis values to compute predictions at
  ns <- 100
  P_seq <- seq(from = -1.4, to = 3, length.out = ns)

  # predictions for cid=1 (low contact)
  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  # predictions for cid=2 (high contact)
  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(P_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, P_seq, xpd = TRUE)

  ## R code 11.48
  plot(d_drop_hawaii$population, d_drop_hawaii$total_tools,
    xlab = "population", ylab = "total tools",
    col = rangi2, pch = ifelse(dat_drop_hawaii$cid == 1, 1, 16), lwd = 2,
    ylim = c(0, 75), cex = 1 + normalize(k)
  )

  ns <- 100
  P_seq <- seq(from = -5, to = 3, length.out = ns)
  # 1.53 is sd of log(population)
  # 9 is mean of log(population)
  pop_seq <- exp(P_seq * 1.53 + 9)

  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 1))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 2, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)

  lambda <- link(m_drop_hawaii, data = data.frame(P = P_seq, cid = 2))
  lmu <- apply(lambda, 2, mean)
  lci <- apply(lambda, 2, PI)
  lines(pop_seq, lmu, lty = 1, lwd = 1.5)
  shade(lci, pop_seq, xpd = TRUE)
}, ar=1.8)
SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.015614 seconds (Warm-up)
Chain 1:                0.013301 seconds (Sampling)
Chain 1:                0.028915 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 9e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.015469 seconds (Warm-up)
Chain 2:                0.013373 seconds (Sampling)
Chain 2:                0.028842 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 9e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.015906 seconds (Warm-up)
Chain 3:                0.015114 seconds (Sampling)
Chain 3:                0.03102 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL '05013cdfcc1a8f52ceb5601ee475a98d' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 8e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.016163 seconds (Warm-up)
Chain 4:                0.012632 seconds (Sampling)
Chain 4:                0.028795 seconds (Total)
Chain 4: 
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
../_images/5d63ad7caa2f849955f433cd24c8805c04c7612a72cc4043685fbf7b5b946705.png
SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 1.4e-05 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 1: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 1: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 1: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 1: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 1: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 1: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 1: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 1: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 1: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 1: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 1: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 0.016739 seconds (Warm-up)
Chain 1:                0.014427 seconds (Sampling)
Chain 1:                0.031166 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 9e-06 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 2: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 2: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 2: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 2: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 2: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 2: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 2: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 2: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 2: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 2: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 2: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 0.015531 seconds (Warm-up)
Chain 2:                0.014058 seconds (Sampling)
Chain 2:                0.029589 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 9e-06 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.09 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 3: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 3: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 3: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 3: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 3: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 3: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 3: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 3: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 3: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 3: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 3: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 0.015582 seconds (Warm-up)
Chain 3:                0.014108 seconds (Sampling)
Chain 3:                0.02969 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'c5bf3fcd8c4a2bf16731479ec5cc9e70' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 8e-06 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:   1 / 1000 [  0%]  (Warmup)
Chain 4: Iteration: 100 / 1000 [ 10%]  (Warmup)
Chain 4: Iteration: 200 / 1000 [ 20%]  (Warmup)
Chain 4: Iteration: 300 / 1000 [ 30%]  (Warmup)
Chain 4: Iteration: 400 / 1000 [ 40%]  (Warmup)
Chain 4: Iteration: 500 / 1000 [ 50%]  (Warmup)
Chain 4: Iteration: 501 / 1000 [ 50%]  (Sampling)
Chain 4: Iteration: 600 / 1000 [ 60%]  (Sampling)
Chain 4: Iteration: 700 / 1000 [ 70%]  (Sampling)
Chain 4: Iteration: 800 / 1000 [ 80%]  (Sampling)
Chain 4: Iteration: 900 / 1000 [ 90%]  (Sampling)
Chain 4: Iteration: 1000 / 1000 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 0.018843 seconds (Warm-up)
Chain 4:                0.014995 seconds (Sampling)
Chain 4:                0.033838 seconds (Total)
Chain 4: 
Some Pareto k values are high (>0.5). Set pointwise=TRUE to inspect individual points.
../_images/757ded2aed6a6ab2e0b5ce6cf74839d562a472dca1effe354eef784a1e6987cd.png

11H1. Use WAIC to compare the chimpanzee model that includes a unique intercept for each actor, m11.4 (page 330), to the simpler models fit in the same section. Interpret the results.

Answer. The last model (m11.4) does the best by far because it includes the actor predictor. The actor variable is highly predictive of the output; the big story of this experiment is handedness.

iplot(function() {
  plot(compare(m11.1, m11.2, m11.3, m11.4))
}, ar=3)
Warning message in compare(m11.1, m11.2, m11.3, m11.4):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/eb0d75c315d5510d5b18e9f10c3422e7a5fcc18e2d394ba5e1d3cbab6cd5ad43.png

11H2. The data contained in library(MASS);data(eagles) are records of salmon pirating attempts by Bald Eagles in Washington State. See ?eagles for details. While one eagle feeds, sometimes another will swoop in and try to steal the salmon from it. Call the feeding eagle the “victim” and the thief the “pirate.” Use the available data to build a binomial GLM of successful pirating attempts.

(a) Consider the following model: $\( \begin{align} \\ y_i & \sim Binomial(n_i, p_i) \\ logit(p_i) & = \alpha + \beta_P P_i + \beta_V V_i + \beta_A A_i \\ α & \sim Normal(0, 1.5) \\ \beta_P, \beta_V, \beta_A & \sim Normal(0, 0.5) \\ \end{align} \)$

where \(y\) is the number of successful attempts, \(n\) is the total number of attempts, \(P\) is a dummy variable indicating whether or not the pirate had large body size, \(V\) is a dummy variable indicating whether or not the victim had large body size, and finally \(A\) is a dummy variable indicating whether or not the pirate was an adult. Fit the model above to the eagles data, using both quap and ulam. Is the quadratic approximation okay?

Answer. The quadratic approximation is OK because this model includes pre-selected reasonable priors.

library(rethinking)
library(MASS)
data(eagles)
eag <- eagles
eag$Pi <- ifelse(eagles$P == "L", 1, 0)
eag$Vi <- ifelse(eagles$V == "L", 1, 0)
eag$Ai <- ifelse(eagles$A == "A", 1, 0)

mq_pirate_success <- quap(
  alist(
    y ~ dbinom(n, p),
    logit(p) <- a + b_P*Pi + b_V*Vi + b_A*Ai,
    a ~ dnorm(0, 1.5),
    b_P ~ dnorm(0, 0.5),
    b_V ~ dnorm(0, 0.5),
    b_A ~ dnorm(0, 0.5)
  ),
  data = eag
)

dat_eag <- list(
  n = eag$n,
  y = eag$y,
  Pi = eag$Pi,
  Vi = eag$Vi,
  Ai = eag$Ai
)

mu_pirate_success <- ulam(
  mq_pirate_success@formula,
  data = dat_eag, chains=4, cores=4, log_lik=TRUE
)

iplot(function() {
  plot(compare(mq_pirate_success, mu_pirate_success))
}, ar=4.5)
Warning message in compare(mq_pirate_success, mu_pirate_success):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/9265635a0b1c470d410ae341c0f67b91a3e8428042c81c54899d4e92d3e439fb.png

(b) Now interpret the estimates. If the quadratic approximation turned out okay, then it’s okay to use the quap estimates. Otherwise stick to ulam estimates. Then plot the posterior predictions. Compute and display both (1) the predicted probability of success and its 89% interval for each row (i) in the data, as well as (2) the predicted success count and its 89% interval. What different information does each type of posterior prediction provide?

Answer. The eagles data frame:

display(eagles)
A data.frame: 8 × 5
ynPAV
<int><int><fct><fct><fct>
11724LAL
22929LAS
31727LIL
42020LIS
5 112SAL
61516SAS
7 028SIL
8 1 4SIS

This graph is from postcheck, as mentioned in the text. The documentation on this function isn’t too helpful; it’s easier to read the source code. In the following figure, the open circles ($mean in the raw data) represent the predicted probability of success and the vertical lines the 89% interval ($PI). The crosshairs represent the 89% interval of the predicted success count ($outPI). The purple dots are observed outcomes from the eagles data frame above. Notice outcomes are scaled to between zero and one; in case #1 the purple dot is at \(17/24 \approx 0.7083\).

iplot(function() {
  result <- postcheck(mu_pirate_success)
  display_markdown("The raw data:")
  display(result)
})

The raw data:

$mean
  1. 0.710157399257729
  2. 0.927927363862875
  3. 0.562201465031564
  4. 0.86989774940961
  5. 0.328546090480074
  6. 0.716971297887675
  7. 0.203425159304023
  8. 0.571254516921082
$PI
A matrix: 2 × 8 of type dbl
5%0.61096640.88739070.45349790.79929620.21836850.59783180.12896450.4277729
94%0.80547320.95977420.66822830.92486390.45055620.82486800.28948050.7078469
$outPI
A matrix: 2 × 8 of type dbl
5%0.54166670.82758620.37037040.70.083333330.50000.071428570.25
94%0.87500001.00000000.74074071.00.583333330.93750.357142861.00
../_images/10982dfefee35db8563d00cf117f5d8c8f613efb05f04a2557953ef97ec4bed4.png

In part (1) we’re visualizing model parameters (including uncertainty). In part (2) we’re visualizing implied predictions on the outcome scale (including uncertainty) which indirectly includes the uncertainty in model parameters from the first part.

(c) Now try to improve the model. Consider an interaction between the pirate’s size and age (immature or adult). Compare this model to the previous one, using WAIC. Interpret.

Answer. The non-interaction model appears to do slightly better, but generally speaking the performance is almost the same. The interaction model is more complex (has more parameters) and that might slightly hurt its score.

eag$PirateTreatment <- 1 + eag$Pi + eag$Ai

dat_eag_inter <- list(
  n = eag$n,
  y = eag$y,
  PirateTreatment = eag$PirateTreatment,
  Vi = eag$Vi
)

mu_pirate_success_inter <- ulam(
  alist(
    y ~ dbinom(n, p),
    logit(p) <- a + b_t[PirateTreatment] + b_V*Vi,
    a ~ dnorm(0, 1.5),
    b_t[PirateTreatment] ~ dnorm(0, 0.5),
    b_V ~ dnorm(0, 0.5)
  ),
  data = dat_eag_inter, chains=4, cores=4, log_lik=TRUE
)

iplot(function() {
  plot(compare(mu_pirate_success_inter, mu_pirate_success))
}, ar=4.5)
../_images/edc96fcc3828238a64e8115a12378d91aa1c934c4bde405e8c4af1218f459255.png

11H4. The data contained in data(salamanders) are counts of salamanders (Plethodon elongatus) from 47 different 49-m² plots in northern California. The column SALAMAN is the count in each plot, and the columns PCTCOVER and FORESTAGE are percent of ground cover and age of trees in the plot, respectively. You will model SALAMAN as a Poisson variable.

(a) Model the relationship between density and percent cover, using a log-link (same as the example in the book and lecture). Use weakly informative priors of your choosing. Check the quadratic approximation again, by comparing quap to ulam. Then plot the expected counts and their 89% interval against percent cover. In which ways does the model do a good job? In which ways does it do a bad job?

Answer. The salamanders data frame:

data(salamanders)
display(salamanders)
sal <- salamanders
sal$StdPctCover <- scale(sal$PCTCOVER)

sal_dat <- list(
  S = sal$SALAMAN,
  StdPctCover = sal$StdPctCover
)

mq_sal <- quap(
  alist(
    S ~ dpois(lambda),
    log(lambda) <- a + b*StdPctCover,
    a ~ dnorm(3, 0.5),
    b ~ dnorm(0, 0.2)
  ), data = sal_dat
)

mu_sal <- ulam(
  mq_sal@formula,
  data = sal_dat, chains=4, cores=4, log_lik=TRUE
)
A data.frame: 47 × 4
SITESALAMANPCTCOVERFORESTAGE
<int><int><int><int>
11385316
21186 88
31190548
4 988 64
5 889 43
6 783368
7 683200
8 691 71
9 588 42
10 590551
11 487675
12 383217
13 387212
14 389398
15 392357
16 393478
17 2 2 5
18 287 30
19 293551
20 1 7 3
21 116 15
22 119 31
23 129 10
24 134 49
25 146 30
26 180215
27 186586
28 188105
29 192210
30 0 0 0
31 0 1 4
32 0 3 3
33 0 5 2
34 0 8 10
35 0 9 8
36 011 6
37 014 49
38 017 29
39 024 57
40 044 59
41 052 78
42 077 50
43 078320
44 080411
45 086133
46 089 60
47 091187

There doesn’t seem to be a difference between quap and ulam in inference:

iplot(function() {
  plot(compare(mu_sal, mq_sal))
}, ar=4.5)

pctc <- list(StdPctCover=scale(seq(0, 100)))
sims <- sim(mq_sal, data=pctc)

expectedSalCount.mu <- apply(sims, 2, mean)
expectedSalCount.PI <- apply(sims, 2, PI)
display_markdown("The raw data:")
display(expectedSalCount.mu)
display(expectedSalCount.PI)

iplot(function() {
  plot(pctc$StdPctCover, expectedSalCount.mu, ylim=c(0,12))
  shade(expectedSalCount.PI, pctc$StdPctCover)
  title("Expected salamander count")
})
Warning message in compare(mu_sal, mq_sal):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
../_images/42212c4534a88e62621786a4e41465f4114030ed7d5451af0aa0fadaaf1ac85a.png

The raw data:

  1. 0.763
  2. 0.753
  3. 0.74
  4. 0.881
  5. 0.865
  6. 0.848
  7. 0.834
  8. 0.899
  9. 0.888
  10. 0.945
  11. 0.938
  12. 0.95
  13. 1.037
  14. 0.988
  15. 0.994
  16. 0.99
  17. 1.053
  18. 1.066
  19. 1.131
  20. 1.119
  21. 1.113
  22. 1.191
  23. 1.204
  24. 1.225
  25. 1.28
  26. 1.309
  27. 1.34
  28. 1.377
  29. 1.39
  30. 1.373
  31. 1.508
  32. 1.504
  33. 1.466
  34. 1.531
  35. 1.632
  36. 1.584
  37. 1.629
  38. 1.669
  39. 1.731
  40. 1.711
  41. 1.868
  42. 1.808
  43. 1.957
  44. 1.996
  45. 1.971
  46. 2.12
  47. 2.013
  48. 2.1
  49. 2.135
  50. 2.136
  51. 2.289
  52. 2.319
  53. 2.32
  54. 2.506
  55. 2.589
  56. 2.509
  57. 2.484
  58. 2.597
  59. 2.626
  60. 2.771
  61. 2.826
  62. 2.927
  63. 2.981
  64. 2.961
  65. 3.045
  66. 3.14
  67. 3.182
  68. 3.286
  69. 3.401
  70. 3.54
  71. 3.537
  72. 3.544
  73. 3.638
  74. 3.782
  75. 3.878
  76. 3.901
  77. 4.059
  78. 3.94
  79. 4.287
  80. 4.172
  81. 4.333
  82. 4.57
  83. 4.576
  84. 4.755
  85. 4.802
  86. 4.895
  87. 5.061
  88. 5.234
  89. 5.22
  90. 5.388
  91. 5.399
  92. 5.629
  93. 5.759
  94. 5.975
  95. 6.124
  96. 6.161
  97. 6.275
  98. 6.458
  99. 6.461
  100. 6.739
  101. 7.016
A matrix: 2 × 101 of type dbl
5%0000000000 2 2 2 2 2 2 3 2.945 3 3
94%22233333331010101111111111.0001212
../_images/aa8bfcad55e4a7ff0c2b82519f9345da1f8d0348e19c87a0e94352336b2a1ca6.png

The model does well (is more certain) when it needs to guess lower numbers; when there are fewer salamanders when the ground cover is low. In general, though, it is uncertain about most of its predictions.

(b) Can you improve the model by using the other predictor, FORESTAGE? Try any models you think useful. Can you explain why FORESTAGE helps or does not help with prediction?

Answer. It doesn’t seem forest age helps much with prediction; it may be that forest age only predicts ground cover and so adds no extra information.

sal$StdForestAge <- scale(sal$FORESTAGE)
sal_dat <- list(
  S = sal$SALAMAN,
  StdPctCover = sal$StdPctCover,
  StdForestAge = sal$StdForestAge
)

mq_sal_forest_age <- quap(
  alist(
    S ~ dpois(lambda),
    log(lambda) <- a + b_c*StdPctCover + b_f*StdForestAge,
    a ~ dnorm(3, 0.5),
    b_c ~ dnorm(0, 0.2),
    b_f ~ dnorm(0, 0.2)
  ), data = sal_dat
)

iplot(function() {
  plot(compare(mq_sal, mq_sal_forest_age))
}, ar=4.5)
../_images/16c5f7f22805b4fb61907cd5334276ca30a431ad34cc8f520413a1ac062b2947.png
library(dagitty)

11H4. The data in data(NWOgrants) are outcomes for scientific funding applications for the Netherlands Organization for Scientific Research (NWO) from 2010-2012 (see van der Lee and Ellemers (2015) for data and context). These data have a very similar structure to the UCBAdmit data discussed in the chapter. I want you to consider a similar question: What are the total and indirect causal effects of gender on grant awards? Consider a mediation path (a pipe) through discipline. Draw the corresponding DAG and then use one or more binomial GLMs to answer the question. What is your causal interpretation? If NWO’s goal is to equalize rates of funding between men and women, what type of intervention would be most effective?

Answer. The NWOGrants data.frame:

data(NWOGrants)
display(NWOGrants)
nwo <- NWOGrants
A data.frame: 18 × 4
disciplinegenderapplicationsawards
<fct><fct><int><int>
Chemical sciences m 8322
Chemical sciences f 3910
Physical sciences m13526
Physical sciences f 39 9
Physics m 6718
Physics f 9 2
Humanities m23033
Humanities f16632
Technical sciences m18930
Technical sciences f 6213
Interdisciplinary m10512
Interdisciplinary f 7817
Earth/life sciencesm15638
Earth/life sciencesf12618
Social sciences m42565
Social sciences f40947
Medical sciences m24546
Medical sciences f26029

Reproduce the DAG from the text, where D now means discipline rather than department:

nwo_dag <- dagitty('
dag {
    bb="0,0,1,1"
    A [pos="0.500,0.400"]
    D [pos="0.300,0.300"]
    G [pos="0.100,0.400"]
    D -> A
    G -> A
    G -> D
}')
iplot(function() plot(nwo_dag), scale=10)
../_images/84a558c8bf920a0191915fd010d66f4ee2fec419cb14debb1ba77bb1f549f35c.png

For the total causal effect of gender on grant awards, do not condition on discipline:

nwo_dat <- list(
  awards = nwo$awards,
  applications = nwo$applications,
  gid = ifelse(as.character(nwo$gender) == "m", 1, 2)
)
m_nwo_tot <- quap(
  alist(
    awards ~ dbinom(applications, p),
    logit(p) <- a[gid],
    a[gid] ~ dnorm(0, 1.5)
  ),
  data = nwo_dat
)
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(m_nwo_tot, depth = 2)
  display(x)
  plot(x, main="m_nwo_tot")
}, ar=4.5)

Raw data:

A precis: 2 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a[1]-1.5314150.06462428-1.634697-1.428133
a[2]-1.7374340.08121384-1.867230-1.607639
../_images/ad75ddc48badd5cb6cbf5d532a55ef3962d6fd749aae298ec91eb38268172bfb.png

These two coefficients are on the log-odds (parameter) scale, so the odds of a male getting an award from an application is about \(exp(-1.53) = 0.217\) and the odds of a female about \(exp(-1.74) = 0.176\). If it helps you can convert these absolute odds to probabilities with \(p = o/(1+o)\).

Lets also look at the relative performance of males to females (see Contrast (statistics)). As in the text we’ll do so on the logit and probability scale:

post <- extract.samples(m_nwo_tot)
diff_a <- post$a[,1] - post$a[,2]
diff_p <- inv_logit(post$a[,1]) - inv_logit(post$a[,2])
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(list(diff_a=diff_a, diff_p=diff_p))
  display(x)
  plot(x, main="m_nwo_tot: contrast")
}, ar=4.5)

Raw data:

A precis: 2 × 5
meansd5.5%94.5%histogram
<dbl><dbl><dbl><dbl><chr>
diff_a0.206991020.10339980.0391461180.37194581▁▁▂▇▇▃▁▁
diff_p0.028184770.01397310.0054525070.05017562▁▁▁▂▅▇▇▃▁▁▁
../_images/c7eaf7eca643ed5e4f4131fd16f6c450e91618c829aeeb2d997c24f32073928c.png

For the direct causal effect of gender on grant awards, condition on discipline:

nwo_dat$disc_id <- rep(1:9, each = 2)
m_nwo_dir <- quap(
  alist(
    awards ~ dbinom(applications, p),
    logit(p) <- a[gid] + delta[disc_id],
    a[gid] ~ dnorm(0, 1.5),
    delta[disc_id] ~ dnorm(0, 1.5)
  ),
  data = nwo_dat
)
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(m_nwo_dir, depth = 2)
  display(x)
  plot(x, main="m_nwo_dir")
}, ar=3)

post <- extract.samples(m_nwo_dir)
diff_a <- post$a[,1] - post$a[,2]
diff_p <- inv_logit(post$a[,1]) - inv_logit(post$a[,2])
iplot(function() {
  display_markdown("Raw data:")
  x <- precis(list(diff_a=diff_a, diff_p=diff_p))
  display(x)
  plot(x, main="m_nwo_dir: contrast")
}, ar=4.5)

Raw data:

A precis: 11 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a[1]-1.15671970.4565097-1.8863105-0.4271290
a[2]-1.29446560.4597883-2.0292961-0.5596351
delta[1] 0.16249860.4908985-0.6220521 0.9470492
delta[2]-0.18948570.4856748-0.9656879 0.5867165
delta[3] 0.13892740.5114875-0.6785285 0.9563833
delta[4]-0.41144580.4709091-1.1640496 0.3411579
delta[5]-0.38202800.4792310-1.1479317 0.3838756
delta[6]-0.44780240.4895143-1.2301409 0.3345361
delta[7]-0.17665390.4742538-0.9346031 0.5812954
delta[8]-0.63810230.4640408-1.3797292 0.1035246
delta[9]-0.51679440.4687254-1.2659081 0.2323194
../_images/6ca0ad95403e1f88d04ef885d4a8c88c0caaed9d0cc4fba6ef93e05fd9987eea.png

Raw data:

A precis: 2 × 5
meansd5.5%94.5%histogram
<dbl><dbl><dbl><dbl><chr>
diff_a0.137327550.10669033-0.0339146120.30750179▁▁▂▅▇▅▁▁▁
diff_p0.023700680.01961388-0.0057350840.05587873▁▁▂▇▇▃▁▁▁
../_images/658f9d249e20b21b757cd587e2965062c7e0bdacfd5f1375adc06d4f66a916ec.png

Although the advantage for males has decreased, there is still some evidence for it.

One possible intervention is to increase funding to topics considered more valuable by females, assuming they prefer to do research in the areas they are applying. Another is to encourage females to get more involved (probably at a younger age) in the topics with more funding.

11H5. Suppose that the NWO Grants sample has an unobserved confound that influences both choice of discipline and the probability of an award. One example of such a confound could be the career stage of each applicant. Suppose that in some disciplines, junior scholars apply for most of the grants. In other disciplines, scholars from all career stages compete. As a result, career stage influences discipline as well as the probability of being awarded a grant. Add these influences to your DAG from the previous problem. What happens now when you condition on discipline? Does it provide an un-confounded estimate of the direct path from gender to an award? Why or why not? Justify your answer with the backdoor criterion. If you have trouble thinking this though, try simulating fake data, assuming your DAG is true. Then analyze it using the model from the previous problem. What do you conclude? Is it possible for gender to have a real direct causal influence but for a regression conditioning on both gender and discipline to suggest zero influence?

Answer. Reproducing the DAG from the text (where U is now career stage):

nwo_career_stage_dag <- dagitty('
dag {
    bb="0,0,1,1"
    A [pos="0.500,0.400"]
    D [pos="0.300,0.300"]
    G [pos="0.100,0.400"]
    U [latent,pos="0.500,0.300"]
    D -> A
    G -> A
    G -> D
    U -> D
    U -> A
}')
iplot(function() plot(nwo_career_stage_dag), scale=10)
../_images/82610bb054231028fcd3b7419b7d00094626c8f73d8730cd00a335ded8ce719d.png

If we condition on D then we open the backdoor path ‘G -> D <- U -> A’, which could confound any causal inferences we make about the effect of gender on award. That is, we will induce a statistical association between G and A. This association could cancel out any direct causal influence of G on A.

# set.seed(145)
# n <- 100
# U <- rnorm(n)
# gid <- rbinom(n, 1, 0.5) + 1
# gender_to_disc_log_odds <- c(-0.2, 0.2)
# p_disc <- inv_logit(gender_to_disc_log_odds[gid] + U)
# disc_id <- rbinom(n, 1, p_disc) + 1
# gender_to_award_log_odds <- c(-1.5, -2)
# disc_to_award_log_odds <- c(0, 0)
# p_award <- inv_logit(gender_to_award_log_odds[gid] + U + disc_to_award_log_odds[disc_id])
# awards <- rbinom(n, 1, p_award)
#
# display_markdown("
# Imagine there are only two disciplines. Researchers with more scholarly experience tend to work in
# the second discipline, and females tend to work in the second discipline. If we know you're in the
# second discipline, and you got an award, then it's likely you got the award because you're
# experienced.
#
#
# We're assuming that discipline has no influence on award in this case; this assumption is not
# necessary for conditioning on discipline to connect (d-connect) gender and scholarly experience.
#
# G D U
# m 1 2
# f 1
# m 2 2
# f 2

11H6. The data in data(Primates301) are 301 primate species and associated measures. In this problem, you will consider how brain size is associated with social learning. There are three parts.

(a) Model the number of observations of social_learning for each species as a function of the log brain size. Use a Poisson distribution for the social_learning outcome variable. Interpret the resulting posterior.

Answer. The head of the Primates301 data.frame:

data(Primates301)
display(head(Primates301, n = 20L))
prim_inc <- Primates301
prim <- prim_inc[complete.cases(prim_inc$social_learning, prim_inc$brain, prim_inc$research_effort), ]
A data.frame: 20 × 16
namegenusspeciessubspeciesspp_idgenus_idsocial_learningresearch_effortbrainbodygroup_sizegestationweaninglongevitysex_maturitymaternal_investment
<fct><fct><fct><fct><int><int><int><int><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1Allenopithecus_nigroviridis AllenopithecusnigroviridisNA 11 0 658.024655.0040.00 NA106.15276.0 NA NA
2Allocebus_trichotis Allocebus trichotis NA 22 0 6 NA 78.09 1.00 NA NA NA NA NA
3Alouatta_belzebul Alouatta belzebul NA 33 01552.846395.00 7.40 NA NA NA NA NA
4Alouatta_caraya Alouatta caraya NA 43 04552.635383.00 8.90185.92323.16243.61276.72509.08
5Alouatta_guariba Alouatta guariba NA 53 03751.705175.00 7.40 NA NA NA NA NA
6Alouatta_palliata Alouatta palliata NA 63 37949.886250.0013.10185.42495.60300.01578.42681.02
7Alouatta_pigra Alouatta pigra NA 73 02551.138915.00 5.50185.92 NA240.0 NA NA
8Alouatta_sara Alouatta sara NA 83 0 459.086611.04 NA NA NA NA NA NA
9Alouatta_seniculus Alouatta seniculus NA 93 08255.225950.00 7.90189.90370.04300.01690.22559.94
10Aotus_azarai Aotus azarai NA 104 02220.671205.00 4.10 NA229.69 NA NA NA
11Aotus_azarai_boliviensis Aotus azarai boliviensis 114NANA NA NA NA NA NA NA NA NA
12Aotus_brumbacki Aotus brumbacki NA 124 0NA NA NA NA NA NA NA NA NA
13Aotus_infulatus Aotus infulatus NA 134 0 6 NA NA NA NA NA NA NA NA
14Aotus_lemurinus Aotus lemurinus NA 144 01616.30 734.00 NA132.23 74.57216.0 755.15206.80
15Aotus_lemurinus_griseimembraAotus lemurinus griseimembra154NANA NA NA NA NA NA NA NA NA
16Aotus_nancymaae Aotus nancymaae NA 164 0 5 NA 791.03 4.00 NA NA NA NA NA
17Aotus_nigriceps Aotus nigriceps NA 174 0 1 NA 958.00 3.30 NA NA NA NA NA
18Aotus_trivirgatus Aotus trivirgatus NA 184 05816.85 989.00 3.15133.47 76.21303.6 736.60209.68
19Aotus_vociferans Aotus vociferans NA 194 012 NA 703.00 3.30 NA NA NA NA NA
20Archaeolemur_majori Archaeolemur majori NA 205NANA NA NA NA NA NA NA NA NA

After filtering for complete cases:

display(head(prim, n = 20L))

brain_dat <- list(
  SocialLearning = prim$social_learning,
  LogBrain = log(prim$brain)
)

mq_brain_size <- quap(
  alist(
    SocialLearning ~ dpois(lambda),
    log(lambda) <- a + b * LogBrain,
    a ~ dnorm(3, 0.5),
    b ~ dnorm(0, 0.2)
  ),
  data = brain_dat
)

mu_brain_size <- ulam(
  mq_brain_size@formula,
  data = brain_dat, chains = 4, cores = 4, log_lik = TRUE
)
A data.frame: 20 × 16
namegenusspeciessubspeciesspp_idgenus_idsocial_learningresearch_effortbrainbodygroup_sizegestationweaninglongevitysex_maturitymaternal_investment
<fct><fct><fct><fct><int><int><int><int><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
1Allenopithecus_nigroviridisAllenopithecusnigroviridisNA 1 10 6 58.024655.0040.00 NA106.15276.0 NA NA
3Alouatta_belzebul Alouatta belzebul NA 3 3015 52.846395.00 7.40 NA NA NA NA NA
4Alouatta_caraya Alouatta caraya NA 4 3045 52.635383.00 8.90185.92323.16243.61276.72 509.08
5Alouatta_guariba Alouatta guariba NA 5 3037 51.705175.00 7.40 NA NA NA NA NA
6Alouatta_palliata Alouatta palliata NA 6 3379 49.886250.0013.10185.42495.60300.01578.42 681.02
7Alouatta_pigra Alouatta pigra NA 7 3025 51.138915.00 5.50185.92 NA240.0 NA NA
8Alouatta_sara Alouatta sara NA 8 30 4 59.086611.04 NA NA NA NA NA NA
9Alouatta_seniculus Alouatta seniculus NA 9 3082 55.225950.00 7.90189.90370.04300.01690.22 559.94
10Aotus_azarai Aotus azarai NA10 4022 20.671205.00 4.10 NA229.69 NA NA NA
14Aotus_lemurinus Aotus lemurinus NA14 4016 16.30 734.00 NA132.23 74.57216.0 755.15 206.80
18Aotus_trivirgatus Aotus trivirgatus NA18 4058 16.85 989.00 3.15133.47 76.21303.6 736.60 209.68
22Arctocebus_calabarensis Arctocebus calabarensisNA22 60 1 6.92 309.00 1.00133.74109.26156.0 298.91 243.00
23Ateles_belzebuth Ateles belzebuth NA23 7012117.028167.0014.50138.20 NA336.0 NA NA
24Ateles_fusciceps Ateles fusciceps NA24 70 4114.249025.00 NA224.70482.70288.01799.68 707.40
25Ateles_geoffroyi Ateles geoffroyi NA25 7258105.097535.0042.00226.37816.35327.62104.571042.72
26Ateles_paniscus Ateles paniscus NA26 7030103.858280.0020.00228.18805.41453.62104.571033.59
28Avahi_laniger Avahi laniger NA28 8010 9.861207.00 2.00136.15149.15 NA NA 285.30
29Avahi_occidentalis Avahi occidentalisNA29 80 6 7.95 801.00 3.00 NA NA NA NA NA
32Bunopithecus_hoolock Bunopithecus hoolock NA3210024110.686728.00 3.20232.50635.13 NA2689.08 867.63
33Cacajao_calvus Cacajao calvus NA3311011 76.003165.0023.70180.00339.29324.01262.74 519.29
Warning message:
“Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess”
Warning message:
“Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess”

Use func=PSIS to generate warnings for high Pareto k values.

If you look at the data, you’ll see some classes (e.g. Gorillas and Chimpanzees) have orders of magnitude more study and more social learning examples than other species.

iplot(function()
  plot(compare(mq_brain_size, mu_brain_size, func=PSIS))
, ar=4.5)

iplot(function() {
  display_markdown("<br/>Raw data:")
  x <- precis(mq_brain_size)
  display(x)
  plot(x, main="mq_brain_size")
}, ar=4.5)
Warning message in compare(mq_brain_size, mu_brain_size, func = PSIS):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
../_images/248e93f5e67c2e3675c42d346a3fbaf153d60db31271124471a500bd8237dbf6.png


Raw data:

A precis: 2 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-5.6876360.24026506-6.071626-5.303646
b 1.5635200.04787613 1.487005 1.640035
../_images/506f3f3cb3b744b82e712e1517a7010081e3053d512818a1e0e35024f4be6c1e.png

Although the model is confident in the b coefficient, the wide error bars on the WAIC scores indicate the out-of-sample performance may be much worse.

Still, in the context of this model it appears there is an association between brain size and social learning. For every unit increase in LogBrain i.e. for every order of magnitude increase in brain size we should see an exponential increase in examples of social learning. If we let \(b = 1.5\) then for every order of magnitude we’ll see about \(exp(1.5) \approx 4.5\) more examples of social learning.

(b) Some species are studied much more than others. So the number of reported instances of social_learning could be a product of research effort. Use the research_effort variable, specifically its logarithm, as an additional predictor variable. Interpret the coefficient for log research_effort. How does this model differ from the previous one?

brain_dat$LogResearchEffort <- log(prim$research_effort)

mq_brain_research <- quap(
  alist(
    SocialLearning ~ dpois(lambda),
    log(lambda) <- a + b * LogBrain + d * LogResearchEffort,
    a ~ dnorm(3, 0.5),
    b ~ dnorm(0, 0.2),
    d ~ dnorm(0, 0.2)
  ),
  data = brain_dat
)

mu_brain_research <- ulam(
  mq_brain_research@formula,
  data = brain_dat, chains = 4, cores = 4, log_lik = TRUE
)

Again, use func=PSIS to generate warnings for high Pareto k values:

iplot(function()
  plot(compare(mq_brain_research, mu_brain_research, func=PSIS))
, ar=4.5)

iplot(function() {
  display_markdown("<br/>Raw data:")
  x <- precis(mq_brain_research)
  display(x)
  plot(x, main="mq_brain_research")
}, ar=4.5)
Warning message in compare(mq_brain_research, mu_brain_research, func = PSIS):
“Not all model fits of same class.
This is usually a bad idea, because it implies they were fit by different algorithms.
Check yourself, before you wreck yourself.”
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
Some Pareto k values are very high (>1). Set pointwise=TRUE to inspect individual points.
../_images/7a9874d2f0ad9d61526408e1a7eb1b256c594f958e90b474cec88a972af19cb2.png


Raw data:

A precis: 3 × 4
meansd5.5%94.5%
<dbl><dbl><dbl><dbl>
a-5.26926330.20355690-5.5945865-4.9439400
b 0.23261610.05079163 0.1514412 0.3137909
d 1.30945030.05033045 1.2290125 1.3898881
../_images/a0719fae03f30b84757d8dda25f8913d1874fa5d817e614e0fa98104d68a71d8.png

Answer. When we start to condition on research effort, the influence of brain size drops dramatically (to about 0.2 from 1.5).

(c) Draw a DAG to represent how you think the variables social_learning, brain, and research_effort interact. Justify the DAG with the measured associations in the two models above (and any other models you used).

Answer. It seems likely we were looking at the total causal effect of LogBrain in the first model, and the direct causal effect in the second.

brain_social_learning_dag <- dagitty('
dag {
    bb="0,0,1,1"
    LogBrain [pos="0.100,0.400"]
    LogResearchEffort [pos="0.300,0.300"]
    SocialLearning [pos="0.500,0.400"]
    LogBrain -> SocialLearning
    LogBrain -> LogResearchEffort
    LogResearchEffort -> SocialLearning
}')
iplot(function() plot(brain_social_learning_dag), scale=10)
../_images/fcc113feca431ec963350aac85682c2b60802fd4f1b6ebb219867a1bbe39d22b.png