17th September, 2019

Joint work with

Don van den Bergh, Alexander Ly, Eric-Jan Wagenmakers

The Second Moment’s First Moment

  • Upon closer inspection, many scientific questions from various fields concern variances

  • Economics & Archeology:
    • Do products become more homogeneous with increased specialization? (Kvamme et al., 1996)

  • Legal Studies:
    • Which interventions best reduce unwanted variability in civil damage awards? (Sasks et al., 1997)

  • Genetics:
    • What are the genetic effects on the variance of quantitative traits? (Pare et al., 2010)

  • Psychology:
    • Do males show higher variance on intelligence? (Johnson et al., 2008)
    • Do males show higher variance on personality traits? (Borkenau et al., 2013)
    • Does the variance of mathematical ability increase across school grades? (Aunola et al., 2004)

Problem Setup

  • Let group \(k\) consist of \(n_k\) observations \(\mathbf{x}_k = \{x_{k1}, \ldots, x_{kn_1}\}\). We assume that

\[ x_{ki} \overset{\tiny{\text{i.i.d.}}}{\sim} \mathcal{N}\left(\mu_k, \sigma^2_k\right) \enspace , \]

  • for all \(k \in \{1, \ldots, K\}\) and \(i \in \{1, \ldots, n_k\}\)
  • We restrict our focus to the \(K = 2\) case. Our aim is to test the hypotheses

\[ \begin{aligned} \mathcal{H}_0 &: \sigma^2_1 = \sigma^2_2 \\[.5em] \mathcal{H}_1 &: \sigma^2_1 \neq \sigma^2_2 \enspace . \end{aligned} \]

  • Bayesian inference provides a predictive perspective
  • We require models that instantiate these hypotheses and that make predictions
  • Without priors, no predictions!

Two Perspectives

\[ p(\mathbf{y{}} \mid \mathcal{M}) = \int_{\Theta} p(\mathbf{y}\mid \theta, \mathcal{M}) \, p(\theta \mid \mathcal{M}) \, \mathrm{d}\theta \]

\[ \widehat{\text{elpd}}^{\mathcal{M}}_{\text{loo}} = \frac{1}{n} \sum_{i=1}^n \text{log}\int_{\Theta} p(\mathbf{y}_i \mid \mathbf{y}_{-i}, \theta) \, p(\theta \mid \mathbf{y}_{-i}) \, \mathrm{d}\theta \]

From https://fabiandablander.com/r/Law-of-Practice.html

Bayes factor

  • Let \(\mathbf{d} = (\mathbf{x}_1, \mathbf{x}_2)\)
  • We focus here on the prior predictive perspective
  • The ratio of two marginal likelihoods is called the Bayes factor
  • Quantifies how much better one model predicts the data compared to another model

\[ \underbrace{\frac{p(\mathcal{M}_1 \mid \mathbf{d})}{p(\mathcal{M}_0 \mid \mathbf{d})}}_{\text{Posterior odds}} = \underbrace{\frac{p(\mathbf{d} \mid \mathcal{M}_1)}{p(\mathbf{d} \mid \mathcal{M}_0)}}_{\text{Bayes factor}} \hspace{.25em} \underbrace{\frac{p(\mathcal{M}_1)}{p( \mathcal{M}_0)}}_{\text{Prior odds}} \]

  • We focus on the \(K = 2\) case and derive a default Bayes factor
  • Next steps: Assigning priors to all parameters and integrating them out

A Butchered Bayes factor

  • Let \(\phi = (\mu_1, \mu_2)\) denote the test-irrelevant parameters
  • Under \(\mathcal{H}_1\) we specify

\[ \begin{aligned} \pi(\phi) &\propto 1 \\[.5em] \pi(\sigma_1^2) &\sim \text{Inverse-Gamma}(\alpha, \beta) \\[.5em] \pi(\sigma_2^2) &\sim \text{Inverse-Gamma}(\alpha, \beta) \\[.5em] \end{aligned} \]

  • Under \(\mathcal{H}_0\) we specify

\[ \begin{aligned} \pi(\phi) &\propto 1 \\[.5em] \pi(\sigma^2) &\sim \text{Inverse-Gamma}(\alpha, \beta) \end{aligned} \]

  • where \(\sigma^2 = \sigma_1^2 = \sigma_2^2\)

A Butchered Bayes factor

  • The resulting Bayes factor is given by

\[ \begin{aligned} \text{BF}_{10} &= \frac{p(\mathbf{d} \mid \mathcal{M}_1)}{p(\mathbf{d} \mid \mathcal{M}_0)} \\[.5em] &= \frac{\int_{\sigma_1}\int_{\sigma_2}\int_{\phi} f(\mathbf{d} \mid \phi, \sigma_1, \sigma_2) \, \pi(\phi, \sigma_1, \sigma_2) \, \mathrm{d}\phi \, \mathrm{d}\sigma_1 \mathrm{d} \sigma_2}{\int_{\sigma}\int_{\phi}f(\mathbf{d} \mid \phi, \sigma) \, \pi(\phi, \sigma) \, \mathrm{d}\phi \, \mathrm{d}\sigma} \\[.5em] &= \frac{\frac{1}{2} \frac{\beta^\alpha}{\Gamma(\alpha)} \Gamma\left(\frac{n_1}{2} + \alpha\right) \Gamma\left(\frac{n_2}{2} + \alpha\right) \left(\beta + \frac{n_1 s_1^2}{2} \right)^{-\frac{n_1}{2} - \alpha} \left(\beta + \frac{n_2 s_2^2}{2} \right)^{-\frac{n_2}{2} - \alpha}}{\Gamma\left(\frac{n}{2} + \alpha\right) \left(\beta + \frac{n_1 s_1^2 + n_2 s_2^2}{2} \right)^{-\frac{n}{2} - \alpha}} \enspace , \end{aligned} \]

  • where \(n = n_1 + n_2\) and \(n_k s_k = \sum_{i=1}^{n_k} (x_{ik} - \bar{x}_k)^2\)
  • This Bayes factor is not measurement invariant nor predictively matched

Principled Selection of Priors

  • Select priors such that certain desiderata are fulfilled:
  • Measurement invariance:
    • Bayes factor does not depend on the unit of measurements

  • Predictive matching:
    • Bayes factor equals 1 for uninformative data

  • Model selection consistent:
    • Bayes factor in favour of true model goes to \(\infty\) as \((n_1, n_2) \rightarrow \infty\)

  • Information consistent:
    • Bayes factor in favour of true model goes to \(\infty\) as (finite) data provide infinite support

  • Limit consistent:
    • If \(n_2\) fixed but \(n_1 \rightarrow \infty\), Bayes factor remains bounded
  • We call the resulting Bayes factor a default Bayes factor (Jeffreys, 1961; Bayarri et al. 2012; Ly, 2018)

Intuition behind Desiderata

Derivation of Default Bayes factor

  • We borrow an idea from Jeffreys (1939) and write for \(\rho \in [0, 1]\)

\[ \begin{aligned} \sigma_1^2 &= \rho \sigma^2 \\[.5em] \sigma_2^2 &= (1 - \rho) \sigma^2 \end{aligned} \]

  • such that \(\sigma_1^2 + \sigma_2^2 = \rho \sigma^2 + (1 - \rho)\sigma^2 = \sigma^2\)
  • Observe that \(\rho = \frac{\sigma_1^2}{\sigma_1^2 + \sigma_2^2}\)
  • We can rewrite the hypotheses know in terms of \(\rho\):

\[ \begin{aligned} \mathcal{H}_0 &: \rho = 0.50 \\[.5em] \mathcal{H}_1 &: \rho \neq 0.50 \end{aligned} \]

  • This leads to \(p(\mathbf{y} \mid \mathcal{M}_0) = p(\mathbf{y} \mid \rho = 0.50, \mathcal{M}_1)\)

Derivation of Default Bayes factor

  • Let \(\phi = (\mu_1, \mu_2, \sigma)\) be the set of test-irrelevant parameters. Then

\[ \text{BF}_{10} = \frac{p(\mathbf{d} \mid \mathcal{M}_1)}{p(\mathbf{d} \mid \mathcal{M}_0)} = \frac{\int_{\rho} \int_{\phi} f(\mathbf{d} \mid \phi, \rho) \, \pi(\phi) \, \pi(\rho) \, \mathrm{d}\phi \, \mathrm{d}\rho}{\int_{\phi} f(\mathbf{d} \mid \phi, \rho = 0.50) \, \pi(\phi) \, \mathrm{d}\phi}\enspace . \]

  • We can use improper priors \(\pi(\phi) = 1 \cdot 1 \cdot \sigma^{-1}\) because the normalizing constants cancel
  • Key question: What prior do we assign the test-relevant parameter \(\rho\)?

Prior on \(\rho\)

  • Principled approach of choosing priors
    • Measurement invariance \(\checkmark\)
    • Predictive matching \(\checkmark\)
    • Model selection consistent \(\checkmark\)
    • Information consistent \(\checkmark\)
    • Limit consistent \(\checkmark\)

  • A class of Beta priors fulfills these criteria

\[ \rho = \frac{\sigma_1^2}{\sigma_1^2 + \sigma_2^2} \sim \text{Beta}(\alpha, \alpha) \]

  • This induces a Generalized Beta prime distribution on

\[ \delta \equiv \frac{\sigma_1}{\sigma_2} \sim \text{GeneralizedBetaPrime}(\alpha, 2, 1) \]

  • \(\alpha = 4.50\) implies that \(p(\delta \in [0.50, 2]) = 0.95\)

Result

  • The Bayes factor in favour of \(\mathcal{H}_1\) is given by: \[ \text{BF}_{10} = \frac{\text{B}\left({\frac{n_2 - 1}{2} + \alpha},\ {\frac{n_1 - 1}{2} + \alpha}\right) \,_2F_1\left({\frac{n - 2}{2}};{\frac{n_2 - 1}{2} + \alpha};{\frac{n - 2}{2} + 2\alpha};{1 - \frac{n_2 s_2^2}{n_1 s_1^2}}\right)}{\text{B}\left({\alpha},\ {\alpha}\right) \left(1 + \frac{n_2 s_2^2}{n_1 s_1^2}\right)^{\frac{2 - n}{2}}} \]

  • where \(n_k s_k^2 = \sum_{i=1}^{n_k} (x_{ik} - \bar{x}_{ik})^2\) and \(n = n_1 + n_2\)
  • Note that the data only enter through the ratio \(\frac{n_2 s_2^2}{n_1 s_1^2}\)

Example Application I

  • Borkenau et al. (2013) study the variability in personality traits in four countries
  • We re-analyze the Estonian sample: \(s_f^2 = 15.6\), \(s_m^2 = 19.9\), \(n_f = 969\), \(n_m = 716\)

Comparison with Fractional Bayes factor I

  • There are alternative ways of specifying a prior distribution for testing
  • One might sacrifice part of the data to specify minimally informative priors
    • Partial Bayes factors (O’Hagan, 1991)
    • Intrinsic Bayes factors (Berger & Pericchi, 1996)
    • Fractional Bayes factors (O’Hagan, 1995)

  • Böing-Messing & Mulder (2018) propose an Automatic Fractional Bayes factor (AFBF) for testing hypotheses on variances
  • It turns out that this is a special case of our Bayes factor with \(\alpha = 0.50\)

Comparison with Fractional Bayes factor II

Extension to \(K > 2\) groups

  • We again write

\[ x_{ki} \overset{\tiny{\text{i.i.d.}}}{\sim} \mathcal{N}(\mu_k, \rho_k \sigma^2) \]

  • for all \(k \in \{1, \ldots, K\}\) and \(i \in \{1, \ldots, n_k\}\) where \(\sum_{k=1}^K \rho_k = 1\)
  • Standard \(\mathcal{H}_0\) and \(\mathcal{H}_1\) are

\[ \begin{align*} \mathcal{H}_0&: \rho_k = \frac{1}{k} \hspace{1em} \forall k \in \{1, \ldots, k\} \\ \mathcal{H}_1&: \mathbf{\rho} \sim \text{Dirichlet}(\alpha_1, \ldots, \alpha_k) \end{align*} \]

  • However, we can flexibly test any hypotheses with mixed constraints such as

\[ \mathcal{H}_r: \rho_1 = \rho_2 > \rho_3, \rho_4, \rho_5 = \rho_6 > \rho_7 \]

  • by constraining the prior and computing the marginal likelihood in Stan using bridgesampling (Meng & Wong, 1996; Gronau et al., 2017)

Example Application II

  • Aunola et al. (2004) find that the variance in mathematical ability increases across school grade
  • We replicate this finding using large-scale data from MathGarden (\(n = 41801\))

\[ \begin{align*} \mathcal{H}_0&: \rho_i = \rho_j \hspace{1em} \forall (i, j) \hspace{5em} p(\mathcal{M}_0 \mid y) = 0.00\\ \mathcal{H}_f&: \rho_i \neq \rho_j \hspace{1em} \forall (i, j) \hspace{5em} p(\mathcal{M}_f \mid y) = 0.00\\ \mathcal{H}_r&: \rho_i < \rho_j \hspace{1em} \forall (i < j) \hspace{4.25em} p(\mathcal{M}_r \mid y) = 1.00\\ \end{align*} \]

Wrap-Up

  • Many scientific questions from various fields concern variances

  • Derived a default Bayes factor for the \(K = 2\) case
    • Fulfills a number of desiderata
    • Limit consistency yields \(K = 1\) group Bayes factor
    • Can extend point null hypothesis to a null-region

  • Extended it to \(K > 2\) groups
    • Compute marginal likelihood for mixed hypotheses using bridgesampling

  • Generalizes the ‘automatic’ Bayes factor proposed by Böing-Messing & Mulder (2018)
    • Allows sensitivity analyses and incorporation of prior information

Short Advertisement

  • You can find me online @fdabl and fabiandablander.com where I write about
    • Love affairs and linear differential equations
    • Spurious correlations and random walks
    • Bayesian modeling in Stan: A case study

Thank you for your attention!

Literature

  • Kvamme, K. L., Stark, M. T., & Longacre, W. A. (1996). Alternative procedures for assessing standardization in ceramic assemblages. American Antiquity, 61(1), 116-126.
  • Saks, M. J., Hollinger, L. A., Wissler, R. L., Evans, D. L., & Hart, A. J. (1997). Reducing variability in civil jury awards. Law and Human Behavior, 21(3), 243-256.
  • Paré, G., Cook, N. R., Ridker, P. M., & Chasman, D. I. (2010). On the use of variance per genotype as a tool to identify quantitative trait interaction effects: a report from the Women’s Genome Health Study. PLoS Genetics, 6(6), e1000981.
  • Johnson, W., Carothers, A., & Deary, I. J. (2008). Sex differences in variability in general intelligence: A new look at the old question. Perspectives on Psychological Science, 3(6), 518-531.
  • Borkenau, P., Hřebíčková, M., Kuppens, P., Realo, A., & Allik, J. (2013). Sex differences in variability in personality: A study in four samples. Journal of Personality, 81(1), 49-60.
  • Aunola, K., Leskinen, E., Lerkkanen, M. K., & Nurmi, J. E. (2004). Developmental dynamics of math performance from preschool to grade 2. Journal of Educational Psychology, 96(4), 699-713.
  • Jeffreys, H. (1939). Theory of Probability (1rd Ed.). Oxford, UK: Oxford University Press.
  • Jeffreys, H. (1961). Theory of Probability (3rd Ed.). Oxford, UK: Oxford University Press.

Literature

  • Bayarri, M. J., Berger, J. O., Forte, A., & García-Donato, G. (2012). Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics, 40(3), 1550-1577.
  • Ly, A. (2018). Bayes factors for research workers. Unpublished PhD Thesis
  • O’Hagan, A. (1991). Discussion on posterior Bayes factors (by M. Aitkin). Journal of the Royal Statistical Society: Series B (Methodological), 53, 136.
  • O’Hagan, A. (1995). Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society: Series B (Methodological), 57(1), 99-118.
  • Berger, J. O., & Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction. Journal of the American Statistical Association, 91(433), 109-122.
  • Böing-Messing, F., & Mulder, J. (2018). Automatic Bayes factors for testing equality-and inequality-constrained hypotheses on variances. Psychometrika, 83(3), 586-617.
  • Meng, X. L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica, 6(4), 831-860.
  • Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., … & Steingroever, H. (2017). A tutorial on bridge sampling. Journal of Mathematical Psychology, 81, 80-97.