Introduction Bayes’ Theorem
Bayesian Inference in Survey Research: Applications to Confirmatory Factor Analysis with Small Sample Sizes
Sample Size Issues MCMC Summarizing the Posterior Distribution
David Kaplan Department of Educational Psychology
Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
Invited Talk to the Joint Research Center of the European Commission 1 / 57
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
2 / 57
This talk is drawn from my book Bayesian Statistics for the Social Sciences, The Guilford Press, 2014.
Introduction Bayesian statistics has long been overlooked in the quantitative methods training of social scientists. Introduction Bayes’ Theorem Sample Size Issues
Typically, the only introduction that a student might have to Bayesian ideas is a brief overview of Bayes’ theorem while studying probability in an introductory statistics class.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
3 / 57
1
2
Until recently, it was not feasible to conduct statistical modeling from a Bayesian perspective owing to its complexity and lack of availability. Bayesian statistics represents a powerful alternative to frequentist (classical) statistics, and is therefore, controversial.
Recently, there has been a renaissance in the development and application of Bayesian statistical methods, owing mostly to developments of powerful statistical software tools that render the specification and estimation of complex models feasible from a Bayesian perspective.
Paradigm Differences
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
For frequentists, the basic idea is that probability is represented as long run frequency. Frequentist probability underlies the Fisher and Neyman-Pearson schools of statistics – the conventional methods of statistics we most often use. The frequentist formulation rests on the idea of equally probable and stochastically independent events
Example Wrap-Up: Some Philosophical Issues
4 / 57
The physical representation is the coin toss, which relates to the idea of a very large (actually infinite) number of repeated experiments.
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
5 / 57
The entire structure of Neyman - Pearson hypothesis testing and Fisherian statistics (together referred to as the frequentist school) is based on frequentist probability. Our conclusions regarding the null and alternative hypotheses presuppose the idea that we could conduct the same experiment an infinite number of times. Our interpretation of confidence intervals also assumes a fixed parameter and CIs that vary over an infinitely large number of identical experiments.
But there is another view of probability as subjective belief. The physical model in this case is that of the “bet”. Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
Consider the situation of betting on who will win the the World Cup (or the World Series). Here, probability is not based on an infinite number of repeatable and stochastically independent events, but rather on how much knowledge you have and how much you are willing to bet. Subjective probability allows one to address questions such as “what is the probability that my team will win the World Cup?” Relative frequency supplies information, but it is not the same as probability and can be quite different. This notion of subjective probability underlies Bayesian statistics.
6 / 57
Bayes’ Theorem
Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues
Consider the joint probability of two events, Y and X, for example observing lung cancer and smoking jointly. The joint probability can be written as
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
7 / 57
p(cancer, smoking) = p(cancer|smoking)p(smoking)
(1)
Similarly p(smoking, cancer) = p(smoking|cancer)p(cancer)
(2)
Because these are symmetric, we can set them equal to each other to obtain the following
Introduction Bayes’ Theorem
p(cancer|smoking)p(smoking) = p(smoking|cancer)p(cancer) (3)
The Prior Distribution
Sample Size Issues MCMC
.
Summarizing the Posterior Distribution
p(cancer|smoking) =
Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
8 / 57
p(smoking|cancer)p(cancer) p(smoking)
(4)
The inverse probability theorem (Bayes’ theorem) states
. p(smoking|cancer) =
p(cancer|smoking)p(smoking) p(cancer)
(5)
Introduction Bayes’ Theorem
Why do we care?
The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
9 / 57
Because this is how you could go from the probability of having cancer given that the patient smokes, to the probability that the patient smokes given that he/she has cancer. We simply need the marginal probability of smoking and the marginal probability of cancer (”base rates” or what we will call prior probabilities).
Statistical Elements of Bayes’ Theorem
What is the role of Bayes’ theorem for statistical inference? Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
Denote by Y a random variable that takes on a realized value y. For example, a person’s socio-economic status could be considered a random variable taking on a very large set of possible values. This is the random variable Y . Once the person identifies his/her socioeconomic status, the random variable Y is now realized as y.
Example Wrap-Up: Some Philosophical Issues
10 / 57
Because Y is unobserved and random, we need to specify a probability model to explain how we obtained the actual data values y.
Next, denote by θ a parameter that we believe characterizes the probability model of interest. Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
The parameter θ can be a scalar, such as the mean or the variance of a distribution, or it can be vector-valued, such as a set of regression coefficients in regression analysis or factor loadings in factor analysis. We are concerned with determining the probability of observing y given the unknown parameters θ, which we write as p(y|θ). In statistical inference, the goal is to obtain estimates of the unknown parameters given the data. This is expressed as the likelihood of the parameters given the data, often denoted as L(θ|y).
11 / 57
Introduction
The key difference between Bayesian statistical inference and frequentist statistical inference concerns the nature of the unknown parameters θ.
Bayes’ Theorem The Prior Distribution
Sample Size Issues
In the frequentist tradition, the assumption is that θ is unknown, but no attempt is made to account for our uncertainty about θ.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
12 / 57
In Bayesian statistical inference, θ is also unknown but we reflect our uncertainty about the true value of θ by specifying a probability distribution to describe it. Because both the observed data y and the parameters θ are assumed random, we can model the joint probability of the parameters and the data as a function of the conditional distribution of the data given the parameters, and the prior distribution of the parameters.
Introduction
More formally, p(θ, y) = p(y|θ)p(θ).
Bayes’ Theorem The Prior Distribution
Sample Size Issues
(6)
where p(θ, y) is the joint distribution of the parameters and the data. Following Bayes’ theorem described earlier, we obtain
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
Bayes’ Theorem p(θ|y) =
p(θ, y) p(y|θ)p(θ) = , p(y) p(y)
Example Wrap-Up: Some Philosophical Issues
13 / 57
where p(θ|y) is referred to as the posterior distribution of the parameters θ given the observed data y.
(7)
Introduction Bayes’ Theorem The Prior Distribution
From equation (7) the posterior distribution of θ give y is equal to the data distribution p(y|θ) times the prior distribution of the parameters p(θ) normalized by p(y) so that the posterior distribution sums (or integrates) to one.
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
14 / 57
For discrete variables X
p(y|θ)p(θ),
(8)
and for continuous variables Z p(y) = p(y|θ)p(θ)dθ,
(9)
p(y) =
θ
θ
Notice that p(y) does not involve model parameters, so we can omit the term and obtain the unnormalized posterior distribution
Introduction Bayes’ Theorem The Prior Distribution
.
Sample Size Issues
p(θ|y) ∝ p(y|θ)p(θ).
(10)
MCMC
When expressed in terms of the unknown parameters θ for fixed values of y, the term p(y|θ) is the likelihood L(θ|y), which we defined earlier. Thus, equation (10) can be re-written as
Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
15 / 57
. p(θ|y) ∝ L(θ|y)p(θ).
(11)
Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
16 / 57
Equations (10) and (11) represents the core of Bayesian statistical inference and is what separates Bayesian statistics from frequentist statistics. Equation (11) states that our uncertainty regarding the parameters of our model, as expressed by the prior density p(θ), is weighted by the actual data p(y|θ) (or equivalently, L(θ|y)), yielding an updated estimate of our uncertainty, as expressed in the posterior density p(θ|y).
The Prior Distribution Why do we specify a prior distribution on the parameters? Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
The key philosophical reason concerns our view that progress in science generally comes about by learning from previous research findings and incorporating information from these findings into our present studies. The information gleaned from previous research is almost always incorporated into our choice of designs, variables to be measured, or conceptual diagrams to be drawn. Bayesian statistical inference simply requires that our prior beliefs be made explicit, but then moderates our prior beliefs by the actual data in hand. Moderation of our prior beliefs by the data in hand is the key meaning behind equations (10) and (11).
17 / 57
But how do we choose a prior? Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
The choice of a prior is based on how much information we believe we have prior to the data collection and how accurate we believe that information to be. This issue has also been discussed by Leamer (1983), who orders priors on the basis of degree of confidence. Leamer’s hierarchy of confidence is as follow: truths (e.g. axioms) > facts (data) > opinions (e.g. expert judgement) > conventions (e.g. pre-set alpha levels).
Example Wrap-Up: Some Philosophical Issues
18 / 57
The strength of Bayesian inference lies precisely in its ability to incorporate existing knowledge into statistical specifications.
Non-informative priors In some cases we may not be in possession of enough prior information to aid in drawing posterior inferences. Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues
From a Bayesian perspective, this lack of information is still important to consider and incorporate into our statistical specifications.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
In other words, it is equally important to quantify our ignorance as it is to quantify our cumulative understanding of a problem at hand. The standard approach to quantifying our ignorance is to incorporate a non-informative prior into our specification. Non-informative priors are also referred to as vague or diffuse priors.
19 / 57
Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
20 / 57
Perhaps the most sensible non-informative prior distribution to use in this case is the uniform distribution U (α, β) over some sensible range of values from α to β. In this case, the uniform distribution essential indicates that we believe that the value of our parameter of interest lies in range β−α and that all values have equal probability. Care must be taken in the choice of the range of values over the uniform distribution. For example, a U [−∞, ∞] is an improper prior distribution insofar as it does not integrate to 1.0 as required of probability distributions.
Informative-Conjugate Priors
Introduction Bayes’ Theorem The Prior Distribution
Sample Size Issues
It may be the case that some information can be brought to bear on a problem and be systematically incorporated into the prior distribution. Such “subjective” priors are called informative.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
21 / 57
One type of informative prior is based on the notion of a conjugate distribution. A conjugate prior distribution is one that, when combined with the likelihood function yields a posterior that is in the same distributional family as the prior distribution.
Conjugate Priors for Some Common Distributions
Introduction Bayes’ Theorem The Prior Distribution
Data Distribution
Conjugate Prior
The normal distribution
The normal distribution or Uniform Distribution
The Poisson distribution
The gamma distribution
The binomial distribution
The Beta Distribution
The multinomial distribution
The Dirichlet Distribution
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
22 / 57
Density
1.0
prior Likelihood Posterior
0.5
Density 0.5
Bayes’ Theorem
prior Likelihood Posterior
1.0
Introduction
Prior is Normal (0,0.5) 1.5
1.5
Prior is Normal (0,0.3)
−4
−2
MCMC
−2
0
2
Prior is Normal (0,1.2)
Prior is Normal (0,3)
Density
1.0
prior Likelihood Posterior
0.5
Density
1.0
prior Likelihood Posterior
4
0.0
Wrap-Up: Some Philosophical Issues
−4
x
0.5
Example
4
0.0
Bayesian Factor Analysis
2
1.5
Summarizing the Posterior Distribution
0
x
1.5
Sample Size Issues
0.0
0.0
The Prior Distribution
−4
−2
0
2
4
−4
−2
0
2
4
Figure: Normal distribution, mean unknown/variance known with varying conjugate priors 23 / 57
Prior is Gamma(10,0.2)
4
1.0 0.8
2
4
6
8
Prior is Gamma(3,1)
Prior is Gamma(2.1,3)
0.8
Density
0.8
10
prior Likelihood Posterior
1.0
prior Likelihood Posterior
0.2 0.0
0.0
Wrap-Up: Some Philosophical Issues
0
x
0.6
Density
10
0.2
Example
8
0.4
Bayesian Factor Analysis
6
x
1.0
Summarizing the Posterior Distribution
0.6
Density
0.2 0.0 2
0.6
0
MCMC
0.4
Sample Size Issues
prior Likelihood Posterior
0.4
1.0 0.6
Density
0.0
0.2
The Prior Distribution
0.4
Bayes’ Theorem
0.8
Introduction
Prior is Gamma(8,0.5)
prior Likelihood Posterior
0
2
4
6
8
10
0
2
4
6
8
10
Figure: Poisson distribution with varying gamma-density priors 24 / 57
Sample Size in Bayesian Statistics
Introduction Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
In classical statistics, a re-occurring concern is the problem of whether a sample size is large enough to trust estimates and standard errors. This problem stems from the fact that the desirable properties of estimators such as OLS and ML exhibit themselves in the limit as N approaches infinity. The question is whether a finite sample size is large enough for these desirable properties to “kick in”.
Example Wrap-Up: Some Philosophical Issues
25 / 57
Considerable methodological work focuses on examining the effects of small sample sizes on estimators in the context of different modeling frameworks.
Introduction
What is the role of sample size in Bayesian methods?
Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage
Because appeals to asymptotic properties are not part of the Bayesian toolbox, can Bayesian statistics be used for small sample sizes?
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
26 / 57
The answer depends on the interaction of the sample size with the specification of the prior distribution. This, in turn, can be seen from a discussion of the Bayesian Central Limit Theorem.
Bayesian Central Limit Theorem and Bayesian Shrinkage
As an example, assume that y follows a normal distribution written as
Introduction Bayes’ Theorem
.
Sample Size Issues
1 (y − µ)2 p(y|µ, σ ) = √ exp − . 2σ 2 2πσ 2
Bayesian CLT and Bayes Shrinkage
(12)
MCMC
Specify a normal prior with mean and variance hyperparameters, κ and τ 2 , respectively which for this example are known.
Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
27 / 57
. 1 (µ − κ)2 p(µ|κ, τ ) = √ exp − . 2τ 2 2πτ 2 2
(13)
The posterior distribution can be obtained as
. Introduction
"
Bayes’ Theorem
p(µ|y) ∼ N
Sample Size Issues Bayesian CLT and Bayes Shrinkage
MCMC
κ τ2 1 τ2
# τ 2 σ2 , 2 . σ + nτ 2
(14)
The posterior distribution of µ is normal with mean
µ ˆ=
Bayesian Factor Analysis
κ τ2 1 τ2
+ +
n¯ y σ2 n σ2
,
(15)
and variance
Example
28 / 57
n¯ y σ2 n σ2
.
Summarizing the Posterior Distribution
Wrap-Up: Some Philosophical Issues
+ +
. σˆµ2 =
τ 2 σ2 . σ 2 + nτ 2
(16)
Notice that as the sample size approaches infinity,
. Introduction Bayes’ Theorem Sample Size Issues
n→∞
Bayesian CLT and Bayes Shrinkage
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
29 / 57
n¯ y κ τ 2 + σ2 , n→∞ 12 + n2 τ σ κσ 2 ¯ 2 + y lim nτ = 2 σ n→∞ nτ 2 + 1
lim µ ˆ = lim =
y¯.
(17)
Thus as the sample size increases to infinity, the expected a posteriori estimate µ ˆ converges to the maximum likelihood estimate y¯. With N very large is little information in the prior distribution that is relevant to estimating the mean and variance of the posterior distribution.
In terms of the variance, let 1/τ 2 and n/σ 2 refer to the prior precision and data precision, respectively. Introduction Bayes’ Theorem
Letting n approach infinity, we obtain
.
Sample Size Issues Bayesian CLT and Bayes Shrinkage
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
1 +
lim σˆµ2 = lim
n→∞
n→∞ 12 τ
= lim
n→∞
n σ2
,
σ2 σ2 = , n +n
σ2 τ2
(18)
which we recognize as the maximum likelihood estimator of the variance of the mean; the square root of which yields the standard error of the mean. A similar result emerges if we consider the case where we have very little information regarding the prior precision.
30 / 57
Another interesting result is that the posterior mean µ can be seen as a compromise between the prior mean κ and the observed data mean y¯.
Introduction Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage
Notice that we can rewrite equation (15) as
.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
31 / 57
µ ˆ=
σ2 nτ 2 κ + y¯. σ 2 + nτ 2 σ 2 + nτ 2
(19)
Thus, the posterior mean is a weighted combination of the prior mean and observed data mean. These weights are bounded by 0 and 1 and together are referred to as the shrinkage factor.
Introduction
The shrinkage factor represents the proportional distance that the posterior mean has shrunk back to the prior mean κ and away from the maximum likelihood estimator y¯.
Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage
If the sample size is large, the weight associated with κ will approach zero and the weight associated with y¯ will approach one. Thus µ ˆ will approach y¯.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
32 / 57
Similarly, if the data variance σ 2 is very large relative to the prior variance τ 2 , this suggests little precision in the data relative to the prior and therefore the posterior mean will approach the prior mean, κ. Conversely, if the prior variance is very large relative to the data variance this suggests greater precision in the data compared to the prior and therefore the posterior mean will approach y¯.
Introduction
So, can Bayesian methods be used for very small sample sizes?
Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
33 / 57
Yes, but the role of priors becomes crucial. If sample size is small, the precision of the priors and the precision of the data matter. If the prior mean is way off with high precision, that is a problem. Elicitation and model comparison is very important. There is no free lunch!
Markov Chain Monte Carlo Sampling
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
The key reason for the increased popularity of Bayesian methods in the social and behavioral sciences has been the (re)-discovery of numerical algorithms for estimating the posterior distribution of the model parameters given the data. Prior to these developments, it was virtually impossible to analytically derive summary measures of the posterior distribution, particularly for complex models with many parameters. Rather than attempting the impossible task of analytically solving for estimates of a complex posterior distribution, we can instead draw samples from p(θ|y) and summarize the distribution formed by those samples. This is referred to as Monte Carlo integration. The two most popular methods of MCMC are the Gibbs sampler and the Metropolis-Hastings algorithm.
34 / 57
Formally, a Markov Chain is a sequence of dependent random variables {θs } Introduction Bayes’ Theorem Sample Size Issues
θ0 , θ1 , . . . , θs , . . .
(20)
s
such that the conditional probability of θ given all the past variables depends only on θs−1
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
35 / 57
A property of the Markov chain is that after a long sequence, the chain will “forget” its initial state θ0 and converge to the stationary distribution p(θ|y). The number of iterations prior to stability is referred to as the “burn-in” samples. Let m be the number of burn-in samples. Then the ergodic average of the posterior distribution is given as T X 1 p¯(θ|y) = p(θt |y) (21) T − m t=m+1
Point Summaries of the Posterior Distribution
Introduction
Hypothesis testing begins first by obtaining summaries of relevant distributions.
Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Point Summaries Interval Summaries
Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
36 / 57
The difference between Bayesian and frequentist statistics is that with Bayesian statistics we wish to obtain summaries of the posterior distribution. The expressions for the mean and variance of the posterior distribution come from expressions for the mean and variance of conditional distributions generally. Another common summary measure would be the mode of the posterior distribution – referred to as the maximum a posteriori (MAP) estimate.
Posterior Probability Intervals Introduction Bayes’ Theorem
In addition to point summary measures, it may also be desirable to provide interval summaries of the posterior distribution.
Sample Size Issues MCMC Summarizing the Posterior Distribution Point Summaries Interval Summaries
Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
37 / 57
Recall that the frequentist confidence interval requires twe imagine an infinite number of repeated samples from the population characterized by µ. For any given sample, we can obtain the sample mean x ¯ and then form a 100(1 − α)% confidence interval. The correct frequentist interpretation is that 100(1 − α)% of the confidence intervals formed this way capture the true parameter µ under the null hypothesis. Notice that the probability that the parameter is in the interval is either zero or one.
Posterior Probability Intervals (cont’d) Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Point Summaries
In contrast, the Bayesian framework assumes that a parameter has a probability distribution. Sampling from the posterior distribution of the model parameters, we can obtain its quantiles. From the quantiles, we can directly obtain the probability that a parameter lies within a particular interval.
Interval Summaries
Bayesian Factor Analysis
So, a 95% posterior probability interval would mean that the probability that the parameter lies in the interval is 0.95.
Example Wrap-Up: Some Philosophical Issues
38 / 57
Notice that this is entirely different from the frequentist interpretation, and arguably aligns with common sense.
Bayesian Factor Analysis
We write the confirmatory factor analysis model as Introduction
CFA Model
Bayes’ Theorem
y = α + Λη + ,
Sample Size Issues
(22)
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
Under conventional assumptions we obtain the model expressed in terms of the population covariance matrix Σ as
. Σ = ΛΦΛ0 + Ψ,
(23)
The distinction between the CFA model in equation (22) and exploratory factor analysis typically lies in the number and location of restrictions placed in the factor loading matrix Λ . 39 / 57
Conjugate Priors for Factor Analysis Parameters Let θ norm = {α, Λ} be the set of free model parameters that are assumed to follow a normal distribution and let θ IW = {Φ, Ψ} be the set of free model parameters that are assumed to follow an inverse-Wishart distribution. Thus,
Introduction Bayes’ Theorem Sample Size Issues
.
MCMC
θ norm ∼ N (µ, Ω),
Summarizing the Posterior Distribution
The uniqueness covariance matrix Ψ is assumed to follow an inverse-Wishart distribution. Specifically,
Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
. θ IW ∼ IW (R, δ), Different choices for R and δ will yield different degrees of “informativeness” for the inverse-Wishart distribution.
40 / 57
(24)
(25)
Example
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
We present a small example of Bayesian factor analysis to illustrate the issue of sample size and precise priors. Data come from a sample of 3500 10th grade students who participated in the National Educational Longitudinal Study of 1988. Students were asked to respond to a series of questions that tap into their perceptions of the climate of the school.
Example Wrap-Up: Some Philosophical Issues
A random sample of 100 respondents were obtained to demonstrate Bayesian CFA for sample sample sizes. Smaller sample sizes are possible, but in this case, severe convergence problems were encountered.
41 / 57
A subset of items were chosen: Introduction Bayes’ Theorem Sample Size Issues
1 2 3 4
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
5 6 7 8
GETALONG Students get along well with teachers TCHGOOD The teaching is good TCHINT Teachers are interested in students TCHPRAIS When I work hard on schoolwork, my teachers praise my effort TCHDOWN In class I often feel ”put down” by my teachers STRICT Rules for behavior are strict STUDOWN In school I often feel ”put down” by other students NOTSAFE I don’t feel safe at this school
Example Wrap-Up: Some Philosophical Issues
A 4-category Likert response scale was used from “strongly agree” to “strongly disagree” Exploratory factor analyses suggested a 2-factor solution.
42 / 57
We use “MCMCfactanal” from the “MCMCpack” program within the R programming environment. Introduction Bayes’ Theorem Sample Size Issues
We specify 5000 burn-in iterations and 20,000 post-burn-in iterations and a thinning interval of 20.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
Summary statistics are based on 15,000 draws from the posterior distribution. Using a thinning interval of 20, the trace and ACF plots indicate convergence for the model parameters for the large sample size case. Some convergence problems were noted for the small sample size case.
43 / 57
Introduction
Table: Results of Bayesian CFA: N=3500 and Noninformative Priors.
Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
44 / 57
Parameter
EAP
SD
95% PPI
N=3500 / Noninformative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS
0.81 0.89 0.59
0.02 0.02 0.02
0.76, 0.85 0.85, 0.93 0.55, 0.63
Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE
0.09 0.32 0.30
0.01 0.01 0.02
0.06, 0.13 0.28, 0.35 0.26, 0.35
Introduction
Table: Results of Bayesian CFA: N=100 and Noninformative Priors.
Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
45 / 57
Parameter
EAP
SD
95% PPI
N=100 / Noninformative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS
0.61 1.01 0.67
0.14 0.14 0.14
0.44, 0.98 0.75, 1.30 0.42, 0.94
Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE
0.07 0.27 0.33
0.06 0.11 0.11
0.00, 0.22 0.05, 0.51 0.14, 0.56
Introduction
Table: Results of Bayesian CFA: N=3500 and Informative Priors.
Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
46 / 57
Parameter
EAP
SD
95% PPI
N=3500 / Informative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS
0.81 0.90 0.61
0.02 0.02 0.02
0.77, 0.86 0.86, 0.93 0.56, 0.65
Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE
0.11 0.33 0.32
0.02 0.02 0.02
0.08, 0.15 0.30, 0.38 0.28, 0.36
Introduction
Table: Results of Bayesian CFA: N=100 and Informative Priors.
Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
47 / 57
Parameter
EAP
SD
95% PPI
N=100 / Informative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS
0.87 1.00 0.87
0.10 0.10 0.10
0.69, 1.06 0.81, 1.20 0.68, 1.05
Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE
0.59 0.82 0.84
0.12 0.10 0.10
0.37, 0.83 0.61, 1.03 0.65, 1.04
Wrap-Up: Some Philosophical Issues
Introduction
Bayesian statistics represents a powerful alternative to frequentist (classical) statistics, and is therefore, controversial.
Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
The controversy lies in differing perspectives regarding the nature of probability, and the implications for statistical practice that arise from those perspectives. The frequentist framework views probability as synonymous with long-run frequency, and that the infinitely repeating coin-toss represents the canonical example of the frequentist view. In contrast, the Bayesian viewpoint regarding probability was, perhaps, most succinctly expressed by de Finetti
48 / 57
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
49 / 57
. Probability does not exist. - Bruno de Finetti
That is, probability does not have an objective status, but rather represents the quantification of our experience of uncertainty.
Introduction Bayes’ Theorem
For de Finetti, probability is only to be considered in relation to our subjective experience of uncertainty, and, for de Finetti, uncertainty is all that matters.
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
50 / 57
. “The only relevant thing is uncertainty – the extent of our known knowledge and ignorance. The actual fact that events considered are, in some sense, determined, or known by other people, and so on, is of no consequence.” (pg. xi).
The only requirement then is that our beliefs be coherent, consistent, and have a reasonable relationship to any observable data that might be collected.
Subjective v. Objective Bayes
There are controversies within the Bayesian school between “subjectivists and “objectivists Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
Subjective Bayesian practice attempts to bring prior knowledge directly into an analysis. This prior knowledge represents the analysts (or others) degree-of-uncertainity. An analyst’s degree-of-uncertainty is encoded directly into the specification of the prior distribution, and in particular on the degree of precision around the parameter of interest.
Example Wrap-Up: Some Philosophical Issues
51 / 57
The advantages include 1
Priors can be based on factual prior knowledge
2
Small sample sizes can be handled.
For objectivists, the goal is to have the data speak as much as possible, but to allow priors that serve as objective “referents“. Introduction
Specifically, there are a large class of so-called reference priors (Kass and Wasserman,1996).
Bayes’ Theorem Sample Size Issues MCMC
An important viewpoint regarding the notion of objectivity in the Bayesian context comes from Jaynes (1968).
Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
52 / 57
For Jaynes, the “personalistic” (subjective) school of probability is to be reserved for
. “...the field of psychology and has no place in applied statistics. Or, to state this more constructively, objectivity requires that a statistical analysis should make use, not of anybody’s personal opinions, but rather the specific factual data on which those opinions are based.”(pg. 228)
Evidenced-based Subjective Bayes
Introduction Bayes’ Theorem
The subjectivist school, advocated by de Finetti and others, allows for personal opinion to be elicited and incorporated into a Bayesian analysis. In the extreme, the subjectivist school would place no restriction on the source, reliability, or validity of the elicited opinion.
Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
The objectivist school advocated by Jeffreys, Jaynes, Berger, Bernardo, and others, views personal opinion as the realm of psychology with no place in a statistical analysis. In their extreme form, the objectivist school would require formal rules for choosing reference priors.
Example Wrap-Up: Some Philosophical Issues
The difficulty with these positions lies with the everyday usage of terms such as “subjective” and “belief”. Without careful definitions of these terms, their everyday usage might be misunderstood among those who might otherwise consider adopting the Bayesian perspective.
53 / 57
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis
“Subjectivism” within the Bayesian framework runs the gamut from the elicitation of personal beliefs to making use of the best available historical data available to inform priors. I argue along the lines of Jaynes (1968) – namely that the requirements of science demand reference to “specific, factual data on which those opinions are based” (pg. 228). This view is also consistent with Leamer’s hierarchy of confidence on which priors should be ordered.
Example Wrap-Up: Some Philosophical Issues
54 / 57
We may refer to this view as an evidence-based form of subjective Bayes which acknowledges (1) the subjectivity that lies in the choice of historical data; (2) the encoding of historical data into hyperparameters of the prior distribution; and (3) the choice among competing models to be used to analyze the data.
What if factual historical data are not available? Introduction Bayes’ Theorem Sample Size Issues
Berger (2006) states that reference priors should be used “in scenarios in which a subjective analysis is not tenable”, although such scenarios are probably rare.
MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
The goal, nevertheless, is to shift the practice of Bayesian statistics away from the elicitation of personal opinion (expert or otherwise) which could, in principle, bias results toward a specific outcome, and instead move Bayesian practice toward the warranted use prior objective empirical data for the specification of priors. The specification of any prior should be explicitly warranted against observable, empirical data and available for critique by the relevant scholarly community.
55 / 57
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
56 / 57
To conclude, the Bayesian school of statistical inference is, arguably, superior to the frequentist school as a means of creating and updating new knowledge in the social sciences. An evidence-based focus that ties the specification of priors to objective empirical data provides stronger warrants for conclusions drawn from a Bayesian analysis. In addition, predictive criteria should always be used as a means of testing and choosing among Bayesian models. As always, the full benefit of the Bayesian approach to research in the social sciences will be realized when it is more widely adopted and yields reliable predictions that advance knowledge.
Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues
57 / 57
.
GRAZIE MILLE