Bayesian Inference in Survey Research: Applications to Confirmatory Factor Analysis with Small Sample Sizes

Introduction Bayes’ Theorem Bayesian Inference in Survey Research: Applications to Confirmatory Factor Analysis with Small Sample Sizes Sample Size ...
Author: Eugene Hunt
14 downloads 0 Views 1MB Size
Introduction Bayes’ Theorem

Bayesian Inference in Survey Research: Applications to Confirmatory Factor Analysis with Small Sample Sizes

Sample Size Issues MCMC Summarizing the Posterior Distribution

David Kaplan Department of Educational Psychology

Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

Invited Talk to the Joint Research Center of the European Commission 1 / 57

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

2 / 57

This talk is drawn from my book Bayesian Statistics for the Social Sciences, The Guilford Press, 2014.

Introduction Bayesian statistics has long been overlooked in the quantitative methods training of social scientists. Introduction Bayes’ Theorem Sample Size Issues

Typically, the only introduction that a student might have to Bayesian ideas is a brief overview of Bayes’ theorem while studying probability in an introductory statistics class.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

3 / 57

1

2

Until recently, it was not feasible to conduct statistical modeling from a Bayesian perspective owing to its complexity and lack of availability. Bayesian statistics represents a powerful alternative to frequentist (classical) statistics, and is therefore, controversial.

Recently, there has been a renaissance in the development and application of Bayesian statistical methods, owing mostly to developments of powerful statistical software tools that render the specification and estimation of complex models feasible from a Bayesian perspective.

Paradigm Differences

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

For frequentists, the basic idea is that probability is represented as long run frequency. Frequentist probability underlies the Fisher and Neyman-Pearson schools of statistics – the conventional methods of statistics we most often use. The frequentist formulation rests on the idea of equally probable and stochastically independent events

Example Wrap-Up: Some Philosophical Issues

4 / 57

The physical representation is the coin toss, which relates to the idea of a very large (actually infinite) number of repeated experiments.

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

5 / 57

The entire structure of Neyman - Pearson hypothesis testing and Fisherian statistics (together referred to as the frequentist school) is based on frequentist probability. Our conclusions regarding the null and alternative hypotheses presuppose the idea that we could conduct the same experiment an infinite number of times. Our interpretation of confidence intervals also assumes a fixed parameter and CIs that vary over an infinitely large number of identical experiments.

But there is another view of probability as subjective belief. The physical model in this case is that of the “bet”. Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

Consider the situation of betting on who will win the the World Cup (or the World Series). Here, probability is not based on an infinite number of repeatable and stochastically independent events, but rather on how much knowledge you have and how much you are willing to bet. Subjective probability allows one to address questions such as “what is the probability that my team will win the World Cup?” Relative frequency supplies information, but it is not the same as probability and can be quite different. This notion of subjective probability underlies Bayesian statistics.

6 / 57

Bayes’ Theorem

Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues

Consider the joint probability of two events, Y and X, for example observing lung cancer and smoking jointly. The joint probability can be written as

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

7 / 57

p(cancer, smoking) = p(cancer|smoking)p(smoking)

(1)

Similarly p(smoking, cancer) = p(smoking|cancer)p(cancer)

(2)

Because these are symmetric, we can set them equal to each other to obtain the following

Introduction Bayes’ Theorem

p(cancer|smoking)p(smoking) = p(smoking|cancer)p(cancer) (3)

The Prior Distribution

Sample Size Issues MCMC

.

Summarizing the Posterior Distribution

p(cancer|smoking) =

Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

8 / 57

p(smoking|cancer)p(cancer) p(smoking)

(4)

The inverse probability theorem (Bayes’ theorem) states

. p(smoking|cancer) =

p(cancer|smoking)p(smoking) p(cancer)

(5)

Introduction Bayes’ Theorem

Why do we care?

The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

9 / 57

Because this is how you could go from the probability of having cancer given that the patient smokes, to the probability that the patient smokes given that he/she has cancer. We simply need the marginal probability of smoking and the marginal probability of cancer (”base rates” or what we will call prior probabilities).

Statistical Elements of Bayes’ Theorem

What is the role of Bayes’ theorem for statistical inference? Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

Denote by Y a random variable that takes on a realized value y. For example, a person’s socio-economic status could be considered a random variable taking on a very large set of possible values. This is the random variable Y . Once the person identifies his/her socioeconomic status, the random variable Y is now realized as y.

Example Wrap-Up: Some Philosophical Issues

10 / 57

Because Y is unobserved and random, we need to specify a probability model to explain how we obtained the actual data values y.

Next, denote by θ a parameter that we believe characterizes the probability model of interest. Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

The parameter θ can be a scalar, such as the mean or the variance of a distribution, or it can be vector-valued, such as a set of regression coefficients in regression analysis or factor loadings in factor analysis. We are concerned with determining the probability of observing y given the unknown parameters θ, which we write as p(y|θ). In statistical inference, the goal is to obtain estimates of the unknown parameters given the data. This is expressed as the likelihood of the parameters given the data, often denoted as L(θ|y).

11 / 57

Introduction

The key difference between Bayesian statistical inference and frequentist statistical inference concerns the nature of the unknown parameters θ.

Bayes’ Theorem The Prior Distribution

Sample Size Issues

In the frequentist tradition, the assumption is that θ is unknown, but no attempt is made to account for our uncertainty about θ.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

12 / 57

In Bayesian statistical inference, θ is also unknown but we reflect our uncertainty about the true value of θ by specifying a probability distribution to describe it. Because both the observed data y and the parameters θ are assumed random, we can model the joint probability of the parameters and the data as a function of the conditional distribution of the data given the parameters, and the prior distribution of the parameters.

Introduction

More formally, p(θ, y) = p(y|θ)p(θ).

Bayes’ Theorem The Prior Distribution

Sample Size Issues

(6)

where p(θ, y) is the joint distribution of the parameters and the data. Following Bayes’ theorem described earlier, we obtain

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

Bayes’ Theorem p(θ|y) =

p(θ, y) p(y|θ)p(θ) = , p(y) p(y)

Example Wrap-Up: Some Philosophical Issues

13 / 57

where p(θ|y) is referred to as the posterior distribution of the parameters θ given the observed data y.

(7)

Introduction Bayes’ Theorem The Prior Distribution

From equation (7) the posterior distribution of θ give y is equal to the data distribution p(y|θ) times the prior distribution of the parameters p(θ) normalized by p(y) so that the posterior distribution sums (or integrates) to one.

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

14 / 57

For discrete variables X

p(y|θ)p(θ),

(8)

and for continuous variables Z p(y) = p(y|θ)p(θ)dθ,

(9)

p(y) =

θ

θ

Notice that p(y) does not involve model parameters, so we can omit the term and obtain the unnormalized posterior distribution

Introduction Bayes’ Theorem The Prior Distribution

.

Sample Size Issues

p(θ|y) ∝ p(y|θ)p(θ).

(10)

MCMC

When expressed in terms of the unknown parameters θ for fixed values of y, the term p(y|θ) is the likelihood L(θ|y), which we defined earlier. Thus, equation (10) can be re-written as

Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

15 / 57

. p(θ|y) ∝ L(θ|y)p(θ).

(11)

Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

16 / 57

Equations (10) and (11) represents the core of Bayesian statistical inference and is what separates Bayesian statistics from frequentist statistics. Equation (11) states that our uncertainty regarding the parameters of our model, as expressed by the prior density p(θ), is weighted by the actual data p(y|θ) (or equivalently, L(θ|y)), yielding an updated estimate of our uncertainty, as expressed in the posterior density p(θ|y).

The Prior Distribution Why do we specify a prior distribution on the parameters? Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

The key philosophical reason concerns our view that progress in science generally comes about by learning from previous research findings and incorporating information from these findings into our present studies. The information gleaned from previous research is almost always incorporated into our choice of designs, variables to be measured, or conceptual diagrams to be drawn. Bayesian statistical inference simply requires that our prior beliefs be made explicit, but then moderates our prior beliefs by the actual data in hand. Moderation of our prior beliefs by the data in hand is the key meaning behind equations (10) and (11).

17 / 57

But how do we choose a prior? Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

The choice of a prior is based on how much information we believe we have prior to the data collection and how accurate we believe that information to be. This issue has also been discussed by Leamer (1983), who orders priors on the basis of degree of confidence. Leamer’s hierarchy of confidence is as follow: truths (e.g. axioms) > facts (data) > opinions (e.g. expert judgement) > conventions (e.g. pre-set alpha levels).

Example Wrap-Up: Some Philosophical Issues

18 / 57

The strength of Bayesian inference lies precisely in its ability to incorporate existing knowledge into statistical specifications.

Non-informative priors In some cases we may not be in possession of enough prior information to aid in drawing posterior inferences. Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues

From a Bayesian perspective, this lack of information is still important to consider and incorporate into our statistical specifications.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

In other words, it is equally important to quantify our ignorance as it is to quantify our cumulative understanding of a problem at hand. The standard approach to quantifying our ignorance is to incorporate a non-informative prior into our specification. Non-informative priors are also referred to as vague or diffuse priors.

19 / 57

Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

20 / 57

Perhaps the most sensible non-informative prior distribution to use in this case is the uniform distribution U (α, β) over some sensible range of values from α to β. In this case, the uniform distribution essential indicates that we believe that the value of our parameter of interest lies in range β−α and that all values have equal probability. Care must be taken in the choice of the range of values over the uniform distribution. For example, a U [−∞, ∞] is an improper prior distribution insofar as it does not integrate to 1.0 as required of probability distributions.

Informative-Conjugate Priors

Introduction Bayes’ Theorem The Prior Distribution

Sample Size Issues

It may be the case that some information can be brought to bear on a problem and be systematically incorporated into the prior distribution. Such “subjective” priors are called informative.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

21 / 57

One type of informative prior is based on the notion of a conjugate distribution. A conjugate prior distribution is one that, when combined with the likelihood function yields a posterior that is in the same distributional family as the prior distribution.

Conjugate Priors for Some Common Distributions

Introduction Bayes’ Theorem The Prior Distribution

Data Distribution

Conjugate Prior

The normal distribution

The normal distribution or Uniform Distribution

The Poisson distribution

The gamma distribution

The binomial distribution

The Beta Distribution

The multinomial distribution

The Dirichlet Distribution

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

22 / 57

Density

1.0

prior Likelihood Posterior

0.5

Density 0.5

Bayes’ Theorem

prior Likelihood Posterior

1.0

Introduction

Prior is Normal (0,0.5) 1.5

1.5

Prior is Normal (0,0.3)

−4

−2

MCMC

−2

0

2

Prior is Normal (0,1.2)

Prior is Normal (0,3)

Density

1.0

prior Likelihood Posterior

0.5

Density

1.0

prior Likelihood Posterior

4

0.0

Wrap-Up: Some Philosophical Issues

−4

x

0.5

Example

4

0.0

Bayesian Factor Analysis

2

1.5

Summarizing the Posterior Distribution

0

x

1.5

Sample Size Issues

0.0

0.0

The Prior Distribution

−4

−2

0

2

4

−4

−2

0

2

4

Figure: Normal distribution, mean unknown/variance known with varying conjugate priors 23 / 57

Prior is Gamma(10,0.2)

4

1.0 0.8

2

4

6

8

Prior is Gamma(3,1)

Prior is Gamma(2.1,3)

0.8

Density

0.8

10

prior Likelihood Posterior

1.0

prior Likelihood Posterior

0.2 0.0

0.0

Wrap-Up: Some Philosophical Issues

0

x

0.6

Density

10

0.2

Example

8

0.4

Bayesian Factor Analysis

6

x

1.0

Summarizing the Posterior Distribution

0.6

Density

0.2 0.0 2

0.6

0

MCMC

0.4

Sample Size Issues

prior Likelihood Posterior

0.4

1.0 0.6

Density

0.0

0.2

The Prior Distribution

0.4

Bayes’ Theorem

0.8

Introduction

Prior is Gamma(8,0.5)

prior Likelihood Posterior

0

2

4

6

8

10

0

2

4

6

8

10

Figure: Poisson distribution with varying gamma-density priors 24 / 57

Sample Size in Bayesian Statistics

Introduction Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

In classical statistics, a re-occurring concern is the problem of whether a sample size is large enough to trust estimates and standard errors. This problem stems from the fact that the desirable properties of estimators such as OLS and ML exhibit themselves in the limit as N approaches infinity. The question is whether a finite sample size is large enough for these desirable properties to “kick in”.

Example Wrap-Up: Some Philosophical Issues

25 / 57

Considerable methodological work focuses on examining the effects of small sample sizes on estimators in the context of different modeling frameworks.

Introduction

What is the role of sample size in Bayesian methods?

Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage

Because appeals to asymptotic properties are not part of the Bayesian toolbox, can Bayesian statistics be used for small sample sizes?

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

26 / 57

The answer depends on the interaction of the sample size with the specification of the prior distribution. This, in turn, can be seen from a discussion of the Bayesian Central Limit Theorem.

Bayesian Central Limit Theorem and Bayesian Shrinkage

As an example, assume that y follows a normal distribution written as

Introduction Bayes’ Theorem

.

Sample Size Issues

  1 (y − µ)2 p(y|µ, σ ) = √ exp − . 2σ 2 2πσ 2

Bayesian CLT and Bayes Shrinkage

(12)

MCMC

Specify a normal prior with mean and variance hyperparameters, κ and τ 2 , respectively which for this example are known.

Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

27 / 57

.   1 (µ − κ)2 p(µ|κ, τ ) = √ exp − . 2τ 2 2πτ 2 2

(13)

The posterior distribution can be obtained as

. Introduction

"

Bayes’ Theorem

p(µ|y) ∼ N

Sample Size Issues Bayesian CLT and Bayes Shrinkage

MCMC

κ τ2 1 τ2

# τ 2 σ2 , 2 . σ + nτ 2

(14)

The posterior distribution of µ is normal with mean

µ ˆ=

Bayesian Factor Analysis

κ τ2 1 τ2

+ +

n¯ y σ2 n σ2

,

(15)

and variance

Example

28 / 57

n¯ y σ2 n σ2

.

Summarizing the Posterior Distribution

Wrap-Up: Some Philosophical Issues

+ +

. σˆµ2 =

τ 2 σ2 . σ 2 + nτ 2

(16)

Notice that as the sample size approaches infinity,

. Introduction Bayes’ Theorem Sample Size Issues

n→∞

Bayesian CLT and Bayes Shrinkage

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

29 / 57

n¯ y κ τ 2 + σ2 , n→∞ 12 + n2 τ σ κσ 2 ¯ 2 + y lim nτ = 2 σ n→∞ nτ 2 + 1

lim µ ˆ = lim =

y¯.

(17)

Thus as the sample size increases to infinity, the expected a posteriori estimate µ ˆ converges to the maximum likelihood estimate y¯. With N very large is little information in the prior distribution that is relevant to estimating the mean and variance of the posterior distribution.

In terms of the variance, let 1/τ 2 and n/σ 2 refer to the prior precision and data precision, respectively. Introduction Bayes’ Theorem

Letting n approach infinity, we obtain

.

Sample Size Issues Bayesian CLT and Bayes Shrinkage

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

1 +

lim σˆµ2 = lim

n→∞

n→∞ 12 τ

= lim

n→∞

n σ2

,

σ2 σ2 = , n +n

σ2 τ2

(18)

which we recognize as the maximum likelihood estimator of the variance of the mean; the square root of which yields the standard error of the mean. A similar result emerges if we consider the case where we have very little information regarding the prior precision.

30 / 57

Another interesting result is that the posterior mean µ can be seen as a compromise between the prior mean κ and the observed data mean y¯.

Introduction Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage

Notice that we can rewrite equation (15) as

.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

31 / 57

µ ˆ=

σ2 nτ 2 κ + y¯. σ 2 + nτ 2 σ 2 + nτ 2

(19)

Thus, the posterior mean is a weighted combination of the prior mean and observed data mean. These weights are bounded by 0 and 1 and together are referred to as the shrinkage factor.

Introduction

The shrinkage factor represents the proportional distance that the posterior mean has shrunk back to the prior mean κ and away from the maximum likelihood estimator y¯.

Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage

If the sample size is large, the weight associated with κ will approach zero and the weight associated with y¯ will approach one. Thus µ ˆ will approach y¯.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

32 / 57

Similarly, if the data variance σ 2 is very large relative to the prior variance τ 2 , this suggests little precision in the data relative to the prior and therefore the posterior mean will approach the prior mean, κ. Conversely, if the prior variance is very large relative to the data variance this suggests greater precision in the data compared to the prior and therefore the posterior mean will approach y¯.

Introduction

So, can Bayesian methods be used for very small sample sizes?

Bayes’ Theorem Sample Size Issues Bayesian CLT and Bayes Shrinkage

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

33 / 57

Yes, but the role of priors becomes crucial. If sample size is small, the precision of the priors and the precision of the data matter. If the prior mean is way off with high precision, that is a problem. Elicitation and model comparison is very important. There is no free lunch!

Markov Chain Monte Carlo Sampling

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

The key reason for the increased popularity of Bayesian methods in the social and behavioral sciences has been the (re)-discovery of numerical algorithms for estimating the posterior distribution of the model parameters given the data. Prior to these developments, it was virtually impossible to analytically derive summary measures of the posterior distribution, particularly for complex models with many parameters. Rather than attempting the impossible task of analytically solving for estimates of a complex posterior distribution, we can instead draw samples from p(θ|y) and summarize the distribution formed by those samples. This is referred to as Monte Carlo integration. The two most popular methods of MCMC are the Gibbs sampler and the Metropolis-Hastings algorithm.

34 / 57

Formally, a Markov Chain is a sequence of dependent random variables {θs } Introduction Bayes’ Theorem Sample Size Issues

θ0 , θ1 , . . . , θs , . . .

(20)

s

such that the conditional probability of θ given all the past variables depends only on θs−1

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

35 / 57

A property of the Markov chain is that after a long sequence, the chain will “forget” its initial state θ0 and converge to the stationary distribution p(θ|y). The number of iterations prior to stability is referred to as the “burn-in” samples. Let m be the number of burn-in samples. Then the ergodic average of the posterior distribution is given as T X 1 p¯(θ|y) = p(θt |y) (21) T − m t=m+1

Point Summaries of the Posterior Distribution

Introduction

Hypothesis testing begins first by obtaining summaries of relevant distributions.

Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Point Summaries Interval Summaries

Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

36 / 57

The difference between Bayesian and frequentist statistics is that with Bayesian statistics we wish to obtain summaries of the posterior distribution. The expressions for the mean and variance of the posterior distribution come from expressions for the mean and variance of conditional distributions generally. Another common summary measure would be the mode of the posterior distribution – referred to as the maximum a posteriori (MAP) estimate.

Posterior Probability Intervals Introduction Bayes’ Theorem

In addition to point summary measures, it may also be desirable to provide interval summaries of the posterior distribution.

Sample Size Issues MCMC Summarizing the Posterior Distribution Point Summaries Interval Summaries

Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

37 / 57

Recall that the frequentist confidence interval requires twe imagine an infinite number of repeated samples from the population characterized by µ. For any given sample, we can obtain the sample mean x ¯ and then form a 100(1 − α)% confidence interval. The correct frequentist interpretation is that 100(1 − α)% of the confidence intervals formed this way capture the true parameter µ under the null hypothesis. Notice that the probability that the parameter is in the interval is either zero or one.

Posterior Probability Intervals (cont’d) Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Point Summaries

In contrast, the Bayesian framework assumes that a parameter has a probability distribution. Sampling from the posterior distribution of the model parameters, we can obtain its quantiles. From the quantiles, we can directly obtain the probability that a parameter lies within a particular interval.

Interval Summaries

Bayesian Factor Analysis

So, a 95% posterior probability interval would mean that the probability that the parameter lies in the interval is 0.95.

Example Wrap-Up: Some Philosophical Issues

38 / 57

Notice that this is entirely different from the frequentist interpretation, and arguably aligns with common sense.

Bayesian Factor Analysis

We write the confirmatory factor analysis model as Introduction

CFA Model

Bayes’ Theorem

y = α + Λη + ,

Sample Size Issues

(22)

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

Under conventional assumptions we obtain the model expressed in terms of the population covariance matrix Σ as

. Σ = ΛΦΛ0 + Ψ,

(23)

The distinction between the CFA model in equation (22) and exploratory factor analysis typically lies in the number and location of restrictions placed in the factor loading matrix Λ . 39 / 57

Conjugate Priors for Factor Analysis Parameters Let θ norm = {α, Λ} be the set of free model parameters that are assumed to follow a normal distribution and let θ IW = {Φ, Ψ} be the set of free model parameters that are assumed to follow an inverse-Wishart distribution. Thus,

Introduction Bayes’ Theorem Sample Size Issues

.

MCMC

θ norm ∼ N (µ, Ω),

Summarizing the Posterior Distribution

The uniqueness covariance matrix Ψ is assumed to follow an inverse-Wishart distribution. Specifically,

Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

. θ IW ∼ IW (R, δ), Different choices for R and δ will yield different degrees of “informativeness” for the inverse-Wishart distribution.

40 / 57

(24)

(25)

Example

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

We present a small example of Bayesian factor analysis to illustrate the issue of sample size and precise priors. Data come from a sample of 3500 10th grade students who participated in the National Educational Longitudinal Study of 1988. Students were asked to respond to a series of questions that tap into their perceptions of the climate of the school.

Example Wrap-Up: Some Philosophical Issues

A random sample of 100 respondents were obtained to demonstrate Bayesian CFA for sample sample sizes. Smaller sample sizes are possible, but in this case, severe convergence problems were encountered.

41 / 57

A subset of items were chosen: Introduction Bayes’ Theorem Sample Size Issues

1 2 3 4

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

5 6 7 8

GETALONG Students get along well with teachers TCHGOOD The teaching is good TCHINT Teachers are interested in students TCHPRAIS When I work hard on schoolwork, my teachers praise my effort TCHDOWN In class I often feel ”put down” by my teachers STRICT Rules for behavior are strict STUDOWN In school I often feel ”put down” by other students NOTSAFE I don’t feel safe at this school

Example Wrap-Up: Some Philosophical Issues

A 4-category Likert response scale was used from “strongly agree” to “strongly disagree” Exploratory factor analyses suggested a 2-factor solution.

42 / 57

We use “MCMCfactanal” from the “MCMCpack” program within the R programming environment. Introduction Bayes’ Theorem Sample Size Issues

We specify 5000 burn-in iterations and 20,000 post-burn-in iterations and a thinning interval of 20.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

Summary statistics are based on 15,000 draws from the posterior distribution. Using a thinning interval of 20, the trace and ACF plots indicate convergence for the model parameters for the large sample size case. Some convergence problems were noted for the small sample size case.

43 / 57

Introduction

Table: Results of Bayesian CFA: N=3500 and Noninformative Priors.

Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

44 / 57

Parameter

EAP

SD

95% PPI

N=3500 / Noninformative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS

0.81 0.89 0.59

0.02 0.02 0.02

0.76, 0.85 0.85, 0.93 0.55, 0.63

Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE

0.09 0.32 0.30

0.01 0.01 0.02

0.06, 0.13 0.28, 0.35 0.26, 0.35

Introduction

Table: Results of Bayesian CFA: N=100 and Noninformative Priors.

Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

45 / 57

Parameter

EAP

SD

95% PPI

N=100 / Noninformative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS

0.61 1.01 0.67

0.14 0.14 0.14

0.44, 0.98 0.75, 1.30 0.42, 0.94

Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE

0.07 0.27 0.33

0.06 0.11 0.11

0.00, 0.22 0.05, 0.51 0.14, 0.56

Introduction

Table: Results of Bayesian CFA: N=3500 and Informative Priors.

Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

46 / 57

Parameter

EAP

SD

95% PPI

N=3500 / Informative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS

0.81 0.90 0.61

0.02 0.02 0.02

0.77, 0.86 0.86, 0.93 0.56, 0.65

Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE

0.11 0.33 0.32

0.02 0.02 0.02

0.08, 0.15 0.30, 0.38 0.28, 0.36

Introduction

Table: Results of Bayesian CFA: N=100 and Informative Priors.

Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

47 / 57

Parameter

EAP

SD

95% PPI

N=100 / Informative Prior Loadings: POSCLIM by TCHGOOD TCHERINT TCHPRAIS

0.87 1.00 0.87

0.10 0.10 0.10

0.69, 1.06 0.81, 1.20 0.68, 1.05

Loadings: NEGCLIM by STRICT SPUTDOWN NOTSAFE

0.59 0.82 0.84

0.12 0.10 0.10

0.37, 0.83 0.61, 1.03 0.65, 1.04

Wrap-Up: Some Philosophical Issues

Introduction

Bayesian statistics represents a powerful alternative to frequentist (classical) statistics, and is therefore, controversial.

Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

The controversy lies in differing perspectives regarding the nature of probability, and the implications for statistical practice that arise from those perspectives. The frequentist framework views probability as synonymous with long-run frequency, and that the infinitely repeating coin-toss represents the canonical example of the frequentist view. In contrast, the Bayesian viewpoint regarding probability was, perhaps, most succinctly expressed by de Finetti

48 / 57

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

49 / 57

. Probability does not exist. - Bruno de Finetti

That is, probability does not have an objective status, but rather represents the quantification of our experience of uncertainty.

Introduction Bayes’ Theorem

For de Finetti, probability is only to be considered in relation to our subjective experience of uncertainty, and, for de Finetti, uncertainty is all that matters.

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

50 / 57

. “The only relevant thing is uncertainty – the extent of our known knowledge and ignorance. The actual fact that events considered are, in some sense, determined, or known by other people, and so on, is of no consequence.” (pg. xi).

The only requirement then is that our beliefs be coherent, consistent, and have a reasonable relationship to any observable data that might be collected.

Subjective v. Objective Bayes

There are controversies within the Bayesian school between “subjectivists and “objectivists Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

Subjective Bayesian practice attempts to bring prior knowledge directly into an analysis. This prior knowledge represents the analysts (or others) degree-of-uncertainity. An analyst’s degree-of-uncertainty is encoded directly into the specification of the prior distribution, and in particular on the degree of precision around the parameter of interest.

Example Wrap-Up: Some Philosophical Issues

51 / 57

The advantages include 1

Priors can be based on factual prior knowledge

2

Small sample sizes can be handled.

For objectivists, the goal is to have the data speak as much as possible, but to allow priors that serve as objective “referents“. Introduction

Specifically, there are a large class of so-called reference priors (Kass and Wasserman,1996).

Bayes’ Theorem Sample Size Issues MCMC

An important viewpoint regarding the notion of objectivity in the Bayesian context comes from Jaynes (1968).

Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

52 / 57

For Jaynes, the “personalistic” (subjective) school of probability is to be reserved for

. “...the field of psychology and has no place in applied statistics. Or, to state this more constructively, objectivity requires that a statistical analysis should make use, not of anybody’s personal opinions, but rather the specific factual data on which those opinions are based.”(pg. 228)

Evidenced-based Subjective Bayes

Introduction Bayes’ Theorem

The subjectivist school, advocated by de Finetti and others, allows for personal opinion to be elicited and incorporated into a Bayesian analysis. In the extreme, the subjectivist school would place no restriction on the source, reliability, or validity of the elicited opinion.

Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

The objectivist school advocated by Jeffreys, Jaynes, Berger, Bernardo, and others, views personal opinion as the realm of psychology with no place in a statistical analysis. In their extreme form, the objectivist school would require formal rules for choosing reference priors.

Example Wrap-Up: Some Philosophical Issues

The difficulty with these positions lies with the everyday usage of terms such as “subjective” and “belief”. Without careful definitions of these terms, their everyday usage might be misunderstood among those who might otherwise consider adopting the Bayesian perspective.

53 / 57

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis

“Subjectivism” within the Bayesian framework runs the gamut from the elicitation of personal beliefs to making use of the best available historical data available to inform priors. I argue along the lines of Jaynes (1968) – namely that the requirements of science demand reference to “specific, factual data on which those opinions are based” (pg. 228). This view is also consistent with Leamer’s hierarchy of confidence on which priors should be ordered.

Example Wrap-Up: Some Philosophical Issues

54 / 57

We may refer to this view as an evidence-based form of subjective Bayes which acknowledges (1) the subjectivity that lies in the choice of historical data; (2) the encoding of historical data into hyperparameters of the prior distribution; and (3) the choice among competing models to be used to analyze the data.

What if factual historical data are not available? Introduction Bayes’ Theorem Sample Size Issues

Berger (2006) states that reference priors should be used “in scenarios in which a subjective analysis is not tenable”, although such scenarios are probably rare.

MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

The goal, nevertheless, is to shift the practice of Bayesian statistics away from the elicitation of personal opinion (expert or otherwise) which could, in principle, bias results toward a specific outcome, and instead move Bayesian practice toward the warranted use prior objective empirical data for the specification of priors. The specification of any prior should be explicitly warranted against observable, empirical data and available for critique by the relevant scholarly community.

55 / 57

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

56 / 57

To conclude, the Bayesian school of statistical inference is, arguably, superior to the frequentist school as a means of creating and updating new knowledge in the social sciences. An evidence-based focus that ties the specification of priors to objective empirical data provides stronger warrants for conclusions drawn from a Bayesian analysis. In addition, predictive criteria should always be used as a means of testing and choosing among Bayesian models. As always, the full benefit of the Bayesian approach to research in the social sciences will be realized when it is more widely adopted and yields reliable predictions that advance knowledge.

Introduction Bayes’ Theorem Sample Size Issues MCMC Summarizing the Posterior Distribution Bayesian Factor Analysis Example Wrap-Up: Some Philosophical Issues

57 / 57

.

GRAZIE MILLE

Suggest Documents