A Hierarchical Bayesian Arms Race Model

A Hierarchical Bayesian Arms Race Model Christopher K. Wikle∗ September 2, 2010 Abstract There is a long history in the social sciences of building m...
Author: Erica Rich
1 downloads 0 Views 169KB Size
A Hierarchical Bayesian Arms Race Model Christopher K. Wikle∗ September 2, 2010

Abstract There is a long history in the social sciences of building models to describe processes related to arms races. These include both theoretical models, such as the actionreaction Richardson model, and empirical models. In addition, there is a substantial literature on estimating parameters associated with these models given observations of military expenditures, or various proxies. Such methods should account for uncertainties in data, process understanding, and parameters. Relatively recent advances in hierarchical formulations for such models can be utilized to account for these sources of uncertainty and can incorporate scientific judgement in a probabilistically consistent manner. For example, the deterministic Richardson model can be used to motivate the structure of a hierarchical statistical model for a bilateral arms race. The purpose of this paper is to illustrate such an approach and apply it to previously studied arms races between Greece and Turkey, as well as between India and Pakistan.

Keywords: vector autoregressive, Markov chain Monte Carlo, state-space

1

Introduction

The study of arms races is of interest across a broad range of social sciences. One of the classic models for these conflicts is Richardson’s (1960). His model is a coupled system of differential equations, and is one of the most famous differential equation models in the social sciences (Brown, 2007, p.60). The essence of this system (described in detail below) is that a country reacts to increased military spending by its rivals by increasing its own military spending. Such spending, however, is held in check to some degree by the fact that it is burdensome economically. Finally, there is a component to military spending decisions that has to do with the “grievances and ambitions” of each country. These models have ∗

Department of Statistics, University of Missouri; Columbia, MO 65211 USA; [email protected]

1

some theoretical properties that allow for analytical and empirical study (e.g., see Brown 2007; Dunne et al. 2003). However, the use of theoretical arms race models on “realworld” military spending data has shown mixed results (e.g., Sandler and Hartley 1995). For example, Dunne et al. (2003) consider the application of stochastic versions of the Richardson model to the Greece-Turkey and to the India-Pakistan arms races. They find that the Richardson model is much more applicable to the India-Pakistan scenario than to the Greece-Turkey scenario. Possible reasons for these differences have to do with the fact that the basic theoretical Richardson model is just too simple for the realities of some arms races; it leaves out other processes and interactions that could be playing a role; e.g., the lack of a budget constraint in the model can lead to biased results (Sandler and Hartley, 1995; Dunne et al. 2003). Another possible issue is that the data on military expenditures is too uncertain to provide much in the way of a “signal”. That is, the data suffer from sampling variation, measurement errors, biases, and possible deliberate misinformation and manipulation by the countries in question. Given the uncertainties in process model and data discussed above, it is clear that arms race models are inherently statistical (e.g., Schrodt, 1978). A consequence of this is that we cannot expect to be able to predict the process (arms spending) directly, but must instead focus on prediction of the aggregate statistical properties (i.e., distributions). By utilizing a statistical approach that attempts to account for these uncertainties, such distributions, at the very least, can show how uncertain we are about the process, and at best, can provide realistic measures of prediction error. In practice, prediction (forecasting) is not the primary purpose of these models. Rather, practitioners typically seek to perform inference on the model parameters to evaluate the possible influences on arms expenditures. Such inference is closely tied to the proper characterization of uncertainty in data, process and parameters. In particular, in situations where data quantity and quality are limited, standard asymptotic results for parameter uncertainties are often not applicable. Traditionally, econometric-based statistical implementations of the Richardson model have focused on time discretizations of a two-country differential equation system, with an additive error process, leading to a vector autoregressive (VAR) structure with fixed 2

(but unknown) parameters that are then estimated by the available data (e.g.,Smith et al. 2000; Dunne et al. 2003). These models are very valuable for studying relationships between two countries, such as the presence of “causality” and “cointegration,” as well as model “stability.” However, most implementations of such models have not included the data uncertainty explicitly; that is, they do not factor in that the true process cannot actually be observed without error. Econometricians and time-series analysts reconcile this by formulating a “state-space” model that contains an explicit observation equation of the data conditioned on the true process. A process model is then specified assuming that if one knows the distribution of the process at the current time given the previous time(s), all other earlier times provide no additional information. An example of this so-called Markovian assumption on the true (but hidden) process is the VAR model mentioned above (e.g., for an overview of state-space time series models, see West and Harrison 1997; Shumway and Stoffer 2006). State-space models with Markovian formulations of the process are ideal for modeling dynamical systems such as an arms race because data come at discrete times (e.g., yearly) and the parameters that describe the dynamic evolution correspond (more or less) to the parameters of the original differential equations. As mentioned above, these parameters are typically estimated from the data. However, such estimation does not necessarily account for the uncertainty in the model specification and the possibility for additional structure or processes affecting the parameters over time. Recent (Bayesian) statistical formulations of hierarchical models, in which the parameters in the observation and process models are assumed to be random processes in their own right, have shown to be much more flexible in the context of merging relatively simple (and, thus incomplete) deterministic process models with data (e.g., Royle et al. 1999; Wikle et al. 2001; Wikle 2003a; Hooten and Wikle 2008). In general, it is seldom the case in science that we know the true dynamical model for a given process, and the associated observations have varying degrees of uncertainty. However, it is possible that inexact but useful scientific theory, such as suggested by deterministic differential equation models, can be incorporated into the conditional framework 3

described above (e.g., see the review in Cressie and Wikle, 2011, Chap. 7). That is, the deterministic model may reflect our prior scientific understanding of the dynamical propagation (often analytical), but we doubt that the model, by itself, can adequately describe the true process. Thus, we reconfigure the deterministic model into a stochastic framework, formally making it part of our prior assumptions about the process; specifically, we use the deterministic model to motivate a series of conditional models. The important point is that within the hierarchical framework, one accounts for the uncertainty associated with this knowledge (the deterministic model), so that the final inference on the process and associated parameters accounts for uncertainties not only in data, but also in theory, model specification, and parameters. This hierarchical framework is typically best considered from a Bayesian perspective, at least when the parameters are assumed to follow random processes (e.g., see the discussion in Cressie et al. 2009). Such models are not new to the social sciences (e.g., see the overview in Gelman and Hill 2007), however, they remain underutilized. The purpose of this paper is to provide a simple illustration of how the hierarchical Bayesian methodology can be applied to a relatively simple deterministic system of equations of historical interest in the social sciences – the arms race. In particular, the methodology is illustrated by applying a system of differential equations motivated by the Richardson arms race model to data associated with Greece-Turkey and India-Pakistan military expenditure data. This example is not meant to be a comprehensive study of these arms races, but rather, a simple illustration of the hierarchical Bayesian methodology on a common system of practical interest. Section 2 provides a summary of the data used in the analysis, the general background of the Richardson arms race model, and a brief overview on hierarchical Bayesian models. The specific Bayesian hierarchical arms race model considered here is given in Section 3, followed by its application to data in Section 4. Section 5 concludes the paper with a brief discussion.

4

2

Background

This section provides background on the data used in this study, the Richardson arms race model, and the hierarchical Bayesian modeling methodology.

2.1

Data

Military expenditure data for Greece, Turkey, Pakistan and India are available from the Stockholm International Peace Research Institute (SIPRI) Military Expenditure Database (http://www.sipri.org/databases/milex) for the years 1988 - 2008. The data used are in U.S. dollars, based on constant prices and exchange rates from 2005. Figure 1 shows plots of the Greece/Turkey and the India/Pakistan expenditure data, respectively. These plots suggest that each country has a general increasing trend in military expenditures. However, Turkey’s expenditures peaked in 1999 and have been steadily declining since. In addition, although each series shows yearly variation, the Pakistan series appears to have the least variability on an inter-annual time scale. The inter-relationship between these pairs of time series is not clear from visual examination.

2.2

The Richardson Model

The Richardson (1960) action-reaction arms race model is well-known in the literature. Assume that Yi (t), i = 1, 2, corresponds to the military (arms) spending for the i-th country at time t. Richardson proposed the following system of differential equations to describe the time rate of change in such spending, dY1 = α1 + β1 Y2 + γ1 Y1 , dt dY2 = α2 + β2 Y1 + γ2 Y2 , dt

(1)

where the explicit time dependence of the Yi has been suppressed for notational simplicity. In this case, the β terms indicate that each country’s spending is related to the level of spending of the other country (the so-called “reaction” terms); the γ terms indicate that there is a potential inhibition (if γi < 0) or increase (γi > 0) of spending based on eco5

nomic burden or prosperity, respectively (the so-called “fatigue terms”); and, the α terms correspond to “grievances” and “ambitions” attributed to each country or, rather, its leaders.

2.3

Hierarchical Bayesian Models

Hierarchical models are derived from a basic fact of probability that states that the joint distribution of a collection of random variables can be decomposed into a series of conditional models (for general overviews, see Wikle 2003b; Cressie et al. 2009). For example, if A, B, C are random variables, then the joint distribution [A, B, C] can be factored as [A, B, C] = [A | B, C][B | C][C], where the brackets around the random variable (say, z), [z], represent a short-hand notation to denote the probability distribution of z; similarly, [z | y] corresponds to the conditional distribution of z given y. This basic decomposition is the essence of hierarchical modeling. For example, assume one is interested in a time series that cannot be observed without error, and one has associated observations and parameters that describe models for both the process and the data. The joint distribution of all of these may be very difficult (if not impossible) to obtain directly. However, if we recognize that the joint distribution can be represented in terms of the product of relatively simple conditional (and a marginal) distribution of quantities for which we have some knowledge, then the joint distribution can be obtained by construction. For modeling complicated processes in the presence of data, Berliner (1996) showed that it is conceptually helpful if we write the hierarchical model for the joint distribution of data (e.g., represented by A above), process (e.g., B above) and parameters (e.g., C above) in three primary stages: Stage 1.

Data model: [data | process, data parameters]

(2)

Stage 2.

Process model: [process | process parameters]

(3)

Stage 3.

Parameter model: [data and process parameters].

(4)

In this case, the purpose is just to address the complexity of the problem by decomposing it into tractable subproblems. The first stage gives the distribution of the data (e.g., military expenditures) given the process of interest (e.g., the true military arms spending) and the 6

parameters that make up the data model. The second stage then assigns a model for the process (e.g., true arms spending) given the parameters that control the process. In our case, this might correspond to a discretized version of the Richardson model, conditional on the α, β, and γ parameters, as well as other parameters that describe the process error. The last stage then describes distributions for the parameters that make up the data and process models. Of particular interest might be the variances of observation errors in the data model as well as the possibility that the process parameters are themselves time-varying and/or dependent on exogenous variables. In practice, each of these stages can be decomposed into substages if necessary (e.g., see Wikle et al. 1998). Ultimately, we are interested in the distribution of the process and parameters updated by the data. This “posterior” distribution can be obtained by Bayes’ Rule, in which [process, parameters | data] ∝ [data | process, parameters] × [process | parameters][parameters].

(5)

This formula provides the basis of Bayesian hierarchical modeling. Although this formula looks simple, in practice the component distributions on the right-hand side of (5) can be quite complicated, such that the one cannot obtain analytically the normalizing constant that turns the product of these component distributions into a valid probability distribution. Thus, as is typically the case, one cannot obtain a closed form for the posterior distribution. Fortunately, with the advent of Markov chain Monte Carlo (MCMC) methods in Bayesian statistics (e.g., Gelfand and Smith 1990), one can obtain Monte Carlo samples from this distribution in a relatively simple, yet computationally intense, manner (e.g., see the overviews in Robert and Casella 2004; Gelman et al. 2004). Indeed, the development of these computational methods have made it feasible to implement very complicated hierarchical models across a wide variety of scientific disciplines. Of course, it remains to specify the distributions on the right-hand side of (5) explicitly. Examples will be provided in the Richardson arms race context in Section 3.

7

3

Hierarchical Modeling of an Arms Race

The purpose of this section is to develop a model that can be applied to arms race expenditures. We describe the data, process and parameter models in turn.

3.1

Data Model

Let Zi (t) correspond to the SIPRI military expenditure data for country i = 1, 2, and year t = 1, . . . , T , where year 1 corresponds to 1988 and year T corresponds to 2008. We define the vector of observations Zt = (Z1 (t), Z2 (t))0 for time t, and the associated vector of “true” expenditures as Yt = (Y1 (t), Y2 (t))0 . Note, the prime in these formulae represents a vector (or matrix) transpose operation. Our data model is then given in vector notation as follows: Zt = µt + Yt + t , t ∼ N (0, R),

(6)

for t = 1, . . . , T , where µt = (µ1 (t), µ2 (t))0 corresponds to a vector of means (or offsets) for each time series, t = (1 (t), 2 (t))0 corresponds to a vector of observation errors that are normally distributed with mean zero and variance-covariance matrix R, and N (a, C) represents a multivariate normal distribution with mean a and variance/covariance matrix C. These errors are assumed to be independent in time and independent of the process Yt and the mean µt . For reasons mentioned below in Section 3.2, we assume µt = 0 for all t in this application. In this case, the data model can be written in distributional form as Zt | Yt , R ∼ indep. N (Yt , R),

(7)

for t = 1, . . . , T . This observational model could be made much more general to accommodate transformations, time-varying errors, and/or errors that are correlated across time or with the process. Such models are beyond the scope of this paper (see Cressie and Wikle 2011, Chap. 7, for more detail). We note that researchers often consider the natural logarithm of the military expenditure data before fitting it to arms race models (e.g., Dunne et al. 2005). Although this is likely to make the observations more Gaussian (i.e., normally distributed) and is reasonable, it also 8

can make it more difficult to interpret the parameters relative to the original model. For that reason, and because the conditional normality assumption in the hierarchical framework is not unreasonable, we consider the observations on their original scale in this illustrative example.

3.2

Process Model

We consider the deterministic Richardson differential equation model (1) as motivation for our stochastic process model. Similar to Smith et al. (2000) and Dunne et al. (2003), we can use a finite-difference approximation to discretize the Richardson model in the following manner: Y1 (t + δt ) ≈ Y1 (t) + δt (α1 + β1 Y2 (t) + γ1 Y1 (t)) Y2 (t + δt ) ≈ Y2 (t) + δt (α2 + β2 Y1 (t) + γ2 Y2 (t)),

(8)

where δt corresponds to the discretization interval. In this application, we will assume for simplicity that δt = 1 (e.g., one year). As Dunn et al. (2005) discuss, one could discretize this system such that each country is conditioned on the other at the same time (e.g., time t) rather than the lagged (time t − 1) value shown here. The lagged version that we employ here fits into the traditional vector autoregressive (VAR) model framework common in Econometrics and Statistics. In that case, we can rewrite the model in the following vector/matrix form: Yt = α + MYt−1 + η t , for t = 1, . . . , T , where α = (α1 , α2 )0 and M is a 2 × 2 matrix given by:     m11 m12 1 + γ1 β1 M≡ = . m21 m22 β2 1 + γ2

(9)

(10)

Thus, inference on the elements of the matrix given by (10) will provide inference on the parameters that are interpretable in the context of the Richardson model. Furthermore, η t is an additive error process used to account for the discretization error, as well as the additive effects of unresolved process components. We assume that this error process is 9

independent in time and independent of Yt , and that it has a normal distribution with mean zero and variance/covariance matrix Q. That is η t ∼ indep. N (0, Q), t = 1, . . . , T . Note, the “grievance/ambition” parameters {α1 , α2 } act as the mean of the process in this case, thus eliminating the need for a separate mean term in the data model stage. Finally, we can write the conditional process distribution as: Yt |α, Yt−1 , M, Q ∼ indep. N (α + MYt−1 , Q),

(11)

for time t = 1, . . . , T . It is clear from this formulation that we must specify an initial con˜ 0 , Σ0 ), dition, Y0 . We typically specify a prior distribution for this as well, e.g., Y0 ∼ N (Y ˜ 0 and Σ0 are typically specified. where the parameters Y

3.3

Parameter Models

We now specify distributions for the parameters in the data and process models. The specification of prior information at this stage is, essentially, what places a hierarchical model in the Bayesian framework. In some cases, we may have additional information that can be used to inform these prior distributions. The parameters that control these prior distributions may be modeled at yet a lower stage of the model hierarchy (typically, if more information is available), or we may just specify these parameters. In the case of the latter, with little prior information available, we typically specify relatively non-informative distributions (e.g., large variances). Critically, we should test the sensitivity of our posterior distributions of the process and parameters of interest to the choice of these so-called hyperparameters. For more details on prior distribution selection, see Gelman et al. (2004). 3.3.1

Data Model Parameters

First, we make an assumption that conditional on the true process the observation errors are independent. That is,  R=

σ21 0 0 σ22

 ,

(12)

and we then put inverse gamma distributions on the variance parameters σ21 and σ22 that are associated with country 1 and 2, respectively. In particular, we assign these inverse-gamma 10

prior distributions as is typical in Bayesian analysis (e.g., Gelman et al. 2004): σ21 ∼ IG(q1 , r1 ),

σ22 ∼ IG(q2 , r2 ),

(13)

where q1 , r1 , q2 , r2 are hyperparameters that we specify as given in Table 1. 3.3.2

Process Model Parameters

Recall that our process model for Yt includes parameters α, M, and Q. We specify a normal prior distribution for the α vector: α ∼ N (α0 , Σα ),

(14)

where we assume the 2 × 1 vector α0 , and 2 × 2 matrix Σα are fixed and specify them as given in Table 1. Without additional information regarding the dynamic model parameters M and Q, we assign them general prior distribution forms that facilitate computation. In this case, we vectorize the parameters in M and give them normal distribution priors. That is, m ∼ N (m0 , Σm ),

(15)

where m = (m11 , m21 , m12 , m22 )0 and m0 is a 4 × 1 vector and Σm a 4 × 4 matrix of mean and variance parameters, respectively, that we assume to be fixed and specify in Table 1. Similarly, the prior distribution for the variance/covariance matrix Q is given by an inverse-Wischart distribution (e.g., see Gelman et al. 2004). Equivalently, the inverse of the matrix Q is given a Wischart distribution, i.e.: Q−1 ∼ W ischart((νq Cq )−1 , νq ),

(16)

where νq is a scalar (degrees of freedom) and Cq a 2 × 2 prior covariance matrix, both of which we assume to be fixed and specified as in Table 1. In general, as with the data model parameters, the parameters that make up this portion of the model can certainly be given more complicated prior distributions, which then depend on parameters that have their own distributions, etc. Such specification should depend on one’s scientific knowledge and/or the amount and type of data that are available. 11

3.4

Bayesian Computation

Ultimately, we are interested in the posterior distribution of all of the unknowns (processes and parameters) given the observations. Formally, the posterior distribution in this case can be written proportionally to the product of the hierarchical model stages given above: [Y0 , Y1 , . . . , YT , M, Q, α, σ21 , σ22 |Z1 , . . . , ZT ]

T Y ∝ [Zt |Yt , σ21 , σ22 ][Yt |Yt−1 , α, M, Q] t=1

× [Y0 ][M][Q][α][σ21 ][σ22 ].

(17)

Thus, in principle, this seems fairly simple since we just multiply all of the hierarchical model stages together to get the right-hand side of (17). However, as discussed above, the critical component is that this product is only proportional to the posterior distribution. To find the normalizing constant, one must integrate out all of the random quantities on the right hand side. Such a high-dimensional integration is typically not possible analytically. However, we can use Markov chain Monte Carlo (MCMC) methods to obtain Monte Carlo samples from the posterior distribution (e.g., see Robert and Casella 2004 for a comprehensive overview.) The details for such calculations are somewhat tedious but also fairly standard in the literature now. Thus, we omit the details here but refer the interested reader to one of the many textbooks that describe modern Bayesian computation. In particular, Shumway and Stoffer (2006, Chap. 6) and Cressie and Wikle (2011, Chap. 8) describes the algorithm for a model similar to the one presented here. The point is that the MCMC algorithms give samples from the distribution of the process and parameters given the data, and we can perform inference directly on these samples from the posterior distributions, as shown in Section 4. The reader who wishes to implement these models directly, without programming their own MCMC algorithm, can find code in the freely available “R” statistical computing language (e.g., http://cran.r-projct.org). In particular, the R-packages dlm (Petris, 2010) and MSBVAR (Brandt, 2009) allow one to implement Bayesian dynamic linear state-space models of the type described here. The use of these packages requires a basic understanding of the “R” statistical software as well as the basics of vector autogres12

sive and state-space models.

4

Results

We now consider the model from Section 3 with the SIPRI military spending data for two separate cases: Greece-Turkey and India-Pakistan. In both cases, we seek the posterior distribution of the α, β and γ parameters corresponding to the discretized Richardson arms race model given the data. We select relatively non-informative hyperparameters for the parameter models, as given in Table 1. Note that since we do not really know what the observation model variance distribution should be, in this example we select a prior mean for these variances (σ2i ) to be equal to one-half the mean of the observations for each country. Sensitivity analysis showed that the results presented here for the parameters of interest were not sensitive to the choices of the hyperparameters used in the analysis.

4.1

Inference on Model Parameters

The summaries of the posterior distributions of the model parameters for the Greece-Turkey and India-Pakistan cases are given in Table 2 and Table 3, respectively. In particular, the posterior mean, standard deviation, 2.5%-tile, and 97.5%-tile are shown. Note that the interval given by the 2.5%-tile and 97.5%-tile is known as the 95% credible interval, which is the Bayesian analog to the 95% confidence interval (e.g., see Gelman et al. 2004 for details). First, we note that the α (“grievance/ambition”) parameters have posterior 95% credible intervals that suggest positive values for Greece, Turkey, and India, but the interval for Pakistan covers (i.e., includes) zero, suggesting that there is too much uncertainty to declare that this component of the model is significant with regard to Pakistan’s arms expenditures. This is most likely a result of the fact that the data considered here have a high degree of uncertainty. This is also verified by the relatively wide distributions for the measurement error posterior distributions. The primary focus here is with the parameters that make up the transition matrix, M,

13

given by equation (10), and their corresponding Richardson model interpretation. Specifically, we consider the posterior distributions for the β parameters and the implied γ parameters. Figure 2 shows histograms of samples from the posterior distribution for each of these parameters. In the context of the Greece-Turkey analysis, each of the parameters βi , γi , i = 1, 2, have 95% credible intervals that cover zero. However, there does seem to be a preference for each of these parameters to be negative (see histogram plots), especially β1 and γ2 . That is, the posterior mean, median and mode, and most of the posterior mass, correspond to negative values of the parameters. Furthermore, the posterior summary of the covariance matrix, Q, suggests that on average, there is positive dependence between the military spending of the two countries (in some unspecified way). However, we note that although these covariance parameters (i.e., Q(1, 2), Q(2, 1)) tend to be positive, the posterior credible interval does cover zero, suggesting that there is too much uncertainty to say (statistically) that there is dependence between these military spending processes. Similarly, the posterior distributions of the dynamic parameters (β’s, γ’s) for the IndiaPakistan data all cover zero as well. However, in this case, the posterior mean (and medians/modes) for β2 and γ1 are positive. Again, β1 and γ2 have posterior means, medians and modes that are negative. As with the Greece-Turkey case, β1 and γ2 are shifted such that much more of their posterior mass is associated with parameters that are less than zero. In addition, the correlation parameters have 95% credible intervals that cover zero in this case and, unlike the Greece-Turkey case, are only moderately supportive of a preference to positive dependence.

5

Discussion

In summary, although there is too much posterior uncertainty to declare either the GreeceTurkey or India-Pakistan models “significant” relative to their posterior 95% credible intervals, there are intriguing similarities and differences between the two analyses. In particular, in both cases, β1 and γ2 appear to be the “most significant” (in the sense that more of

14

the posterior mass is shifted away from zero), and these parameters are shifted negatively in each case. Thus, there is a tendency for Greece’s (India’s) arms race spending to be negatively related to the spending of Turkey (Pakistan), and Turkey’s (Pakistan’s) spending tends to be negatively related to how much it spent the previous year. Similarly, given that the posterior means, medians, and modes of β2 and γ1 are negative for Greece-Turkey and positive for India-Pakistan, there is a tendency for a different interpretation between the two arms races. Specifically, Turkey’s (Pakistan’s) arms expenditures tend to be negatively (positively) related to Greece’s (India’s), and Greece’s (India’s) arms expenditures tend to be negatively (positively) related to the previous year’s spending. Although the data, process, and parameter uncertainty preclude formal declaration of statistical significance, this example illustrates the advantage of having the entire posterior distribution available, as opposed to just the interval summaries. In particular, one can make qualitative judgments as to the influence of the parameters relative to the distribution of their probability mass over the parameter space. In fact, one can make formal probability statements in this regard as well (e.g., see Gelman et al. 2004). Indeed, this is the primary advantage of the Bayesian paradigm – one can get posterior distributions without reliance on asymptotic theory (which surely would not be appropriate for the small sample sizes available here). Although this example does illustrate how one can specify various sources of uncertainty in data, process and parameters within the hierarchical framework, it does not really illustrate the true power of the hierarchical approach. That power comes from the inclusion of complicated model structures on the process errors and parameters. For example, the β and γ parameters could have been made to be dependent in time in this example without too much additional effort. This would be particularly important if one believed there was a need for a change-point in time and/or if exogenous variables were thought to influence the parameters (as opposed to the process directly). Such approaches are discussed, for example, in Wikle (2003b), Cressie et al. (2009), and Cressie and Wikle (2011). In conclusion, this example is meant to be a simple illustration of how one can place deterministic, scientifically motivated models in a statistical framework that accommodates 15

various sources of uncertainty. Simple interpretations are given to the resultant posterior distributions to show how one can take advantage of the full-distributional inference made possible by this approach. This illustration is not meant to be a comprehensive analysis of these particular arms races. The referenced literature contains much more details about those. However, the modeling ideas presented here could be used for detailed studies of arms races as well as other processes of interest in the social sciences.

References Berliner, L.M. (1996) Hierarchical Bayesian time-series models. In Maximum Entropy and Bayesian Methods, 15–22. Kluwer Academic Publishers, Dordrecht. Brandt, P.T. (2009) The MSBVAR package. http://cran.r-project.org/ Brown, C. (2007) Differential Equations: A Modeling Approach. Sage Publications, Los Angeles. Cressie, N. and Wikle, C.K. (2011) Statistics for Spatio-Temporal Data. John Wiley & Sons, Hobokon. Cressie, N., Calder, C.A., Clark, J.S., VerHoef, J.M., and Wikle, C.K. (2009) Accounting for uncertainty in ecological analysis: The strengths and limitations of hierarchical statistical modeling (with discussion). Ecological Applications, 19:553–570. Dunne, J. P., Nikolaidou, E. and Smith, R. (2003) Arms race models and econometric applications. In Arms Trade, Security and Conflict, P. Levine and R. Smith, eds., 178187. Routledge. Dunne, J. P., Nikolaidou, E. and Smith, R. (2005) Is there an Arms Race between Greece and Turkey? Peace Economics, Peace Science and Public Policy, 11:1–35. Gelfand, A.E., and Smith, A.M.F. (1990) Sampling based approaches to calculating marginal densities. Journal of the American Statistical Association, 85:398–409. Gelman, A., Carlin, J.B., Stern, H.S., and D.B. Rubin, (2004) Bayesian Data Analysis: Second Edition. Chapman & Hall, CRC, Boca Raton. Gelman, A. and Hill, J. (2007) Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge. Hooten, M.B., and Wikle, C.K. (2008) A hierarchical Bayesian non-linear spatio-temporal model for the spread of invasive species with application to the Eurasian Collared-Dove. Environmental and Ecological Statistics, 15:59–70. Petris, G. (2010) The dlm package. http://cran.r-project.org/ 16

Richardson, L.F. (1960) Arms and Insecurity. Quadrangle Books, Chicago. Robert, C.P. and Casella, G. (2004) Monte Carlo Statistical Methods: Second Edition. Springer, New York. Royle, J.A., Berliner, L.M., Wikle, C.K. and Milliff, R. (1999) A hierarchical spatial model for constructing wind fields from scatterometer data in the Labrador Sea. In Case Studies in Bayesian Statistics IV, G. Gatsonis, B. Carlin, A. Gelman, M. West, R. E. Kass, A. Carrinquiry, and I. Verdinelli, eds., 367–382. Springer-Verlag, New York. Sandler, T. and Hartley, K. (1995) The Economics of Defense. Cambridge University Press, Cambridge. Schrodt, P. (1978) Statistical problems associated with the Richardson arms race model. Conflict Management and Peace Studies, 3:159–172. Shumway, R. H. and Stoffer, D.S. (2006) Time Series Analysis and Its Applications With R Examples, Second Edition. Springer, New York. Smith, R.P., Dunne, J.P., and Nikolaidou, E. (2000) The econometrics of arms races. Defense and Peace Economics, 11:31–43. West, M. and Harrison, J. (1997) Bayesian Forecasting and Dynamic Models, 2nd ed. Springer-Verlag, New York. Wikle, C.K. (2003a). Hierarchical Bayesian models for predicting the spread of ecological processes. Ecology, 84:1382–1394. Wikle, C.K. (2003b) Hierarchical models in environmental science. International Statistical Review, 71:181–199. Wikle, C.K., Berliner, L.M., and Cressie, N. (1998) Hierarchical Bayesian space-time models. Environmental and Ecological Statistics, 5:117–154. Wikle, C.K., Milliff, R., Nychka, D. and Berliner, L.M. (2001) Spatiotemporal hierarchical Bayesian modeling: tropical ocean surface winds, Journal of the American Statistical Association, 96:382–397.

17

Parameters Prior Distribution σ21 (Greece) IG(q1 , r1 ) 2 IG(q2 , r2 ) σ2 (Turkey) 2 IG(q1 , r1 ) σ1 (India) IG(q2 , r2 ) σ22 (Pakistan) α N (α0 , Σα ) m m ∼ N (m0 , Σm ) Q Q−1 ∼ W ishart((νq Cq )−1 , νq )

q1 , r1 : q2 , r2 : q1 , r1 : q2 , r2 :

Prior Hyperparameters mean = 3.74 × 103 , var=3.74 × 106 mean = 6.26 × 103 , var=6.26 × 106 mean = 8.15 × 103 , var=8.15 × 106 mean = 1.80 × 103 , var=1.80 × 106 α0 = [0 0]0 , Σα = 107 I m0 = 0, Σm = 102 I νq = 2, Cq = 107 I

Table 1: Prior distributions used in the hierarchical Bayesian arms race model (see text for details). Parameter M (1, 1) M (2, 1) (β2 ) M (1, 2) (β1 ) M (2, 2) Q(1, 1) × 10−6 Q(2, 1) × 10−6 Q(1, 2) × 10−6 Q(2, 2) × 10−6 2 σ(1) × 10−3 2 × 10−3 σ(2) α1 × 10−3 α2 × 10−3 γ1 γ2

Mean Std. Dev. 2.5%-tile 97.5%-tile 0.810 0.423 0.101 1.673 -0.223 0.517 -1.144 0.839 -0.167 0.194 -0.513 0.174 0.611 0.223 0.175 1.032 2.900 3.066 0.837 9.114 1.243 1.737 -0.384 5.097 1.243 1.737 -0.384 5.097 4.106 4.824 1.138 1.421 3.598 4.609 0.824 13.465 6.090 5.354 1.536 21.081 3.714 1.408 0.753 6.222 6.798 1.736 3.066 9.648 -0.190 0.423 -0.899 0.673 -0.389 0.223 -0.826 0.032

Table 2: Posterior distribution summary of model parameters for the Greece/Turkey data.

18

Parameter M (1, 1) M (2, 1) (β2 ) M (1, 2) (β1 ) M (2, 2) Q(1, 1) × 10−6 Q(2, 1) × 10−6 Q(1, 2) × 10−6 Q(2, 2) × 10−6 2 σ(1) × 10−3 2 × 10−3 σ(2) α1 × 10−3 α2 × 10−3 γ1 γ2

Mean Std. Dev. 2.5%-tile 97.5%-tile 1.162 0.252 0.613 1.621 0.054 0.120 -0.184 0.287 -2.714 1.504 -5.328 0.534 0.203 0.740 -1.161 1.776 8.432 9.594 2.215 29.45 0.900 1.616 -1.313 5.315 0.900 1.616 -1.313 5.315 1.716 1.462 0.518 5.800 7.886 7.870 2.142 25.98 1.754 2.316 0.358 6.853 7.889 2.244 2.995 11.69 2.101 1.175 -0.377 4.201 0.162 0.252 -0.388 0.621 -0.797 0.740 -2.161 0.776

Table 3: Posterior distribution summary of model parameters for the India/Pakistan data.

19

4

1.8

x 10

Greece Turkey

Military Expenditures (106 US $)

1.6

1.4

1.2

1

0.8

0.6 1988

1990

1992

1994

1996

1998 Year

2000

2002

2004

2006

2008

1992

1994

1996

1998 Year

2000

2002

2004

2006

2008

4

2.5

x 10

India Pakistan

Military Expenditures (106 US $)

2

1.5

1

0.5

0 1988

1990

Figure 1: SIPRI military expenditure data for 1988 - 2008 in constant (2005) millions of US dollars. Top panel: military expenditures for Greece and Turkey. Bottom panel: military expenditures for India and Pakistan.

20

β : Greece/Turkey

β : India/Pakistan

1

1

300 100 200 50

100

0

−1

−0.5

0

0.5

0 −8

1

−6

β2: Greece/Turkey 200

150

150

100

100

50

50

−2

−1

0

−2

0

2

β2: India/Pakistan

200

0

−4

1

0

2

−0.5

γ1: Greece/Turkey

0

0.5

γ1: India/Pakistan

250

150

200 100

150 100

50

50 0 −3

−2

−1

0

1

2

0 −1

3

γ2: Greece/Turkey

−0.5

0

0.5

1

1.5

γ2: India/Pakistan

200

150

150

100

100 50

50 0 −2

−1

0

0 −4

1

−2

0

2

Figure 2: Histograms of posterior samples for the Richardson arms race dynamic parameters. Specifically, the left panels show from top to bottom the histograms of β1 , β2 , γ1 and γ2 for the Greece/Turkey data. The right panels show the corresponding parameters for the India/Pakistan data.

21

Suggest Documents