Statistical analysis of Quantitative Data

Statistical analysis of Quantitative Data Arkadiusz M. Kowalski Tomasz M. Napiórkowski The textbook is co-financed by the European Union from the Eur...
Author: Patrick Walton
32 downloads 3 Views 1MB Size
Statistical analysis of Quantitative Data

Arkadiusz M. Kowalski Tomasz M. Napiórkowski

The textbook is co-financed by the European Union from the European Social Fund.

Statistical analysis of Quantitative Data

Arkadiusz M. Kowalski Tomasz M. Napiórkowski

Statistical analysis of Quantitative Data

Warsaw 2014

This textbook was prepared for the purposes of International Doctoral Programme in Management and Economics organized within the Collegium of World Economy at Warsaw School of Economics.

The textbook is co-financed by the European Union from the European Social Fund.

This textbook is distributed free of charge.

Table of Contents INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1. BASELINE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.1. Equations Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2. Parameter Estimation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3. Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4. Using Statistical Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.5. Econometrics Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2. DATA TYPES AND STRUCTURAL EQUATIONS DESCRIPTION . . . . . . . . . . . 2.1. Cross-section, Time-series and Panel data defined . . . . . . . . . . . . . . 2.2. Structural Equation Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1. Cross-Section structural equation. . . . . . . . . . . . . . . . . . . . . . 2.2.2. Time-Series structural equation . . . . . . . . . . . . . . . . . . . . . . . 2.2.3. Panel structural equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13 13 14 14 16 16

3. VARIABLES AND DATASET WORK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1. Naming Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2. Examining Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3. Stationarity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4. Correlation Matrix Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5. Descriptive Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6. Hypotheses Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7. Dummy Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1. Dummy Variables: Example . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2. Dummy Variables: Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8. Data Cleaning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9. Data Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19 19 20 21 24 25 26 27 27 29 30 32

4. MODEL DETERMINATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1. Model Estimation with a Forward Stepwise Method . . . . . . . . . . . . 34

5

Table of Contents 5. MODEL TESTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1. Multicolinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2. Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3. Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39 39 40 44

6. MODEL’S RESULTS INTERPRETATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 6.1. Interpreting and Testing Variables and Coefficients . . . . . . . . . . . . . 48 6.2. Interpreting Model’s Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 7. FORECASTING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1. Forecasting as a Model Testing Tool . . . . . . . . . . . . . . . . . . . . . . . . . 7.2. Forecasting with ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3. Forecast Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57 57 59 60

8. CONCLUSIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 A. TRANSITION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 EXAMPLE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Hypothesis Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Unit Route Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Model Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 FINAL REMARKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 STATISTICAL TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 z-table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 t-table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 F-table at 0.01 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 F-table at 0.025 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 F-table at 0.05 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 F-table at 0.1 level of significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 χ2 distribution table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

6

INTRODUCTION Dr Arkadiusz M. Kowalski Tomasz M. Napiórkowski

Is this book for you? If you are connected with econometrics in any way, this book is for you. If you are just starting the subject, this book will provide you with the basic theory and show you how to use it effectively through the employment of econometric software (EViews,1 for example). On the other hand, if you already have some experience, then this book will be a useful bring-it-all together place that you may want to visit every time you have a question about statistical tests, degrees of freedom or other questions about econometrics and its uses. So, now when we know that this book is for you, welcome to the place where some of the most common econometric theories and methodologies are explained using a step-by-step look at the blueprint of econometric research. Starting with the raw data and ending with obtaining the final ready-to-submit model, its testing and interpretation. Each chapter uses everyday-language explanations so there are no worries that you will be overwhelmed by pages of equations or words that would require you to carry a dictionary with you. Every theory, every statistical test is clearly defined and supported by a real-world example with software outputs and their full interpretation. Take note that this book is example- and hands-on heavy, with only the essential theory explained. At the end of the book you will find a full-length example of a research that will tie directly to the book by following its chapters and subsections. The use of a full, real-life example that takes the reader from the beginning to the end adds additional strength to the reader’s understanding of methodologies used. Clear references to specific sections of the book will provide a deep understanding of the workflow associated with preforming an econometric study for the conducted research.

1

To find out more about this software, please visit: http://www.eviews.com/home.html.

7

Introduction Book Sections The book is designed around eight chapters and the example. 1. Baseline: • this section explains basic econometrics and associated notation as well as what data is used in the examples. 2. Dataset Types and Structural Equations Description: • a comprehensive introduction to three most common dataset types (cross-section, time-series and panel) and a how-to regarding the construction of the structural equation. 3. Variables and Dataset Work: • a look at how to efficiently name, examine, test, adjust and record variables, what they are, how to create and use dummy variables as well as how to clean the dataset. 4. Model’s Determination: • development of the model from the initial stage to its final version by employing the LM test for additional information in residuals. 5. Model Testing: • a detailed look at detection, consequences and solutions to problems of multicollinearity, autocorrelation and heteroscadesticity. 6. Model Results Interpretation: • interpretation of the estimated and corrected model, its coefficients (including coefficient testing) and model descriptive statistics like R-squared. 7. Forecasting: • a look at forecasting as a model verification tool, ex-ante and ex-post forecasting using the plug-in method and ARIMA models. 8. Conclusion: • crafting finishing remarks.

8

CHAPTER ONE

Baseline In order to make sure that all readers, regardless of the level of advancement in the subject, can use this book to their full benefit, this section covers basic topics and terms needed when working with econometrics and using this book to its full potential. Econometrics is a science of crafting mathematical models based on economic sets of data. As one will come to realize, data can be found on almost anything that is happening in the world; be it human behavior like consumer spending or decisions of Federal Reserve Banks on interest rates, all those decisions are recorded and used by others for analysis. Popular sources of data are federal banks (e.g., for the U.S., the Federal Reserve Bank of St. Louis Federal Reserve Economic Data, or FRED, is such an example1), special government agencies like the U.S. Bureau of Labor Statistics2 or world organizations in the vein of the World Bank,3 the Organization for Economic CoOperation and Development (OECD),4 EuroStat5 and the International Monetary Fund (IMF).6

1.1. Equations Explained In this book, all models will be built based on the “skeleton” with n independent variables as can be seen in Equation 1:

1

For more information see: http://research.stlouisfed.org/fred2/. For more information see: http://bls.gov/. 3 For more information see: http://data.worldbank.org/. 4 For more information see: http://stats.oecd.org/. 5 For more information see: http://epp.eurostat.ec.europa.eu/portal/page/portal/statistics/ search_database. 6 For more information see: http://www.imf.org/external/data.htm. 2

9

Chapter one: Baseline Equation 1. Basic structural equation, i.e., the skeleton

Y = β0 + β1X1 + β2X2 + ... βnXn + ε Source: Authors’ own equation

Here, the following symbols are used:

Y – the dependent variable. It is called the dependent variable because its value

β0 – β1 –

X1 –

ε–

depends on other variables. This is the variable that the model is aiming to explain. a parameter called the constant term. Its presence is required in all estimated models. a parameter called the coefficient of the first independent (or explanatory) variable. When we talk about estimating the model, we are referring to estimating these coefficients as well as the estimation of the constant term. After the estimation is completed, especially if all of the explanatory variables have identical units (dollars, for example), it is useful to list them in the estimated model in the order of magnitude. This benefits the reader of the report by immediately informing him or her which of the used variables have the greatest impact on the dependent variable. the first independent or explanatory variable. This is one of the variables used in explaining the movement or the value of the dependent variable. It is also called the independent variable because it comes from the dataset and, ideally, does not directly depend on any other variables in the model. the error term. This accounts for any inaccuracies in the model. Since there is no such thing as a perfect model, the gap between the estimated model and the “perfect model” that predicts values of the dependent variable equal to its actual values, is called the residual.

1.2. Parameter Estimation Methods There are many methods that allow the researcher to obtain a model and its parameters, each suited for specific situations. The word “method” really refers to the way the parameters, βn, are estimated. Such approaches include: Ordinary Least Squares (OLS), Generalized Least Squares (GLS), Weighted Least Squares (WLS), Two and Three Stage Least Squares (2SLS and 3SLS) and General Method of Moments (GMM). This book will mostly employ the use of Ordinary Least Squares, that can be modified by, for example, adding cross-section and time fixed or random effects, as it is the most common and the easiest one to use, and adequately serves the purpose of estimating models like the ones talked about in this book. 10

1.3. Hypothesis Testing For detailed explanations of OLS and other approaches, as well as detailed mathematical explanations of concepts covered in this work, I suggest you refer to books that are theory heavy. One book worth recommending is Econometric Models and Economic Forecasts by Robert S. Pindyck and Daniel L. Rubinfeld.7

1.3. Hypothesis Testing Hypothesis Testing is used to statistically test models and their parts as well as other statistic-based questions, e.g., difference of means of two data sets collected. The hardest step in performing a statistical test is to correctly setting up the two hypotheses. The first one is called the null hypothesis (H0) and the second one, adequately, is referred to as the alternative hypothesis (HA or H1), which, as can be expected, states the opposite of the null. When performing a test, the decision is made whether to reject or fail to reject the null hypothesis. The decision depends on the decision rule that states that the null hypothesis is to be rejected if the critical value is less than the observed one and if the p-value is less that the established level of significance. P-value represents the probability that, given the random sample, the difference between sample means is as large, or larger, than the one being observed. The level of significance is how much error we are allowing to exist in the model. At 5% a model is much more restrictive than at 10%. Depending on the area of research, different levels can be implemented.8 For example, in marketing the levels will be greater than when performing research in the medical field.

1.4. Using Statistical Tables When performing a statistical test, the researcher, through the use of appropriate formulas, arrives at the observed value that he or she then compares with the critical value, which is obtained from Statistical Tables – tables that list values for different distributions based on degrees of freedom and the level of significance (more on this in later parts of the book). Main distributions include: t-distribution, F-distribution and Chi-square – χ2-distribution. The way of using these tables is explained throughout the book on the first occasion it is used. All tables are included at the end of the book.

7 8

Pindyck, Rubinfeld (1998). Do not worry, all this will be clear when we move to examples.

11

Chapter one: Baseline

1.5. Econometrics Software There are many different econometrics software packages available, each with unique strengths and limitations, and designations (e.g., SPSS is used in social studies). EViews, SAS,9 STATA,10 SPSS,11 just to name a few, are the most common ones used. In this book, all of the outputs, models graphs and estimations will be done using the EViews package that can be acquired at http://www.eviews.com/. One of the main advantages of this software is that it is very easy to use as well as visual in providing econometrics solutions. Another benefit is that it allows working with most common types of dataset (each of which is explained in detail in Data Types and Structural equations Description chapter). Lastly, it also allows using many statistical methods, some of which were mentioned in this chapter’s section titled Parameter Estimation Methods.

9

For more information see: http://www.sas.com/. For more information see: http://www.stata.com/. 11 For more information see: http://www-01.ibm.com/software/analytics/spss/. 10

12

CHAPTER TWO

Data Types and Structural Equations Description Depending on the type of research, datasets – the way the data is arranged – can be divided into three main categories: time-series, cross-section and panel data. Each econometrics project that aims at estimating the model requires setting up a structural equation that can be viewed as a skeleton on which the final model will be constructed. Such equations have their unique specifics that depend on the data set that is being used.

2.1. Cross-section, Time-series and Panel data defined Cross-section data looks at many variables at one particular moment in time, a snapshot of the entire situation. That is why we say that it is a onedimensional dataset. An example of such data would be an attempt to estimate the price of the house by looking at individual factors of the house sold (number of rooms, size of the house, and presence of a pool, for example). To perform such research, a dataset would consist of many observations (houses), each with a sale price as well as above-suggested data points. Time-series data is one that overlooks a particular variable (or a few variables) through the set time period (for example, the U.S. imports from the first quarter of 1960 to the last quarter of 2010); the letter t is usually used to depict the period in which the measurement has been taken. Time-series is used in most macroeconomics models (the U.S. GDP as a function of consumer spending, trade deficit, government spending and whether the country is in a recession or not, for example). Panel data consists of observations of subjects (that are our dependent variables) over a specified period of time; a combination of cross-section and time-series sets. 13

Chapter two: Data Types and Structural Equations Description A good example is a set of data looking at profits of the top 10 transportation companies in the U.S. over 10 years. Of course, each of the firms comes with its own set of explanatory data points – an example of such a set is included in Table 1. Table 1. An example of panel data with averages per firm and per year listed in the last row and the last column, respectively Year / Firm

Firm 1

Firm 2

Firm 10

Industry’s Average

2000

10

42

---

44

32.00

2001

22

15

21

19.33

2002

53

62

37

50.67

17

20

52

29.67 8.33

--2008 2009

9

11

5

Firm’s Average

22.2

30

31.8

Source: Authors’ own table on original theoretical data.

Some of the advantages of the panel data are: 1) a large number of data points (allowing for increased accuracy and additional degrees of freedom), 2) a combination of time-series and cross-section approaches that minimize the probability of omitted-variables problem. In the firm example, the use of panel data allows the researcher not only to measure the variation in profits of a single company over time but also to measure the variation in profits between companies. The advancement of the panel data is also the source of its problems as it brings together issues from cross-section and time-series sets. This book focuses on research done using cross-section and time-series sets of data. For a detailed look at working with panel data, there is no better place than Econometric Analysis of Cross Section and Panel Data by J.M. Wooldridge.1

2.2. Structural Equation Description The structural equation is the basic representation of the model to be estimated. It provides the reader with a quick mathematical view of what it is that the research is going to do and what it is trying to achieve.

2.2.1. Cross-Section structural equation For a simple cross-section dataset, a structural equation in linear form that attempts, for example, to model house sale price (SalePrice) as a linear function of the area of the house (Area), number of bedrooms (Beds) and the existence of an in-ground pool (Pool) will look like Equation 2: 1

14

Wooldridge (2010).

2.2. Structural Equation Description Equation 2. Simple linear form structural equation for working with a crosssection data set, with i representing a specific observation

SalePricei = β0 + β1Areai + β2Areai + β3Pooli + εi Source: Authors’ own equation.

The interpretation of coefficients obtained with a model in a simple linear form is very straightforward – a one unit increase in the independent variable X will impact the dependent variable Y by βX of its units, where βX is the coefficient of the X independent variable. For example, referring to Equation 2, assume that the sale price (SalePrice) of a house is reported in the U.S. dollars, the area of a house (Area) is reported in squared meters and that the value of β1 is 1520. In this case the interpretation of β1 is as follows: An increase in the area of the house by one squared meter will increase the price of the house by 1,520 U.S. dollars. A semi-log form (Equation 3 and Equation 4) of the same model would have at least one variable (dependent or independent) in the model logged – common practice is to either log the entire right- (linear-log form) or left-hand (log-linear form) side and to use the natural logarithm, ln. Equation 3. Simple semi-log form structural equation for working with a cross-section data set, with i representing a specific observation – loglinear form

InSalePricei = β0 + β1Areai + β2Bedsi + β3Pooli + εi Source: Authors’ own equation.

Equation 4. Simple semi-log form structural equation for working with a cross-section data set, with i representing a specific observation – linearlog form

SalePricei = β0 + β1lnAreai + β2lnBedsi + β3lnPooli + εi Source: Authors’ own equation.

In case of semi-log forms, the interpretation of the coefficient is a bit more complicated. Starting with the log-linear form (Equation 3), a one unit increase in the independent variable X will impact the dependent variable Y by 100βX %, where βX is the coefficient of the X independent variable. Holding all the assumption from the linear-form example, let us assume now that βX equals 0.20. In this case the interpretation of β1 is as follows: An increase in the area of the house by one squared meter will increase the price of the house by 20%. 15

Chapter two: Data Types and Structural Equations Description Moving to the linear-log form (Equation 4), a one-percent increase in the independent variable X will impact the dependent variable Y by 0.01βX of its units, where βX is the coefficient of the X independent variable. Giving the value of βX to be 3000, its interpretation is as follows: An increase in the area of the house by 1% will increase the price of the house by 30 U.S. dollars. A log-log form (also known as a full-log form, Equation 5) has all variables in logs. Equation 5. Simple full-log form structural equation for working with a cross-section data set, with i representing a specific observation

InSalePricei = β0 + β1lnAreai + β2lnBedsi + β3lnPooli + εi Source: Authors’ own equation.

In the case of a full-log form, the interpretation is simpler, that is: a onepercent increase in the independent variable X will impact the dependent variable Y by βX %, where βX is the coefficient of the X independent variable. Assigning βX to be 4, its interpretation in Equation 5 is as follows: An increase in the area of the house by 1% will increase the price of the house by 4%.

2.2.2. Time-Series structural equation When presenting the reader with a functional form of a model base on timeseries data (one with a time factor), a subscript to represent the time period should be added. The equation (Equation 6) that regresses the U.S. Imports (IM) on the U.S. GDP (GDP), the U.S. Exports (EX) and Change in Inventory (chng inv) has the following structural form. Equation 6. Simple linear form structural equation for working with a timeseries data set, with t representing a specific year

IMt = β0 + β1GDPt + β2EXt + β3chg_invt + εt Source: Authors’ own equation.

This structural equation, just like the model presented in Equation 2, can also be transformed into its semi-log and full-log forms.

2.2.3. Panel structural equation As can be expected, the structural equation of the model that is to be estimated based on panel data, will combine the features in Equation 2 and Equation 6. For example, when attempting to model inward foreign direct investment from the U.S. (IFDI) to six countries (i = 1, 2… 6) over ten years (from the year 2000 to the year 2009, that is, t = 2000, 2001… 2009) as a function 16

2.2. Structural Equation Description of hosts’ gross domestic products (GDP), their exports (X) and costs of labor (LCOST), Equation 7 can be used. Equation 7. Simple linear form structural equation for working with a panel data set, with i representing cross-section elements, i.e., host countries, and t representing time-series elements, i.e., a specific year

IFDIit = β0 + β1GDPit + β2Xit + β3LCOSTit + εit Source: Authors’ own equation.

This structural equation, just like the model presented in Equation 2, can also be transformed into its semi-log and full-log forms. Notice that in all of the above-shown cases, the constant, β0, does not have a cross-section or a time subscript, unlike the error term, εit.

17

CHAPTER THREE

Variables and Dataset Work Proper treatment of variables is probably one of the most crucial steps to setting up a successful project. Indistinctive names, errors and missing values and other mistakes are bound to occur, and therefore invalidate the entire research. This section will aim to show how to avoid such pitfalls. Additionally, it is very important and useful for reference and further research that as work is conducted all steps and changes are documented.

3.1. Naming Variables After the literature review and establishing which variables are going to play a role in obtaining the model, the next step is to properly name them all. There are two rules to properly doing so: 1) keep it short – it is very likely that they will have to be entered multiple times while using the software package, 2) be sure that you can recognize the name. For example, if one of your variables is Disposable Income, naming the variable disposable_income is not efficient (rule 1 violation) and naming it Yd, if you are not familiar with using the letter Y to represent income, infringes on the second rule. Again, writing everything down is crucial. Table 2 represents an example of a good way of keeping track of your variables.

19

Chapter three: Variables and Dataset Work Table 2. Variables Info Table Name

Symbol in the model

Unit

Gross Domestic Product

GDP

Constant 2000 United States Dollars

Source of data World Bank

Transformations NA

Source: Authors’ own table.

3.2. Examining Variables When dealing with time-series data, it is a good practice to examine how the variable moves over time. For example, the gross domestic product (GDP) data for the United States in its graphical representation is shown in Graph 1. Graph 1. The U.S. gross domestic product (left-hand axis in billion, USD)

Source: Authors’ own graph of data from International Monetary Fund.

A simple analysis of the variable presented in Graph 1 should follow for each of the variables. An example of such an analysis is: As expected, as time progresses, GDP increases; therefore, it has an upward trend and it appears to be a non-stationary variable. When looking at time as the only component, it is useful to add a trendline (this can be done in Microsoft Excel, for example1).2 1 This can be done by right-clicking the line on the graph and choosing “Add Trendline…” Here, a regression can be fitted in various forms (exponential, linear, logarithmic, power, polynomial and moving average). It is also useful to check the “Display Equation on chart” box and the “Display R-squared value on chart” box – more on these topics later in the book. 2 Be very careful when analyzing and falling back on these results. As much as these tools are

20

3.3. Stationarity Test Graph 2. The U.S. gross domestic product (left-hand axis in billion, USD) with a linear trendline

Source: Authors’ own graph of data from International Monetary Fund.

A visual analysis can also give an indication into whether we should use a linear, square (a parabolic shape – think of the capital letter U upright or upside down), cube (wave-like) or a log form (a half-parabola on its side with the tip being the intercept term) of the variable in the model.

3.3. Stationarity Test In case of cross-section data, since there is no time factor, this analysis can be skipped. When dealing with time-series data, a variable needs to be tested and corrected for nonstationarity. By definition, a stationary variable will have its mean, variance and autocorrelation constant over time. There are three general tests to see if the variable is stationary: 1) visual test (also known as the ocular test, which is the easiest), 2) correlogram, 3) Augmented Dickey-Fuller test. The ocular test can be done by plotting the data in levels – without any adjustments – as presented in Graph 1. If an average line drawn (in this case the linear trend line, Graph 2) is not close to a horizontal line, the data is considered to be not stationary. helpful in providing some insight, these insights are very limited as the presented model uses the horizontal axis’ variable, in this case time, only to model the vertical axis’ variable.

21

Chapter three: Variables and Dataset Work Table 3. An example of a correlogram of data with a unit root present Autocorrelation

Partial Correlation

.|*******

.|*******

1

.|******

.|.

|

2

.|*****

.|.

|

3

.|****

.|.

|

4

.|***

.|.

|

5

.|**

.|.

|

6

.|*

.|.

|

7

Source: Authors’ own graph based on results obtained with EViews software.

The correlogram (again, in levels), here presented in Table 3, in a case of nonstationary data, will have Autocorrelation bars slightly decreasing and Partial Correlation will have one bar that represents a unit root. More on autocorrelation in Chapter 6: Model Testing. In this type of an output, the extent of the bar is represented by the amount of stars; the longer the bar the more stars are used to represent it. Table 4. Output of the Augmented Dickey-Fuller test t-Statistic

Prob.*

Augmented Dickey-Fuller test statistic

0.885655

0.9952

Test critical values:

1% level

-3.464280

5% level

-2.876356

10% level

-2.574746

Source: Authors’ own table based on results obtained with EViews software.

The hypotheses setup for the Augmented Dickey-Fuller test used to detect the presence of a unit root (data being nonstationary) is: H0: the variable is nonstationary H1: the variable is stationary The analysis of the Augmented Dickey-Fuller output (presented in Table 4) looks first at the test t-statistic (0.885655) and compares it with the Test critical value (negative 2.876356), at a chosen level of significance, i.e. 5%. Also, Prob. (the p-value associated with the test) equals 0.9952, which is greater than the one associated with a 5% level of significance; p-value = 0.05. Based on the test’s results, we fail to reject the null hypothesis and therefore conclude that the variable in question is not stationary. This conclusion is a result of the test t-statistic being greater than the test’s critical value and Prob. being more than the 0.05. 22

3.3. Stationarity Test To solve the problem of nonstationarity, differencing is applied; Yt – Y(t-1). Differencing takes the observation of the past period’s value (Y(t-1)) and subtracts it from the observation from the current period (Yt). Graph 3. A graphical representation of the U.S. GDP after it has been transformed into a stationary variable via first-order differencing; D (GDP)

Source: Authors’ own graph based on results obtained with EViews software.

Table 5. A correlogram of the U.S. GDP after it has been transformed into a stationary variable Autocorrelation

Partial Correlation

.|**

|

.|**

|

1

.|**

|

.|*

|

2

.|*

|

.|.

|

3

.|*

|

.|.

|

4

.|.

|

.|.

|

5

Source: Authors’ own table based on results obtained with EViews software.

Table 6. The Augmented Dickey-Fuller test output testing the 1st difference of the U.S. GDP for stationarity (only relevant information included) t-Statistic

Prob.*

Augmented Dickey-Fuller test statistic

-5.477582

0.0000

Test critical values:

-2.876356

5% level

Source: Authors’ own table based on results obtained with EViews software. 23

Chapter three: Variables and Dataset Work The stationary data will have a following graph with the overall linear trend being nearly horizontal (Graph 3), correlogram (Table 5) and the Augmented Dickey-Fuller output (Table 6) with the t-static from the test (-5.477582) being less (greater in the absolute value) than the one at the desired confidence level (-2.876356) with Prob. = 0.00. Sometimes taking the first difference is not enough. If that is the case, differentiation should be repeated until the data is proved to be stationary. This should be done within the realm of reason. The second degree is usually the highest degree of differencing.

3.4. Correlation Matrix Analysis Following should be the analysis of the Correlation Matrix (a table of correlation coefficients between variables). This has the following goals: 1) to see if there is a linear relationship between the dependent variable and chosen independent variables, 2) to see the relative strength of the relationship, 3) to see the sign of the relationship, 4) to assess the possibility of multicollinearity. Table 7. A correlation matrix for the number of the U.S. FDI firms and the GDP in two regions in Poland DOLNOŚLĄSKIE KUJAWSKO-POMORSKIE

Pearson Correlation

-0.246

Sig. (2-tailed)

0.639

Pearson Correlation

0.909

Sig. (2-tailed)

0.012

Source: Authors’ own table based on results obtained with SPSS software.

Let us go over points 1 through 4 by looking at the example data shown in Table 7. The null hypothesis states that the coefficient of correlation is equal to zero; therefore, stating that there is no linear correlation between the two tested variables. In the example, the p-value for the correlation coefficient -0.246 between the number of the U.S. FDI firms in the Dolnośląskie region and that region’s GDP is equal to 0.639. Since this value is significantly above any logical and practical level of significance, the conclusion is that there is no evidence to state that there is a linear relationship between the two tested variables. When looking at the Kujawsko-Pomorskie region, since the p-value is equal to 0.012, i.e., less than the one set at a 5% level of significance (0.05), a statement can be made that there is a high, positive and statistically significant linear correlation between the two tested variables for this region. When describing correlation 24

3.5. Descriptive Statistics between two variables, it is important to make a note of three facts: one, the strength of the correlation; two, the direction (is the correlation coefficient positive/negative, suggesting that as one variable increases the other increases/ decreases); and three, the statistical significance. The correlation matrix should also be used to look at correlation coefficients between independent variables in order to detect multicollinearity, which occurs when one explanatory variable is highly correlated with another. For example, if a model would use household’s income and household’s taxes where the latter is a derivative of the former, there would be a strong suspicion of multicollinearity. The rule-of-thumb is that if the correlation coefficient (which suggests the strength of a linear relationship between two variables) is greater than 0.8, then we can expect multicollinearity (which is also suspected when the model has a significantly high R-squared and very small, in absolute value, t-statistics). More on this problem, its consequences and its solutions in the Model Testing chapter. It is important to note that just because two or more variables are highly correlated with each other it does not mean that one causes another. For example, the U.S. imports and the U.S. GDP are highly correlated but it does not automatically mean that one causes another. Here is why. First, the same example, the question can be asked: Does a high correlation coefficient between the U.S. imports and the U.S. GDP signifies that changes in the U.S. imports cause changes in the U.S. GDP, or do changes in the U.S. GDP cause changes in the U.S. imports? This question can be answered by falling back on the theory, but on the basis of the results of the correlation coefficient such a question is impossible to answer.3

3.5. Descriptive Statistics The next-to-last step in the analysis of variables is to look at the statistical summary (Table 8), usually provided within the econometrics software, of all the variables. Mean (the average value), median (the value in the middle of the set), mode (the most common value), extreme values (the minimum and maximum) and the number of observations should be examined – it is important that all variables have the same number of observations (196 in this case) as missing values will significantly distort the estimated model’s coefficients. When looking at dummy variables, the mean will represent the percentage of observations that were coded with 1 (for example, 18.3673% of all observations took place during a recession). 3 A hint into the cause-and-effect relationship can be also given by the Granger Causality test; see: Pindyck, Rubinfeld (1998), pp. 242–245.

25

Chapter three: Variables and Dataset Work To see if the variable has a normal distribution, the researcher can use three statistics. First, Skewness which shows the distribution of the mass of the variable to the left, long tail on the right (positive value) or to the right, long tail on the left (negative value). Kurtosis on the other hand measures how flat or how tall the distribution is with an ideal value of 3. The lower/higher the value, the flatter/peaker the distribution is. Third, the Jarque-Bera statistic can be used to test for normal distribution with the null hypothesis stating that the variable is normally distributed. In this example ( Table 8), as p-values (Probability) are less than 0.00000, we reject the null in favor of the alternative. Still, it needs to be remembered that the assumption of normal distribution is an “ideal” one and very often does not work in the real world.4 Table 8. Descriptive statistics of the U.S. imports, the U.S. exports and a dummy variable for recession IM

EX

RECES

Mean

735.4768

561.9069

0.183673

Median

490.3720

355.4060

0.000000

Maximum

2208.336

1670.431

1.000000

Minimum

108.4540

94.75800

0.000000

Std. Dev.

635.9210

441.4533

0.388209

Skewness

1.049200

0.827283

1.633843

Kurtosis

2.767814

2.426459

3.669444

Jarque-Bera

36.40038

25.04337

90.86179

Probability

0.000000

0.000004

0.000000

Sum

144153.4

110133.7

36.00000

Sum Sq. Dev.

78857124

38001795

29.38776

Observations

196

196

196

Source: Authors’ own table based on results obtained with EViews software.

3.6. Hypotheses Formulation The next step, taken after preforming all of the analytical steps presented in the previous section, is to construct Hypotheses Tests for each variable based on economic theory and literature review. For example, for GDP in relation to imports, the hypotheses regarding the sigh of the coefficient of the GDP explanatory variable is as follows: H0: βGDP < 0 and H1: βGDP > 0 where we want to statistically reject the null hypothesis; therefore,

4

26

For more see: Wolldrigde (2010).

3.7. Dummy Variables allowing for a statement that GDP has a positive and a statistically significant impact on the dependent variable, i.e., the U.S. imports. Summary of the information for all variables can be presented in a form of a table (e.g., Table 9). Table 9. A summary of information for the U.S. GDP variable Variable U.S. GDP

Name in the model GDP

Alternative Hypothesis H1: βGDP > 0

Source: Authors’ own table.

3.7. Dummy Variables It is very often that some information cannot be directly inputted into the model. Variables like sex (male, female), race (white, black, for example), location (Washington, Richmond, for example) and many more need to be transformed prior to their use. Another important use of dummy variables is to distinguish between two periods. When looking at any variables around an economic or a social event, a researcher may want to designate those observations that took place prior to the event versus those that followed. For example, Poland joined the European Union in the year 2004; as a result, a dummy variable (coded EUDV) can be created that takes the value of zero for the years prior to the year 2004 and one for the years 2004 and after (see Table 10).

3.7.1. Dummy Variables: Example Table 10. Dummy variable creation: European Union membership example Year

EUDV

2002

0

2003

0

2004

1

2005

1

2006

1

Source: Authors’ own table.

Here is another example. When seeing if the sale price of a specific car, which is our left-hand-side variable of the original structural equation (Equation 8), depends on the sex of the buyer, the researcher should decide on the presented course of action; first, the set up.

27

Chapter three: Variables and Dataset Work Equation 8. Dummy variable creation: Sale price example, original equation (no dummy variable)

SalePricei = β0 + β1X1i + . . . βnXni + εi where: CarSalesi – the dependent variable; ith sale price of a specific car in βn – coefficient of the nth independent variable X εi – error term Source: Authors’ own equation.

Let us assume that we have the original data set as presented in Table 11. The first purchase was done by a male, the second by a female and the third by a male; the data coded this way cannot be effectively used in model determination. The solution is to simply assign the value of 1 if the buyer was a female, and the value of 0 if the buyer was a male.5 In this case, the original data set will be transformed to look like the one presented in Table 12. Table 11. Dummy variable creation: Sale price example, original data set Sale Price

Sex

$120,000

M

$67,450

F

$87,090

M

Source: Authors’ own table on original data.

Table 12. Dummy variable creation: Sale price example, transformed data set Sale Price

SexDV

$120,000

0

$67,450

1

$87,090

0

Notice that when a variable is a dummy variable, it is very useful to mark that fact by, for example, adding capital DV at the end of its name. Source: Authors’ own table on original data.

This introduces one dummy variable to the original structural equation (Equation 8), which results in a new one (Equation 9).

5 It does not matter which sex takes which value as long as you have it clearly noted for interpretation purposes.

28

3.7. Dummy Variables Equation 9. Dummy variable creation: Sale price example, original equation (with a dummy variable)

SalePricei = β0 + β1X1i + . . . βnXni + βn+1 SEXDVi + εi Source: Authors’ own equation.

Given how SEXDV is coded (0 for male and 1 for female), the interpretation of its coefficient, βn+1 , is as follows: 1) if the coefficient of the dummy variable SEXDV is positive, then a statement can be made that if the buyer is a female, the dependent variable, i.e., the price for which the car is sold, will be higher than in the case of a male buyer, 2) if the coefficient of the dummy variable SEXDV is negative, then a statement can be made that if the buyer is a female, the dependent variable, i.e., the price for which the car is sold, will be lower than in the case of a male buyer.

3.7.2. Dummy Variables: Pitfalls One may be tempted to solve the example from the previous section by creating two dummy variables; namely, Male (MDV) and Female (FDV). The first taking the value of zero if the buyer was a female and one if the buyer was male; the second, the value of zero for a male buyer and the value of one for a female buyer. In this case, the transformed data set will look as is presented in Table 13 and the structural equation will take the form shown in Equation 10. Table 13. Dummy variable creation: Sale price example, transformed, version 2, data set Sale Price

MDV

FDV

$120,000

1

0

$67,450

0

1

$87,090

1

0

Source: Authors’ own table on original data.

Equation 10. Dummy variable creation: Sale price example, original equation (with two dummy variables)

SalePricei = β0 + β1X1i + . . . βnXni + βn+1 MDVi + βn+1 FDVi + εi Source: Authors’ own equation.

29

Chapter three: Variables and Dataset Work This procedure shows the first most common pitfall when employing dummy variables; namely, including all categories in the model. This creates the problem of multicollinearity (more on this later on) as MDV and FDV are perfectly correlated with each other. That is, as one increases from zero to one, the other, for the same ith observation, decreases from one to zero as compared with the previous occurrence. Obviously, one can include just one of the two new variables; MDV showing if the buyer was male (value of one) or not (value of zero) or FDV showing if the buyer was a female (value of one) or not (value of zero). The interpretation will be parallel to the one made in the above-presented example from the previous section. The second common pitfall is when the researcher decides to base the model on too many dummy variables. The rule of thumb is that the model should not contain more than two, maximum three dummy variables – this of course being subject to the fact that the model does not suffer from the problem of underspecification (too few explanatory variables) or the problem or overspecification (too many explanatory variables).6

3.8. Data Cleaning One of the main reasons for inspecting the data, in addition to getting a feel of the links between variables (i.e., correlations) as well as how variables change over time, is to determine whether there are any inconsistencies. Looking at extreme values, for instance, allows the researcher to identify miscoded entries. An example would be a house with 0 square footage, 22 bathrooms and 2 bedrooms, an average minimum labor cost of 15 dollars an hour with a maximum value of 115 – all of which are clearly illogical and an error. Another way of finding such values or finding missing entries is to sort the data according to each variable to see which cells were left empty in the spreadsheet. What is important, this should be done one variable at time to avoid distorting the data. Identifying the problems is straightforward whereas amending the issue can be as easy as deleting observations and as complicated as finding alternative ways of acquiring the missing data. Prior to deciding on the solution, it is important to note that retrieving the missing data is the better preferred approach – this way the size of the data set, and therefore degrees of freedom, is not being decreased. If the researcher decides to look for an alternative source of data, say to complement the unit labor cost for Poland for the year 2004 (values for other years are known), it is crucial to look at the methodology behind the data collection of the first source and make sure that the data point that is being supplemented comes 6 The number of independent variables depends first and foremost on the number of observations and on the literature review.

30

3.8. Data Cleaning from a source that employs the same methodology. Due to differences in methodology, differences can reach 30% – this issue is evident when looking at data on Foreign Direct Investments, for example. When dealing with cross-section data, where there is no continuity between observations, deletion of a single or even of few observations usually does not cause concern; that is as long as the sample size stays large enough. Deletion, though tempting, is not a good solution when working with time-series data, data that has a “flow” to it. Removing an observation from a time-series set (for example, for the first quarter of 1998, when looking at data from the year 1990 to the year 2010 quarterly) creates a hole. If deletion is the only option while working with time-series data, it has to be done on the variable bases, that is, an entire variable for which data is missing is deleted. As can be expected, when put into a corner, that is when deletion of an entire variable is not possible, it is possible to employ some algorithms that will methodologically supplement the missing data, e.g., supplementing the data according to its simplified, linear, trend. A simpler alternative to it is an averaging method, see Equation 11. Equation 11. Simple averaging method

Vt = (Vt+1 + Vt–1) / 2 Source: Authors’ own equation.

Say the situation is as presented in Table 14, where observation of GDP for the year 2004 (GDP2004) is missing and there is no possibility of obtaining it from another source. Deletion, as has been explained, is not an option as it distorts continuity. Table 14. Supplementing the missing data example, original data set Year

GDP (in billion)

2002

4

2003

6

2004

???

2005

9

Source: Authors’ own table based on original data.

In this case, the missing value will be 7.5 = (9+6)/2 = (GDP2005 + GDP2003)/2. This method can be employed under the following conditions: 1) values of the missing variable continue to grow at a more-or-less steady pace, in other words, the value preceding the missing data point is not, for example 5 and the subsequent value 134, 31

Chapter three: Variables and Dataset Work 2) the number of supplemented observations is minimal in reference to the entire number of observations in the series, 3) the researcher employing this, or for the matter of fact any other method regardless of its mathematical advancement, is aware of its limitations.

3.9. Data Description The purpose of describing the data is to explain to the reader everything that he or she needs to know. It will include things like sources of data (International Monetary Fund, for example), its frequency and the range of the data (for timeseries), the number of observations, any transformations – and the methods used – done to the data (for example, converting monthly data into quarterly data), assumptions (if using one data to represent another, a proxy; for example, using daily market closing numbers to reflect customers’ wealth) and the creation of dummy variables.

32

CHAPTER FOUR

Model Determination Now when the variables have been examined and the structural equation is defined, the next step is the model estimation. This part of the book outlines step-by-step the procedure of moving from raw data obtained from data sources to arriving at, correcting and interpreting the final model. There are many options when it comes to deciding which variables should be included in the model. Usually, the explanatory variables are decided on based on the literature review and then simply put into the model. The problem arises when there is the issue of oversaturation of the literature with possible determinants.1 In this situation, when there is no empirical research that dictates which independent variables should be used, the researcher is usually forced to rely on his subjectivism, which, due to its nature, can be questioned by others and usually should be avoided in any research. Other solutions include, but are not limited to, stepwise approaches that add explanatory factors to the initial very limited model based on some statistical property. This property is usually the maximization of the F-statistic or R-squared. Addressing some shortcomings of the stepwise method, three main issues should be understood. First, this method is not a substitute for a literature review. What this means is that it picks the variables from a given evoked set, regardless of their theoretical connection, or its lack of, with the dependent variable. As a result, the set of possible explanatory factors should include only those variables that have a strong backing in the theory and in the literature on the topic being researched. Second, new variables are added based on their statistical importance, not their theoretical importance. As a result, the order in which the variables are entered is not necessarily the order of importance 1 To study the great article that shows the extent of this issue when doing research in the field of foreign direct investment see: B.A. Blonigen, J. Piger (2011), Determinants of Foreign Direct Investment, NBER Working Paper 16704.

33

Chapter four: Model Determination from the point of view of theory and/or the impact a change in the independent variable will have on the dependent variable. Third, which variables are added depends on which variables are already in the model. Therefore, least some variables should be forced into to create an initial model based on their most common occurrence in the literature on the subject.2

4.1. Model Estimation with a Forward Stepwise Method In the forward stepwise method,3 the starting point is the initial model that consists of a small number of independent variables that were decided on based on commonalities in the articles read during the literature review. This section looks at the procedure from the manual point of view, that is, whether all the steps were carried out by the researcher. Despite the fact that this can be done automatically in such software packages as SPSS, other econometric programs (e.g., EViews) do not have the automatic option and require the following procedure to be conducted “by hand.” All estimations are done with Ordinary Least Squares method of estimation. Holding all other variables constant, the initial structural equation is presented in Equation 12. Notice that in this example, subscripts that would designate either cross-section (i) or time-series (t) modeling are substituted for simplicity with a. Equation 12. Model estimation with forward stepwise method example – initial structural, restricted equation

Ya = β0 + β1X1a + β2X2a + β3X3a + β4X4a + β5X5a + εa Source: Authors’ own equation.

After the structural equation is estimated with econometric software, it becomes a model. The structural representation is shown in Equation 13. Since new explanatory factors are going to be added to this model, it is called the unrestricted model. Notice that now that we are talking about an estimated 2

For more information on the stepwise approach and its limitations see: 1) B. Thompson (1989), Why Won’t Stepwise Methods Die?, “Measurement and Evaluation in Counseling and Development,” Vol. 21, pp. 146–149. 2) C.J. Hubert (1989), Problems with Stepwise Methods – Better Alternatives, “Advances in Social Science Methodology,” Vol. 1, pp. 43–70. 3) J.S. Whitaker (1997), Use of Stepwise Methodology in Discriminant Analysis, paper presented at the annual meeting of the Southwest Educational Research Association, Austin, Texas, January 23, 1997. 3 The reason why this method is called “forward” is because the researcher starts with a small, restricted initial model and then adds new variables to it. If the opposite was the case, that is, an unrestricted model with many explanatory variables is the starting point and the objective is to statistically drop independent variables, the method would be referred to as a backward stepwise method.

34

4.1. Model Estimation with a Forward Stepwise Method model, all the parameters that have been estimated have a hat (^) on top of them and the error term becomes known as the residuals. Equation 13. Model estimation with forward stepwise method example – initial structural, restricted model

Ya = β0 + β1X1a + β2X2a + β3X3a + β4X4a + β5X5a + εa Source: Authors’ own equation.

For the sale price of the house example that has been mentioned previously, the initial model would, for example, consist of the area of the house, location (that is, a city or a state), its age, number of rooms and the number of baths. In order to determine if the initial model is sufficient or not, a statistical test, the Lagrange Multiplier (LM) test, should be implemented to check for the presence of additional information hidden in residuals (estimates of errors). The reason why residuals are expected to hold additional information is that the restricted model, or any other model, only extracts the information relating to the used independent variables. As a result, there is always some information that is not accounted for. The mentioned test requires an auxiliary regression. Such an equation has the residuals ( ^ εa) from the initial model (Equation 13) as the dependent variable that is being regressed on all explanatory variables collected by the researcher. In the example, there are overall 20 possible independent variables suggested by the literature, X1–X20, for which the data has been collected. In Equation 14, the structural equation has alphas (α) that designates the parameters to be estimated and γa represents the error of the auxiliary regression. Equation 14. Model estimation with forward stepwise method example – auxiliary structural equation ^ ε

a

= α0 + α1X1a + α2X2a + α3X3a + . . . + β19X19a + β20X20a + γa

Source: Authors’ own equation.

The estimated auxiliary regression is shown in Equation 15. Equation 15. Model estimation with forward stepwise method example – auxiliary structural model ^ ε

a

=^ α0 + ^ α1X1a + ^ α2X2a + ^ α3X3a + . . . + ^ α19X19a + ^ α20X20a + ^ γa

Source: Authors’ own equation. 35

Chapter four: Model Determination When looking at the output of the auxiliary regression, it is important to note that the variables already included in the model (X1 – X5) will have a low (in absolute value) t-statistics and a high p-values. The null hypothesis states that all of the coefficients in the auxiliary model are equal to zero, and therefore, there is no further information to be extracted. The alternative hypothesis states that at least one of the referred to coefficients are not equal to one. H0: αk+1 = αk+2 = … = αk+m = 0



no more information to be extracted

H1: αk+i ≠ 0; least for some i



some information that can be added

The LM formula that is used has a Chi-square distribution and is shown in Equation 16, where n represents the number of observations and R2aux is the R-squared statistic from the auxiliary model and is described in detail in the Model Results Interpretation section of this work. Equation 16. Lagrange Multiplier formula

LM = nR2aux Source: Pindyck, Rubinfeld (1998), p. 282.

The degrees of freedom would be the number of all available variables minus the number of variables used in the model being tested, that is, the initial (restricted, Equation 13) model (20–5 in this example). Table 15. A section of the Chi-square table with error levels in the first row and degrees of freedom in the first column Right tail areas for the Chi-square Distribution df\area

0.25

0.1

0.05

1

1.3233

2.70554

3.84146

14

17.11693

21.06414

23.68479

15

18.24509

22.30713

24.99579

16

19.36886

23.54183

26.29623

Source: Authors’ own table.

The first step after calculations of the LM statistic are completed is to find the critical value. In our example, χ2critical for 15 degrees of freedom (size of the set of possible explanatory variables net the number of independent variables used) at 5% (0.05) will be 24.99579 and can be read from a Chi-square distribution table (a part of which is shown in Table 15). This value is compared with χ2observed from the LM formula. 36

4.1. Model Estimation with a Forward Stepwise Method If the number of observations (n) is, for example, 900 and the Raux2 is 0.257, the LM would be (900 • 0.257) 231.3. As a result of the test, χ2critical being less than χ2observed (24.99579 < 231.3), the null hypothesis is rejected and a statement can be made that there is still some information to be added to the model. In order to determine which variables ought to be added to the model, the examination of the auxiliary regression’s output should follow. The possible explanatory variables that have the highest (again, in absolute value) t-statistics, and therefore, the lowest p-values, should be added as they are the most statistically significant. It is wise to only add no more than two variables at a time. The safest course of action is to add one new independent variable at a time. Let us say that one new variable has been added to the original restricted model’s right-hand side, X6. After this addition, the new model looks as presented in Equation 17. Equation 17. Model estimation with forward stepwise method example – initial structural unrestricted model ^

^

^

^

^

^

^

Ya = β0 + β1X1a + β2X2a + β3X3a + β4X4a + β5X5a + ^εa Source: Authors’ own equation.

Notice that the model, after it has been expanded with the addition of a new explanatory element, is referred to as an unrestricted model. At this point, the new model should be tested again with the LM test. The procedure is to be repeated till we fail to reject the null hypothesis (in other words, till χ2critical > χ2observed). At that point, a statement can be made that the final model has been achieved and now should be tested for multicollinearity, autocorrelation and heteroscadesticity as described in the next chapter titled Model Testing.

37

CHAPTER FIVE

Model Testing After the model is estimated, it needs to be checked. The three most common and major problems are: multicollinearity, autocorrelation and heteroscedasticity. This chapter provides the definition of each of these three issues and the ways of detecting and remedying them.

5.1. Multicollinearity Multicollinearity exists when two or more of the explanatory variables (for example, the U.S. GDP and the U.S. Exports in the U.S. Imports estimation example) are highly correlated with each other. Another cause of multicollinearity is overfitting or overspecification, which suggests that the researcher was adding independent variables simply to maximize R-squared without regard for their statistical significance. As mentioned earlier, another common cause of multicollinearity is associated with dummy variables. When using dummy variables, it is important to always leave one of the categories out. If, for example, the explained variable is believed to be dependent on the seasons of the year, four dummy variables would be created to reflect whether the observation took place in summer, autumn, winter or spring. But, when estimating the model, only three of the four dummy variables would be included to avoid the multicollinearity problem. If significant multicollinearity is present, the computer software will not be able to estimate the model, as one of the mathematical functions it uses will be impossible to execute. There are two main ways of detecting this problem. One, the correlation matrix (shown in the 3.4. Correlation Matrix Analysis section, in Table 7) and, two, the examination of the regression output (discussed in detail in the next 39

Chapter five: Model Testing chapter). If the correlation coefficient is high (0.8 and above – again, a rule of thumb) between any independent variables, multicollinearity can be a problem. Also, if the model has a very high R-squared statistic, but the coefficients are not statistically significant (low t-statistics and high p-values), multicollinearity is expected. The most common remedies are to either increase the sample size (get more observations) or drop variables that are the least significant (highest p-values) and/or are the main suspects of causing the problem. The latter solution needs to be performed under a caution as deletion of too many variables can lead to the problem of underspecification.

5.2. Autocorrelation Autocorrelation (Serial Correlation) exists when a variable is a time function of itself (today is affected by yesterday, for example) and is a problem only when dealing with time-series and panel sets of data. If the problem occurs in a crosssection set, it can be either ignored or, preferably, the order of observations can be changed to solve the problem. The presence of autocorrelation causes the estimated coefficients of independent variables to be inefficient (though still unbiased). In addition, standard errors are biased and any individual hypothesis testing is invalid. Autocorrelation of the dependent variable can be detected by the ocular test of the residuals, the correlogram, the Breusch-Godfrey Serial Correlation LM test and the examination of the Durbin-Watson statistic. Graph 4. Graph of residuals of a model with the U.S. imports (IM) as the dependent variable

Source: Authors’ own graph based on calculations conducted with EViews software. 40

5.2. Autocorrelation The residuals graph may look like the one in Graph 4, in which a pattern that suggests the presence of the problem of autocorrelation is visible. Here, one observation appears to be dictated by the one before it. These quick changes in the trend create sharp tips of the graph. The main benefit of this approach is that it is quick as it does not require any calculations. At the same time, its disadvantage comes in the form of subjectivism used by the researcher. As much as this method can be a good indicator, conclusions on the presence of the autocorrelation should not be made solely based on it. Table 16. An example of a correlogram output for the U.S. imports model Autocorrelation

Partial Correlation

AC

PAC

Q-Stat

Prob.

.|***** |

.|***** |

1

0.751

0.751

114.36

0.000

.|**** |

.|*

|

2

0.598

0.080

187.40

0.000

.|*** |

**|.

|

3

0.375

-0.227

216.18

0.000

.|**

|

.|.

|

4

0.228

-0.020

226.93

0.000

.|*

|

.|*

|

5

0.169

0.147

232.85

0.000

Source: Authors’ own table based on calculations conducted with EViews software.

Significant bars in the Partial correlation column in the correlogram (Table 16) suggest that there is a problem of autocorrelation. The placement of theses bars serves as an indicator to which order of the autocorrelation is present in the model. In this case, the first and possibly the third level of autocorrelation can be expected as those orders have the longest bars in the right column. The Prob. column provides p-vales for each of the autocorrelation orders. It is important to note that initially a large number of autocorrelation orders will be found statistically significant, which is a big disadvantage of this approach. The reason for this is that the third order, for example, may be caused by the first and/or the second order of autocorrelation. On the plus side, this method of detecting autocorrelation provides more information than the ocular examination as it suggests which orders of autocorrelation can be expected. The next possibility of detecting autocorrelation involves the use of the Breusch-Godfrey Serial Correlation Lagrange Multiplier Test, for which the hypotheses setup for autocorrelation looks as follows: H0: No Autocorrelation H1: Autocorrelation exists

41

Chapter five: Model Testing Table 17. An example of the Breusch-Godfrey Serial Correlation LM test output for the U.S. imports model Breusch-Godfrey Serial Correlation LM Test: F-statistic

425.9017

Prob. F (2,192)

0.0000

Obs*R-squared

163.2115

Prob. Chi-Square (2)

0.0000

Source: Authors’ own table based on calculations conducted with EViews software.

The LM formula (Equation 16), as mentioned earlier, has the Chi-square distribution. For the U.S. imports example, the degrees of freedom would be 6 (the number of explanatory variables in the model). From the Chi-square table χ2critical = 12.59 and the χ2observed = 163.2115 (which we can either calculate or read from the Breusch-Godfrey Serial Correlation LM test output – Table 17); hence, we reject the null hypothesis and conclude that autocorrelation is present. This is the preferred way of approaching the issue of testing residuals for the presence of autocorrelation as, due to its mathematical nature, it removes all subjectivism and its interpretation is clear. The last way of determining the presence of the autocorrelation is to examine the Durbin-Watson statistic (that is explained in more detail in the following chapter). The ideal value is 2.00. Anything below 2.00 suggests a positive autocorrelation and when the reading is above 2.00 it indicates the presence of a negative autocorrelation. There is a number of ways to correct for autocorrelation (Generalized Least Squares method or adding more significant variables, to just name two). The easiest two to implement are an introduction of an autoregressive (AR(p)) term where the letter p indicates the order of the serial correlation and the introduction of a lagged dependent variable as one of the explanatory variables. When using the AR(p) approach (Equation 18), it is important that AR(1) through p terms are introduced. For example, if there is third order autocorrelation (as suggested in Table 16) terms AR(1), AR(2) and AR(3) should be added to the model (Equation 19). Equation 18. Structural equation with an AR(p) term

Yt = β0 + β1X1t + . . . + βnXnt + δ1AR(p) + εt Source: Authors’ own equation.

Equation 19. Structural equation with AR(p) terms 1 through 3

Yt = β0 + β1X1t + . . . + βnXnt + δ1AR(1) + δ2AR(2) + δ3AR(3)+ εt Source: Authors’ own equation.

42

5.2. Autocorrelation AR terms are subject to the same statistical significance tests as other coefficients (more on that topic in the next chapter). As much as they are easy to implement, their biggest drawback is that they are very hard, if possible, to interpret. When applying the second solution, after introducing the lagged dependent variable term (Yt-1) into the equation (as shown in Equation 20), all of the original coefficients (including the constant term) need to be adjusted to properly reflect their values that have changed due to the correction. Equation 20. Structural equation with lagged dependent variable as an additional explanatory variable

Yt = β0 + β1X1t + . . . + βnXnt + 1Yt–1+ εt Source: Authors’ own equation.

The adjustment (Equation 21) requires dividing the original coefficient’s ^ estimated value ( βn ) by 1 minus the sum of all coefficients associated with lagged dependent variables used as explanatory variables. Equation 21. Adjustment of the nth coefficient with r lagged dependent variables used as independent factors ^

βn β’n = ––––––––– r 1 – ( Σ 1 m) ^

where: ^ ^ β’n – the adjusted value of the original coefficient, βn m – number of the coefficient of the lagged dependent variable r – number of lagged dependent variables used as explanatory variables Source: Authors’ own equation.

For the Equation 20, the adjustment for the coefficient of the first independent variable would take the following form presented in Equation 22. Equation 22. Adjustment of the 1st coefficient with one lagged dependent variables used as independent factors ^

β1 ^ β’1 = ––––––––– 1 – ( 1) Source: Authors’ own equation. 43

Chapter five: Model Testing Analogously to using AR(p) terms, if higher orders of autocorrelation are expected (for example, 3rd order), all of the orders should be included in the model (1 through 3). The advantage of this method is that despite the need for adjustment, it provides coefficients that are easy to incorporate in the interpretation, description, of the estimated coefficients assigned to the used explanatory variables.

5.3. Heteroscedasticity Heteroscedasticity is the existence of different variances among random variables. A good example of this problem would be the variance of consumer spending – lower income earners will have a smaller variance when people in the upper income bracket will have a higher variance. It causes the same problems as autocorrelation. To detect this problem tests like the ocular test of residuals (at one end the spread of residuals will be small and it will increase as the residuals are plotted, a megaphone or a cone-shape graph), or any of the White, Goldfield-Quandt or Breusch-Pagan LM tests can be implemented. Just like with autocorrelation or stationarity, the ocular examination of the graph should be used only as an indicator of the presence, or the lack of, the problem. Statistical test, like the ones mentioned above, are the preferred option. For the LM White test, for example, the hypotheses will look as follows: H0: No Heteroscedasticity H1: Heteroscedasticity exists Table 18. An example of a heteroscedasticity LM White test for the U.S. Imports model Heteroscadesticity Test: White F-statistic

40.04048

Prob. F (20,179)

0.0000

Obs*R-squared

163.4623

Prob. Chi-Square (20)

0.0000

Scaled explained SS

271.0250

Prob. Chi-Square (20)

0.0000

Source: Authors’ own table based on calculations conducted with EViews software.

An example of a statistical test for heteroscedasticity, the LM White test (shown in Table 18), suggests that the model tested suffers from the presence of heteroscedasticity and needs to be corrected for it. We make such a conclusion as the LM test statistic, also known as χ2observed, 163.4623 (in Table 18 reflected as Obs*R-squared – number of observations multiplied by R-squared of the auxiliary regression) is greater than χ2critical at 5% level of significance and 20, for example, degrees of freedom. The decision to reject the null of no heteroscedasticity is 44

5.3. Heteroscedasticity supported by the fact that the p-value of Prob. Chi-Square (20), read from the output, is less than 0.00. One of the popular remedies is called the Weight Least Squares method of estimating the parameters of a model – where weights are assigned to observations to adjust for the difference in variance. The key problem with Weight Least Squares is assigning proper weights to specific observations in such a way not to distort the results of the research. An easy way out is provided by many software packages (EViews, for example) that offer an automatic option that cures the problem of heteroscedasticity.

45

CHAPTER SIX

Model’s Results Interpretation Model interpretation consists of analyzing two parts of the output received after estimating the designed model using econometric software, that is, one, the output regarding the estimation of the model’s parameters (Table 19) and statistics describing the model as a whole (Table 22). Each of the mentioned outputs plays a key role in assessing the estimated model. This process will give the researcher hints as to whether or not the chosen independent variables and the model as a whole, statistically, do a good job of representing the data. For this section, the model that is used as an example is estimated based on the following linear structural equation, Equation 23. Equation 23. Linear structural equation of the model used in Model’s Results Interpretation chapter

IMt = β0 + β1YDt + β2POPt + β3Wt + β4GDPt + β5EXt + εt Source: Authors’ own equation.

47

Chapter six: Model’s Results Interpretation

6.1. Interpreting and Testing Variables and Coefficients Table 19. Coefficient estimation output from software after estimating the U.S. imports Variable C

Coefficient

Std. Error

3376.174

278.8045

YD

0.142595

POP

-0.024214

W GDP EX

t-Statistic

Prob.

12.10947

0.0000

0.065082

2.191002

0.0296

0.001978

-12.24395

0.0000

0.032785

0.007324

4.476295

0.0000

0.299346

0.067465

4.437042

0.0000

0.21193

0.06036

3.51112

0.0006

Where the dependent variable, the U.S. imports (IM) is being regressed on the constant term (C), Disposable Income (YD), the U.S. Population (POP), Wealth (W), the U.S. GDP (GDP) and the U.S. Exports (EX). Source: Authors’ own table based on calculations conducted with EViews software.

In Table 19, the Variable column lists all the explanatory variables entered into the model as well the constant term, Coefficient column lists the estimated values of the coefficients of the independent variables as well as the constant term, Sdt. Error represent standard errors of the coefficients and the constant term, t-Statistic is parameter’s value divided by its standard error and the Prob. column shows the p-value associated with each of the estimated coefficients and the constant term. The estimated version of Equation 23, based on results presented in Table 19, is shown in Equation 24. Equation 24. Estimated version of the linear structural equation of the model used in Model’s Results Interpretation chapter ^

IMt = 3,376.174 + 0.143YDt + 0.02POPt + 0.03Wt + 0.299GDPt + 0.212EXt + εt Source: Authors’ own equation.

When dealing with non-probability models (ones that do not involve estimating the probability that an occurrence will take place based on given characteristics of the object making the decision, for example), the coefficients are easy to interpret – as has been shown in section 2.2. Structural Equation Description. When interpreting the coefficients, it is very important to realize that all other coefficients are held constant (ceteris paribus). The reason for such a statement is necessary. Moving away from economics, let us say that 48

6.1. Interpreting and Testing Variables and Coefficients an overweight person decides to go on a diet and start an exercise program. After two months, the weight of this person has decreased by 10 kilograms. The questions are, was it due to the decrease in calories eaten, the exercise program, or maybe both, and if that is the case, which, the diet or the exercise program, had a greater impact on the reached weight loss? A parallel example can be seen of course in any discipline. To bring the discussion back to economics, let us look at the unemployment rate, that as we know depends on many economic conditions. Or the gross domestic product, is it spending, is it investment in capital, is it government spending or net exports. If none of the variables in the model are logged, like the example in Table 19, coefficients represent something called marginals. The marginal is interpreted as follows: a one unit increase in the disposable income (YD) will increase the dependent variable, the U.S. imports, by 0.142595 units; this is why it is crucial for the model interpretation to have the units specified clearly. In the U.S. imports example, the measurements are done in billions of U.S. 2005 dollars, unless stated otherwise. Using that information, the above analysis of the coefficient of the disposable income can be improved upon by stating that: In case of the disposable income of the U.S. customers increasing by 1 billion U.S. 2005 dollars, the model suggests that the U.S. imports will increase by 0.142595 billion U.S. 2005 dollars, or by 142,595,000 U.S. 2005 dollars. Analogously, if the population of the U.S. increases by one person (unit of measurement), the U.S. imports will decrease by 0.024214 billion U.S. 2005 dollars; and so on. The above-presented interpretation of the estimated parameters of the model is only valid when and if the estimated coefficients are found to be statistically significant or not. To do so, again the output shown in Table 19 is examined. By looking at the assigned value of Prob. (that is, the p-value), a statement regarding the statistical significance of an individual variable can be made. At the 5% level of significance the p-value is equal to 0.05, which serves as a cutoff point. The hypotheses for this test are as presented below: H0: Xn is not statistically significant H1: Xn is statistically significant If the p-value of the estimated coefficient of the nth variable is less than 0.05, we reject the null hypothesis and state that the nth variable is statistically significant. Any Prob. reading above the cutoff point fails to reject the null because, as will be shown in a bit, its coefficient is not significantly different from zero. In addition to looking at the p-value, each of the coefficients can be tested for its significance with a t-test. If, for example, it is expected that the coefficient ^ of the U.S. GDP variable (βGDP) will be positive (in other words, it is expected that as GDP increases, the imports of the U.S. are expected to also increase) it ^ is important to test if the estimated coefficient, βGDP , actually is, as expected, greater than zero. 49

Chapter six: Model’s Results Interpretation The test for the significance of the calculated coefficient of the nth variable should have the following steps: 1) set up the null and the alternative hypotheses statements, 2) select the critical value based on the confidence interval, 3) compute the tobserved and compare against tcritical, 4) make a statement about the success of rejecting or a failure to reject of the null hypothesis. This is called the one-tail t-test as there is some expectation on the sign on the tested coefficient. Table 20. Summary of the coefficient testing procedure for one-tail tests If the variable is expected to have

Hypothesis statement

a negative coefficient a positive coefficient

H0: βn ≥ 0 H1: βn < 0 H0: βn ≤ 0 H1: βn > 0

Formula ^

(βn – βtest) tβn = ––––––––– Sβ^n ^

Where Sβn is the standard error of the coefficient of the nth variable and βtest is the value βn is compared to, in this case βn = 0. Source: Authors’ own table with formula from Pindyck, Rubinfeld (1998), p. 112.

The degrees of freedom are equal to n – k (number of observations less the number of explanatory variables in the model). If, for example, the number of observations is 50 and the number of independent variables in the model is 20, then the degrees of freedom are 30 and the t-statistic at 5% is 1.697. If tcritical is less than tobserved, we reject the null and confirm that the coefficient of the nth variable is consistent with the assumptions made based on the economic theory. It is possible to see if statistically the coefficient of the nth variable is greater, less than or equal to a specific value. For the first two tests, simply choose the appropriate hypothesis statement from Table 20 and substitute the value for βtest ^ that βn is being tested against into the formula. For example, to test if the coefficient of disposable income (YD in Table 19, with 200 observations) is statistically greater than 0.01, the test would look as follows: Hypothesis setup: H0: βn ≤ 0.01 H1: βn > 0.01 T-test is shown in Equation 25. 50

6.1. Interpreting and Testing Variables and Coefficients Equation 25. Example of the t-test

(0.142595 – 0.01) tβ^n = ––––––––––––––– = 2.0373 0.065082 Source: Authors’ own equation.

Conclusion Since tobserved (2.0373) is more than tcritical (1.6448), we reject the null hypothesis in favor of accepting the alternative hypothesis, and therefore state that statistically, the coefficient of the disposable income variable is greater than 0.01. The test to see if the coefficient is statistically different from a given value, the two-tail t-test is used. This test is also used when there is no inclination, hints, to whether a positive change in a tested independent variable will have a positive or a negative impact on the dependent variable. The two-tail t-test mimics the one-tail t-test, with the following adjustments listed in Table 21. Table 21. Summary of the coefficient testing procedure for two-tail tests Hypothesis statement

Formula

t-statistic

^

(βn – βtest) tβn = ––––––––– Sβ^n

H0: βn = 0 H1: βn ≠ 0

Example: given d.f. = 30 at 5% the t-statistic = 2.042

Source: Authors’ own table with formula from Pindyck, Rubinfeld (1998), p. 112.

The decision rule for null’s rejection (tcritical < tobserved) stays the same. Sometimes, the one- or two-tail t-test, which is used only for a single variable at a time, will result in that variable being statistically insignificant. Yet, there are times when the same variable combined with another one (or two, or three) will be, as a group, found statistically significant, and therefore the model will be improved with their addition. To test for a combined significance (or joint significance) of two or more variables at the same time, the F-test with F distribution is used. In principal, the F-test compares the restricted model (one to which new variables are to be added, Equation 26) to the unrestricted model (Equation 27) containing the new independent variables. Equation 26. Joint significance test – structural model, restricted ^

^

^

^

^

^

Yt = β0 + β1X1t + β2X2t + β3X3i + εt Source: Authors’ own equation.

51

Chapter six: Model’s Results Interpretation Equation 27. Joint significance test – structural model, unrestricted ^

^

^

^

^

^

^

Yt = β0 + β1X1t + β2X2t + β3X3i + β4X4i + β5X5i + ^ εt Source: Authors’ own equation.

The hypothesis statement for the joint test is as follows: H0: β4 = β5 = 0 H1: β4 ≠ β5 ≠ 0 Similarly to the LM procedure for adding new explanatory variables to a restricted model, the null hypothesis assumes that there is no difference between the coefficients of newly inserted independent variables and that they both equal to zero. The alternative hypothesis states that the coefficients are not equal and that they are different from zero. The F-test formula is shown in Equation 28. Equation 28. F-test formula with Error Sum Squares

(ESSR – ESSUR) / q Fq,(n–k) = ––––––––––––––– ESSUR / (n – k) Source: Pindyck, Rubinfeld (1998), p. 129.

The UR and R subscripts designate the use of unrestricted or restricted models respectively, q is the number of variables tested (in this case 2, Equation 27), n is the number of observations and k is the number of explanatory variables in the unrestricted model. The principle of this test is that the Error Sum of Squares (covered later) is less in the unrestricted than it is in the restricted model if the added variables, combined, are truly significant explanatory contributors. After a few transformations (shown in Equation 29 and Equation 30), the formula can be rewritten using R-squared (R2) as presented in Equation 29. Equation 29. R2 of the unrestricted model as a function of its Error Sum of Squares and Total Sum of Squares

ESSUR R2UR = 1 – –––––– TSSUR Source: Pindyck, Rubinfeld (1998), p. 130.

52

6.2. Interpreting Model’s Statistics Equation 30. R2 of the unrestricted model as a function of its Error Sum of Squares and Total Sum of Squares

ESSR R2R = 1 – –––––––– TSSR Source: Pindyck, Rubinfeld (1998), p. 130.

Equation 31. F-test formula with R-squared

(R2UR – R2R) / q Fq, n–k = ––––––––––––––– (1 – R2R ) / (n – k) Source: Pindyck, Rubinfeld (1998), p. 130.

This formula assumes that, if the alternative hypothesis is correct, the unrestricted model explains a greater amount of variation in the dependent variable as compared with the restricted model The decision rule is similar to other tests: if Fcritical is less than Fobserved, the null hypothesis is rejected and, in the example above, combined coefficients of X4 and X5 are statically different from each other and zero; therefore, they add additional information to the model.

6.2. Interpreting Model’s Statistics After conducting a detailed analysis of estimated coefficients, the model as a whole has to be taken under revision using its descriptive statistics (Table 22). Table 22. Model’s statistics output from the software after estimating the U.S. imports by regressing them on the constant term (C), disposable income (YD), the U.S. population (POP), wealth (W), the U.S. GDP and the U.S. exports R-squared

0.99315

Mean dependent var

757.2164

Adjusted R-squared

0.992974

S.D. dependent var

647.7554

S.E. of regression

54.29724

Akaike info criterion

10.85636

571948.9

Schwarz criterion

10.95531

Hannan-Quinn criter.

10.89641

Durbin-Watson stat

0.185468

Sum squared resid Log likelihood F-statistic Prob(F-statistic)

-1079.636 5625.545 0

Source: Authors’ own table based on calculations conducted with EViews software. 53

Chapter six: Model’s Results Interpretation R-squared – Equation 32 – represents the percentage of the variation in the dependent variable explained by the model. For example, in Table 22, R-squared equals 0.99315; therefore, a statement can be made that 99.31% of variation in the dependent variable is explained by its regression on the independent variables. It ranges from 0 to 1. As much as this statistic is very often the quoted one, it suffers from a serious problem. That is, it will increase as long as explanatory variables are added to the model, regardless of their true significance. This issue arises as there is no adjustment for changing degrees of freedom. For the purpose of solving this issue, the Adjusted R-squared statistic was developed. Equation 32. R-squared formula

R2 =

^

_

^

_

Σ (Y + Y) / Σ (Y –Y) 2

i

2

i

Source: Pindyck, Rubinfeld (1998), pp. 112–113.

Adjuster R-squared – Equation 33 – is interpreted similarly to R-squared but it is the preferred measurement as it is adjusted for the degrees of freedom. With regular R-squared, the addition of variables, regardless of their statistical significance to the model, will always increase it. When adding insignificant variables, the Adjusted R-squared will decrease and it can become negative. Adjusted R-squared ranges from – 1 to 1. Similar to R-squared, the higher the value of the Adjusted R-squared the better job the model does in explaining the variation of the dependent variable. Equation 33. Adjusted R-squared formula Adjsuted R = 1 – 2



^2

εt –––––– (n – k)

_

] /[ Σ

(Yi – Y)2 –––––––– (n – 1)

]

Source: Pindyck, Rubinfeld (1998), pp. 112–113.

Sum Squared resid. – Error Sum of Squares (ESS, Equation 35), is a measure ^ of the discrepancy between the original data (Yi ) and the estimated model (Yi); in other words, variation in the residuals or the unexplained variation. In the estimated model, ESS is equal to the Total Sum of Squares (TSS) net Regression Sum of Squares (RSS, Equation 37). TSS is the variation in the dependent variable (Equation 36) and RSS is the explained variation (Equation 37).

54

6.2. Interpreting Model’s Statistics Equation 34. Total Sum or Squares

TSS = RSS + ESS or

Σ

_

(Yi – Yi)2 =

Σ

^

_

(Yi – Yi)2 +

^ 2

Σ (Y – Y ) i

i

Source: Pindyck, Rubinfeld (1998), p. 89.

Equation 35. Error (Residual) sum of squares

ESS =

^ 2

Σ (Y – Y ) i

i

Source: Pindyck, Rubinfeld (1998), p. 89.

Equation 36. Total sum of squares

TSS =

Σ

_

(Yi – Yi)2

Source: Pindyck, Rubinfeld (1998), p. 89.

Equation 37. Regression sum of squares

RSS =

Σ

^

_

(Yi – Yi)2

Source: Pindyck, Rubinfeld (1998), p. 89.

F-statistic – statistic used to measure the overall statistical significance of the model. Prob(F-statistic) – probability of the F-statistic. The null hypothesis states that the model as a whole is statistically insignificant, while the alternative hypothesis says that the model as a whole is statistically significant. If the probability of the F-statistic is less than the level of significance the null hypothesis is rejected and a conclusion can be made that the model as a whole is statistically significant. Mean Dependent Variable – mean of the dependent variable. S.D. Dependent Variable – standard deviation of the dependent variable. Akaike, Schwarz and Hanna-Quinn criterions – information criterions that measure the goodness of fit of an estimated statistical model. The smaller the value assigned to those criterions the better the fit. 55

Chapter six: Model’s Results Interpretation Durbin-Watson statistic – a guide to detecting autocorrelation, with the ideal value being 2.00 (suggesting no autocorrelation). A reading below suggests a presence of a positive autocorrelation and a reading above hints of a negative autocorrelation. As has been described in the section on autocorrelation, this statistic comes with its drawbacks that need to be made a note of. In addition to R-squared and other statistics, plotting actual, fitted data and residuals on one graph (Graph 5) provides a good representation of how the model fits the original data and how the residuals are behaving. For example, the presented graph shows that the model does a good job of fitting the data as the fitted and the actual plots are hard to distinguish between. Residuals, with the exception of the year 2009, appear to show no signs of heteroscadesticity and suggest minimal, if any, signs of autocorrelation.1 Residuals graph is also a good place to look for signs of seasonality. Graph 5. Graph of actual values (Actual), the fitted model (Fitted); both on left-hand side axis, and resulting residuals (Residuals); right-hand axis

Source: Authors’ own graph based on calculations conducted with EViews software.

1 Of course, these are just ocular observations and, as has been mentioned when discussing each of these problems, additional tests should be conducted before making final claims on residuals of the model.

56

CHAPTER SEVEN

Forecasting The purpose of econometric models can be seen from two perspectives: one, to look at what took place and two, to look into the future (time-series) or to predict values (cross-section). In other words, given certain conditions, what should be the value of the dependent variable? Looking into the past allows the researcher to see what variables, and in what magnitude, have contributed to the value of the researched (explained) variable. An example would be a hedonic housing price model that as a result of provided characteristics, estimates their direct effect on the price for which the house was sold. This allows, with a certain degree of error, to estimate what a house with a set of given descriptive characteristics should sell for. The same can be applied to time-series models when, given values of specific explanatory variables, the model allows for an estimation of the parameters of explanatory values, which then can be used to simulate what the value of the dependent, researched, variable will be given the values of independent, explanatory variables.

7.1. Forecasting as a Model Testing Tool There is no testing like testing in the field. In addition to testing the model by looking at its statistics (R-squared, for example) another form of testing the model is by using it as a forecasting tool. The problem with testing the model using conventional forecasting is that it cannot be done immediately; it has to be done at the future (ex-ante or out-of-sample) time when model forecasts can be compared with actual numbers. For example, if we were to estimate a model with Poland’s imports as the dependent variable based on data from 1990 to 2010, and the estimation itself took place in the year 2010, we would have to wait till another observations of independent variables could be collected (if the 57

Chapter seven: Forecasting data is annual, then the year 2011), plug them into the model and compare the value obtained from the model with the actual record of Poland’s imports. The solution to this problem is ex-post forecasting, or forecasting within the dataset available. In order to do this, prior to estimating the model, a sample has to be properly set up. Figure 1. Division of the original data set into Estimation Period, Ex post and Ex ante sections T1

T2

T3

Present time

Estimation period

Ex post forecast period

Ex ante forecast period

Source: Authors’ own graphic based on Pindyck, Rubinfeld (1998), p. 203.

Usually, when the model is being estimated, it is done so on all data that is being available at the moment of the research being conducted (in Figure 1, T1 to T3). There are many good reasons why; two big ones are: one, to have the biggest data set possible, which in turn increases precision as well as allows for the use of more explanatory variables due to the increased number of degrees of freedom; two, it allows to capture the most recent trends (for obvious reasons, an estimation of a model with data from the years from 1960 to 1970 to reflect current trends misses the point completely). Data permitting, some observations, ideally the most recent ones as they are the closest to what comes next, should be left out of the data used to estimate the model. That is, the model should be estimated on data from T1 to T2 (Figure 1), leaving observations from T2 to T3 for model’s testing via ex post forecasting. To perform a forecast or test the model ex post, the easiest way is to simply plug in the values of explanatory variables from T2 to T3 (Figure 1) into a model with estimated parameters, and then to compare the results with the values of the dependent variable from that data frame. The plug-in method can also be used to forecast ex ante. The only difference is that the question is not how close our estimated values of the independent variable are to their corresponding actual values, but, given the values of dependent variables, what would the value of the independent variable be in, for example, T3+1.

58

7.2. Forecasting with ARIMA

7.2. Forecasting with ARIMA When dealing with time-series data, a very popular way of forecasting variables is through the use of an ARIMA (p, d, q), an Autoregressive Integrated Moving Average model. ARIMA models consist of the autoregressive (AR, p) and moving average (MA, q) terms where p and q are the respective orders of those processes. I in the model comes from indifferencing the data to make the variable a stationary one with d being the order of integration. When the forecasted variable used is stationary to begin with, (d = 0), ARIMA becomes ARMA (p, q). ARIMA as a tool has its significant advantages. First, it does not require any explanatory variables, all one needs is the dependent value itself. Second, this makes the analysis quick to conduct and, when needed, is also no time consuming when the procedure needs to be repeated (which comes in handy as will be proven a bit later). The obvious drawback of ARIMA is that, as it depends on the order of observations, it can only be used to work with time-series data. Also, as no independent variables are used, this analysis does not provide any information on determinants of changes in the variable of interest. There are four steps that need to be followed in order to achieve an effective ARIMA model: 1) test and correct the variable for nonstationarity,1 2) identify the AR and MA terms. Correlogram of the U.S. imports variable (shown in Table 23), is used to identify p and q, with the Autocorrelation column representing orders of Moving Averages and Partial-Autocorrelation representing orders of autocorrelation. Table 23. Correlogram of the U.S. imports variable after it has been differentiated Autocorrelation .|**** |

Partial Correlation .|**** |

1

.|*** |

.|*

|

2

.|*

|

*|.

|

3

.|*

|

.|.

|

4

Source: Authors’ own table based on calculations conducted with EViews software.

Looking at the above results (which are of I/1/) p = 1 and q = 2. 3) finalize the ARIMA model.

1

Described in detail in section 3.3. Stationarity Test 59

Chapter seven: Forecasting Based on the correlogram (Table 23), the ARIMA (1, 1, 2) model is estimated (results of which are posted in Table 24). Note that since the first difference was needed to be taken in order to achieve data stationarity, the dependent variable (the U.S. imports in this case) in the model is in its first difference. Table 24. ARIMA (1, 1, 2) model output for the U.S. imports variable. Note that the independent variable is not IM but d(IM) – the 1st-level difference of the original independent variable Variable C

Coefficient

Std. Error

t-Statistic

Prob.

10.95103

2.640449

4.147412

0.0001

AR(1)

0.41178

0.146435

2.812033

0.0055

MA(1)

0.042426

0.143773

0.29509

0.7683

MA(2)

0.308006

0.089616

3.436974

0.0007

Source: Authors’ own table based on calculations conducted with EViews software.

It is hard to interpret the coefficients though their statistical significance can be tested using p-values. Similarly, model statistics like R-squared can be used to evaluate the model’s fit to the original data. Since that is not the main purpose of using ARIMA models, high R-squares are unlikely. 4) forecast and test the model, adjust when needed. It can be hard to estimate the best ARIMA model on the first attempt, as reading of the correlogram is subjective to researcher’s interpretation. That is why using this approach sometimes is referred to as an art or a skill. If the initial model proves to be unsatisfactory, adjustments to the number of AR and MA terms can be made. Also, it is always worth checking the neighboring model, that is, ±1 AR and ±1 MA orders, when looking for the best fit and the best forecast (the word “best” being used relatively, of course). Of course, it is useful to first test the estimated model ex post in order of its evaluation and then ex ante.

7.3. Forecast Evaluation The researcher has a lot of tools to evaluate the forecast. Two common ones are descriptive statistics as Proportions and the Root Mean Square Error (provided by the software and which the process aims to minimize) and the ocular test by introducing upper and lower limits. Starting with the latter, as with any ocular test, a lot is left open for the interpretation of the examiner. As a result, setting the limits is a subjective 60

7.3. Forecast Evaluation procedure. One common approach is to take the forecasted value and then add double the standard error of the forecast to create the upper limit and to subtract it; therefore, creating the lower limit. By plotting the original and the forecasted values with the addition of limits (example shown in Graph 6) over the ex post period shows how well the forecast fits the actual occurrences within the set boundaries. If the forecast is expected to meet more restrictive requirements, the above-mentioned limits can be created with, for example, just one standard error – the case is opposite for more liberal requirements. The rule of thumb is that as long as original values stay within the limits of the forecast, the model does a good job of forecasting the dependent variable. Same evaluation method can be applied to the plug-in method. Graph 6. A plot of the original U.S. imports data (IM) versus the forecast (IMF) and the upper (UP) and lower (DOWN) limits

Source: Authors’ own graph based on calculations conducted with EViews software.

From the ocular examination of the forecasted values, the used ARIMA (1, 1, 2) model performs well over the first year; its values are nearly indistinguishable from the actual ones. But at the end of the year 2008, the forecast loses its validity as the actual values cross the set lower limit. It is very likely that a better ARIMA model should be used. Moreover, this shows that the longer the forecasted period, the greater the allowance for its error. Moving to some statistics as tools of evaluation, the first one is the Root Mean Squared Error (in the shown example it is equal to 267.5270) that is useful when comparing forecasts carried out with different models; the better the forecast the lower the value of the discussed statistic. The catch is that this is a comparative statistic, i.e., it is used to compare between the forecasts performed with different models, not the forecast itself. Other three statistics 61

Chapter seven: Forecasting that should be examined are the Bias Proportion, the Variance Proportion and the Covariance Proportion. The first statistic shows the spread between the mean of the forecast and the mean of the actual data. The second one does the same but for the variation of the forecast and the actual data. The last one measures what is left, that is, the unsystematic forecasting error. For a forecast to be considered a good forecast, the bias and the variance proportions should be as close to zero as possible, with all the noise being collected in the covariance proportion.2 In the example the bias proportion is equal to 0.449675, the variance proportion to 0.237572 and the covariance proportion to 0.312753; again suggesting that a better ARIMA model should be sought after.

2

62

For more information see: Pindyck, Rubinfeld (1998), pp. 210–214.

CHAPTER EIGHT

Conclusions After completing the research and describing it in an appropriate length and detail, a Conclusions paragraph consisting of closing remarks is written. The conclusion is not the same as the abstract that talks about what took place from the beginning to the end; the conclusion focuses more on end results and future actions. A brief summary of the results and their comparison with conclusions drawn from the literature review and economic theory are a good starting point. Another common topic to be included in this segment is the discussion of any problems incurred with the work, their sources and a list of possible solutions. One person cannot cover the topic researched in its entirety. Therefore, the researcher should suggest the areas related to the topic in which further studies should be conducted or parts of his or her own work that can be improved upon.

63

A. Transition At this point you have a good understanding of what it takes to get raw data and transform it using econometrics software packages into meaningful information. To further see how this is done, it is a good idea to take a look at one example that carries you through all the steps.

65

Example Let us work with data regarding the U.S., more specifically, its macroeconomic conditions. Prior to starting, it is important to note three things: one, this example is a full-length one, but it is made as short as possible by omitting some descriptions; two, this example focuses on the econometric part of a study, as a result the descriptive parts of the study as well as the literature review have been omitted; and three, as this is a real-world example, that is, the data is not staged or edited, some of the results may not look as pretty as they should.

Setup The aim of this study is to look at what factors should be taken under consideration when explaining changes in the import of the U.S. Therefore, the structural equation will take the form shown in Equation 38. Equation 38. Structural equation for the U.S. imports as the dependent variable

IMt = β0 + β1X1 + β2X2 + . . . + βnXn + εt Source: Authors’ own equation.

IMt represents the U.S. imports in year t and it will be explained with the set of potential independent variables listed in Table 25. These variables were found through the process of the literature review.

67

Example Table 25. Potential independent variables Name

U.S. imports

Real Disposable Personal Income

Symbol Unit Source of data in the model Imports of Goods and Services U.S. Department Billions of of Commerce: IM Chained U.S. Bureau of 2005 Dollars Economic Analysis Independent Variables U.S. Department Billions of of Commerce: YD Chained U.S. Bureau of 2005 Dollars Economic Analysis

Total Population: All Ages including Armed Forces Overseas

POP

Thousands

U.S. Department of Commerce: Census Bureau

Dow Jones Index

DJ

Index

finance.yahoo. com*

CPI

Index, 1982– 84 = 100

U.S. Department of Labor: Bureau of Labor Statistics

Consumer Price Index For All Urban Consumers: All Items Exports of Goods and Services

EX

Real Gross Domestic Product

GDP

Real Change in Private Inventories

CHG.INV

Presence of NAFTA

NAFTA

U.S. Department of Commerce: Billions of U.S. Bureau of Dollars** Economic Analysis U.S. Department Billions of of Commerce: Chained U.S. Bureau of 2005 Dollars Economic Analysis U.S. Department Billions of of Commerce: Chained U.S. Bureau of 2005 Dollars Economic Analysis Dummy Variable (1 – Yes, 0 – No)

68

Note

Seasonally Adjusted, Annual Rate

Seasonally Adjusted, Annual Rate Reported monthly, transformed into quarterly to match other data

Reported monthly, transformed into quarterly to match other data Seasonally Adjusted

Seasonally Adjusted, Annual Rate

Seasonally Adjusted, Annual Rate

Example Dummy Variable Presence of the Gold Standard

GOLD

(1 – Yes, 0 – No) Dummy Variable

Presence of the recession

RECES

(1 – Yes,

FRED***

0 – No) * Source: http://finance.yahoo.com/q/hp?s=^DJI&a=00&b=1&c=1960&d=02&e=2&f =2010&g=m&z=66&y=594. ** Ideally, all data would be in the same constant units, but such data was not available for the U.S. exports. *** http://research.stlouisfed.org/fred2/help-faq/. Source: Authors’ own table.

Descriptive Statistics Now that the set of variables to work with has been selected and the data for them has been collected, it is time to look at descriptive statistics presented in the end of the text. The most important observation is that there is an equal number of observations (200) for all variables. As expected, none of the variables have a normal distribution, but, as it is discussed earlier and in the suggested reference, this is not an issue.

69

Example

Hypothesis Statements Hypothesis statements, which are based on the examined literature, are presented in Table 26. Table 26. Hypothesis statements for all independent variables Variable

Name in the model

Alternative Hypothesis

Real Disposable Personal Income

YD

H1: βYD > 0

Total Population: All Ages including Armed Forces Overseas

POP

H1: βPOP > 0

Dow Jones Index

DJ

H1: βDJ > 0

Consumer Price Index For All Urban Consumers: All Items

CPI

H1: βCPI < 0

Exports of Goods and Services

EX

H1: βEX ≠ 0

GDP

H1: βGDP > 0

Real Gross Domestic Product Real Change in Private Inventories

CHG.INV

H1: βCHG.INV > 0

Presence of NAFTA

NAFTA

H1: βNAFTA > 0

Presence of the Gold Standard

GOLD

H1: βGOLD ≠ 0

Presence of the recession

RECES

H1: βRECES < 0

Source: Authors’ own table.

70

Example

Correlation matrix The next step is to look at the correlation matrix (see Table 27) for high correlations coefficients between the dependent variable and the possible independent variables as well as for signs of multicollinearity.

RECES

GOLD

NAFTA

CHGINV

GDP

0.97 0.94

0.98

0.94

0.98 0.97 -0.02

0.85 -0.49

0.01

YD

0.97

1.00 0.99

0.94

0.99

0.98 1.00 -0.05

0.86 -0.64

0.02

POP

0.94

0.99 1.00

0.91

0.99

0.97 0.99 -0.04

0.86 -0.68

0.01

DJ

0.98

0.94 0.91

1.00

0.91

0.97 0.95 -0.08

0.87 -0.41

0.03

CPI

0.94

0.99 0.99

0.91

1.00

0.96 0.99 -0.05

0.87 -0.64

0.01

EX

0.98

0.98 0.97

0.97

0.96

1.00 0.98 -0.05

0.89 -0.53

0.04

0.97

1.00 0.99

0.95

0.99

0.98 1.00 -0.03

0.87 -0.62

0.00

GDP CHGINV NAFTA GOLD RECES

EX

CPI

1.00

YD

IM

IM

DJ

POP

Table 27. Correlation Matrix for all variables

-0.02 -0.05 -0.04 -0.08 -0.05 -0.05 -0.03 0.85

0.86 0.86

0.87

0.87

0.89 0.87

1.00

0.03 -0.01 -0.50

0.03

1.00 -0.42 -0.05

-0.49 -0.64 -0.68 -0.41 -0.64 -0.53 -0.62 -0.01 -0.42 0.01

0.02 0.01

0.03

0.01

1.00 -0.02

0.04 0.00 -0.50 -0.05 -0.02

1.00

Source: Authors’ own table based on calculations conducted with EViews software.

From the above-presented correlation table it is clear that all but three (change in inventory, presence of the Gold Standard and the presence of recession) of the independent variables are highly, positively and statistically significantly correlated with the dependent variable. Unfortunately, when looking at correlation coefficients between independent variables themselves, there exists a high probability of multicollinearity. As a result, some of the variables that are derived from other variables (for example, export and gross domestic product) should be paid attention to as only one of them, theoretically the one that has the highest correlation coefficient with the dependent variable, should be included in the model. Additionally, R-squared of the model and p-values of coefficients of included explanatory variables will be monitored for the signs of multicollinearity.

71

Example

Unit Route Test As the literature shows that the most important explanatory variables are disposable income (the more money people have, the more they will buy) and population (the higher the number of customers, the higher the number of purchases), the first test for stationarity only for those and the dependent variable will be carried out. Hypotheses statements for the unit route test for each of the three variables are shown in Table 28. Table 28. Hypothesis statements for the Unit Route tests Variable IM

Null Hypothesis H0: the variable is nonstationary

Alternative Hypothesis H1: the variable is stationary

YD

H0: the variable is nonstationary

H1: the variable is stationary

POP

H0: the variable is nonstationary

H1: the variable is stationary

Source: Authors’ own table.

The results of the Augmented Dickey-Fuller tests1 are shown in Table 29. None of the original data was stationary on its levels. Taking the first difference solved the problem for the U.S. imports and disposable income, but the variable representing the U.S. population had to be differenced twice in order to achieve stationarity. Table 29. Results of the Augmented Dickey-Fuller test for the presence of a Unit Route IM

Test critical values:

Augmented Dickey-Fuller test statistic

D(IM) t-StatisProb. Augmented tic Dickey-Fuller test 0.493 0.986 statistic

Augmented Dickey-Fuller test statistic 1% level 5% level 10% level YD

-3.463 -2.876 -2.575

1% level 5% Test critical values: level 10% level D(YD)

t-StatisProb. Augmented tic Dickey-Fuller test 3.455 1.000 statistic

t-Statistic

Prob.

-7.628

0.000

-3.463 -2.876 -2.575

t-Statistic

Prob.

-8.902

0.000

1 There are other tests for the presence of the unit route, but the Augmented Dickey-Fuller test is administered as it does not suffer from the problem of subjectivism like other tests, for example, the analysis of the graph.

72

Example

1% level Test critical values:

5% level 10% level POP

-2.876

1% level 5% level 10% level

5% -2.876 level 10% -2.575 level D(POP, 2)

-2.575

-3.465 -2.877 -2.575

-3.464

Test critical values:

t-StatisProb. Augmented tic Dickey-Fuller test 1.776 1.000 statistic

Augmented Dickey-Fuller test statistic

Test critical values:

1% level

-3.463

1% level 5% Test critical values: level 10% level

t-Statistic

Prob.

-4.427

0.000

-3.465 -2.877 -2.575

Source: Authors’ own table based on calculations conducted with EViews software.

Model Estimation As mentioned earlier, thankfully, the literature review has put forward two independent variables that are the most cross-quoted in previous works; therefore, allowing for the construction of the restricted structural equation shown in Equation 39.2 Equation 39. Restricted structural equation

D(IMt ) = β0 + β1D(YD)+ β2D (POP, 2) + εt Source: Authors’ own equation.

Now that the restricted equation is properly specified, the estimation procedure can begin. The Ordinary Least Squares method of estimation of the parameters of the model is employed. The results of the estimation are presented in Table 30 and model’s statistics are shown in Table 31, and the resulting structural model is shown in Equation 40. Equation 40. Restricted structural model

D(IMt ) = 5.073 + 0.142D(YD)+ 0.008D (POP, 2) Source: Authors’ own equation based on calculations conducted with EViews software. 2 Note that if the final model can be constructed based on the literature review, it is the preferred way to proceed.

73

Example Table 30. Values of the restricted model’s parameters Variable

Coefficient

Std. Error

t-Statistic

Prob.

C

5.073

1.741

2.913

0.004

D(YD)

0.142

0.027

5.210

0.000

D(POP,2)

-0.008

0.015

-0.513

0.609

Source: Authors’ own table based on calculations conducted with EViews software.

Table 31. Values of the restricted model’s statistics R-squared

0.130

Mean dependent var.

11.023

Adjusted R-squared

0.120

S.D. dependent var.

19.094

17.908

Akaike info criterion

8.624

Schwarz criterion

8.676

Hannan-Quinn criter.

8.645

Durbin-Watson stat.

1.168

S.E. of regression Sum squared resid. Log likelihood F-statistic Prob. (F-statistic)

58686.390 -799.065 13.662 0.000

Source: Authors’ own table based on calculations conducted with EViews software.

Examining model’s statistics first, it can be said that the model as a whole is statistically significant – Prob. (F-statistic) = 0.000 – but it is a poor model as it only explains 13% of the variation in the dependent variable according to the R-squared statistic and even less, 12%, according to the Adjusted R-squared statistic. In addition, the model suffers from the presence of autocorrelation that is suggested by the Durbin-Watson statistic (1.168) and confirmed by the Breusch-Godfrey Serial Correlation Lagrange Multiplier test with the null hypothesis of no autocorrelation that is rejected due to p-value = 0.000 (Table 32). Table 32. Results of the Breusch-Godfrey Serial Correlation Lagrange Multiplier test Breusch-Godfrey Serial Correlation LM Test F-statistic

23.221

Prob. F (2,181)

0.000

Obs*R-squared

37.980

Prob. Chi-Square (2)

0.000

Source: Authors’ own table based on calculations conducted with EViews software.

As for the coefficients of the independent variables forced in based on the literature (Table 30), only the one assigned to disposable income is statistically significant (p-value = 0.000) and its sign in line with the set hypothesis (0.142). The coefficient of population is found to be highly statistically insignificant 74

Example (p-value = 0.609) given the 5% level of confidence and its sign is opposite of what is expected (-0.008).3 Obviously, the model needs to be improved on. To do so, first the auxiliary regression (Equation 41) is estimated with the residuals from Equation 40 as the dependent variable and all possible explanatory variables from Table 25 as independent factors. Equation 41. Structural auxiliary equation

εa = α0 + α1YD + α2POP + α3DJ + α4CPI + α5EX + α6GDP + α7CHG.INV + α8NAFTA + α9GOLD + α10RECES + γa Source: Authors’ own equation.

The results of the equation (Table 33) are as expected when looking at p-values of already included independent variables (p-value of independent income is very high, 0.7909 and the p-value for population is low, which is expected given its lack its of statistical significance in the restricted model). Table 33. Results of the auxiliary regression (1) Variable C

Coefficient -195.817

Std. Error

t-Statistic

107.739

-1.818

Prob. 0.071

YD

-0.005

0.017

-0.266

0.791

POP

0.001

0.001

1.907

0.058

DJ

0.005

0.002

2.088

0.038

CPI

0.154

0.175

0.878

0.381

EX

-0.014

0.034

-0.411

0.681

GDP

-0.016

0.015

-1.041

0.299

0.192

0.043

4.492

0.000

CHGINV NAFTA

-0.756

7.061

-0.107

0.915

GOLD

1.819

6.596

0.276

0.783

RECES

-10.658

3.416

-3.120

0.002

Source: Authors’ own table based on calculations conducted with EViews software.

Prior to adding any new explanatory variables to the restricted model, a statistical test with the use of a Lagrange Multiplier (number of observations times R-squared from the auxiliary model; 200 • 0.360679 = 72.1358) is carried 3 This may happen. Some variables often work for some test subjects, in this case countries, and some, even the ones most often used by the literature, may be found to highly statistically insignificant. Still, as both of the used explanatory factors are the ones that are used in the literature the most, they will stay in the model. At the same time, if none or significant most of the staple independent variables work, it is wise to use other ones.

75

Example out with the null hypothesis of no more information to be extracted (H0: αk+1 = αk+2 = … = αk+m = 0) and the alternative that some more information can be added (H1: αk+i ≠ 0; least for some i). Since at a 5% level of confidence and 10 – 2 degrees of freedom χ2critical (15.50731) is less than χ2observed (72.1358), the null hypothesis is rejected and a statement can be made that there is still some information that can be added, extracted. From the output presented above in Table 33, the obvious choice for addition to the unrestricted model is the variable representing changes in inventory (p-value = 0.000) and the presence of recession (p-value = 0.002). First, the two variables are tested for stationarity (Table 34). Both variables prove to be stationary, do not have unit route, in levels as critical values for both are more than the observed and p-values are less than 0.00. Table 34. Stationarity test for CHG.INV and RECES variables CHG.INV Augmented Dickey-Fuller test statistic Test critical values:

Prob.

-7.2201

0.000

1% level

-3.4654

5% level

-2.8768

10% level

-2.5750

RECES Augmented Dickey-Fuller test statistic Test critical values:

t-Statistic

t-Statistic

Prob.

-5.3811

0.000

1% level

-3.4654

5% level

-2.8768

10% level

-2.5750

Source: Authors’ own table based on calculations conducted with EViews software.

After adding new selected independent variables to the model, the structural equation takes the form shown in Equation 42 and the estimated parameters have values shown in Table 35, with model’s statistics presented in Table 36. Equation 42. Unrestricted structural model

D(IMt ) = β0 + β1D(YD)+ β2D(POP, 2) + β3CHGINV + β4RECES + εt Source: Authors’ own equation.

76

Example Table 35. Values of the unrestricted model’s parameters Variable C D(YD) D(POP,2) CHGINV RECES

Coefficient

Std. Error

t-Statistic

Prob.

2.105

1.456

0.147

3.064 0.094

0.024

3.844

0.000

-0.002

0.013

-0.178

0.859

0.209

0.041

5.142

0.000

-12.254

3.457

-3.544

0.001

Source: Authors’ own table based on calculations conducted with EViews software.

Table 36. Values of the unrestricted model’s statistics R-squared

0.355

Mean dependent var

11.023

Adjusted R-squared

0.340

S.D. dependent var

19.094

15.507

Akaike info criterion

8.347

Schwarz criterion

8.434

Hannan-Quinn criter.

8.382

Durbin-Watson stat

1.413

S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

43,526.250 -771.272 24.870 0.000

Source: Authors’ own table based on calculations conducted with EViews software.

As expected, both the R-squared and the Adjusted R-squared have increased; therefore, confirming the notion that the addition of the new explanatory variables was a good decision. The model as a whole is still statistically valid with a higher F-statistic that has probability less than 0.00. Another round of tests is run to see if there is still some more information to be extracted. As it turns out, for 10–4 degrees of freedom at a 5% confidence interval, χ2critical (12.59159) is less than χ2observed (200 • 0.195209 = 39.0418), hence the output of the second auxiliary model should be examined for new independent variables. Still, for the purpose of this example, let us assume that the unrestricted model based on the structural equation shown in Equation 42 is the final model and proceed with tests. Starting with the test of the model for multicollinearity, the correlation matrix (Table 27) strongly suggests that it may prove to be an issue. Still, given the fact that the independent variables in the model are statistically significant (with the exception of population) and R-squared is not excessively high, multicollinearity is not expected to be an issue. As for autocorrelation, the Breusch-Godfrey Serial Correlation Lagrange Multiplier test shows that it is an issue for of the 1st (Table 37) and the 2nd (Table 38) order.

77

Example Table 37. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for the final model (1) F-statistic Obs*R-squared

Breusch-Godfrey Serial Correlation LM Test 15.49961 Prob. F (2,179) 27.45655 Prob. Chi-Square (2)

0.00 0.00

Source: Authors’ own table based on calculations conducted with EViews software.

Table 38. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for the final model (2) F-statistic Obs*R-squared

Breusch-Godfrey Serial Correlation LM Test 6.178035 Prob. F (2,177) 12.07182 Prob. Chi-Square (2)

0.0025 0.0024

Source: Authors’ own table based on calculations conducted with EViews software.

Because there are few independent variables in the model, using the lags of the independent variable is not advised since the ratio of the prior to the latter would be 2:1. A solution to this issue is an inclusion of AR(p) terms. After AR(1) was introduced, the Breusch-Godfrey Serial Correlation Lagrange Multiplier test still shows that autocorrelation is an issue (Table 38). Therefore, the second term, AR(2), was added; test’s results shown in Table 39 suggest failing to reject the null of no autocorrelation eventually yielding the output of the final model shown in Table 40 and its statistics in Table 41. This result is supported by the Durbin-Watson statistic (1.942) being very close to its ideal value, 2.00. Table 39. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for the final model (3) F-statistic Obs*R-squared

Breusch-Godfrey Serial Correlation LM Test 0.867717 Prob. F (2,175) 1.806768 Prob. Chi-Square (2)

0.4217 0.4052

Source: Authors’ own table based on calculations conducted with EViews software.

Table 40. Values of the corrected unrestricted mode’s parameters Variable C D(YD) D(POP,2) CHGINV RECES AR(1) AR(2)

Coefficient 6.478 0.055 -0.001 0.156 -15.311 0.262 0.243

Std. Error 2.810 0.022 0.010 0.045 3.981 0.077 0.075

t-Statistic 2.305 2.558 -0.150 3.485 -3.846 3.401 3.240

Prob. 0.022 0.011 0.881 0.001 0.000 0.001 0.001

Source: Authors’ own table based on calculations conducted with EViews software. 78

Example Table 41. Values of the corrected unrestricted model’s statistics R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.448 0.429 14.457 36991.520 -749.007 23.903 0.000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

11.191 19.129 8.217 8.340 8.267 1.942

Source: Authors’ own table based on calculations conducted with EViews software.

Another test is to see if the residuals have a normal distribution.4 This is done by examining the Jarque-Bera statistic (5.569) with the null hypothesis of normal distribution. Since the p-value associated with the statistic equals 0.059051, and is just above the p-value at a 5% level of confidence (0.05), it is possible to say that, given the level of significance, the residuals are normally distributed. The last test is the White test for heteroscadesticity with the null hypothesis of no heteroscadesticity. Since the p-value of the test (0.0862, shown in Table 42) is above the 5% cut off point, a conclusion can be said that the model does not suffer from heteroscadesticity. Table 42. White heteroscadesticity test for the final model F-statistic Obs*R-squared Scaled explained SS

Heteroscadesticity Test: White 1.444 Prob. F (27,156) 36.788 Prob. Chi-Square (27) 44.504 Prob. Chi-Square (27)

0.086 0.099 0.018

Source: Authors’ own table based on calculations conducted with EViews software.

Moving to the model’s assessment, the estimated model explains 44.8% of variation in the dependent variable (R-squared = 0.448). In its entirety, the model is statistically significant – Prob. (F-statistic) = 0.000. As for the coefficients, all but the one assigned to population (p-value = 0.811) are statistically significant at a 5% level of significance (this includes both autoregressive terms) – the highest p-value, 0.011, is associated with disposable income. The interpretation of statistically significant coefficients is as follows: 1) YD: If the difference in real disposable income increases by one billion (of Chained U.S. 2005) USD, the difference5 in the U.S. imports will increase by 0.055 billion (of Chained U.S. 2005) USD, or 55,491,000 Chained U.S. 2005 USD, 4 5

Remembering that, as presented earlier, this is an ideal assumption. Remember that for stationarity reasons, the dependent variable had to be differenced.

79

Example 2) CHGINV: If the real change in private inventories increases by one billion (of Chained U.S. 2005) USD, the difference in the U.S. imports will increase by 0.156 billion (of Chained U.S. 2005) USD, or 156,196,000 Chained U.S. 2005 USD, 3) RECES: If the U.S. is in a recession, the difference in the U.S. imports will decrease by 15.311 billion (of Chained U.S. 2005) USD, or 15,311,240,000 Chained U.S. 2005 USD. All of the hypothesis statements regarding the signs of incorporated independent variables (as listed in Table 26) have been statistically confirmed at a 5% level of significance.6 Additionally, let us examine the graph (Graph 7) that shows how the fitted data looks when set against the actual data with incorporated residuals. Graph 7. Actual, fitted data and residuals of the final model

Source: Authors’ own graph based on calculations conducted with EViews software.

The fitted data is still off the actual data, which is seen as the discrepancy between the two series and high jumps in the values of the residuals.7 Lastly, the model is tested ex post over the data from the first quarter of the year 2007 to the fourth quarter of the year 2009. This is done in two ways. First, the forecast (IMF) is evaluated visually (Graph 8) by comparing it to the actual data (IM) with the upper/lower boundary being set by adding/subtracting twice the value of the standard error of the forecast to/from the IMF value. 6 This statement is made based on the examination of p-values of those coefficients. Of course, t-tests can be carried out to manually prove the referred to statement, but in practice it is omitted to avoid repetition. 7 This is expected as the low quality of the fit is suggested by the value of the R-squared statistic and is due to the assumption that the analyzed model is the final model.

80

Example Graph 8. Ex post forecast of the final model

Source: Authors’ own graph based on calculations conducted with EViews software.

The graph shows that till the third quarter of the year 2008, the model did a very good job when it comes to forecasting the values of the U.S. imports. After that, the actual data begins to significantly deviate from the forecast that still was able to detect the incoming downward trend with a recovery at the end. This discrepancy between the two values will be corrected for by adding new explanatory variables (for example, the variable correcting for the presence of the 2007 economic crisis that occurred at this time). Looking at the forecast by examining its statistics, the key three (bias, 0.3411, variance, 0.5972, and covariance, 0.0617) proportions,8 it is possible to say that the bulk of the bias is associated with the fact that the variation of the forecast is far from the variation of the actual series, followed by the fact that the mean of the forecast is far from the mean of the actual series, with the least bias being associated with the covariance proportion.9

8 The Root Mean Squared Error, although very important, is not evaluated here as it is used to compare between the forecasts. 9 In a good forecast, bias and variance proportions ought to be very low with the bulk of the bias being attributed to the covariance proportion.

81

82

IM 757.22 505.55 2208.34 108.45 647.76 0.97 2.54 32.94 0.00 200

YD 5402.41 5078.25 10095.10 1955.50 2415.92 0.43 2.03 13.94 0.00 200

POP 240896.80 237375.50 308413.30 179590.30 36955.19 0.18 1.85 12.07 0.00 200

DJ CPI EX 3770.61 106.52 580.11 1262.77 105.78 357.77 13379.36 218.91 1670.43 573.47 29.40 94.76 3940.41 61.10 455.34 1.01 0.20 0.78 2.41 1.66 2.29 37.21 16.29 24.47 0.00 0.00 0.00 200 200 200

GDP CHGINV 7339.93 24.62 6708.77 25.01 13415.27 117.20 2802.62 -160.22 3202.17 37.41 0.43 -1.21 1.95 7.82 15.35 242.21 0.00 0.00 200 200

Source: Authors’ own table based on calculations conducted with EViews software.

Mean Median Max. Min. Std.Dev. Skewness Kurtosis J-B Prob. Obs.

Table 43. Descriptive Statistics NAFTA GOLD RECES 0.38 0.22 0.20 0.00 0.00 0.00 1.00 1.00 1.00 0.00 0.00 0.00 0.49 0.42 0.40 0.49 1.35 1.54 1.24 2.83 3.37 33.83 61.16 80.16 0.00 0.00 0.00 200 200 200

Example

Final Remarks Performing econometric research is a science and writing a clear description is an art. The purpose of this book was to guide you through bringing both of those skills together. Be it describing variables or using the LM test to find the presence of autocorrelation, the important thing is to understand that conducting research is a step-by-step process. Yet, just like any other highly structured process, even this one sometimes requires adjustments. Think, make a plan, take notes and you will be fine. And always remember: if the critical and p-value are low, the null has to go.

83

Statistical Tables Statistical Tables

Statistical Tables z-table Area between 0 and z

0

0.01

0.02

0.03

0

0.004

0.008

0.012

0.04

0.05

0.06

0.07

0.08

0.09

0.016 0.0199 0.0239 0.0279 0.0319 0.0359

0.1

0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2

0.0793 0.0832 0.0871

0.3

0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443

0.4

0.1554 0.1591 0.1628 0.1664

0.5

0.1915

0.6

0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7

0.091 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.148 0.1517

0.17 0.1736 0.1772 0.1808 0.1844 0.1879

0.195 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157

0.219 0.2224

0.258 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8

0.2881

0.9

0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315

1

0.291 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.334 0.3365 0.3389

0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1

0.3643 0.3665 0.3686 0.3708 0.3729 0.3749

1.2

0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962

1.3

0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4

0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5

0.4332 0.4345 0.4357

1.6

0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7

0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8

0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 2

0.377

0.379

0.381

0.383

0.398 0.3997 0.4015

0.437 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

0.4713 0.4719 0.4726 0.4732 0.4738 0.4744

0.475 0.4756 0.4761 0.4767

0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1

0.4821 0.4826

2.2

0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887

2.3

0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4

0.4918

0.492 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5

0.4938

0.494 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6

0.4953 0.4955 0.4956 0.4957 0.4959

0.496 0.4961 0.4962 0.4963 0.4964

2.7

0.4965 0.4966 0.4967 0.4968 0.4969

0.497 0.4971 0.4972 0.4973 0.4974

2.8

0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979

2.9

0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3

86

0

0.483 0.4834 0.4838 0.4842 0.4846

0.485 0.4854 0.4857

0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989

0.489

0.498 0.4981 0.499

0.499

Statistical Tables t-table degrees of probability freedom one-tail 0.4 test two-tail 0.8 test 1

0.32492

0.25

0.1

0.05

0.025

0.01

0.005

0.0005

0.5

0.2

0.1

0.05

0.02

0.01

0.001

1 3.077684 6.313752

12.7062 31.82052 63.65674 636.6192

2 0.288675 0.816497 1.885618 2.919986

4.30265

6.96456

9.92484

31.5991

3 0.276671 0.764892 1.637744 2.353363

3.18245

4.5407

5.84091

12.924

4 0.270722 0.740697 1.533206 2.131847

2.77645

3.74695

4.60409

8.6103

5 0.267181 0.726687 1.475884 2.015048

2.57058

3.36493

4.03214

6.8688

6 0.264835 0.717558 1.439756

1.94318

2.44691

3.14267

3.70743

5.9588

7 0.263167 0.711142 1.414924 1.894579

2.36462

2.99795

3.49948

5.4079

8 0.261921 0.706387 1.396815 1.859548

2.306

2.89646

3.35539

5.0413

9 0.260955 0.702722 1.383029 1.833113

2.26216

2.82144

3.24984

4.7809

10 0.260185 0.699812 1.372184 1.812461

2.22814

2.76377

3.16927

4.5869

11 0.259556 0.697445

1.36343 1.795885

2.20099

2.71808

3.10581

4.437

12 0.259033 0.695483 1.356217 1.782288

2.17881

2.681

3.05454

4.3178

13 0.258591 0.693829 1.350171 1.770933 14 0.258213 0.692417

2.16037

2.65031

3.01228

4.2208

1.34503

1.76131

2.14479

2.62449

2.97684

4.1405

15 0.257885 0.691197 1.340606

1.75305

2.13145

2.60248

2.94671

4.0728

16 0.257599 0.690132 1.336757 1.745884

2.11991

2.58349

2.92078

4.015

17 0.257347 0.689195 1.333379 1.739607

2.10982

2.56693

2.89823

3.9651

18 0.257123 0.688364 1.330391 1.734064

2.10092

2.55238

2.87844

3.9216

19 0.256923 0.687621 1.327728 1.729133

2.09302

2.53948

2.86093

3.8834

20 0.256743 0.686954 1.325341 1.724718

2.08596

2.52798

2.84534

3.8495

21

0.25658 0.686352 1.323188 1.720743

2.07961

2.51765

2.83136

3.8193

22 0.256432 0.685805 1.321237 1.717144

2.07387

2.50832

2.81876

3.7921

23 0.256297 0.685306

1.31946 1.713872

2.06866

2.49987

2.80734

3.7676

24 0.256173

0.68485 1.317836 1.710882

2.0639

2.49216

2.79694

3.7454

25

0.68443 1.316345 1.708141

2.05954

2.48511

2.78744

3.7251

26 0.255955 0.684043 1.314972 1.705618

2.05553

2.47863

2.77871

3.7066

27 0.255858 0.683685 1.313703 1.703288

2.05183

2.47266

2.77068

3.6896

28 0.255768 0.683353 1.312527 1.701131

2.04841

2.46714

2.76326

3.6739

29 0.255684 0.683044 1.311434 1.699127

2.04523

2.46202

2.75639

3.6594

30 0.255605 0.682756 1.310415 1.697261

2.04227

2.45726

2.75

3.646

inf 0.253347

1.95996

2.32635

2.57583

3.2905

0.25606

0.67449 1.281552 1.644854

87

Statistical Tables F-table at 0.01 level of significance df in numerator df in denominator

1

2

3

4

5

6

1

4052.181

4999.5

5403.352

5624.583

5763.65

5858.986

2

98.503

99

99.166

99.249

99.299

99.333

3

34.116

30.817

29.457

28.71

28.237

27.911

4

21.198

18

16.694

15.977

15.522

15.207

5

16.258

13.274

12.06

11.392

10.967

10.672

6

13.745

10.925

9.78

9.148

8.746

8.466

7

12.246

9.547

8.451

7.847

7.46

7.191

8

11.259

8.649

7.591

7.006

6.632

6.371

9

10.561

8.022

6.992

6.422

6.057

5.802

10

10.044

7.559

6.552

5.994

5.636

5.386

11

9.646

7.206

6.217

5.668

5.316

5.069

12

9.33

6.927

5.953

5.412

5.064

4.821

13

9.074

6.701

5.739

5.205

4.862

4.62

14

8.862

6.515

5.564

5.035

4.695

4.456

15

8.683

6.359

5.417

4.893

4.556

4.318

16

8.531

6.226

5.292

4.773

4.437

4.202

17

8.4

6.112

5.185

4.669

4.336

4.102

18

8.285

6.013

5.092

4.579

4.248

4.015

19

8.185

5.926

5.01

4.5

4.171

3.939

20

8.096

5.849

4.938

4.431

4.103

3.871

21

8.017

5.78

4.874

4.369

4.042

3.812

22

7.945

5.719

4.817

4.313

3.988

3.758

23

7.881

5.664

4.765

4.264

3.939

3.71

24

7.823

5.614

4.718

4.218

3.895

3.667

25

7.77

5.568

4.675

4.177

3.855

3.627

26

7.721

5.526

4.637

4.14

3.818

3.591

27

7.677

5.488

4.601

4.106

3.785

3.558

28

7.636

5.453

4.568

4.074

3.754

3.528

29

7.598

5.42

4.538

4.045

3.725

3.499

30

7.562

5.39

4.51

4.018

3.699

3.473

40

7.314

5.179

4.313

3.828

3.514

3.291

60

7.077

4.977

4.126

3.649

3.339

3.119

120

6.851

4.787

3.949

3.48

3.174

2.956

inf

6.635

4.605

3.782

3.319

3.017

2.802

88

Statistical Tables df in numerator df in denominator

7

8

9

10

12

15

1

5928.356

5981.07

6022.473

6055.847

6106.321

6157.285

2

99.356

99.374

99.388

99.399

99.416

99.433

3

27.672

27.489

27.345

27.229

27.052

26.872

4

14.976

14.799

14.659

14.546

14.374

14.198

5

10.456

10.289

10.158

10.051

9.888

9.722

6

8.26

8.102

7.976

7.874

7.718

7.559

7

6.993

6.84

6.719

6.62

6.469

6.314

8

6.178

6.029

5.911

5.814

5.667

5.515

9

5.613

5.467

5.351

5.257

5.111

4.962

10

5.2

5.057

4.942

4.849

4.706

4.558

11

4.886

4.744

4.632

4.539

4.397

4.251

12

4.64

4.499

4.388

4.296

4.155

4.01

13

4.441

4.302

4.191

4.1

3.96

3.815

14

4.278

4.14

4.03

3.939

3.8

3.656

15

4.142

4.004

3.895

3.805

3.666

3.522

16

4.026

3.89

3.78

3.691

3.553

3.409

17

3.927

3.791

3.682

3.593

3.455

3.312

18

3.841

3.705

3.597

3.508

3.371

3.227

19

3.765

3.631

3.523

3.434

3.297

3.153

20

3.699

3.564

3.457

3.368

3.231

3.088

21

3.64

3.506

3.398

3.31

3.173

3.03

22

3.587

3.453

3.346

3.258

3.121

2.978

23

3.539

3.406

3.299

3.211

3.074

2.931

24

3.496

3.363

3.256

3.168

3.032

2.889

25

3.457

3.324

3.217

3.129

2.993

2.85

26

3.421

3.288

3.182

3.094

2.958

2.815

27

3.388

3.256

3.149

3.062

2.926

2.783

28

3.358

3.226

3.12

3.032

2.896

2.753

29

3.33

3.198

3.092

3.005

2.868

2.726

30

3.304

3.173

3.067

2.979

2.843

2.7

40

3.124

2.993

2.888

2.801

2.665

2.522

60

2.953

2.823

2.718

2.632

2.496

2.352

120

2.792

2.663

2.559

2.472

2.336

2.192

inf

2.639

2.511

2.407

2.321

2.185

2.039

89

Statistical Tables df in numerator df in denominator

20

30

40

60

120

INF

1

6208.73

6260.649

6286.782

6313.03

6339.391

6365.864

2

99.449

99.466

99.474

99.482

99.491

99.499

3

26.69

26.505

26.411

26.316

26.221

26.125

4

14.02

13.838

13.745

13.652

13.558

13.463

5

9.553

9.379

9.291

9.202

9.112

9.02

6

7.396

7.229

7.143

7.057

6.969

6.88

7

6.155

5.992

5.908

5.824

5.737

5.65

8

5.359

5.198

5.116

5.032

4.946

4.859

90

9

4.808

4.649

4.567

4.483

4.398

4.311

10

4.405

4.247

4.165

4.082

3.996

3.909

11

4.099

3.941

3.86

3.776

3.69

3.602

12

3.858

3.701

3.619

3.535

3.449

3.361

13

3.665

3.507

3.425

3.341

3.255

3.165

14

3.505

3.348

3.266

3.181

3.094

3.004

15

3.372

3.214

3.132

3.047

2.959

2.868

16

3.259

3.101

3.018

2.933

2.845

2.753

17

3.162

3.003

2.92

2.835

2.746

2.653

18

3.077

2.919

2.835

2.749

2.66

2.566

19

3.003

2.844

2.761

2.674

2.584

2.489

20

2.938

2.778

2.695

2.608

2.517

2.421

21

2.88

2.72

2.636

2.548

2.457

2.36

22

2.827

2.667

2.583

2.495

2.403

2.305

23

2.781

2.62

2.535

2.447

2.354

2.256

24

2.738

2.577

2.492

2.403

2.31

2.211

25

2.699

2.538

2.453

2.364

2.27

2.169

26

2.664

2.503

2.417

2.327

2.233

2.131

27

2.632

2.47

2.384

2.294

2.198

2.097

28

2.602

2.44

2.354

2.263

2.167

2.064

29

2.574

2.412

2.325

2.234

2.138

2.034

30

2.549

2.386

2.299

2.208

2.111

2.006

40

2.369

2.203

2.114

2.019

1.917

1.805

60

2.198

2.028

1.936

1.836

1.726

1.601

120

2.035

1.86

1.763

1.656

1.533

1.381

inf

1.878

1.696

1.592

1.473

1.325

1

Statistical Tables F-table at 0.025 level of significance df in numerator df in denominator

1

2

3

4

5

6

1

647.789

799.5

864.163

899.5833

921.8479

937.1111

2

38.5063

39

39.1655

39.2484

39.2982

39.3315

3

17.4434

16.0441

15.4392

15.101

14.8848

14.7347

4

12.2179

10.6491

9.9792

9.6045

9.3645

9.1973

5

10.007

8.4336

7.7636

7.3879

7.1464

6.9777

6

8.8131

7.2599

6.5988

6.2272

5.9876

5.8198

7

8.0727

6.5415

5.8898

5.5226

5.2852

5.1186

8

7.5709

6.0595

5.416

5.0526

4.8173

4.6517

9

7.2093

5.7147

5.0781

4.7181

4.4844

4.3197

10

6.9367

5.4564

4.8256

4.4683

4.2361

4.0721

11

6.7241

5.2559

4.63

4.2751

4.044

3.8807

12

6.5538

5.0959

4.4742

4.1212

3.8911

3.7283

13

6.4143

4.9653

4.3472

3.9959

3.7667

3.6043

14

6.2979

4.8567

4.2417

3.8919

3.6634

3.5014

15

6.1995

4.765

4.1528

3.8043

3.5764

3.4147

16

6.1151

4.6867

4.0768

3.7294

3.5021

3.3406

17

6.042

4.6189

4.0112

3.6648

3.4379

3.2767

18

5.9781

4.5597

3.9539

3.6083

3.382

3.2209

19

5.9216

4.5075

3.9034

3.5587

3.3327

3.1718

20

5.8715

4.4613

3.8587

3.5147

3.2891

3.1283

21

5.8266

4.4199

3.8188

3.4754

3.2501

3.0895

22

5.7863

4.3828

3.7829

3.4401

3.2151

3.0546

23

5.7498

4.3492

3.7505

3.4083

3.1835

3.0232

24

5.7166

4.3187

3.7211

3.3794

3.1548

2.9946

25

5.6864

4.2909

3.6943

3.353

3.1287

2.9685

26

5.6586

4.2655

3.6697

3.3289

3.1048

2.9447

27

5.6331

4.2421

3.6472

3.3067

3.0828

2.9228

28

5.6096

4.2205

3.6264

3.2863

3.0626

2.9027

29

5.5878

4.2006

3.6072

3.2674

3.0438

2.884

30

5.5675

4.1821

3.5894

3.2499

3.0265

2.8667

40

5.4239

4.051

3.4633

3.1261

2.9037

2.7444

60

5.2856

3.9253

3.3425

3.0077

2.7863

2.6274

120

5.1523

3.8046

3.2269

2.8943

2.674

2.5154

inf

5.0239

3.6889

3.1161

2.7858

2.5665

2.4082

91

Statistical Tables df in numerator df in denominator

7

8

9

10

12

15

1

948.2169

956.6562

963.2846

968.6274

976.7079

984.8668

2

39.3552

39.373

39.3869

39.398

39.4146

39.4313

3

14.6244

14.5399

14.4731

14.4189

14.3366

14.2527

4

9.0741

8.9796

8.9047

8.8439

8.7512

8.6565

5

6.8531

6.7572

6.6811

6.6192

6.5245

6.4277

6

5.6955

5.5996

5.5234

5.4613

5.3662

5.2687

7

4.9949

4.8993

4.8232

4.7611

4.6658

4.5678

8

4.5286

4.4333

4.3572

4.2951

4.1997

4.1012

92

9

4.197

4.102

4.026

3.9639

3.8682

3.7694

10

3.9498

3.8549

3.779

3.7168

3.6209

3.5217

11

3.7586

3.6638

3.5879

3.5257

3.4296

3.3299

12

3.6065

3.5118

3.4358

3.3736

3.2773

3.1772

13

3.4827

3.388

3.312

3.2497

3.1532

3.0527

14

3.3799

3.2853

3.2093

3.1469

3.0502

2.9493

15

3.2934

3.1987

3.1227

3.0602

2.9633

2.8621

16

3.2194

3.1248

3.0488

2.9862

2.889

2.7875

17

3.1556

3.061

2.9849

2.9222

2.8249

2.723

18

3.0999

3.0053

2.9291

2.8664

2.7689

2.6667

19

3.0509

2.9563

2.8801

2.8172

2.7196

2.6171

20

3.0074

2.9128

2.8365

2.7737

2.6758

2.5731

21

2.9686

2.874

2.7977

2.7348

2.6368

2.5338

22

2.9338

2.8392

2.7628

2.6998

2.6017

2.4984

23

2.9023

2.8077

2.7313

2.6682

2.5699

2.4665

24

2.8738

2.7791

2.7027

2.6396

2.5411

2.4374

25

2.8478

2.7531

2.6766

2.6135

2.5149

2.411

26

2.824

2.7293

2.6528

2.5896

2.4908

2.3867

27

2.8021

2.7074

2.6309

2.5676

2.4688

2.3644

28

2.782

2.6872

2.6106

2.5473

2.4484

2.3438

29

2.7633

2.6686

2.5919

2.5286

2.4295

2.3248

30

2.746

2.6513

2.5746

2.5112

2.412

2.3072

40

2.6238

2.5289

2.4519

2.3882

2.2882

2.1819 2.0613

60

2.5068

2.4117

2.3344

2.2702

2.1692

120

2.3948

2.2994

2.2217

2.157

2.0548

1.945

inf

2.2875

2.1918

2.1136

2.0483

1.9447

1.8326

Statistical Tables

df in numerator df in denominator

20

30

40

1

993.1028

1001.414

2

39.4479

39.465

3

14.1674

4 5

60

120

INF

1005.598

1009.8

1014.02

1018.258

39.473

39.481

39.49

39.498

14.081

14.037

13.992

13.947

13.902

8.5599

8.461

8.411

8.36

8.309

8.257

6.3286

6.227

6.175

6.123

6.069

6.015

6

5.1684

5.065

5.012

4.959

4.904

4.849

7

4.4667

4.362

4.309

4.254

4.199

4.142

8

3.9995

3.894

3.84

3.784

3.728

3.67

9

3.6669

3.56

3.505

3.449

3.392

3.333

10

3.4185

3.311

3.255

3.198

3.14

3.08

11

3.2261

3.118

3.061

3.004

2.944

2.883

12

3.0728

2.963

2.906

2.848

2.787

2.725

13

2.9477

2.837

2.78

2.72

2.659

2.595

14

2.8437

2.732

2.674

2.614

2.552

2.487

15

2.7559

2.644

2.585

2.524

2.461

2.395

16

2.6808

2.568

2.509

2.447

2.383

2.316

17

2.6158

2.502

2.442

2.38

2.315

2.247

18

2.559

2.445

2.384

2.321

2.256

2.187

19

2.5089

2.394

2.333

2.27

2.203

2.133

20

2.4645

2.349

2.287

2.223

2.156

2.085

21

2.4247

2.308

2.246

2.182

2.114

2.042

22

2.389

2.272

2.21

2.145

2.076

2.003

23

2.3567

2.239

2.176

2.111

2.041

1.968

24

2.3273

2.209

2.146

2.08

2.01

1.935

25

2.3005

2.182

2.118

2.052

1.981

1.906

26

2.2759

2.157

2.093

2.026

1.954

1.878

27

2.2533

2.133

2.069

2.002

1.93

1.853

28

2.2324

2.112

2.048

1.98

1.907

1.829

29

2.2131

2.092

2.028

1.959

1.886

1.807

30

2.1952

2.074

2.009

1.94

1.866

1.787

40

2.0677

1.943

1.875

1.803

1.724

1.637

60

1.9445

1.815

1.744

1.667

1.581

1.482

120

1.8249

1.69

1.614

1.53

1.433

1.31

inf

1.7085

1.566

1.484

1.388

1.268

1

93

Statistical Tables F-table at 0.05 level of significance df in numerator df in denominator

1

2

3

4

5

6

1

161.4476

199.5

215.7073

224.5832

230.1619

233.986

2

18.5128

19

19.1643

19.2468

19.2964

19.3295

3

10.128

9.5521

9.2766

9.1172

9.0135

8.9406

4

7.7086

6.9443

6.5914

6.3882

6.2561

6.1631

5

6.6079

5.7861

5.4095

5.1922

5.0503

4.9503

6

5.9874

5.1433

4.7571

4.5337

4.3874

4.2839

7

5.5914

4.7374

4.3468

4.1203

3.9715

3.866

8

5.3177

4.459

4.0662

3.8379

3.6875

3.5806

9

5.1174

4.2565

3.8625

3.6331

3.4817

3.3738

10

4.9646

4.1028

3.7083

3.478

3.3258

3.2172

11

4.8443

3.9823

3.5874

3.3567

3.2039

3.0946

12

4.7472

3.8853

3.4903

3.2592

3.1059

2.9961

13

4.6672

3.8056

3.4105

3.1791

3.0254

2.9153

14

4.6001

3.7389

3.3439

3.1122

2.9582

2.8477

15

4.5431

3.6823

3.2874

3.0556

2.9013

2.7905

16

4.494

3.6337

3.2389

3.0069

2.8524

2.7413

17

4.4513

3.5915

3.1968

2.9647

2.81

2.6987

18

4.4139

3.5546

3.1599

2.9277

2.7729

2.6613

19

4.3807

3.5219

3.1274

2.8951

2.7401

2.6283

20

4.3512

3.4928

3.0984

2.8661

2.7109

2.599

21

4.3248

3.4668

3.0725

2.8401

2.6848

2.5727

22

4.3009

3.4434

3.0491

2.8167

2.6613

2.5491

23

4.2793

3.4221

3.028

2.7955

2.64

2.5277

24

4.2597

3.4028

3.0088

2.7763

2.6207

2.5082

25

4.2417

3.3852

2.9912

2.7587

2.603

2.4904

26

4.2252

3.369

2.9752

2.7426

2.5868

2.4741

27

4.21

3.3541

2.9604

2.7278

2.5719

2.4591

28

4.196

3.3404

2.9467

2.7141

2.5581

2.4453

29

4.183

3.3277

2.934

2.7014

2.5454

2.4324

30

4.1709

3.3158

2.9223

2.6896

2.5336

2.4205

40

4.0847

3.2317

2.8387

2.606

2.4495

2.3359 2.2541

94

60

4.0012

3.1504

2.7581

2.5252

2.3683

120

3.9201

3.0718

2.6802

2.4472

2.2899

2.175

inf

3.8415

2.9957

2.6049

2.3719

2.2141

2.0986

Statistical Tables df in numerator df in denominator

7

8

9

10

12

15

1

236.7684

238.8827

240.5433

241.8817

243.906

245.9499

2

19.3532

19.371

19.3848

19.3959

19.4125

19.4291

3

8.8867

8.8452

8.8123

8.7855

8.7446

8.7029

4

6.0942

6.041

5.9988

5.9644

5.9117

5.8578

5

4.8759

4.8183

4.7725

4.7351

4.6777

4.6188

6

4.2067

4.1468

4.099

4.06

3.9999

3.9381

7

3.787

3.7257

3.6767

3.6365

3.5747

3.5107

8

3.5005

3.4381

3.3881

3.3472

3.2839

3.2184

9

3.2927

3.2296

3.1789

3.1373

3.0729

3.0061

10

3.1355

3.0717

3.0204

2.9782

2.913

2.845

11

3.0123

2.948

2.8962

2.8536

2.7876

2.7186

12

2.9134

2.8486

2.7964

2.7534

2.6866

2.6169

13

2.8321

2.7669

2.7144

2.671

2.6037

2.5331

14

2.7642

2.6987

2.6458

2.6022

2.5342

2.463

15

2.7066

2.6408

2.5876

2.5437

2.4753

2.4034

16

2.6572

2.5911

2.5377

2.4935

2.4247

2.3522

17

2.6143

2.548

2.4943

2.4499

2.3807

2.3077

18

2.5767

2.5102

2.4563

2.4117

2.3421

2.2686

19

2.5435

2.4768

2.4227

2.3779

2.308

2.2341

20

2.514

2.4471

2.3928

2.3479

2.2776

2.2033

21

2.4876

2.4205

2.366

2.321

2.2504

2.1757

22

2.4638

2.3965

2.3419

2.2967

2.2258

2.1508

23

2.4422

2.3748

2.3201

2.2747

2.2036

2.1282

24

2.4226

2.3551

2.3002

2.2547

2.1834

2.1077

25

2.4047

2.3371

2.2821

2.2365

2.1649

2.0889

26

2.3883

2.3205

2.2655

2.2197

2.1479

2.0716

27

2.3732

2.3053

2.2501

2.2043

2.1323

2.0558

28

2.3593

2.2913

2.236

2.19

2.1179

2.0411

29

2.3463

2.2783

2.2229

2.1768

2.1045

2.0275

30

2.3343

2.2662

2.2107

2.1646

2.0921

2.0148

40

2.249

2.1802

2.124

2.0772

2.0035

1.9245

60

2.1665

2.097

2.0401

1.9926

1.9174

1.8364

120

2.0868

2.0164

1.9588

1.9105

1.8337

1.7505

inf

2.0096

1.9384

1.8799

1.8307

1.7522

1.6664

95

Statistical Tables df in numerator df in denominator

20

30

40

60

120

INF

1

248.0131

250.0951

251.1432

252.1957

253.2529

254.3144

2

19.4458

19.4624

19.4707

19.4791

19.4874

19.4957

3

8.6602

8.6166

8.5944

8.572

8.5494

8.5264

4

5.8025

5.7459

5.717

5.6877

5.6581

5.6281

5

4.5581

4.4957

4.4638

4.4314

4.3985

4.365

6

3.8742

3.8082

3.7743

3.7398

3.7047

3.6689

7

3.4445

3.3758

3.3404

3.3043

3.2674

3.2298

8

3.1503

3.0794

3.0428

3.0053

2.9669

2.9276

96

9

2.9365

2.8637

2.8259

2.7872

2.7475

2.7067

10

2.774

2.6996

2.6609

2.6211

2.5801

2.5379

11

2.6464

2.5705

2.5309

2.4901

2.448

2.4045

12

2.5436

2.4663

2.4259

2.3842

2.341

2.2962

13

2.4589

2.3803

2.3392

2.2966

2.2524

2.2064

14

2.3879

2.3082

2.2664

2.2229

2.1778

2.1307

15

2.3275

2.2468

2.2043

2.1601

2.1141

2.0658

16

2.2756

2.1938

2.1507

2.1058

2.0589

2.0096

17

2.2304

2.1477

2.104

2.0584

2.0107

1.9604

18

2.1906

2.1071

2.0629

2.0166

1.9681

1.9168

19

2.1555

2.0712

2.0264

1.9795

1.9302

1.878

20

2.1242

2.0391

1.9938

1.9464

1.8963

1.8432

21

2.096

2.0102

1.9645

1.9165

1.8657

1.8117

22

2.0707

1.9842

1.938

1.8894

1.838

1.7831

23

2.0476

1.9605

1.9139

1.8648

1.8128

1.757

24

2.0267

1.939

1.892

1.8424

1.7896

1.733

25

2.0075

1.9192

1.8718

1.8217

1.7684

1.711

26

1.9898

1.901

1.8533

1.8027

1.7488

1.6906

27

1.9736

1.8842

1.8361

1.7851

1.7306

1.6717

28

1.9586

1.8687

1.8203

1.7689

1.7138

1.6541

29

1.9446

1.8543

1.8055

1.7537

1.6981

1.6376

30

1.9317

1.8409

1.7918

1.7396

1.6835

1.6223

40

1.8389

1.7444

1.6928

1.6373

1.5766

1.5089

60

1.748

1.6491

1.5943

1.5343

1.4673

1.3893

120

1.6587

1.5543

1.4952

1.429

1.3519

1.2539

inf

1.5705

1.4591

1.394

1.318

1.2214

1

Statistical Tables F-table at 0.1 level of significance df in numerator df in denominator

1

2

3

4

5

6

1

39.86346

49.5

53.59324

55.83296

57.24008

58.20442

2

8.52632

9

9.16179

9.24342

9.29263

9.32553

3

5.53832

5.46238

5.39077

5.34264

5.30916

5.28473

4

4.54477

4.32456

4.19086

4.10725

4.05058

4.00975

5

4.06042

3.77972

3.61948

3.5202

3.45298

3.40451

6

3.77595

3.4633

3.28876

3.18076

3.10751

3.05455

7

3.58943

3.25744

3.07407

2.96053

2.88334

2.82739

8

3.45792

3.11312

2.9238

2.80643

2.72645

2.66833

9

3.3603

3.00645

2.81286

2.69268

2.61061

2.55086

10

3.28502

2.92447

2.72767

2.60534

2.52164

2.46058

11

3.2252

2.85951

2.66023

2.53619

2.45118

2.38907

12

3.17655

2.8068

2.60552

2.4801

2.39402

2.33102

13

3.13621

2.76317

2.56027

2.43371

2.34672

2.28298

14

3.10221

2.72647

2.52222

2.39469

2.30694

2.24256

15

3.07319

2.69517

2.48979

2.36143

2.27302

2.20808

16

3.04811

2.66817

2.46181

2.33274

2.24376

2.17833

17

3.02623

2.64464

2.43743

2.30775

2.21825

2.15239

18

3.00698

2.62395

2.41601

2.28577

2.19583

2.12958

19

2.9899

2.60561

2.39702

2.2663

2.17596

2.10936

20

2.97465

2.58925

2.38009

2.24893

2.15823

2.09132

21

2.96096

2.57457

2.36489

2.23334

2.14231

2.07512

22

2.94858

2.56131

2.35117

2.21927

2.12794

2.0605

23

2.93736

2.54929

2.33873

2.20651

2.11491

2.04723

24

2.92712

2.53833

2.32739

2.19488

2.10303

2.03513

25

2.91774

2.52831

2.31702

2.18424

2.09216

2.02406

26

2.90913

2.5191

2.30749

2.17447

2.08218

2.01389

27

2.90119

2.51061

2.29871

2.16546

2.07298

2.00452

28

2.89385

2.50276

2.2906

2.15714

2.06447

1.99585

29

2.88703

2.49548

2.28307

2.14941

2.05658

1.98781

30

2.88069

2.48872

2.27607

2.14223

2.04925

1.98033

40

2.83535

2.44037

2.22609

2.09095

1.99682

1.92688

60

2.79107

2.39325

2.17741

2.04099

1.94571

1.87472

120

2.74781

2.34734

2.12999

1.9923

1.89587

1.82381

inf

2.70554

2.30259

2.0838

1.94486

1.84727

1.77411

97

Statistical Tables

df in numerator df in denominator

7

8

9

10

12

15

1

58.90595

59.43898

59.85759

60.19498

60.70521

61.22034

2

9.34908

9.36677

9.38054

9.39157

9.40813

9.42471

3

5.26619

5.25167

5.24

5.23041

5.21562

5.20031

4

3.97897

3.95494

3.93567

3.91988

3.89553

3.87036

5

3.3679

3.33928

3.31628

3.2974

3.26824

3.23801

6

3.01446

2.98304

2.95774

2.93693

2.90472

2.87122

7

2.78493

2.75158

2.72468

2.70251

2.66811

2.63223

8

2.62413

2.58935

2.56124

2.53804

2.50196

2.46422

9

2.50531

2.46941

2.44034

2.41632

2.37888

2.33962

10

2.41397

2.37715

2.34731

2.3226

2.28405

2.24351

11

2.34157

2.304

2.2735

2.24823

2.20873

2.16709

12

2.28278

2.24457

2.21352

2.18776

2.14744

2.10485

13

2.2341

2.19535

2.16382

2.13763

2.09659

2.05316

14

2.19313

2.1539

2.12195

2.0954

2.05371

2.00953

15

2.15818

2.11853

2.08621

2.05932

2.01707

1.97222

16

2.128

2.08798

2.05533

2.02815

1.98539

1.93992

17

2.10169

2.06134

2.02839

2.00094

1.95772

1.91169

18

2.07854

2.03789

2.00467

1.97698

1.93334

1.88681

19

2.05802

2.0171

1.98364

1.95573

1.9117

1.86471

20

2.0397

1.99853

1.96485

1.93674

1.89236

1.84494

21

2.02325

1.98186

1.94797

1.91967

1.87497

1.82715

22

2.0084

1.9668

1.93273

1.90425

1.85925

1.81106

23

1.99492

1.95312

1.91888

1.89025

1.84497

1.79643

24

1.98263

1.94066

1.90625

1.87748

1.83194

1.78308

25

1.97138

1.92925

1.89469

1.86578

1.82

1.77083

26

1.96104

1.91876

1.88407

1.85503

1.80902

1.75957

27

1.95151

1.90909

1.87427

1.84511

1.79889

1.74917

28

1.9427

1.90014

1.8652

1.83593

1.78951

1.73954

29

1.93452

1.89184

1.85679

1.82741

1.78081

1.7306

30

1.92692

1.88412

1.84896

1.81949

1.7727

1.72227

40

1.87252

1.82886

1.7929

1.76269

1.71456

1.66241

60

1.81939

1.77483

1.73802

1.70701

1.65743

1.60337

120

1.76748

1.72196

1.68425

1.65238

1.6012

1.545

inf

1.71672

1.6702

1.63152

1.59872

1.54578

1.48714

98

Statistical Tables

df in numerator df in denominator

20

30

40

60

120

INF

1

61.74029

62.26497

62.52905

62.79428

63.06064

63.32812

2

9.44131

9.45793

9.46624

9.47456

9.48289

9.49122

3

5.18448

5.16811

5.15972

5.15119

5.14251

5.1337

4

3.84434

3.81742

3.80361

3.78957

3.77527

3.76073

5

3.20665

3.17408

3.15732

3.14023

3.12279

3.105

6

2.83634

2.79996

2.78117

2.76195

2.74229

2.72216

7

2.59473

2.55546

2.5351

2.51422

2.49279

2.47079

8

2.42464

2.38302

2.36136

2.3391

2.31618

2.29257

9

2.29832

2.25472

2.23196

2.20849

2.18427

2.15923

10

2.20074

2.15543

2.13169

2.10716

2.08176

2.05542

11

2.12305

2.07621

2.05161

2.02612

1.99965

1.97211

12

2.05968

2.01149

1.9861

1.95973

1.93228

1.90361

13

2.00698

1.95757

1.93147

1.90429

1.87591

1.8462

14

1.96245

1.91193

1.88516

1.85723

1.828

1.79728

15

1.92431

1.87277

1.84539

1.81676

1.78672

1.75505

16

1.89127

1.83879

1.81084

1.78156

1.75075

1.71817

17

1.86236

1.80901

1.78053

1.75063

1.71909

1.68564

18

1.83685

1.78269

1.75371

1.72322

1.69099

1.65671

19

1.81416

1.75924

1.72979

1.69876

1.66587

1.63077

20

1.79384

1.73822

1.70833

1.67678

1.64326

1.60738

21

1.77555

1.71927

1.68896

1.65691

1.62278

1.58615

22

1.75899

1.70208

1.67138

1.63885

1.60415

1.56678

23

1.74392

1.68643

1.65535

1.62237

1.58711

1.54903

24

1.73015

1.6721

1.64067

1.60726

1.57146

1.5327

25

1.71752

1.65895

1.62718

1.59335

1.55703

1.5176

26

1.70589

1.64682

1.61472

1.5805

1.54368

1.5036

27

1.69514

1.6356

1.6032

1.56859

1.53129

1.49057

28

1.68519

1.62519

1.5925

1.55753

1.51976

1.47841

29

1.67593

1.61551

1.58253

1.54721

1.50899

1.46704

30

1.66731

1.60648

1.57323

1.53757

1.49891

1.45636

40

1.60515

1.54108

1.50562

1.46716

1.42476

1.37691

60

1.54349

1.47554

1.43734

1.3952

1.34757

1.29146

120

1.48207

1.40938

1.3676

1.32034

1.26457

1.19256

inf

1.4206

1.34187

1.29513

1.23995

1.1686

1

99

Statistical Tables χ2 distribution table degrees probability of freedom

0.995

0.99

0.975

0.95

0.9

0.75

0.5

1

0.00004

0.00016

0.00098

0.00393

0.01579

0.10153

0.45494

2

0.01003

0.0201

0.05064

0.10259

0.21072

0.57536

1.38629

3

0.07172

0.11483

0.2158

0.35185

0.58437

1.21253

2.36597

4

0.20699

0.29711

0.48442

0.71072

1.06362

1.92256

3.35669

5

0.41174

0.5543

0.83121

1.14548

1.61031

2.6746

4.35146

6

0.67573

0.87209

1.23734

1.63538

2.20413

3.4546

5.34812

7

0.98926

1.23904

1.68987

2.16735

2.83311

4.25485

6.34581

8

1.34441

1.6465

2.17973

2.73264

3.48954

5.07064

7.34412

9

1.73493

2.0879

2.70039

3.32511

4.16816

5.89883

8.34283

10

2.15586

2.55821

3.24697

3.9403

4.86518

6.7372

9.34182

11

2.60322

3.05348

3.81575

4.57481

5.57778

7.58414

10.341

12

3.07382

3.57057

4.40379

5.22603

6.3038

8.43842

11.34032

13

3.56503

4.10692

5.00875

5.89186

7.0415

9.29907

12.33976

14

4.07467

4.66043

5.62873

6.57063

7.78953

10.16531

13.33927

15

4.60092

5.22935

6.26214

7.26094

8.54676

11.03654

14.33886

16

5.14221

5.81221

6.90766

7.96165

9.31224

11.91222

15.3385

17

5.69722

6.40776

7.56419

8.67176

10.08519

12.79193

16.33818

18

6.2648

7.01491

8.23075

9.39046

10.86494

13.67529

17.3379

19

6.84397

7.63273

8.90652

10.11701

11.65091

14.562

18.33765

20

7.43384

8.2604

9.59078

10.85081

12.44261

15.45177

19.33743

21

8.03365

8.8972

10.2829

11.59131

13.2396

16.34438

20.33723

22

8.64272

9.54249

10.98232

12.33801

14.04149

17.23962

21.33704

23

9.26042

10.19572

11.68855

13.09051

14.84796

18.1373

22.33688

24

9.88623

10.85636

12.40115

13.84843

15.65868

19.03725

23.33673

25

10.51965

11.52398

13.11972

14.61141

16.47341

19.93934

24.33659

26

11.16024

12.19815

13.8439

15.37916

17.29188

20.84343

25.33646

27

11.80759

12.8785

14.57338

16.1514

18.1139

21.7494

26.33634

28

12.46134

13.56471

15.30786

16.92788

18.93924

22.65716

27.33623

29

13.12115

14.25645

16.04707

17.70837

19.76774

23.56659

28.33613

30

13.78672

14.95346

16.79077

18.49266

20.59923

24.47761

29.33603

100

Statistical Tables

degrees probability of freedom

0.25

0.1

0.05

0.025

0.01

0.005

1

1.3233

2.70554

3.84146

5.02389

6.6349

7.87944

2

2.77259

4.60517

5.99146

7.37776

9.21034

10.59663

3

4.10834

6.25139

7.81473

9.3484

11.34487

12.83816

4

5.38527

7.77944

9.48773

11.14329

13.2767

14.86026

5

6.62568

9.23636

11.0705

12.8325

15.08627

16.7496

6

7.8408

10.64464

12.59159

14.44938

16.81189

18.54758

7

9.03715

12.01704

14.06714

16.01276

18.47531

20.27774

8

10.21885

13.36157

15.50731

17.53455

20.09024

21.95495

9

11.38875

14.68366

16.91898

19.02277

21.66599

23.58935

10

12.54886

15.98718

18.30704

20.48318

23.20925

25.18818

11

13.70069

17.27501

19.67514

21.92005

24.72497

26.75685

12

14.8454

18.54935

21.02607

23.33666

26.21697

28.29952

13

15.98391

19.81193

22.36203

24.7356

27.68825

29.81947

14

17.11693

21.06414

23.68479

26.11895

29.14124

31.31935

15

18.24509

22.30713

24.99579

27.48839

30.57791

32.80132

16

19.36886

23.54183

26.29623

28.84535

31.99993

34.26719

17

20.48868

24.76904

27.58711

30.19101

33.40866

35.71847

18

21.60489

25.98942

28.8693

31.52638

34.80531

37.15645

19

22.71781

27.20357

30.14353

32.85233

36.19087

38.58226

20

23.82769

28.41198

31.41043

34.16961

37.56623

39.99685

21

24.93478

29.61509

32.67057

35.47888

38.93217

41.40106

22

26.03927

30.81328

33.92444

36.78071

40.28936

42.79565

23

27.14134

32.0069

35.17246

38.07563

41.6384

44.18128

24

28.24115

33.19624

36.41503

39.36408

42.97982

45.55851

25

29.33885

34.38159

37.65248

40.64647

44.3141

46.92789

26

30.43457

35.56317

38.88514

41.92317

45.64168

48.28988

27

31.52841

36.74122

40.11327

43.19451

46.96294

49.64492

28

32.62049

37.91592

41.33714

44.46079

48.27824

50.99338

29

33.71091

39.08747

42.55697

45.72229

49.58788

52.33562

30

34.79974

40.25602

43.77297

46.97924

50.89218

53.67196

101

Bibliography 1)

Barro, R.J., D.B. Gordon (1983), A Positive Theory of Monetary Policy in a Natural Rate Model, “The Journal of Political Economy,” Vol. 91, No. 4, pp. 589–610, accessed via jstor.org, date of publication: 8.1983, date of accession: 3.2010, http://www.jstor.org/pss/1831069. 2) Caporale, G.M., L.A. Gil-Alana (2008), Modeling the U.S., U.K. and Japanese unemployment rates: Fractional integration and structural breaks, “Computational Statistics & Data Analysis,” Vol. 52, No. 11, pp. 4998–5013, date of publication: 7.2008, date of accession: 4.2010, http://www.sciencedirect.com/science/article/B6V8V-4SC78KW1/1/4db0cb865a44de068cb172a6c0f8ece3. 3) Chiang, A.C. (1984), Fundamental Methods of Mathematical Economics, McGraw-Hill, 1984. 4) Dunaev, B.B. (2005), Measuring Unemployment and Inflation as Wages Functions, “Cybernetics and System Analysis,” Vol. 41, No. 3, pp. 403– 414, date of publication: 5.2005, date of accession: 4.2010, http://www. springerlink.com/content/f075283t34082664/. 5) Greene, W.H. (2003), Econometric Analysis, Prentice Hall, 2003. 6) Gujarati, D.N. (2006), Essentials of Econometrics, McGraw-Hill/Irwin, New York 2006. 7) Hanke, J.E., D.W. Wichern (2005), Business Forecasting, Pearson Education, 2005. 8) Intriligator, M.D. (1978), Econometric Models, Techniques & Applications, Prentice-Hall, 1978. 9) Montgomery, A.L., V. Zarnowitz, S. Tsay, Ruey, G.C. Tiao (1998), Forecasting the U.S. Unemployment Rate, “Journal of the American Statistical Association,” Vol. 93, No. 442, pp. 478–493, accessed via jstor.org, date of publication: 6.1998, date of accession: 3.2010, http://www/jstor.ord/pss/2670094. 10) Pindyck, R.S., D.L. Rubinfeld (1998), Econometric Models and Econometric Forecasts, Irwin/McGraw-Hill International Editions, Singapore 1998. 11) Proietti, T. (2003), Forecasting the U.S. unemployment rate, “Computational Statistics & Data Analysis,” Vol. 42, No. 3, pp. 451–476, date of publication: 103

Bibliography

12)

13)

14)

15) 16) 17) 18)

104

3.2003, date of accession: 3.2010, http://portal.acm.org/citation. cmf?id=770742. Rothman, Ph. (1998), Forecasting Asymmetric Unemployment Rates, MIT Press, “The Review of Economics and Statistics,” Vol. 80, Issue: 1, pp. 164– 168, date of publication: 2.1998, date of accession: 3.2010, http://www. mitpressjournals.org/doi/abs/10.1162/003465398557276. Salop, S. (1979), A Model of the Natural Rate of Unemployment, “The American Economic Review,” Vol. 69, No. 1, pp. 117–125, accessed via jstor. org, date of publication: 3.1979, date of accession: 4.2010, http://www/ jstor.ord/pss/1802502. Shimer, R. (1998), Why Is the U.S. Unemployment Rate so Much Lower?, “NBER Macroeconomics Annual,” Vol. 13, pp. 11–61, accessed via jstor.org, date of publication: 1998, date of accession: 4.2010, http://www.jstor.org/ pss/4623732. Stock, J.H., M.W. Watson (2008), Introduction to Econometrics, Pearson Education, 2008. Studenmund, A.H. (2006), Using Econometrics. A practical guide, Pearson Education, 2006. Theil, H. (1971), Principles of Econometrics, A Wiley/Hamilton Publication, 1971. Wooldridge, J.M. (2010), Econometric Analysis of Cross Section and Panel Data, Massachusetts Institute of Technology, MIT Press, Cambridge 2010.

List of Figures

List of Figures Equation 1. Basic structural equation, i.e., the skeleton . . . . . . . . . . . . . . . .10 Equation 2. Simple, linear form structural equation for working with a cross-section data set, with i representing a specific observation . . . . .15 Equation 3. Simple, semi-log form structural equation for working with a cross-section data set, with i representing a specific observation – log-linear form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 Equation 4. Simple, semi-log form structural equation for working with a cross-section data set, with i representing a specific observation – linear-log form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 Equation 5. Simple, full-log form structural equation for working with a cross-section data set, with i representing a specific observation . . . . .16 Equation 6. Simple, linear form structural equation for working with a time-series data set, with t representing a specific year . . . . . . . . . . . .16 Equation 7. Simple, linear form structural equation for working with a panel data set, with i representing cross-section elements, i.e., host countries, and t representing time-series elements, i.e., a specific year. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17 Equation 8. Dummy variable creation: Sale price example, original equation (no dummy variable) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28 Equation 9. Dummy variable creation: Sale price example, original equation (with a dummy variable) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Equation 10. Dummy variable creation: Sale price example, original equation (with two dummy variables) . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Equation 11. Simple averaging method . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 Equation 12. Model estimation with forward stepwise method example – initial structural, restricted equation . . . . . . . . . . . . . . . . . . . .34 Equation 13. Model estimation with forward stepwise method example – initial structural, restricted model . . . . . . . . . . . . . . . . . . . . . .35 105

List of Figures Equation 14. Model estimation with forward stepwise method example – auxiliary structural equation . . . . . . . . . . . . . . . . . . . . . . . . . .35 Equation 15. Model estimation with forward stepwise method example – auxiliary structural model . . . . . . . . . . . . . . . . . . . . . . . . . . . .35 Equation 16. Lagrange Multiplier formula . . . . . . . . . . . . . . . . . . . . . . . . . .36 Equation 17. Model estimation with forward stepwise method example – initial structural, unrestricted model . . . . . . . . . . . . . . . . . . . .37 Equation 18. Structural equation with a AR(p) term . . . . . . . . . . . . . . . . . . .42 Equation 19. Structural equation with AR(p) terms 1 through 3 . . . . . . . . . .42 Equation 20. Structural equation with lagged dependent variable as an additional explanatory variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 Equation 21. Adjustment of the nth coefficient with r lagged dependent variables used as independent factors . . . . . . . . . . . . . . . . . .43 Equation 22. Adjustment of the 1st coefficient with one lagged dependent variables used as independent factors . . . . . . . . . . . . . . . . . .43 Equation 23. Linear structural equation of the model used in Model’s Results Interpretation chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 Equation 24. Estimated version of the linear structural equation of the model used in Model’s Results Interpretation chapter . . . . . . . . . . . .48 Equation 25. Example of the t-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51 Equation 26. Joint significance test – structural model, restricted. . . . . . . . .51 Equation 27. Joint significance test – structural model, unrestricted. . . . . . .52 Equation 28. F-test formula with Error Sum Squares. . . . . . . . . . . . . . . . . . .52 Equation 29. R2 of the unrestricted model as a function of its Error Sum of Squares and Total Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . .52 Equation 30. R2 of the unrestricted model as a function of its Error Sum of Squares and Total Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . .53 Equation 31. F-test formula with R-squared . . . . . . . . . . . . . . . . . . . . . . . . .53 Equation 32. R-squared formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54 Equation 33. Adjusted R-squared formula. . . . . . . . . . . . . . . . . . . . . . . . . . .54 Equation 34. Total Sum or Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Equation 35. Error (Residual) sum of squares . . . . . . . . . . . . . . . . . . . . . . . .55 Equation 36. Total sum of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Equation 37. Regression sum of squares . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 106

List of Figures Equation 38. Structural equation for U.S. imports as the dependent variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67 Equation 39. Restricted structural equation . . . . . . . . . . . . . . . . . . . . . . . . .73 Equation 40. Restricted structural model . . . . . . . . . . . . . . . . . . . . . . . . . . .73 Equation 41. Structural auxiliary equation . . . . . . . . . . . . . . . . . . . . . . . . . .75 Equation 42. Unrestricted structural model. . . . . . . . . . . . . . . . . . . . . . . . . .76 Figure 1. Division of the original data set into Estimation Period, Ex post and Ex ante sections. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58 Graph 1. U.S. gross domestic product (left-hand axis in billion, USD) . . . . . .20 Graph 2. U.S. gross domestic product (left-hand axis in billion, USD) with a linear trendline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21 Graph 3. A graphical representation of U.S. GDP after it has been transformed into a stationary variable via first-order differencing; D(GDP)23 Graph 4. Graph of residuals of a model with U.S. imports (IM) as the dependent variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 Graph 5. Graph of actual values (Actual), the fitted model (Fitted); both on left-hand side axis, and resulting residuals (Residuals); right-hand axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 Graph 6. A plot of the original U.S. imports data (IM) versus the forecast (IMF) and the upper (UP) and lower (DOWN) limits . . . . . . . . . .61 Graph 7. Actual, fitted data and residuals of the final model . . . . . . . . . . . .80 Graph 8. Ex post forecast of the final model . . . . . . . . . . . . . . . . . . . . . . . . .81 Table 1. An example of panel data with averages per firm and per year listed in the last row and the last column respectively . . . . . . . . . . .14 Table 2. Variables Info Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20 Table 3. An example of a correlogram of data with a unit root present . . . .22 Table 4. Output of the Augmented Dickey-Fuller test. . . . . . . . . . . . . . . . . . .22 Table 5. A correlogram of U.S. GDP after it has been transformed into a stationary variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 Table 6. the Augmented Dickey-Fuller test output testing the 1st difference of U.S. GDP for stationarity (only relevant information included)23

107

List of Figures Table 7. A correlation matrix for the number of U.S. FDI firms and the GDP in two regions in Poland. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Table 8. Descriptive statistics of U.S. imports, U.S. exports and a dummy variable for recession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 Table 9. A summary of information for the U.S. GDP variable . . . . . . . . . . . .27 Table 10. Dummy variable creation: European Union membership example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27 Table 11. Dummy variable creation: Sale price example, original data set . . .28 Table 12. Dummy variable creation: Sale price example, transformed data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28 Table 13. Dummy variable creation: Sale price example, transformed, version 2, data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Table 14. Supplementing the missing data example, original data set . . . . .31 Table 15. A section of the Chi-square table with error levels in the first row and degrees of freedom in the first column . . . . . . . . . . . . . . . . . . .36 Table 16. An example of a correlogram output for the U.S. imports model.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 Table 17. An example of the Breusch-Godfrey Serial Correlation LM test output for the U.S. imports model . . . . . . . . . . . . . . . . . . . . . . . . . .42 Table 18. An example of a heteroscedasticity LM White test for the U.S. Imports model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 Table 19. Coefficient estimation output from software after estimating the U.S. imports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 Table 20. Summary of the coefficient testing procedure for one-tail tests. . .50 Table 21. Summary of the coefficient testing procedure for two-tail tests . .51 Table 22. Model’s statistics output from the software after estimating the U.S. imports by regressing them on the constant term (C), disposable income (YD), U.S. population (POP), wealth (W), U.S. GDP and U.S. exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Table 23. Correlogram of U.S. imports variable after it has been differentiated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .59 Table 24. ARIMA (1, 1, 2) model output for U.S. imports variable. Note that the independent variable is not IM but d(IM) – a 1st-level difference of the original independent variable . . . . . . . . . . . . . . . . . . . .60 Table 25. Potential independent variables . . . . . . . . . . . . . . . . . . . . . . . . . . .68 Table 26. Hypothesis statements for all independent variables . . . . . . . . . . .70 108

List of Figures Table 27. Correlation Matrix for all variables . . . . . . . . . . . . . . . . . . . . . . . . .71 Table 28. Hypothesis statements for the Unit Route tests . . . . . . . . . . . . . . .72 Table 29. Results of the Augmented Dickey-Fuller test for the presence of a Unit Route. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72 Table 30. Values of the restricted model’s parameters. . . . . . . . . . . . . . . . . .74 Table 31. Values of the restricted model’s statistics. . . . . . . . . . . . . . . . . . . .74 Table 32. Results of the Breusch-Godfrey Serial Correlation Lagrange Multiplier test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74 Table 33. Results of the auxiliary regression (1) . . . . . . . . . . . . . . . . . . . . . . .75 Table 34. Stationarity test for CHG.INV and RECES variables . . . . . . . . . . . . .76 Table 35. Values of the unrestricted model’s parameters . . . . . . . . . . . . . . .77 Table 36. Values of the unrestricted model’s statistics. . . . . . . . . . . . . . . . . .77 Table 37. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for the final model (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Table 38. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for the final model (2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Table 39. Breusch-Godfrey Serial Correlation Lagrange Multiplier test for the final model (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Table 40. Values of the corrected unrestricted model’s parameters. . . . . . . .78 Table 41. Values of the corrected unrestricted model’s statistics . . . . . . . . . .79 Table 42. White heteroscadesticity test for the final model . . . . . . . . . . . . . .79 Table 43. Descriptive Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82

109

Notes

Notes

112

Suggest Documents