Monte Carlo Methods using Matlab. Roberto Casarin University Ca Foscari, Venice Summer School of Bayesian Econometrics Perugia 2013

Monte Carlo Methods using Matlab Roberto Casarin University Ca’ Foscari, Venice Summer School of Bayesian Econometrics Perugia 2013 September 9, 2013 ...
Author: Darcy Anderson
71 downloads 2 Views 799KB Size
Monte Carlo Methods using Matlab Roberto Casarin University Ca’ Foscari, Venice Summer School of Bayesian Econometrics Perugia 2013 September 9, 2013

2

Contents 1 A Matlab Primer 1.1 Programming Languages . . . . . . . . . . . 1.2 Fourth Generation Languages (4GPL) . . . 1.3 Matlab . . . . . . . . . . . . . . . . . . . . . 1.3.1 Operators . . . . . . . . . . . . . . . 1.3.2 Logical Operators . . . . . . . . . . . 1.3.3 Creating Matrices . . . . . . . . . . . 1.3.4 Matrix Description . . . . . . . . . . 1.3.5 Other Functions . . . . . . . . . . . . 1.3.6 Loops and If Statements . . . . . . . 1.3.7 Procedures . . . . . . . . . . . . . . . 1.4 Examples . . . . . . . . . . . . . . . . . . . 1.4.1 Input, Output and Graphics . . . . . 1.4.2 Ordinary Least Square . . . . . . . . 1.4.3 A Bayesian Linear Regression Model 1.5 From Matlab to Scilab and R . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

1 . 1 . 5 . 6 . 6 . 6 . 7 . 7 . 7 . 8 . 9 . 9 . 9 . 11 . 12 . 17

2 Monte Carlo Integration 2.1 Integration . . . . . . . . . . . . . 2.2 A Monte Carlo Estimator . . . . 2.3 Asymptotic Properties . . . . . . 2.4 Optimal Number of MC Samples 2.5 Appendix - Matlab Code . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

21 21 22 24 25 27

3 Importance Sampling 31 3.1 Importance Sampling . . . . . . . . . . . . . . . . . . . . . . . 31 3.2 Properties of the IS Estimators . . . . . . . . . . . . . . . . . 32 3.3 Generating Student-t Variables . . . . . . . . . . . . . . . . . 34 i

ii

CONTENTS

Chapter 1 A Matlab Primer Aim Learn some basic facts in Matlab programming Contents 1. Programming Languages 2. Fourth Generation Languages (4GPL) 3. Matlab 4. Examples 5. From Matlab to Scilab

1.1

Programming Languages

If you need to carry out an econometric analysis, before starting to write a code, may be you would like to have a look to the following link http://www.feweb.vu.nl/econometriclinks/software.html where many of the most used econometrics softwares and their contributed libraries are linked. 1

2

CHAPTER 1. A MATLAB PRIMER In the following we report a brief description of the softwares listed at the

econometriclinks webpage maintained by the Royal Economic Society: • A+, ACML, ADMB, AIMMS, ALOGIT, Alyuda, AMOS, AMPL, APL, Apophenia, Arc, AREMOS, AutoBox, Autometrics, AutoSignal

• B34S, BACC, BATS, BETA, BIOGEME, BMDP, Brodgar, BUGS, BV4 BACC: Bayesian Analysis, Computation and Communication. Free high quality generic software developed for different operating systems (Windows, Unix) and different front-ends. Specific model procedures as well. Supported by the US NSF. Developed by Bill McCausland under the supervision of John Geweke. BUGS: Bayesian inference Using Gibbs Sampling (MCMC: Markov Chain Monte Carlo)

• C(++), CART, Census X12, Caterpillar-SSA, CPLEX, ConfortS, CVar

• DataDesk, Dataplore, Dataplot, DATAVIEW, DEA-Solver, DEMETRA, Draco, DYALOG, DYNARE DYNARE: A Program for the Resolution and Simulation of Dynamic Models with Forward Variables Through the Use of a Relaxation Algorithm. Computes k-th order approximations of dynamic stochastic general equilibrium (DSGE) models. Also allows Bayesian Estimation of DSGEs

• EasyFit, EasyReg, EcoWin, ECTS, EQS, Eviews, Excel, EXPO • FAME, ForecastPro, Fortran, FreeFore, FSQP • GAMS, GARCH, GAUSS, GAUSSX, GiveWin, Gempack, GeoDa,

Genstat, GLIM, GLIMMIX, GQOPT, graphpad, Gnuplot, GSL, GRETL GAMS: Generic Algebraic Modeling System for large scale optimization problems.

1.1. PROGRAMMING LANGUAGES

3

GAUSS: is a programming language designed to operate with and on matrices. It is a general purpose tool. As such, it is a long way from more specialised econometric packages. On a spectrum which runs from the computer language C at one end to, say, the menu-driven econometric program EViews at the other, GAUSS is very much at the programming end. GRETL: Is a cross-platform software package for econometric analysis, written in the C programming language. It is free, open-source software.

• HLM • ICRFS-Plus, ILOG, IDAMS, IMSL, INSTAT, ITSM • J, JMP, JMulti, JStatCom, JWAVE • KNITRO • MacAnova, Maple, Mendeley, MARS, Mathcad, Mathematica, MathPlayer, MathML, MathType, MATLAB, Matrixer, M@ximize, MetrixND, MHTS, Microfit, MiKTeX, Minitab, MINOS, MIXOR, MLE, MLwiN, Modeleasy, ModelQED, Modler, MOSEK, Mplus, Modula, MuPAD, Mx. MATLAB: It is a high-level language and more specifically a 4GPL (such as SAS, SPSS, Stata, GAUSS) which allows matrix manipulations for numerical computing.

• NAG Mark 22 Numerical Libraries (2009), Genstat, MLP (ML estimation))

• Octave, O-Matrix, Omegahat, OpenDX, Ox, OxEdit, OxGauss, OxMetrics

Octave: a high-level language, primarily intended for numerical computations. It provides a convenient command line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with Matlab. It may also

4

CHAPTER 1. A MATLAB PRIMER be used as a batch-oriented language. Ox: is an object-oriented matrix programming language for statistics and econometrics developed by Jurgen Doornik

• PASS, PASW, PcFiml, PcGets, PcGive, PcNaive, Python

Python: Free Open Source Dynamic object-oriented programming language that can be used for many kinds of software development. It offers strong support for integration with other languages and tools, comes with extensive standard libraries

• R, RATS, REG-X, ReSampling Stats, Rlab, Rlab+

R: is 4GPL, it is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. RATS: developed by Estima, RATS (Regression Analysis of Time Series) is an econometrics and time-series analysis software package.

• S+, SAS, SCA, Scilab, SciPy, SciViews, Sciword, SCP, Shazam, Sigmaplot, SIMSTAT, SOLAS, SOL, Soritec, SpaceStat, SQlite, SPAD, Speakeasy, IBM SPSS, SsfPack, STAMP, Stata, StatCrunch, Statgraphics, Statistica, Stat/Transfer, StatsDirect, STL, Statview, SUDAAN, SVAR, SYSTAT SAS: is a 4GPL which allows to define a sequence of operations (statistical analysis and data management) to be performed on data Scilab: is 4GPL free and open source for numerical computation, similar to Matlab

• TSM, TISEAN, TRAMO/SEATS, TSP, TVAR TRAMO/SEATS:

• UNISTAT, VassarStats, ViSta

1.2. FOURTH GENERATION LANGUAGES (4GPL)

5

• Web Decomp, WebStat, WEKA, WinIDAMS, WINKS, Windows KWIKSTAT, XploRe, Winsolve, X-12-ARIMA, XLisp-Stat, Xtremes, X(G)PL

1.2

Fourth Generation Languages (4GPL)

Each step in the development of Computer Languages has aimed to reduce the amount of time required to write programs and reduce the amount of skill required to write Programs. In the 1GPL the programs are written in binary code and can access binary digits. To write programs with 1GPL is a very skilled job and it is very time consuming to test and debug programs. In the 2GPL, the programs are written in symbolic assembly code, they access bytes and are slightly less time demanding. In the 3GPL, the programs are written in a High Level Language (e.g. COBOL, Pascal, C, Fortran, etc), they can access records and programming requires less time and skills. In the 4GPL, the programs perform BOOLEAN operations on SETS (Mathematical), they requires less time and skills. A well known example of 4GPL is SQL. Scilab, Matlab, Gauss and R, see http://www.scilab.org/ http://www.mathworks.it/ http://www.aptech.com/ http://www.r-project.org/ are 4GPL and have some common features. They are a long way from more specialised econometric packages, are not menu-driven programs (such as EViews) and are very much at the programming end. Thus all of them require a certain degree of familiarity with programming methods and structures.

6

CHAPTER 1. A MATLAB PRIMER Another common feature is that they are extremely powerful for matrix

manipulation and in this sense they are more useful for economists than the 3GPL programming languages (such as C or Fortran), where the basic data units are all scalars. At the same time they are very flexible and allows more expert users to use interface to procedures written in other languages such as C, C++, or Fortran. An important feature of Scilab and R is that the source code of their libraries are available, which is not generally the case for Matlab and Gauss. Finally note that Matlab, Gauss and R have a lot of proprietary and contributed libraries oriented to statistics and econometrics.

1.3 1.3.1

Matlab Operators

• Select submatrix from matrix:

x( startrow : endrow, startcolumn : endcolumn ) • Transposition operator:

• Matrix Operators: + - * \ %

• Element-by-element operators: .+ .- .* .\ • Concatenating operators:

[leftmatrix, rightmatrix] [uppermatrix; bottommatrix] • Relational operators: < > == /= >= .== ./= .>= .0.45); d(j,1)=1;

10

CHAPTER 1. A MATLAB PRIMER

end; end; %************************************************* % Some Pictures... %************************************************* % figure(1) to have distinct graphs figure(1); title(’Time series data’); ylabel(’Data’); xlabel(’Time’); plot(xx,yy); figure(2); title(’Time-varying log-volatility’); a=plot(xx,s,’color’,[1 0 0]); %[red green blue] the rgb convention axis([1 n min(s) max(s)]); % Set tics figure(3); title(’Dummy’); plot(xx,d,’color’,[1 0 0]); %[red green blue] the rgb convention axis([1 n -0.1 1.1]); % Set tics %************************************************* % All charts in one pictures... %************************************************* figure(4); subplot(3,1,1); title(’Time series data’); ylabel(’Data’); xlabel(’Time’); plot(xx,yy); subplot(3,1,2); title(’Time-varying log-volatility’); plot(xx,s,’color’,[1 0 0]); %[red green blue] the rgb convention axis([1 n min(s) max(s)]); % Set tics subplot(3,1,3); title(’Dummy’); plot(xx,d,’color’,[1 0 0]); %[red green blue] the rgb convention axis([1 n -0.1 1.1]); % Set tics %************************************************* % histogram %************************************************* figure(5); hist(yy,50); %************************************************* % Save the results in a ouput file %************************************************* fid = fopen(’C:/Dottorato/Teaching/SummerSchoolBertinoro/... TutorialAntonietta/TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt’, ’w’); fprintf(fid, ’%5.2f\n’, yy); fclose(fid);

1.4. EXAMPLES

11

%*************************************************

1.4.2

Ordinary Least Square

We learn how to use structures in Matlab function results=ols(y,x) % PURPOSE: least-squares regression %--------------------------------------------------% USAGE: results = ols(y,x) % where: y = dependent variable vector (nobs x 1) % x = independent variables matrix (nobs x nvar) %--------------------------------------------------% RETURNS: a structure % results.meth = ’ols’ % results.beta = bhat % results.tstat = t-stats % results.yhat = yhat % results.resid = residuals % results.sige = e’*e/(n-k) % results.rsqr = rsquared % results.rbar = rbar-squared % results.dw = Durbin-Watson Statistic % results.nobs = nobs % results.nvar = nvars % results.y = y data vector

Check for the correct number of input argument and if the number of rows of x is equal to the number of rows of y if (nargin ~= 2); error(’Wrong # of arguments to ols’); else [nobs nvar] = size(x); [nobs2 ndep] = size(y); if (nobs ~= nobs2); error(’x and y must have same # obs in ols’); end; end; k=nvar;

Evaluate all the statistics that are usually involved in a OLS estimation results.y = y; results.nobs = nobs; results.nvar = nvar; % xpxi = (x’*x)\eye(k); results.beta = xpxi*(x’*y); results.yhat = x*results.beta; results.resid = y - results.yhat; sigu = results.resid’*results.resid; results.sige = sigu/(nobs-nvar); tmp = (results.sige)*(diag(xpxi)); results.tstat = results.beta./(sqrt(tmp));

12

CHAPTER 1. A MATLAB PRIMER

ym = y - mean(y); rsqr1 = sigu; rsqr2 = ym’*ym; results.rsqr = 1.0 - rsqr1/rsqr2; % r-squared rsqr1 = rsqr1/(nobs-nvar); rsqr2 = rsqr2/(nobs-1.0); results.rbar = 1 - (rsqr1/rsqr2); % rbar-squared ediff = results.resid(2:nobs) - results.resid(1:nobs-1); results.dw = (ediff’*ediff)/sigu; % durbin-watson end;

We save as a function the ols.m code and run the following simulation example nob=100; x1=ones(nob,1); x2=randn(nob,1).*((1:nob)’/10); x=[x1 x2]; sig=2; y=x*[10; 0.9]+sig*randn(nob,1); res=ols(y,x); res.beta %% figure(1) plot([res.yhat y]); figure(2) plot(res.resid);

1.4.3

A Bayesian Linear Regression Model

Let y ∈ Rn , X ∈ Rn × Rk and β ∈ Rk . Consider the simple regression model (1.1)

y = Xβ + ε

(1.2)

ε ∼ Nn (0n , σ 2 In )

with the following prior specification (1.3)

Rβ ∼ N (r, T )

or equivalently (1.4)

Qβ ∼ N (q, Ik )

where Q′ Q = T −1 and q = Qr.

1.4. EXAMPLES

13

The joint posterior distribution of β and σ 2 is (1.5)

2

π(β, σ |y) ∝



1 σ2

 n+1 2

¯ V¯ −1 (β − β)} ¯ exp{(β − β)

¯ = V¯ /σ 2 (X ′ y + σ 2 Q′ q) where V¯ = σ 2 ((X ′ X) + σ 2 Q′ Q)−1 and β Theil and Goldberger (1961) observed that the conditional posterior distribution of β given σ 2 and y is ¯ V¯ ) π(β|σ 2 , y) ∝ N (β,

(1.6)

and proposed to replace the unknown quantity σ 2 with an estimated value ˆ ′ (y − X β)/(n ˆ σ ˆ 2 = (y − X β) − k).

In a Gibbs sampling framework it is possible to simulate from the posterior by simulating iteratively (Gelfand and Smith (1990) gave a proof of the convergence to the true posterior distribution) from the posterior conditional distribution of β given σ 2 and y ¯ V¯ ) π(β|σ 2 , y) ∝ N (β,

(1.7)

and from the posterior conditional distribution of σ 2 given β and y (1.8)

2

π(σ |β, y) ∝ IG



n 1 , (y − Xβ)′ (y − Xβ) 2 2

or equivalently (1.9)

% Simulate a dataset n=100; k=3; x = randn(n,k); b = ones(k,1); y = x*b + randn(n,1); % Set prior

π((y − Xβ)′ (y − Xβ)/σ −2 |β, y) ∝ χ2n



14 r R T Q q

= = = = =

CHAPTER 1. A MATLAB PRIMER [1.0 1.0 1.0]’; eye(k); eye(k); chol(inv(T)); Q*r;

% Initial values for the Gibbs b0 = (x’*x)\(x’*y); sige = (y-x*b0)’*(y-x*b0)/(n-k); xpx = x’*x; xpy = x’*y; qpq = Q’*Q; qpv = Q’*q; ndraw = 1100; nomit = 100; bsave = zeros(ndraw,k); ssave = zeros(ndraw,1); for i=1:ndraw; xpxi = inv(xpx + sige*qpq); % update b b = xpxi*(xpy + sige*qpv); b = chol((sige*xpxi))*randn(k,1) + b; bsave(i,:) = b’; % update sige e = y - x*b; ssr = e’*e; chi = random(’gamma’,n/2,2); sige = ssr/chi; ssave(i,1) = sige; end; % Evaluate statistics bhat = mean(bsave(nomit+1:ndraw,:)); bstd = std(bsave(nomit+1:ndraw,:)); tstat = bhat./bstd; sighat = mean(ssave(nomit+1:ndraw,1)); tout = 1-cdf(’t’,abs(tstat’),n); % Display Gibbs results disp(’Gibb sampling estimates’) disp([’Coefficient ’,’t-statistic ’,’t-probability’]); disp([bhat’ tstat’ tout]); % Calculate and Display Theil-Goldberger results tgsige=(y - x*b0)’*(y - x*b0)/(n-k); tgbhat=inv(xpx + tgsige*qpq)*(xpy+tgsige*qpv); %Theil-Goldberger estimates tgbstd = diag(chol(tgsige*(xpx + tgsige*qpq))); tgtstat = tgbhat./tgbstd; disp(’Theil-Goldberger estimates’) disp(tgbhat);

A typical result from this code is Gibb sampling estimates Coefficient t-statistic t-probability 1.0045 10.2286 0

1.4. EXAMPLES

15

0.9610

11.2966

0

0.9200

11.2740

0

Theil-Goldberger estimates 1.0037 0.9569 0.9198 We apply now the inference procedure to a financial dataset. We consider monthly data on the short-term interest rate (the three-month Treasury Bill rate) and on the AAA corporate bond yield in the USA. As Treasury Bill notes and AAA bonds are low-risk securities and one could expect that there is a relationship between their interest rate. We consider data from January 1950 to December 1999. Let yi be the monthly change in the Treasury Bill rate and zi the monthly change in the AAA bond rate. We will fit on this set of data the heteroscedastic model presented above with yi = β1 + β2 zi + εi that corresponds to set xi = (1, zi )′ and β = (β1 , β2 )′ in the multivariate regression model given above. The results of the estimation procedure are Gibb sampling estimates Coefficient t-statistic t-probability 0.0053 0.7805 0.2177 0.2751

19.8628

Theil-Goldberger estimates 0.0057 0.2747

0

16

CHAPTER 1. A MATLAB PRIMER 1.5 Actual Fitted

1 0.5 0 −0.5 −1 −1.5

100

200

300

400

500

600

1.5 Residuals 1 0.5 0 −0.5 −1

100

200

300

400

500

600

Figure 1.1: Actual and fitted data (top) and residuals (bottom) using the Bayesian estimates of the linear regression model. The estimates of the σ 2 are 0.0283 for the Gibbs sampler and 0.0282 for the Theil-Goldberger procedure. The actual and fitted data and the residuals are given in Fig. 1.1. The plot of the residuals shows that in the second half of the sample (say after the 1975) the variance is underestimated. More precisely one should account in the model for the time variation in the variance of the data. This call for heteroscedastic linear regression models (see Chapter ??) or for nonlinear models such as stochastic volatility models (see Chapter ?? and ??). References • Gelfand, Alan E., and A.F.M Smith. 1990. Sampling-Based Approaches to Calculating Marginal Densities, Journal of the American Statistical Association, Vol. 85, pp. 398-409.

1.5. FROM MATLAB TO SCILAB AND R

17

• Theil, Henri and Arthur S. Goldberger. 1961. On Pure and Mixed Statistical Estimation in Economics, International Economic Review, Vol. 2, 65-78.

1.5

From Matlab to Scilab and R

We present here an example of translation of a Matlab code into a Scilab and a R code. This exercise shall demonstrate the similarity and between the three programming languages. In particular we found that Scilab could be in term of syntax the most similar to Matlab. Scilab //************************************************* // basic in I/O, graphical, statistical procedures //************************************************* // Load UK/EU exchange rate data clc; clear all; yy= fscanfMat(’C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/... TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt/pound.txt’); //************************************************* n=size(yy,1); // evaluate the number of rows // xx=(1:2:n); //************************************************* // for endfor if end // (1) Evaluate sequentially the variance // (2) Built a dummy variable, based on the value // of the variance estimated recursively //************************************************* wn=10; // set the value of a variable// s=zeros(n,1); // define a n-dim null vector // d=zeros(n,1); for j=(wn+1):n; s(j,1)=variance(yy((j-wn+1):j,1)); if (s(j,1)>0.45); d(j,1)=1; end; end;

18

CHAPTER 1. A MATLAB PRIMER

//************************************************* // Some Pictures... //************************************************* // figure(1) to have distinct graphs figure(1); title("Time series data"); ylabel("Data"); xlabel("Time"); plot(xx,yy); figure(2); title("Time-varying log-volatility"); plot(xx,s,’color’,[1 0 0]); //[red green blue] the rgb convention a=gca(); a.data_bounds=[1,min(s);n,max(s)];// Set tics figure(3); title("Dummy"); plot(xx,d,’color’,[1 0 0]); //[red green blue] the rgb convention a=gca(); a.data_bounds=[1,-0.1;n,1.1];// Set tics //************************************************* // All charts in one pictures... //************************************************* figure(4); subplot(3,1,1); title("Time series data"); ylabel("Data"); xlabel("Time"); plot(xx,yy); subplot(3,1,2); title("Time-varying log-volatility"); plot(xx,s,’color’,[1 0 0]); //[red green blue] the rgb convention a=gca(); a.data_bounds=[1,min(s);n,max(s)];// Set tics subplot(3,1,3); title("Dummy"); plot(xx,d,’color’,[1 0 0]); //[red green blue] the rgb convention a=gca(); a.data_bounds=[1,-0.1;n,1.1];// Set tics //************************************************* // histogram

1.5. FROM MATLAB TO SCILAB AND R //************************************************* figure(5); histplot(100,yy); //************************************************* // Save the results in a ouput file //************************************************* fprintfMat(’C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/... TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt’,yy,’%5.2f’); // attention this overwrites the existing file

R #************************************************* # basic in I/O, graphical, statistical procedures #************************************************* # Load UK/EU exchange rate data yy=scan("C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/... TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/pound.txt",sep="\t",skip=0,na.strings=".") dim(yy)=c(1006,1); #************************************************* n=dim(yy); # evaluate the number of rows # n=n[1]; xx=(1:n); #************************************************* # for endfor if end # (1) Evaluate sequentially the variance # (2) Built a dummy variable, based on the value # of the variance estimated recursively #************************************************* wn=10; # set the value of a variable# s=array(0,n); # define a n-dim null vector # d=array(0,n); for (j in ((wn+1):n)){ s[j]=var(yy[(j-wn+1):j]); if (s[j]>0.45){ d[j]=1; } } #************************************************* # Some Pictures... #************************************************* # figure(1) to have distinct graphs

19

20

CHAPTER 1. A MATLAB PRIMER

dev.new(); plot(xx,yy,main="Time series data",xlab="Time",ylab="Data",type="l"); dev.new(); plot(xx,s,main="Time-varying log-volatility",xlab="Time",ylab="Data",type="l"); #[red green blue] the rgb convention dev.new(); plot(xx,d,main="Dummy",xlab="Time",ylab="Data",type="l"); #[red green blue] the rgb convention #************************************************* # All charts in one pictures... #************************************************* par(mfrow=c(3,1),pin=c(5,1.5)); plot(xx,yy,main="Time series data",xlab="Time",ylab="Data",type="l"); plot(xx,s,main="Time-varying log-volatility",xlab="Time",ylab="Data",type="l"); #[red green blue] the rgb convention plot(xx,d,main="Dummy",xlab="Time",ylab="Data",type="l"); #[red green blue] the rgb convention #************************************************* # histogram #************************************************* dev.new(); hist(yy,50); #************************************************* # Save the results in a ouput file #************************************************* save(yy, file = "C:/Dottorato/Teaching/SummerSchoolBertinoro/TutorialAntonietta/... TutorialRobAnt/AllLab/MatlabCode/ChapterMatlab/OutPound.txt");

Chapter 2 Monte Carlo Integration Aim Apply basic Monte Carlo principles to solve some basic integration problems. Discuss the choice of the number of samples in a Monte Carlo estimation. Contents 1. Integration 2. A Monte Carlo Estimator 3. Asymptotic Properties 4. Optimal Number of MC Samples 5. Appendix - Matlab Code

2.1

Integration

• Our aim is to approximate the integral (2.1)

µ(f ) =

Z

1

f (x)dx 0

21

22

CHAPTER 2. MONTE CARLO INTEGRATION for the following integrand functions f 1. f (x) = x 2. f (x) = x2 3. f (x) = cos(πx)

• We apply a Monte Carlo approach and re-write the integration problem in statistical terms as follows Z 1 Z +∞ (2.2) f (x)dx = f (x)I[0,1] (x)dx = E(f (X)) 0

−∞

where IA (x) if the indicator function that holds 1 if x ∈ A and 0 otherwise and X ∼ U[0,1] is a random variable with a standard uniform distribution.

2.2

A Monte Carlo Estimator

• Let X1 , . . . , Xn be a set of n i.i.d. samples from a uniform distribution. The integral µ = E(f (X)) approximates as follows n

(2.3)

1X f (Xi ) µ ˆn = n i=1

that is called a Monte Carlo estimator of E(f (X)). The results of the Monte Carlo estimates for different sample sizes n = 1, . . . , 50 and different integrand functions f are given in Fig. 2.1 • Find the mean and the variance of the estimator and give a Monte Carlo approximation for the expression of the variance.

2.2. A MONTE CARLO ESTIMATOR

23

MC Means f(x)=x 0.8

Empirical Average Theoretical Mean

0.6 0.4 0.2 0

0

10

20

30

40

50

2

f(x)=x 0.4

Empirical Average Theoretical Mean

0.3 0.2 0.1 0

0

10

20

30

40

50

f(x)=cos(π x) 1

Empirical Average Theoretical Mean

0.5 0 −0.5

0

10

20

30

40

50

Figure 2.1: Monte Carlo estimates µ ˆn (solid lines) for different sample sizes n = 1, . . . , 50 and true values of µ (horizontal dotted lines).

Due to the i.i.d. assumption we have: n

(2.4)

(2.5)

E(ˆ µn ) =

1X E(f (Xi )) = µ n i=1

n 1 X 1 V(ˆ µn ) = 2 V(f (Xi )) = σ 2 (f ) n i=1 n

24

CHAPTER 2. MONTE CARLO INTEGRATION

where

+∞

Z

2

σ (f ) = V(f (X1 )) =

(x − µ)2 f (x)I[0,1] (x)dx

−∞

For the different f we find the analytical solution of the integral µ(f ) (see also horizontal dotted lines in Fig. 2.1) 1. For f (x) = x (2.6)

E(f (X1 )) =

Z

1 2 0 xdx = x = 1/2 2 1

1

0

2. For f (x) = x2 (2.7)

E(f (X1 )) =

3. For f (x) = cos(πx) (2.8)

2.3

E(f (X1 )) =

Z

1 0

Z

1 0

0 1 x2 dx = x3 = 1/3 3 1

0 1 cos(πx)dx = sin(πx) = 0 π 1

Asymptotic Properties

Under the i.i.d. and finite variance assumptions we have a.s.

µ ˆn −→ µ

(2.9)

(2.10)

n→∞



D

n (ˆ µn − µ) −→ N (0, σ 2 (f ))

For the different f we have

n→∞

2.4. OPTIMAL NUMBER OF MC SAMPLES

25

1. For f (x) = x V(f (X1 )) = E(f (X1 )2 ) − (E(f (X1 )))2 Z 1 2 Z 1 2 = x dx − xdx 0

0

= 1/3 − 1/4 = 1/12

2. For f (x) = x2 V(f (X1 )) = 1/5 − 1/9 = 4/45 3. For f (x) = cos(πx) V(f (X1 )) = 1/2 − 0 = 1/2 When the variance V(f (X1 )) is unknown one can use the Monte Carlo estimator n

(2.11)

1 X (Xi − µ ˆ n )2 σˆ (f ) = n − 1 i=1 2

The empirical approximations of the asymptotic variances are given in Fig. 2.2. • Exercise: use the asymptotic distribution and the approximation of the

asymptotic variance to find the 5% confidence intervals of the MC estimator of µ.

2.4

Optimal Number of MC Samples

• It is possible to use the asymptotic properties of a MC estimator to find the optimal number n of samples that are necessary to reach an accuracy

26

CHAPTER 2. MONTE CARLO INTEGRATION MC Variances f(x)=x

0.25

Empirical Variance Theoretical Variance

0.2 0.15 0.1 0.05

0

10

20

30

40

50

f(x)=x2 0.2

Empirical Variance Theoretical Variance

0.15 0.1 0.05 0

0

10

20

30

40

50

f(x)=cos(π x) 1.5

Empirical Variance Theoretical Variance

1 0.5 0

0

10

20

30

40

50

Figure 2.2: Monte Carlo variance estimates σ ˆn2 (solid lines) for different sam2 ple sizes n = 1, . . . , 50 and the true value σ (horizontal dotted lines).

level ε, for a given confidence level α, in the Monte Carlo estimation of µ. The asymptotic results allow us to find n such that (2.12) that is (2.13)

  p 2 P r |ˆ µn − µ| ≤ ε σ (f )/n = 1 − α

Xα = ε

r

n ⇔n= σ 2 (f )



Xα ε

2

σ 2 (f )

2.5. APPENDIX - MATLAB CODE

27

where Xα = Φ−1 (1 − α/2), with Φ−1 the inverse cumulative distribution function of a standard normal.

When the variance σ 2 (f ) is unknown one can use the Monte Carlo estimator σ ˆn2 (f ) and then apply a similar asymptotic argument. In this case the optimal number of simulations should satisfy the following relationship σ ˆn2 (f ) ≤

(2.14)

nε2 Xα2

One can check iteratively the condition. 1. Start with n1 MC samples X1 , . . . , Xn1 2. If σ ˆn2 (f ) ≤

nε2 2 Xα

then stop otherwise 2

3. evaluate k1 = ⌊ nε 2 −n⌋ and generate k1 samples Xn1 +1 , . . . , Xn1 +k1 (⌊x⌋ Xα indicates the integer part of x) Exercise: write a Matlab’s code for computing the optimal number of samples that are needed to estimate µ(f ) for the different integrand functions f given in Section 1 and for the accuracy level ε = 0.001.

2.5

Appendix - Matlab Code

% Uniform Random Number % Monte Carlo method as an approximated integration technique % integrate f(x) on the [0,1] interval % solution: 1/2, 1/3, and 0 clc; n=50; x=rand(n,1); gav=zeros(n,3); gavvar=NaN(n,3); gav(1,1)=x(1,1); gav(1,2)=x(1,1)^2; gav(1,3)=cos(pi*x(1,1)); for i=2:n gav(i,1)=sum(x(1:i))/i; gav(i,2)=sum(x(1:i).^2)/i; gav(i,3)=sum(cos(pi*x(1:i)))/i; gavvar(i,1)=var(x(1:i));

28

CHAPTER 2. MONTE CARLO INTEGRATION gavvar(i,2)=var(x(1:i).^2); gavvar(i,3)=var(cos(pi*x(1:i)));

end % % %%%%%%%%% Graphics (mean) %%%%%%%%%% figure(1); subplot(3,1,1); plot(gav(:,1)); line((1:n),ones(n,1)/2,’color’,’red’); legend(’Empirical Average’,’Theoretical Mean’,... ’Location’,’NorthEastOutside’); title(’f(x)=x’); % subplot(3,1,2); plot(gav(:,2)); line((1:n),ones(n,1)/3,’color’,’red’); legend(’Empirical Average’,’Theoretical Mean’,... ’Location’,’NorthEastOutside’); title(’f(x)=x^2’); % subplot(3,1,3); plot(gav(:,3)); line((1:n),ones(n,1)*0,’color’,’red’); legend(’Empirical Average’,’Theoretical Mean’,... ’Location’,’NorthEastOutside’); title(’f(x)=cos(\pi x)’);

To export picture to a .eps file one can use %%%%%%%%% Export a picture %%%%%%%%%%%%% dire=’C:\Dottorato\Teaching\SummerSchoolBertinoro’; figu=’\TutorialAntonietta\TutorialRobAnt\Figure\’; figname=strvcat([strcat(dire,figu,’MC1.eps’)]); print (gcf,’-depsc2’, figname); % %%%%%%%%% Graphics (variance) %%%%%%%%%% figure(2); subplot(3,1,1); plot(gavvar(:,1)); line((1:n),ones(n,1)/12,’color’,’red’); legend(’Empirical Variance’,’Theoretical Variance’,... ’Location’,’NorthEastOutside’); title(’f(x)=x’); % subplot(3,1,2); plot(gavvar(:,2)); line((1:n),ones(n,1)*4/45,’color’,’red’); legend(’Empirical Variance’,’Theoretical Variance’,... ’Location’,’NorthEastOutside’); title(’f(x)=x^2’); % subplot(3,1,3); plot(gavvar(:,3)); line((1:n),ones(n,1)*1/2,’color’,’red’); legend(’Empirical Variance’,’Theoretical Variance’,...

2.5. APPENDIX - MATLAB CODE ’Location’,’NorthEastOutside’); title(’f(x)=cos(\pi x)’);

29

30

CHAPTER 2. MONTE CARLO INTEGRATION

Chapter 3 Importance Sampling Aim Define and apply the importance sampling method and study its properties. Contents 1. Importance Sampling (IS) 2. Properties of the IS Estimators 3. Generating Student-t Variables

3.1

Importance Sampling

Let π be a probability density function, f a measurable function and (3.1)

µ = Eπ (f (X)) =

Z

f (x)π(x)dx

the integral of interest. In importance sampling (see Section 3.3 in Robert and Casella (2004)) a distribution g (called importance distribution or instrumental distribution) 31

32

CHAPTER 3. IMPORTANCE SAMPLING

is used to apply a change of measure (3.2)

µ=

π(x) f (x)g(x)dx g(x)

Z

The resulting integral is then evaluated numerically by using a i.i.d. sample X1 , . . . , Xn from g n

µ ˆIS n

(3.3)

1X w(Xi )f (Xi ) = n i=1

where w(Xi ) =

π(Xi ) , g(Xi )

i = 1, . . . , n

are called importance weights.

3.2

Properties of the IS Estimators

The Monte Carlo estimator µ ˆIS n of µ is unbiased Eg (ˆ µIS n ) = = =

Z

Z

Z

···

Z

! n n Y 1X w(xi )f (xi ) g(xi )dxi n i=1 i=1

π(x1 ) f (x1 )g(x1 )dx1 g(x1 ) f (x1 )π(x1 )dx1

and converges almost surely to µ, under the assumption supp g ⊃ supp π. Nevertheless the existence of the variance and of a limiting distribution is 2 not guaranteed. We shall notice that Vg (ˆ µIS µIS n ) ≥ Eg ((ˆ n ) ) thus the condition we need to check is the existence of an upper bound for the second order

3.2. PROPERTIES OF THE IS ESTIMATORS

33

moment of the IS estimator, that is 2 Eg ((ˆ µIS n ) )

=

Z

=

Z

π(x1 )2 f (x1 )2 g(x1 )dx1 = g(x1 )2

π(x1 ) f (x1 )2 π(x1 )dx1 < ∞ g(x1 )

Note that if the tails of the importance density are lighter than those of the π the importance weight π(x)/g(x) is not a.e. bounded and the variance of the estimator will be infinite for many functions f .

The following is an example of a set of sufficient conditions for the IS estimators to have finite variance

• π(x)/g(x) < M ∀x ∈ X and Vg (f ) < ∞ • X is compact, g(x) < F and g(x) > ε ∀x ∈ X The condition π(x)/g(x) < M, ∀x ∈ X implies that the distribution f has thicker tails than π.

An alternative way to address the issue of the finite variance is to consider the self-normalized importance sampling (SNIS) estimator (3.4)

IS µ ˆSN n

Pn w(Xi )f (Xi ) Pn = i=1 i=1 w(Xi )

It is biased on a finite sample, but it converges to µ by the strong law of large number. It has been proved both theoretically and numerically that this estimator may perform better, in terms of mean square error, than the simple importance sampling estimator.

34

CHAPTER 3. IMPORTANCE SAMPLING

3.3

Generating Student-t Variables

Consider a Student-t distribution T (ν, θ, σ 2 ) with density (3.5)

Γ((ν + 1)/2) π(x) = √ σ νπΓ(ν/2)



(x − θ)2 1+ νσ 2

−(ν+1)/2

IR (x)

w.l.o.g. take θ = 0, σ = 1 and ν = 12. We choose the quantities of interest to be 1. f (x) = 2.



sin(x) x

5

I(x)(2.1,+∞)

s x f (x) = 1 − x

3.

x5 f (x) = I[0,+∞) (x) 1 + (x − 3)2

We study the performance of the importance sampling estimator µ ˆIS n when the following instrumental distributions are used 1. T (ν ∗ , 0, 1) with ν ∗ < ν (e.g. ν ∗ = 7) 2. N (0, ν/(ν − 2)) 3. C(0, 1) We shall note that the Cauchy distribution C(α, β) has density function π(x) =

1 IR (x) πβ(1 + ((x − α)/β)2 )

3.3. GENERATING STUDENT-T VARIABLES

35

2 Student−t 1 0 0

1

2

3

4

5 4

x 10 10 Normal 5 0 0

1

2

3

4

5 4

x 10 2 Cauchy 1 0 0

1

2

3

4

5 4

x 10

Figure 3.1: Importance sampling weights for the proposal distributions T (ν ∗ , 0, 1), N (0, ν/(ν − 2)) and C(0, 1) where −∞ < α < +∞ and β > 0 and cumulative distribution function Z 1 1 x du F (x) = π −∞ πβ(1 + ((u − α)/β)2) 1 1 x−α = + arc tan IR (x) 2 π β The inverse c.d.f. method can be applied in order to generate from the Cauchy. If X = F −1 (U), where U ∼ U[0,1] , then X ∼ C(α, β). From the results in Fig. 3.1 one can see that the importance weights for Student-t and Cauchy are not unstable while the importance weights associated to the normal exhibit some large jumps. For all the functions the results in Fig. 3.2 show that the normal proposal produces jumps in the progressive averages (green lines) that are due to the unbounded variance of the estimator. However for the first function the normal proposal behaves quite well when compared with the Cauchy and Student-t proposals. For the

36

CHAPTER 3. IMPORTANCE SAMPLING

second and third function the Cauchy proposal seems to converge faster than the Student-t. In all the pictures we plotted (black lines) the approximation obtained with an exact simulation from a Student-t with ν = 12. Exercise - Use repeated Monte Carlo experiments to find the distribution of the estimator µ ˆn (f ). Plot the 95% and 5% quantiles and the mean of the estimator for n = 1, . . . , 50000. The Matlab code is %%%%%%% Importance weight for T(nustar,0,1) function w=w1(x,nu,nustar) w=pdf(’t’,x,nu)/pdf(’t’,x,nustar); end % %%%%%%% Importance weight for N(0,nu/(nu-2)) function w=w2(x,nu) w=pdf(’t’,x,nu)/pdf(’normal’,x,0,sqrt(nu/(nu-2))); end % %%%%%%% Importance weight for C(0,1) function w=w3(x,nu) w=pdf(’t’,x,nu)/pdfcauchy(x,0,1); end % clc; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Importance sampling %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% nu=12; nustar=7; nIS=50000; mu1IS=zeros(nIS,4); mu2IS=zeros(nIS,4); mu3IS=zeros(nIS,4); % mu1IScum=zeros(nIS,4); mu2IScum=zeros(nIS,4); mu3IScum=zeros(nIS,4); % wIS=zeros(nIS,3); for i=1:nIS % Proposal 1 x1=random(’t’,nustar); % Proposal 2 x2=random(’normal’,0,sqrt(nu/(nu-2))); % Proposal 3 x3=tan((rand(1,1)-0.5)*pi); %x3=random(’normal’,0,1)/random(’normal’,0,1); % Exact

3.3. GENERATING STUDENT-T VARIABLES x4=random(’t’,nu); %%%%%%%%%%%%% % IS %%%%%%%%%%%%% % Importance weights wIS(i,1)=w1(x1,nu,nustar); wIS(i,2)=w2(x2,nu); wIS(i,3)=w3(x3,nu); % f_1 mu1IS(i,1)=((sin(x1)/x1)^5)*(x1>2.1); mu1IS(i,2)=((sin(x2)/x2)^5)*(x2>2.1); mu1IS(i,3)=((sin(x3)/x3)^5)*(x3>2.1); mu1IS(i,3)=((sin(x4)/x4)^5)*(x4>2.1); % f_2 mu2IS(i,1)=sqrt(abs(x1/(1-x1))); mu2IS(i,2)=sqrt(abs(x2/(1-x2))); mu2IS(i,3)=sqrt(abs(x3/(1-x3))); mu2IS(i,4)=sqrt(abs(x4/(1-x4))); % f_3 mu3IS(i,1)=(x1^5/(1+(x1-3)^2))*(x1>0); mu3IS(i,2)=(x2^5/(1+(x2-3)^2))*(x2>0); mu3IS(i,3)=(x3^5/(1+(x3-3)^2))*(x3>0); mu3IS(i,4)=(x4^5/(1+(x4-3)^2))*(x4>0); % if ((1000*floor(i/1000))==i) disp(i); end end

mu1IScum(:,1)=cumsum(mu1IS(:,1).*wIS(:,1))./(1:nIS)’; mu1IScum(:,2)=cumsum(mu1IS(:,2).*wIS(:,2))./(1:nIS)’; mu1IScum(:,3)=cumsum(mu1IS(:,3).*wIS(:,3))./(1:nIS)’; mu1IScum(:,4)=cumsum(mu1IS(:,4))./(1:nIS)’; % mu2IScum(:,1)=cumsum(mu2IS(:,1).*wIS(:,1))./(1:nIS)’; mu2IScum(:,2)=cumsum(mu2IS(:,2).*wIS(:,2))./(1:nIS)’; mu2IScum(:,3)=cumsum(mu2IS(:,3).*wIS(:,3))./(1:nIS)’; mu2IScum(:,4)=cumsum(mu2IS(:,4))./(1:nIS)’; % mu3IScum(:,1)=cumsum(mu3IS(:,1).*wIS(:,1))./(1:nIS)’; mu3IScum(:,2)=cumsum(mu3IS(:,2).*wIS(:,2))./(1:nIS)’; mu3IScum(:,3)=cumsum(mu3IS(:,3).*wIS(:,3))./(1:nIS)’; mu3IScum(:,4)=cumsum(mu3IS(:,4))./(1:nIS)’; %% fs=14; %%%%%%%%%%%%%%% figure(1) subplot(3,1,1); plot((1:nIS)’,wIS(:,1)); legend(’Student-t’,’Location’,’NorthEast’); set(gca,’FontSize’,fs); subplot(3,1,2); plot((1:nIS)’,wIS(:,2)); legend(’Normal’,’Location’,’NorthEast’); set(gca,’FontSize’,fs); subplot(3,1,3);

37

38

CHAPTER 3. IMPORTANCE SAMPLING

plot((1:nIS)’,wIS(:,3)); legend(’Cauchy’,’Location’,’NorthEast’); set(gca,’FontSize’,fs); figure(2) plot((1:nIS)’,mu1IScum(:,1:3)); hold on; plot((1:nIS)’,mu1IScum(:,4),’-k’); hold off; legend(’Student-t’,’Normal’,’Cauchy’,’Exact’,’Location’,’NorthEast’); ylim([0.00001 0.00015]); set(gca,’FontSize’,fs); figure(3) plot((1:nIS)’,mu2IScum(:,1:3)); hold on; plot((1:nIS)’,mu2IScum(:,4),’-k’); hold off; legend(’Student-t’,’Normal’,’Cauchy’,’Exact’,’Location’,’NorthEast’); ylim([1 1.4]); set(gca,’FontSize’,fs); figure(4) plot((1:nIS)’,mu3IScum(:,1:3)); hold on; plot((1:nIS)’,mu3IScum(:,4),’-k’); hold off; legend(’Student-t’,’Normal’,’Cauchy’,’Exact’,’Location’,’NorthEast’); ylim([3 9]); set(gca,’FontSize’,fs);

This code calls the following function defined by the user %%%%%%% Cauchy probability density function function f=pdfcauchy(x,a,b) f=1/(pi*b*(1+((x-a)/b)^2)); end %

3.3. GENERATING STUDENT-T VARIABLES

39

−5

x 10

Student−t Normal Cauchy Exact

14 12 10 8 6 4 2 0

1

2

3

4

5 4

x 10

1.4 Student−t Normal Cauchy Exact

1.35 1.3 1.25 1.2 1.15 1.1 1.05 1 0

1

2

3

4

5 4

x 10 9 Student−t Normal Cauchy Exact

8 7 6 5 4 3 0

1

2

3

4

5 4

x 10

Figure 3.2: Charts from one to three: µ ˆIS for the different functions f . In each chart the IS estimators for different proposals (colored lines) and the Monte Carlo estimator with exact simulation from the T (12, 0, 1) (black lines).

40

CHAPTER 3. IMPORTANCE SAMPLING

Exercise Importance Sampling Consider a Student-t distribution T (ν, θ, σ 2 ) with density Γ((ν + 1)/2) π(x) = √ σ νπΓ(ν/2)

(3.6)

 −(ν+1)/2 (x − θ)2 1+ IR (x) νσ 2

w.l.o.g. take θ = 0, σ = 1 and ν = 12. Study the performance of the importance sampling estimator µ ˆIS n of (3.7)

µ = Eπ (f (X)) =

Z

f (x)π(x)dx =

Z

π(x) f (x)g(x)dx g(x)

when the following instrumental distributions, g(x), are used 1. T (ν ∗ , 0, 1) with ν ∗ < ν (e.g. ν ∗ = 7) 2. N (0, ν/(ν − 2)) 3. C(0, 1) for the following test functions 1. f (x) =



sin(x) x 41

5

I(x)(2.1,+∞)

42

CHAPTER 3. IMPORTANCE SAMPLING 2.

3.

s x f (x) = 1 − x f (x) =

x5 I[0,+∞) (x) 1 + (x − 3)2

Metropolis-Hastings Write a M.-H. algorithm to generate n = 500 i.i.d. random samples from a zero-mean and independent bivariate normal distribution, N2 (0, I2 ), with covariance matrix, I2 and mean 0 = (0, 0)′ . Use alternatively independent and random walk proposals with variance covariance matrix σ 2 I2 . (Try with different values of σ 2 ).

Suggest Documents