Charles University in Prague Faculty of Social Sciences. Forecasting Term Structure of Crude Oil Markets Using Neural Networks

Charles University in Prague Faculty of Social Sciences Institute of Economic Studies MASTER THESIS Forecasting Term Structure of Crude Oil Markets ...
7 downloads 1 Views 3MB Size
Charles University in Prague Faculty of Social Sciences Institute of Economic Studies

MASTER THESIS

Forecasting Term Structure of Crude Oil Markets Using Neural Networks

Author: Bc. Barbora Malinsk´ a Supervisor: PhDr. Jozef Barun´ık, PhD. Academic Year: 2014/2015

Declaration of Authorship The author hereby declares that she compiled this thesis independently, using only the listed resources and literature, and the thesis has not been used to obtain a different or the same degree. The author grants to Charles University permission to reproduce and to distribute copies of this thesis document in whole or in part.

Prague, January 4, 2015 Signature

Acknowledgments I would like to express my sincere gratitude to my supervisor PhDr. Jozef Barun´ık, PhD. for his patience, support and guidance. Further, I am eternally grateful to my family. Thank you for your love, support and encouragement in everything I do throughout my life.

Abstract This thesis enhances rare literature focusing on modeling and forecasting of term structure of crude oil markets. Using dynamic Nelson-Siegel model, crude oil term structure is decomposed to three latent factors, which are further forecasted using both parametric and dynamic neural network approaches. Insample fit using Nelson-Siegel model brings encouraging results and proves its applicability on crude oil futures prices. Forecasts obtained by focused time-delay neural network are in general more accurate than other benchmark models. Moreover, forecast error is decreasing with increasing time to maturity. JEL Classification Keywords

C14, C32, C45, G02, G17 term structure, Nelson-Siegel model, dynamic neural networks, crude oil futures

Author’s e-mail Supervisor’s e-mail

[email protected] [email protected]

Abstrakt Tato pr´ace obohacuje ne pˇr´ıliˇs poˇcetnou literaturu zab´ yvaj´ıc´ı se modelov´an´ım a pˇredpov´ıd´an´ım v´ ynosov´e kˇrivky na ropn´ ych trz´ıch. Za pouˇzit´ı dynamick´eho Nelson-Siegelova modelu je ropn´a v´ ynosov´a kˇrivka dekomponov´ana na tˇri latentn´ı faktory, kter´e jsou d´ale pouˇzity k pˇredpov´ıd´an´ı pomoc´ı parametrick´ ych metod a neuronov´ ych s´ıt´ı. Odhad v´ ynosov´e kˇrivky pomoc´ı Nelson-Siegelova modelu pˇrin´aˇs´ı povzbudiv´e v´ ysledky a dokazuje svou aplikovatelnost na ceny term´ınovan´ ych kontrakt˚ u na ropn´ ych trz´ıch. Pˇredpovˇedi z´ıskan´e pomoc´ı dynamick´e neuronov´e s´ıtˇe se uk´azaly b´ yt obecnˇe pˇresnˇejˇs´ı neˇz u ostatn´ıch uvaˇzovan´ ych metod. Chyba pˇredpovˇedi kles´a s rostouc´ı dobou do maturity. Klasifikace JEL Kl´ıˇ cov´ a slova

C14, C32, C45, G02, G17 v´ ynosov´a kˇrivka, Nelson-Siegel model, dynamick´a neuronov´a s´ıˇt, ropn´e term´ınovan´e kontrakty

E-mail autora [email protected] E-mail vedouc´ıho pr´ ace [email protected]

Contents List of Tables

vii

List of Figures

viii

Acronyms

ix

Thesis Proposal

x

1 Introduction

1

2 Crude oil markets 2.1 Specific features of crude oil markets . . . . . . . . . . . . . . . 2.1.1 Crude oil future contracts . . . . . . . . . . . . . . . . .

3 3 5

3 Theory of term structure modeling 3.1 Term structure modeling . . . . . . . . . . . . . . . . . . . . . 3.1.1 Dynamic asset pricing method . . . . . . . . . . . . . . 3.1.2 Curve fitting using standard statistical methods . . . . 3.1.3 Review of commodity market term structure modeling 4 Artificial neural networks 4.1 Architecture of neural networks . . . . 4.1.1 Neurons . . . . . . . . . . . . . 4.1.2 Activation functions . . . . . . 4.1.3 Layers . . . . . . . . . . . . . . 4.2 Learning algorithms . . . . . . . . . . . 4.2.1 Data splitting . . . . . . . . . . 4.2.2 Backpropagation . . . . . . . . 4.2.3 Genetic algorithm . . . . . . . . 4.3 Advantages of artificial neural networks

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . .

10 10 11 15 18

. . . . . . . . .

22 23 25 26 27 28 28 28 29 30

Contents

4.4 4.5

vi

Criticism and limitations of artificial neural networks . . . . . . Artificial neural networks and finance . . . . . . . . . . . . . . .

5 Research methodology 5.1 Data . . . . . . . . . . . . . . . . . 5.1.1 Data organization . . . . . . 5.1.2 Data adjustment . . . . . . 5.1.3 Final dataset . . . . . . . . 5.2 Stylized facts about term structure 5.3 Dynamic Nelson-Siegel model . . . 5.3.1 Decay parameter λt . . . . . 5.3.2 In-sample fit: Ordinary least 5.4 Forecasting . . . . . . . . . . . . . 5.4.1 Parametric models . . . . . 5.4.2 Dynamic neural network . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . squares . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

31 32 34 34 36 38 40 43 46 47 50 54 55 56

6 Discussion of results

62

7 Conclusion

65

Bibliography

72

A Figures and Tables B Content of Enclosed DVD

I VI

List of Tables 5.1 5.2 5.3 5.4

Example of actual monthly data . . . . . . . . . Example of reorganized data to constant time to Traded contracts on May 31 2000 . . . . . . . . Results of Augmented Dickey-Fuller test . . . .

. . . . . . maturity . . . . . . . . . . . .

. . . .

37 38 39 53

6.1

Average RMSE accross all constant maturities . . . . . . . . . .

63

A.1 DNS coefficients statistics . . . . . . . . . . . . . . A.2 Value of Diebold-Mariano test statistics (* = 5%, levels of significance) . . . . . . . . . . . . . . . . . A.3 RMSE of forecasts for individual maturities . . . . A.4 ME of forecasts for individual maturities . . . . . . A.5 RMSE as function of number of lags fed to FTDNN

. . ** . . . . . . . .

. . . .

. . . .

. . . . . I = 10% . . . . . II . . . . . III . . . . . IV . . . . . V

List of Figures 2.1 2.2

4.1 4.2

WTI spot price 1983 - 2014 (quarterly) . . . . . . . . . . . . . . WTI spot price (grey area) and CLA comdty (line) one-month future contract 06/2011 - 06/2014 . . . . . . . . . . . . . . . . .

4 7

Neuron and synapses . . . . . . . . . . . . . . . . . . . . . . . . 23 Architecture of feed-forward (a) and feedback (b) neural network 24

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11

WTI futures prices term structure: 1990 - 2004 . . . . . . . . . WTI futures prices term structure: 2005 - 2009 . . . . . . . . . WTI futures prices term structure: 2008 - 2009 . . . . . . . . . WTI futures prices term structure: 2010 - 2014 . . . . . . . . . Average term structure over period 1984 - 2014 . . . . . . . . . Term structure shapes . . . . . . . . . . . . . . . . . . . . . . . Standard deviation of futures prices for different maturities . . . Time series of optimal λt . . . . . . . . . . . . . . . . . . . . . . Loadings of Nelson-Siegel latent factors of term structure . . . . Values of β coefficients from 1990 to 2014 . . . . . . . . . . . . . Sample autocorrelation functions of level, slope and curvature coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.12 Fitted term structures by Dynamic Nelson-Siegel model . . . . . 5.13 Mechanism of focused time-delay neural network . . . . . . . . .

40 41 42 42 45 45 46 49 51 52

6.1

One-month-ahead forecast for 02/29/2014 . . . . . . . . . . . .

64

A.1 Sample cross correlations of β coefficients time series . . . . . . A.2 Plot of regression of outputs and targets . . . . . . . . . . . . .

I II

53 54 59

Acronyms AI

Artificial Intelligence

ACF

Autocorrelation function

ADF

Augmented Dickey-Fuller test

ANN

Artificial Neural Networks

BIS

Bank of International Settlements

CIR

Cox-Ingersoll-Ross model

CME

Chicago Mercantile Exchange

DNN

Dynamic neural network

EIA

U.S. Energy Information Administration

FTDNN

Focused time-delay neural network

HJM

Heath-Jarrow-Morton model

ME

Mean error

NYMEX

New York Mercantile Exchange

OPEC

Organization of the Petroleum Exporting Countries

PCA

Principal Components Analysis

RMSE

Root-mean-square-error

SDE

Stochastic Differential Equation

WTI

West Texas Intermediate

Master Thesis Proposal Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Author: E-mail: Phone: Specialization:

Bc. Barbora Malinská [email protected] +420 774 340 676 FFMaB

Supervisor: E-mail: Phone: Defense Planned:

PhDr. Jozef Baruník, PhD. [email protected] +420 776 259 273 January 2015

Notes: The proposal should be 2-3 pages long. Save it as “yoursurname_proposal.doc” and send it to [email protected], [email protected], and [email protected]. Subject of the e-mail must be: “JEM001 Proposal (Yoursurname)”.

Proposed Topic: Explaining Yield Curve of Crude Oil Markets Using Neural Networks (Note: Title name changed to: Forecasting Term Structure of Crude Oil Markets Using Neural Networks, on December 22, 2014) Motivation: Neural networks are popular alternative tool for forecasting prices on financial markets. They apply knowledge of brain science in the area of information processing on finance. With development of computing technology, employment of artificial intelligence such as neural networks models become more relevant on the markets. This thesis will focus on yield curve modeling using neural networks. There is some rather rare literature elaborating on interest rate term structure modeling using neural networks. Furthermore, the application of neural networks on commodity markets is almost non-explored field of yield curve modeling with neural networks. In the literature, neural networks models usually outperform standard parametric models and result in better yield curves models. Why is the topic economically interesting? a) There is no clear consensus in the academic literature how to model yield curves. However, understanding of yield curves is extremely important as long as they illustrate expectations about future of the relevant area. b) The thesis is going to focus on commodity market yield curves – more concretely on crude oil. The author has chosen crude oil yield curve modeling as long as crude oil is the commodity of highest importance. As already mentioned, few literature concentrating on the topic of yield curve modeling of financial instruments is present, but there is no or very rare literature about commodity yield curve modeling using neural networks. It is relevant to apply neural networks on commodity yield curve as long as yield curves of commodities and financial instruments do behave differently. Hypotheses: 1. Neural networks outperform a benchmark model (probably Nelson-Siegel model) in explaining yield curve of crude oil. Methodology: Ultimate goal of this thesis is to extend the empirical evidence of overperformance of neural networks in explaining yield curves on commodity markets – in this case crude oil has been selected for analysis. In the first part, the author will explain theory and current state of the art regarding neural networks, their mechanics and use on financial and commodities markets data. Further, characteristic features of commodities markets (in comparison to financial instruments markets, where neural networks for yield curve modeling have been already applied) will be stressed. In the empirical part, using future contracts for crude oil price data, the author will examine, whether neural networks outperform „standard“ parametric models of yield curves (as the empirical evidence described in many papers suggests for financial instruments) also in the case of commodities (crude oil in this particular case). Technically, neural networks model will be evaluated against standard

parametric model (probably Nelson-Siegel) using loss functions (such as MAE, RMSE, etc.) Small proportion of empirical part might be dealing with discussion about proper optimizing tools. Expected Contribution: As long as the literature provides very few publications joining forecasting term structure by neural networks and commodity market, the thesis aims at expanding the discussion to this particular (and very economically interesting) application. Outline: 1. Theory behind neural networks a. Basic principles and design b. Applications in finance and their development 2. Crude oil market a. Crude oil market and its specifics b. Factors determining crude oil market development 3. Empirical part a. Description of the data b. Neural networks and Benchmark model c. Evaluation of the models d. Discussion of results 4. Conclusion and implications Core Bibliography: 1. McNelis, P. D. (2005). Neural networks in finance: gaining predictive edge in the market. Elsevier Acad. Press. 2. Diebold, F. X., Rudebusch, G. D. (2011). The Dynamic Nelson-Siegel Approach to Yield Curve Modeling and Forecasting. 3. Kulkarni, S., & Haidar, I. (2009). Forecasting model for crude oil price using artificial neural networks and commodity futures prices. International Journal of Computer Science and Information Security, Vol 2, No.1. 4. Trippi, R. R., & Turban, E. (1992). Neural Networks in Finance and Investing: Using Artificial Intelligence to Improve Real World Performance. McGraw-Hill, Inc.. 5. Dunis, L.C, Laws, J. & Naïm, P. (2003). Applied Quantitative Methods for Trading and Investment. John Wiley & Sons. 6. Filipović, D. (2009). Term Structure Models - A Graduate Course. Springer. 7. Gabillon, J. (1991). The Term Structures of Oil Futures Prices. Oxford Institute for Energy Studies. 8. Kohzadi et al. (1995). A Comparison of Artificial Neural Network and Time Series Model for Forecasting Commodity Prices. Neurocomputing, Vol. 10. 9. Nelson, C. R. & Siegel, A.F. (1987). Parsimonious modeling of yield curve. The Journal of Business, Vol. 60, No. 4 10. Rozenberg et al. (2012). Handbook of Natural Computing. Springer.

th

On 12

March 2014

Author

Supervisor

Chapter 1 Introduction Modeling and forecasting term structure of commodity markets is attractive from both academic and practical perspective. Generally, term structure illustrates expectations about future development of the corresponding market. Nevertheless its understanding is of high importance, there is no clear consensus in relevant literature, how to model or even forecast commodity term structure. Regarding practical usage, predicting commodity futures prices is valuable for producers, risk managers, and many others. This thesis focuses on term structure of crude oil futures prices, as long as precisely crude oil is the world’s most traded commodity and indicates overall conditions of global economy. Moreover, with respect to recent development on global crude oil markets, proper review and analysis of term structure modeling and forecasting methods is even more appealing. Contribution of this thesis is threefold. First, conceptual similarity of crude oil and fixed income markets is analyzed. Properties of crude oil term structure is subjected to comparison to bond markets described by stylized facts summarized in Diebold & Li (2006). Second, the thesis enhances very rare literature and shows applicability of term structure models primarily developed for interest rates on futures prices of crude oil. Concretely, parsimonious model proposed by Nelson & Siegel (1987) is used to extract three factors, which are able to describe term structure on crude oil markets. Finally, crude oil term structure is forecasted using both parametric models (as pioneered by Diebold & Li (2006) for government bonds market) and artificial neural networks. Good performance and ability to replicate true behaviour of yield curves

1. Introduction

2

of Nelson-Siegel model were many times confirmed in literature. Diebold & Li (2006) concluded outstanding in-sample fit to bonds yield curve. However, its application on crude oil markets is still very scarce in literature. Recently, Hansen & Lunde (2013) have applied the model and confirmed conclusions of Diebold & Li (2006) also for crude oil markets. Even less numerous is the literature dealing with crude oil term structure forecasting and all the reviewed papers employ parametric methods. However, one of the main hypotheses of this thesis is that involvement of artifical neural networks methods in crude oil term structure forecasting is meaningful and generates more accurate results compared to standard parametric methods. Superiority of ANN is caused primarily by no need of assumptions about underlying dataset and their ability to model even nonlinear or ex ante hardly definable relationships. There are several works joining term structure modeling using Nelson-Siegel and forecasting involving neural networks (Vela (2013) or Aljinovi´c & Poklepovi´c (2013)). However, this literature deals with fixed income markets and employs static neural networks only. Brief outline of used methodology is following. For modeling and forecasting, this thesis uses monthly data of futures prices for period from 1990 to 2014. First, term structure curves for each month are fitted by Nelson-Siegel model resulting in three time series of factors. Next, these factors are forecasted and their substitution back to Nelson-Siegel model generates predicted term structures. Benchmark forecasts are made by AR(1), VAR(1), and random walk models. These are compared to focused time-delay neural network, which is well suited for time series forecasting. Rest of this thesis has following structure. Chapter 2 introduces key specifics of crude oil markets, both theoretical and empirical. Chapter 3 reviews significant literature focusing on term structure modeling, discusses basic construction of the models, their contributions and threats, and motivates selection of Nelson-Siegel model for the following empirical analysis. Chapter 4 provides general introduction to artificial neural networks, best-practice, advantages and drawbacks, and finally their applications in finance. Research methodology is presented in Chapter 5. It explains work with data and approaches used both for in-sample fit and forecasting tasks. Chapter 6 is devoted to discussion of final results and the last Chapter concludes.

Chapter 2 Crude oil markets Environmentally responsible policies and rising research towards alternatively fueled engines might have caused gradual decay of demand and production of crude oil, but recent studies refute such a conclusion (Yu et al. (2008), Pan et al. (2009)). We may argue about true reason of non-decreasing demand for crude oil, but the fact is that crude oil represents almost two thirds of global energy demand, and that crude oil is the world’s most traded commodity. Before starting detailed analysis of crude oil markets, it should be defined, what is in the text referred to when speaking about crude oil. There are three most important benchmark crudes1 : West Texas Intermediate (WTI), Brent Crude, and Dubai Crude. WTI is low-sulphur light and sweet crude oil used mainly in the USA. When analyzing spot prices of crude oil further in text, I refer to West Texas Intermediate. Brent Crude is used as marker in OPEC countries and Europe and is extracted in the North Sea. Dubai Crude, as the name suggests, is light sour crude oil produced in Dubai.

2.1

Specific features of crude oil markets

Because demand of crude oil, which is not dependent as much on price as on income (Hamilton 2009), continues to rise and supply is probable to decline (because of nature of crude oil as limited resource), literature agrees that prices will rise to unprecedented levels (Pan et al. 2009). Another fact introducing uncertainty to future development of crude oil market is its substantial volatil1 Benchmark crude or marker crude refers to crude oil playing the role of reference price for market agents.

2. Crude oil markets

4

ity. Pan et al. (2009) highlights main reasons making crude oil market one of the most volatile in the world. First, rising demand and supply strongly dependent on behaviour of often politically and economically unstable countries (especially in the Middle East) bring significant portion of uncertainty to the market. Second, crude oil demand and production is traditionally heavily correlated with occurence of exogeneous events such as military conflicts and natural catastrophes. And finally, speculators represent the third key factor of high volatility (see complex study on this topic by B¨ uy¨ uksahin & Harris (2011)). The longest available time series of crude oil (WTI) mid spot price available from Bloomberg is plotted on the following figure. Obviously, the overall long term upward trend in spot prices mentioned above is true. However, there are observable many jumps during the period. As mentioned above, crude oil price is very sensitive to behaviour and decisions of oil producing countries. For instance, in 1986 we can observe sudden fall in prices caused by political decision of Saudi Arabia to significantly increase production. Early 1990’s was period of political unrest in the Middle East, Iraq invasion to Kuwait impacted WTI spot price and we can see steep jump upwards on the curve.

Source: Bloomberg

Figure 2.1: WTI spot price 1983 - 2014 (quarterly) Naturally, situation in the Middle East is not the only factor influencing crude oil prices. For example, slower and longer term decline of prices in late 1990’s has been attributed to Asian economic crisis. Subsequent increase in

2. Crude oil markets

5

prices was implied by OPEC decision to cut production. Iraq invasion brought prices up again. In the following period starting around 2005, we can observe stronger upward trend attributed to increasing Asian demand fueling Asian economic growth. Rising demand in Asia was not the only reason for massive upward trend in WTI prices. 2003 - 2008 period is often referred as low spare production capacity period2 . Under spare capacity EIA understands amount of crude oil that can be produced within 30 days and sustained for 90 days and more. In other words, production spare capacity plays the role of a cushion with ability to absorb global supply fluctuations. Sharp decrease and slower recovery in late 2000’s are connected to global financial crisis and aftermath. Recent fall in prices can be attributed to conflict in Libya (2011) and decreasing demand in China.

2.1.1

Crude oil future contracts

Until early 1980’s, world crude oil production was dominated by OPEC. However, since this period new oil fields in the North Sea have emerged and their owners have started to compete with OPEC. In the “OPEC period”, price of crude oil was determined by long-term contracts between OPEC and oil companies. Such a setting implied stability of prices, which changed only with rearrangements of the contracts. New players from the North Sea oil fields had to undershoot OPEC long term contract prices in order to gain customers. As Haubrich et al. (2004) say, new suppliers offered discount reaching USD 8 per barrel compared to the contract prices. After less than one year, nearly 50% of global crude oil production was traded on spot markets rather than per long term contracts. Shift from long term contracts to spot market resulted in price fluctuations. In order to hedge themselves against potential losses, market agents started to trade crude oil futures. Crude oil future contracts have been traded on the New York Mercantile Exchange (NYMEX) since 1983. Clever hedging strategy is a good insurance against price fluctuations, but it is still risky. In case of higher spot price than agreed price on the expiration day, seller of the contract loses and buyer gains, and vice versa. Such a situation implies also credit risk that counterparty which is about to lose on the expiration day de2

See official web of U.S. Energy Information Administration (EIA) http://www.eia.gov/ finance/markets/supply-opec.cfm.

2. Crude oil markets

6

faults to meet its obligations. This risk is reduced by daily marking to market. NYMEX crude oil future contracts are contracts trading light sweet crude oil, contract unit is 1000 barrels and minimum fluctuation value is USD 0,013 . Mostly WTI is traded using this contract, because it meets NYMEX criteria for crude oil for this type of contract. Therefore, as Haubrich et al. (2004) concludes, WTI spot price almost equals NYMEX one-month crude oil future price. Alternative platform for crude oil trading in the USA is International Petroleum Exchange, where rather Brent Crude is traded. Lautier (2005) argues that American crude oil futures market is the most developed among commodity futures market in the world in terms of volumes and maturity of transactions. Backwardation on the crude oil market Stable occurence of backwardation is probably the most important specific feature of crude oil markets. Backwardation refers to the situation, when future prices are lower than spot prices. This phenomena is observable on the following figure. It is obvious that during last three years backwardation was present for approximately 80% of time. In order to understand occurence of backwardation on crude oil market properly, I will start in 1930’s. As Litzenberger & Rabinowitz (1995) suggest, theory developed by Hotelling (1931) is in inconsistent with empirical occurence of backwardation on the crude oil market. Hotelling (1931) postulates that equilibrium price non-renewable resources (like crude oil), which equals to net marginal revenue4 , increases over time at rate of interest. However, key differencing factor between Hotelling’s theory and theories of backwardation on crude oil market is uncertainty. Hotelling developed his theory under certainty, issue and implications of uncertainty are discussed further in text. It must be said that backwardation on commodity markets is not recently discovered issue. Also in early 1930’s J. M. Keynes dealt with risk aspect and introduced his theory of normal backwardation for forwards: “... in normal conditions, the spot price exceeds the forward price i.e. there is a backwarda3

See http://www.cmegroup.com/trading/energy/crude-oil/light-sweet-crude_ contract_specifications.html 4 Hotelling assumes constant returns to scale.

2. Crude oil markets

7

Source: Bloomberg

Figure 2.2: WTI spot price (grey area) and CLA comdty (line) onemonth future contract 06/2011 - 06/2014 tion. In other words, the normal supply price on the spot includes remuneration for the risk of price fluctuation during period of production, whilst the forward price excludes this.” (Keynes 1930). As Haubrich et al. (2004) correctly argues, according to common sense, the opposite situation on the market - contango - should be the case. Futures prices should be above spot prices of crude oil. Reasoning is following - opportunity cost equal to interest rate and storage costs make crude oil stocks disadvantegeous. This conclusion is disproved by convenience yield - a concept defined in the following section. Convenience yield As mentioned above, convenience yield justifies occurence of backwardation on commodity markets. Storing a commodity implies not only costs but also benefits. As Lautier (2005) remarks, idea of convenience yield was developed also in 1930’s as the two theories discussed above. Kaldor (1940) builds his concept as follows: “... stocks of all goods possess a yield, measured in terms of themselves, and this yield which is a compensation to the holder of stocks, must

2. Crude oil markets

8

be deducted from carrying costs proper in calculating net carrying cost ...” 5 . An example situation to better illustrate holder’s compensation is following. For instance, if there is an immediate need for crude oil from the holder’s customer or the holder himself, he will immediately benefit from having crude oil stocks rather than from going to the market and buying crude oil ad hoc. More recently convenience yield was defined by Brennan & Schwartz (1985) as “... flow of services that accrues to an owner of the physical commodity but not to an owner of a contract for future delivery of the commodity ...”. The authors further state that discounted marginal convenience yields to the the present value equal backwardation appearing on the market. As Litzenberger & Rabinowitz (1995) emphasize, such an approach implies exogeneously determined backwardation (because convenience yield is exogenous). Litzenberger & Rabinowitz (1995) suggest ednogenous approach to backwardation they introduce oil production as a call option. A producer can opt for either leaving crude oil in ground if prices are low waiting for higher prices, or vice versa, extract oil and sell it for currently high spot price. Authors came to three interesting conclusions. First, weak backwardation6 is a necessary condition for producers to extract oil. Second, risky future prices have an adverse impact on production. Third, strong backwardation7 occurs in case future prices are risky enough. Litzenberger & Rabinowitz (1995) argue that rational producer in case of growing extraction cost at no higher rate than rate of interest will not produce oil if discounted future prices are above spot prices. In other words, it is more beneficial to defer the extraction and take advantage from interest rate and its differential compared to extraction costs. In order to have some crude oil extracted, weak backwardation is a necessary condition. However, as Litzenberger & Rabinowitz (1995) point out, in case of high uncertainty about futures prices, necessary condition restricts to srong backwardation. Alternative explanation of the concept was proposed by Lautier (2005), who points out analogy between convenience yield and coupons or dividends linked to bonds and stocks, respectively. 5

Cost of carry comprises storage cost and financial cost (interest rate). Weak backwardation occurs if discounted futures prices are lower than spot prices. 7 Strong backwardation refers to a situation on a market when spot price is higher than futures price. 6

2. Crude oil markets

9

Chapter 2 introduced crude oil markets, their development and specific theoretical concepts. Having good understanding of the market, where term structure will be further modeled and forecasted, we can proceed to means by which this challenging task might be accomplished.

Chapter 3 Theory of term structure modeling For informed selection of model used for crude oil term structure modeling and forecasting, it is convenient first to introduce variety of models and discuss both their advantages and drawbacks together with their implications on actual empirical analysis. Under term structure we understand functional relationship between futures price and time to maturity. The relationship comprises and expresses market agents’ future expectations regarding the relevant market. Later in text, similarity between commodity and interest rate term structure modeling is explained.

3.1

Term structure modeling

Regardless of what market is considered, primary task of term structure models is to reproduce futures prices as most accurate as possible. As claimed sooner in text, commodities markets are similar to fixed income securities markets. Therefore, approaches to commodities term structure modeling are conceptually the same and stem from methods yield curve models. There are three simplifying assumptions valid for the commodity terms structure models - there are no frictions, taxes, or transaction costs on the market, trading is continuous, lending and borrowing rates are equal, short sale is unconstrained and markets are complete1 (Lautier 2005). However it must be noted that simple application of traditional yield curve models on commodities markets is not so 1 Market is complete if it is possible to replicate a derivative asset by combination of other assets.

3. Theory of term structure modeling

11

straightforward, also because the assumption of market completeness is with high probability violated, as many arbitrage opportunities exist (such opportunities are in case of financial assets much limited). Generally, obtaining dynamic behaviour of term structure has several steps. First, proper choice of state variables, time, time to maturity and their functional relationship. Next, application of Itˆo’s lemma2 generates dynamic relationship and then (depending on concrete model) using other model’s assumptions final equation is obtained (if feasible). Further in text dynamic behaviour is stressed and intermediate steps of derivations are omitted. All the methods presented in literature can be classified into two streams - dynamic asset pricing method and curve fitting using standard statistical methods.

3.1.1

Dynamic asset pricing method

Affine general equilibrium models The oldest stochastic interest rate models are affine general equilibrium models. These approaches consist of deriving theoretical zero-coupon interest rate curve and calibration of the model according to market bond price in order to get values of coefficients of interest. Subsequently, estimated parameters are further used for pricing of derivative securities. The models are often referred to as single-factor (or one-factor) models and try to explain yield curve as a function of one state variable. Common approach is to take short-term interest rate as the state variable, because bond prices movements are correlated and correspond to changes of short-term interest rate (the relationship is valid for first-order approximation). This means that changes in short-term rate3 approximate changes to level (i.e. horizontal shifts) of term structure. Martellini et al. (2003) also claims that such variations in level of term structure accounts for significant fraction of yield curve dynamics. First representant of the class is model formulated by Merton (1973). Merton assumed market price of interest rate risk equal to zero. Dynamic 2 Itˆ o’s lemma is mathematical identity used to find differential of function of stochastic process dependent on time. Lemma was formulated in Itˆo (1944). 3 Short-term rate is rate of return on default-free bond with shorter maturity and very short residual time to maturity - for example T-bill.

3. Theory of term structure modeling

12

behaviour was described using stochastic differential equation (SDE): drt = µdt + σdz

(3.1)

where rt is interest rate, µ is expected value of instantaneous interest rate variation, σ is standard deviation of the variation and z is a Wiener process. Wiener process is standard Brownian motion with following properties: ˆ the process is initialized with 0, i.e. z0 = 0 ˆ Wiener process as function of time t is continuous with probability equal

to 1 ˆ Increments ∆z are independent random variables following normal dis-

tribution N (0, ∆t) However, as Martellini et al. (2003) concludes, this model has various drawbacks. It can describe only low portion of term structure shapes observable on market and short-rate values can reach negative values. Few years later, Vasicek (1977) presented modified approach. He employs Ornstein-Uhlenbeck process4 and formulates SDE: drt = (µ1 + µ2 rt )dt + σ1 dWt = a(b − rt )dt + σdz

(3.2)

where b plays role of equilibrium value of short-term interest rate rt . Inclusion of mean-reverting property is in line with empirical observations on markets. The equation implies that if rt is under its equilibrium value, the expected move given by a(b − rt ) has upward direction (i.e. is positive). Speed of adjustment back to equilibrium is specified by parameter a. The higher is its value, the faster is the convergence to b. Moreover, Vasicek assumes potentially non-zero constant market price for interest rate risk. Vasicek model corresponds more to real observations made on markets compared to the former model. It allows for increasing, decreasing or flat yield curves, but inverted yield curves - such as U-shape or humped curves - are still not considered. Further, Vasicek model also captures higher volatility in case of short-term rates and lower volatility for long-term rates, a phenomenon often observed in the market. Vasicek succeeded in solving part of issues imposed by Merton’s model, but his model can 4 Ornstein-Uhlenbeck process is stationary, Markovian and Gaussian mean-reverting process generally specfied as dpt = θ(µ − pt )dt + σdz, where θ, µ, σ > 0.

3. Theory of term structure modeling

13

sill produce negative interest rates. Permissibility of negative interest rates was removed by Cox et al. (1985), as they dealt with the problem by including square root process for rt . SDE of Cox-Ingersoll-Ross (CIR) model is represented by √ drt = (µ1 + µ2 rt )dt + σ rt dz

(3.3)

CIR dealt with the problem of negative interest rates, but humped or U-shape yield curve modeling issues remain unsolved. There were formulated also other models, such as Brennan & Schwartz (1982) or Shaeffer & Schwartz (1984), which however still experience similar problems as former models. Generally, affine models do not fit actual market data very well and even imply that changes of zero-coupon rate for different maturities are perfectly correlated, which is contrary to behaviour observed on markets (Martellini et al. 2003). No-arbitrage models Much better fit to market data is achieved by no-arbitrage models. Generally, this type of models object to the fact that short-rate models price zero-coupon bonds according to endogenously gained parameters. Such endogeneous term structure is with high probability different from real one observed on the markets. Implied deviation biases computed bond prices from real market values and empirical fit of such models is suboptimal. Conceptually, arbitrage models consider actual market yield curve to be underlying asset. Therefore, they work all the time with real market values (compared to short-rate models working with state variable and with exogeneous assumptions about its process). The oldest representant is model formulated by Ho & Lee (1986). They model bond prices driven by single source of uncertainty in discrete time according to binomial process illustrated by binomial recombining tree. However, this model has various drawbacks. Again, it might produce negative interest rates with positive probability. Furhter, similarly as in case of short-rate models, existence of single explanatory variable implies perfect correlation of bond prices of all maturities. Moreover, Ho & Lee model does not allow for varying volatility - volatility of interest rates is constant over time and also over ma-

3. Theory of term structure modeling

14

turity. Finally, all possible future zero-coupon curves are parallel (Martellini et al. 2003). Later authors tried to deal with shortcomings of Ho & Lee model and extended it to continuous time. General framework was presented in Heath et al. (1992). Again, observed yield curve is regarded as underlying asset and dynamics of discount bond prices are behave according to dynamic process: dB(t, T ) = µ(t, T )dt + σ(t, T )dz B(t, T )

(3.4)

where B(t, T ) is price of bond with maturity date T at time t, µ(t, T ) is expected instantaneous rate of return on the bond at time t, σ(t, T ) is instantaneous volatility on the bond at time t and z is Wiener process. It is worth noting that the equation is very similar to Equation 3.1. However, the substantial difference is in left sides of the equations. Equation formulated by Heath, Jarrow & Morton (HJM) describes relative changes of bond prices, but the former works with absolute changes in level of interest rates. This reformulation removes positive probability of negative interest rates produced by the model. Solution of stochastic differential equation in HJM approach is B(0, T ) exp B(t, T ) = B(0, t)

Z 0

t

1 (σ(s, T ) − σ(s, t)) dˆ zs − 2

Z

t 2

2



(σ (s, T ) − σ (s, t)) ds 0

(3.5) where zˆ is also Wiener process but under new probability measure (for more detailed explanation see Martellini et al. (2003)). From Equation 3.5 is evident that bond price depends on current term structure and current volatility structure. HJM model is however experiencing different issues than previously presented models. Short-rate and Ho & Lee model often had implications or outputs contradictory to empirically observed behaviour on markets. HJM issues are rather computational. Martellini et al. (2003) alert that in their setting rt process must not be inevitably Markovian5 . Absence of Markov process properties makes computational task much more complicated and therefore user-friendliness of HJM model becomes limited. As mentioned above, HJM approach is too general allowing also for non5

Markov process is a process, which is path-independent, i.e. expeceted value of the process given information available at time t is equal to expected value of the process with information available up to (before) time t

3. Theory of term structure modeling

15

markovian processes unsuitable for straightforward computations. Therefore, some authors tried to develop their models as special cases of general model by Heath et al. (1992). Hull & White (1990) merged HJM and short-rate models proposed by Vasicek (1977) and Cox et al. (1985). Model by Hull & White is one-factor special case of HJM general model and assumes time-varying volatility of instantaneous forward rates. Further, Hull & White claim that parameters from Equation 3.2 should be time-dependent in order to capture cyclical behaviour of economy and varying expectations. Resulting short-term interest rate is following Ornstein-Uhlenbeck process dr(t) = [θ(t) + a(t)(b − r)]dt + σ(t)rβ dz

(3.6)

where θ(t) is drift (time-dependent), a(t) determines speed of adjustment (timedependent), b is long-term equilibrium of interest rate, σ(t) is time-dependent volatility factor. It is worth to note, that if time-dependence of reversion rate a and drift are removed, and β = 0, we get Equation 3.2 formulated by Vasicek. If β = 1/2, we get Cox-Ingersoll-Ross formulation as in Equation 3.36 . Hull & White proved that classic short rate models by Vasicek and CoxIngersoll-Ross are in line with no-arbitrage assumption. Moreover, conclusions made by Hull & White facilitate actual computation of short-rate model coefficients. Therefore, their no-arbitrage model belongs to favourite models used by practitioners (Martellini et al. 2003).

3.1.2

Curve fitting using standard statistical methods

Many analyses using principal components analysis (PCA) suggest that term structure should be rather described as combination of multiple factors (for PCA analysis see Martellini et al. (2003)). Basic idea of such methods is to fit term structures using a low-dimensional approximation function with small number of parameters. This approach was pioneered by McCulloch (1975) who approximated yield curve of US government bonds using cubic splines. Other model by Vasicek & Fong (1982) 6

If we allow for drift term, we get with similar modifications extended Vasicek and extended CIR models.

3. Theory of term structure modeling

16

proposed taking two state variables - next to the short-rate also variance of short-rate changes - and curve fitting using exponential splines in order to model exponential decay of the curves. Contrary to affine one-factor models, multi-factor model, such as Vasicek & Fong, does not assume single source of risk. Therefore, yield curve should be properly described by more factors than just short-term interest rate as proposed by short-rate models. But still in case of multi-factor model short-term interest rate is the key driver and yield curve level could be still approximated by short-term interest rate. Two state-variables inclusion removed perfect correlation of zero-coupon curves but still negative interest rates have positive probability. Closed form for price of discount bond does not exist (this was also the case of original one-factor Vasicek model). Closed form for discount bond price was derived by Longstaff & Schwartz (1992) who used the same two factors as Vasicek & Fong, but with different specifications (see Martellini et al. (2003)). Their model allows for both increasing or decreasing volatility and increasing, decreasing or humped term structures. Next to the models using splines for yield curve approximation, exponential functions of time to maturity were also applied in order to model exponential decay of term structures. Parsimonious three-factor model employing such type of function was introduced by Nelson & Siegel (1987). The authors claim that based on expectations theory, if spot rates are determined by differential equation, then forward rates (as they represent forecasts of spot rates) will be solutions of the differential equation. They take forward rate at maturity m as solution to second-order differential equation and get − λτ

r(τ ) = β0 + β1 e

1

− λτ

+ β2 e

2

(3.7)

where r(τ ) is forward rate at maturity τ , λ1 , λ2 are time constants and β0 , β1 , β2 are determined by initial conditions. The time constants determine decay of coefficients β1 , β2 to zero. Average of forward rates is yield to maturity y(τ ) 1 y(τ ) = τ

Z

τ

r(x) dx

(3.8)

0

Nelson & Siegel performed empirical testing and concluded that their original model was overparameterized. Therefore they reformulated the model including

3. Theory of term structure modeling

17

single time constant τ

r(τ ) = β0 + β1 e− λ + β2



τ

λ

e− λ

i

(3.9)

which has a form of a constant plus a Laguerre function7 , and 1 − e−λτ y(τ ) = β0 + β1 + β2 λτ



1 − e−λτ − e−λτ λτ

 (3.10)

Model proposed by Nelson & Siegel (1987) performs very well in practice. The authors have found that their model explains 96% of variation in yields to maturity across all maturities implying that three factors included in the model describe key attributes of yield curves, i.e. of relation between yield and time to maturity. For linkage between previously described model by Hull & White (1990) see Bj¨ork & Christensen (1999). The authors proved that yield curve modeled by Hull & White is actually member of Nelson-Siegel family of yield curves. Original model by Nelson & Siegel was extended by Svensson (1994). He included also second curvature factor which allows term structure to have additional hump 1 − e−λ1 τ y(τ ) = β0 + β1 + β2 λ1 τ

1 − e−λ1 τ − e−λ1 τ λ1 τ

 1 − e−λ2 τ −λ2 τ −e + β3 λ2 τ (3.11) where λ1 , λ2 are two decaying parameters. However, this specification can induce multicollinearity between the curvature factors. Pooter (2007) proposed following modification in order to avoid multicollinerarity problems during estimation     1 − e−λ1 τ 1 − e−λ2 τ 1 − e−λ1 τ −λ1 τ −2λ2 τ + β2 −e + β3 −e y(τ ) = β0 + β1 λ1 τ λ1 τ λ2 τ (3.12) Model formulated by Svensson has generally better goodness-of-fit compared to original Nelson & Siegel model (Pooter 2007). 





As far as term structure forecasting is concerned, recent literature often replicates paper by Diebold & Li (2006), who introduced dynamic reinterpre7

Laguerre functions are functions consisting of product of polynomial and exponential decay term and belong to approximating functions in mathematics.

3. Theory of term structure modeling

18

tation of Nelson-Siegel model  yt (τ ) = β0t + β1t

1 − e−λt τ λt τ



 + β2t

1 − e−λt τ − e−λt τ λt τ

 (3.13)

Diebold & Li have shown that forecasting task can be successfully accomplished by forecasting time series of fitted β coefficients using parametric models like AR(1). Dynamic Nelson-Siegel model will be applied on crude oil futures prices forecasts later in this thesis. Therefore, more detailed analysis of the model will be presented in Chapter 5. Contrary to affine general equilibrium models, which assume concrete functional relationship for yield curve, this class of models does not stem from any theoretical grounds and is based only on mathematical parametrization of curve shapes. Generally, models of curve fitting using standard statistical methods perform better in curve fitting and forecasting compared to affine models (Steeley 2008). Moreover, Bank of International Settlements informs in its report that Nelson-Siegel model is the most often used model among central banks (BIS 2005).

3.1.3

Review of commodity market term structure modeling

As far as crude oil market term structure is concerned, four key state variables occur in term structure models - spot price of oil, convenience yield, interest rate and long-term (equilibrium) price of oil (Lautier 2005). One-factor models As Lautier (2005) reminds, futures price is in literature defined as expectation at time t of future price of an asset given information available at time t. Therefore, spot price is natural candidate for state variable in one-factor models. Similarly as in case of interest rates models, there are two approaches - spot price plus geometric Brownian motion, and mean-reverting process for spot price. Model proposed by Brennan & Schwartz (1985) is simple and therefore probably most popular. It was used and modified in many later works (see Lautier (2005) for review). Spot price of a commodity has dynamics specified

3. Theory of term structure modeling

19

analogously to interest rate models as dS(t) = µS(t)dt + σS(t)dz

(3.14)

where S(t) is spot price, µ is spot price drift, σ is spot price volatility and dz is increment to Wiener process. Lautier (2005) further explains that uncertainty about spot price development is proportional to its level. When there is lack of a commodity on the market, every change in demand is affecting spot price level as long as inventories (in this case very thin) are not capable to equilibrate the situation. With regard to specific concepts for commodity markets such as theory of storage, it might be more appropriate to apply mean-reverting process to spot price dynamics. Such type of model was formulated by Schwartz (1997). He assumes spot price to have following dynamics with mean-reverting property S = Sκ(µ − lnS)dt + σP dz

(3.15)

where κ is reversion rate. Schwartz further substitutes X = lnS and applies Ito’s lemma and gets following Ornstein-Uhlenbeck stochastic process dX = κ(α − X)dt + σdz

(3.16)

2

where α = µ − σ2κ . Mean-reverting process can better give incentives to players on physical markets. For instance, if price is for longer period under its long-term equilibrium, actors anticipating increase in spot price enlarge their commodity stocsks and producers shrink their production. Both scenarios push price back toward its equilibrium level. As Lautier (2005) stresses, theory of storage implies that behaviour and reactions of spot price are significantly different in backwardation and contango settings, but this aspect is not reflected in the model at all. Two-factor models Most frequently convenience yield is picked as the second state variable, alternative models use long-term price or spot price volatility. Again, Schwartz (1997) proposed the most widely used model. He assumes two state variables - spot price and convenience yield - to have dynamics described by following

3. Theory of term structure modeling

20

equations dS = (µ − C)Sdt + σS dzS dC = (κ(α − C))dt + σC dzC

(3.17)

where µ is spot price drift, σS and σC are spot price and spot price and convenience yield volatility, respectively. κ is reversion rate of convenience yield, α is long-term equilibrium convenience yield, and dzS and dzC are Wiener processes associated with spot price and convenience yield, respectively. In this case mean-reverting process is applied for convenience yield and is only intermediated to spot price. Such setting reflects hypothesis of existence of equilibrium level of stock satisfying economy’s needs under normal conditions (Lautier 2005). For instance, if convenience yield is above its equilibrium value, it is rational to increase stocks and vice versa. These actions will intermediate to spot price changes. For other two-factor models see for example Gabillon (1991), who employs long-term price as the second state variable. However, as Lautier (2005) claims, superiority of choice of particular variable as second state variable over another was not determined in literature. Three-factor models One-factor and two-factor models assumed constant interest rate, which implies that future spot price and forward prices are the same, which was rejected by Cox et al. (1985)8 . Moreover including stochastic development of interest rates to the model is well reflecting the theory of storage implications. Cortazar & Schwartz (2003) developed thre-factor model with following dynamics of state variables dS = (ν − y)Sdt + σS SdzS dy = −κydt + σy dzy

(3.18)

dν = a(¯ ν − ν)dt + σν dzν where y = C − α is demanded convenience yield, ν = µ − α is expected longterm return on spot price, κ, a are reversion rates of y and ν respectively and ν¯ is long-run mean of ν. 8

Cox claims that in certainty, future spot rates must be equal to equilibrium forward rates (Cox et al. 1985)

3. Theory of term structure modeling

21

Despite the fact that three-factor models capture more state variables and seem more accurate, researcher must keep parsimony in his models. Therefore proper analysis of performance improvement associated with including more factors is appropriate.

After compact review of most famous term structure models proposed in literature, dynamic version of Nelson-Siegel model was selected for further empirical analysis. However, orginally suggested approach to forecasting employing parametric models will be challenged by nonparamteric methods, concretely artificial neural networks.

Chapter 4 Artificial neural networks In today’s world characteristic by accelerating technological progress and computational capacities, artificial intelligence is gaining more and more attention. Artificial intelligence is one of the most appealing subjects to explore in computer science. Future rise of computing power was firstly and admirably accurately forecasted by Moore et al. (1965). The author suggests that the power of computers will approximately double every two years. Real development of computers has confirmed his vision and has opened the door for human brain simulating algorithms to develop and improve. One of such models inspired by human intelligence are artificial neural networks. As mentioned earlier in text, standard term structure forecasting approach employing parametric models will be challenged exactly by artificial neural networks. As long as ANN are still not widely used and generally known phenomenon, profound analysis of their logic, construction, best-practice, benefits and limitations is appropriate. There are various definitions of this phenomena. According to Sarker et al. (2006), neural networks are “... data-driven nonparametric methods that do not require many restrictive assumptions on the underlying stochastic process from which data are generated.” Zhang et al. (1998) provides with more general definition saying that “... ANNs are thus a parallel, distributed information processing structure consisting of processing elements interconnected via unidirectional signal channels called connection weights.” Artificial neural networks are considered to be a data-driven nonparametric method, because they can learn how to map input to output only from the data without apriori

4. Artificial neural networks

23

assumptions about statistical distribution of the dataset. This capability is the most important advantage over linear regression, but there are also costs such as significantly larger computational requirements and much less straightforward (or according to some authors almost impossible) interpretation. First artificial neuron was created by Warren McCulloch and Walter Pitts in 1943, first perceptron 1 was created by Frank Rosenblatt in 1957. ANNs have been gaining popularity since mid 1980’s, when various researchers and research groups had applied artificial neural networks successfully in many areas such as pattern recognition, optimization problems, financial modeling, manufacturing, etc. (Sarker et al. 2006)

4.1

Architecture of neural networks

Artificial neural networks try to simulate connections of input and output signals in human brain. Before starting the explanation and application of a neural network itself, some key facts about functionality of a neuron in human neural system should be summarized, because exactly those features represent basic building blocks of artificial neural networks. Simplified scheme of two neurons and their interaction is provided on Figure 4.1.

Source: http://www.helcohi.com/sse/body/nervous.html

Figure 4.1: Neuron and synapses 1

Simplest feed-forward artifical neural network containing only one single neuron used to solve linearly seperable classification problems.

4. Artificial neural networks

24

First, input signals are received by synapses, to which dendrites of adjacent neurons are connected. After receiving the signals, neuron sums them up and decides whether the sum exceeded some threshold value. If and only if the answer is positive, neuron creates and sends voltage to the following neuron using an axon. The key logic of input and output signal processing remains unchanged in the case of artificial neural networks. ANN also includes neurons, input and output signals, threshold values and interconnections among single neurons. On the following figure we can see examples of two basic types of artificial neural networks.

Source: Sarker et al. (2006)

Figure 4.2: Architecture of feed-forward (a) and feedback (b) neural network There is large number of possible arrangements of ANN, however, there is a basic feature dividing all the architectures into two main subgroups. The basic classification criteria is direction of transmission of a signal. We distinguish: (a) Feed-forward ANN, where signal flows in a single direction, from input neurons to output neurons without any cycling. (b) Feedback ANN includes also signal transmissions in opposite direction. Financial applications of artificial neural networks are dominated by feedforward design.

4. Artificial neural networks

25

Kaastra & Boyd (1996) published a manual for artificial neural networks design and the suggested procedure is widely applied in literature. The authors propose eight steps for successful design and implementation of an ANN: Step 1: Proper variable selection - informed selection based on a theoretical model depending on the problem to be solved Step 2: Data collection - neural networks as data-driven method rely on the data quality even more than standard methods Step 3: Data preprocessing - standardization and scaling of the data eases training of the network Step 4: Training, testing and validation sets - network learns on the training data, and is tested on the testing subset; validation test is dedicated to validation of the final selected setting of the network (the process will be described further in text) Step 5: Network architecture - number of hidden layers, number of hidden neurons, activation function Step 6: Evaluation criteria - mean square error, sum of squared errors, etc. Step 7: Neural network training - training algorithm (such as backpropagation or genetic algorithms, introduced further in text) Step 8: Implementation

4.1.1

Neurons

It is probably clear from the previous that neuron of a network. Each neuron consists of:

2

is the most basic element

1. inputs x1 , x2 , ..., xn 2. weights w1 , w2 , ..., wn corresponding to each input xi 3. bias b 4. activation function f (x) 5. output y 2

In literature also called processing element or unit.

4. Artificial neural networks

26

The value of output y is determined uniquely and it is represented by functional value of weighted sum of inputs and bias, mathematically: y=f

n X (xi · wi ) + b

! (4.1)

i=1

As Sarker et al. (2006) summarizes, there are three basic features defining particular artificial neural network: 1. computational process, i.e. activation function 2. network architecture, i.e. feed-forward vs. feedback, number of layers, number of neurons in each layer, etc. 3. learning algorithm, i.e. algorithm used for training the network in order to find the set of weights resulting in the output as closest to global optimum as possible

4.1.2

Activation functions

There are several types of activation functions each of them with its unique specifics. Practically all of the activation functions have one common feature they represent such a relationship, which maps weighted input (see the above equation) to intervals h−1; 1i or h0; 1i. (Sarker et al. 2006). According to Zhang et al. (1998) and Sarker et al. (2006), there are five activation functions most frequently found in the literature: ˆ Logistic sigmoid function: f (x) = (1 + exp(−x))− 1 ˆ Hyperbolic tangent sigmoid function: f (x) = tanh(x) =

exp(x)−exp(−x) exp(x)+exp(−x)

ˆ Sine or cosine function: f (x) = sin(x) or f (x) = cos(x) 2

) ˆ Gaussian function: f (x) = exp( −x 2σ 2 ˆ Linear function: f (x) = x

Logistic sigmoid functions are most widely used activation functions in literature (Zhang et al. 1998). Also Kaastra & Boyd (1996) point out that sigmoid function is the most popular in financial applications because of its favourable properties including nonlinearity and differentiability useful for training stage.

4. Artificial neural networks

4.1.3

27

Layers

As mentioned in the previous text, there are basically three types of layers present in the artificial neural network. First, there is one input layer. Number of neurons in the input layer is determined ad hoc according to the concrete problem to be solved. Input layer can be seen as a set of inputs or independent variables. In order to build efficient neural network, the researcher should do an informed selection of independent variables and it must be well reasoned in theory (Kaastra & Boyd 1996). Second, neural network can include one or more hidden layers with various counts of hidden neurons. Hidden layer is essential for modeling non-linear relationships in the data. Adequate number of hidden neurons enables the network to approximate any continuous function. However, it is not true that “richer” hidden unit means better fit to non-linear problems. In fact, large number of hidden layers and neurons results in overfitting and excessive computation time (Kaastra & Boyd 1996) There have been many attempts to find a rule for determination of proper number of hidden layers and hidden neurons, literature review on this topic is provided in Sarker et al. (2006). The author also points out that “a network with fewer than required number of hidden units will be unable to learn the input-output mapping, whereas too many hidden units will generalize poorly of any unseen data”. In order to keep a model parsimonious, all the reviewed literature of neural network design agrees that architecture with one or two hidden layers is optimal (Kaastra & Boyd 1996). Finally, output neurons represent dependent variables. In theory, we can include more than one output neuron into the network, but it results in more problems than benefits. Basic mechanism of neural networks says that weights are optimized in such a way that some error function gets minimized (a measure of difference between real output of the network and desired output). For example in case of forecasting, we might be interested in let us say five-stepsahead forecast. Therefore, it might be useful to have five output neurons in the architecture. Suboptimality of this design stems from nature of optimization. The network tries to minimize the sum of error measures including each output neuron. This probably leads to larger deviation of each output from its desired counterpart. Therefore, best practice says, that only one output neuron approach is more appropriate.

4. Artificial neural networks

4.2

28

Learning algorithms

Performance of a network is determined by measures of deviations between network output and desired output, e.g. value of dependent variable. Zhang et al. (1998) summarizes five most frequently used measures in literature: ˆ mean absolute deviation M AD = ˆ sum of squared errors SSE = ˆ mean squared error M SE =

P

|ei | N

P 2 (ei )

P (ei )2 N

ˆ root mean square error RM SE =

√ M SE

ˆ mean absolute percentage error M AP E =

1 N

P

| yeii | · 100

where ei is an error, yi is an desired (actual) value and N is number of errors.

4.2.1

Data splitting

In order to train an artficial neural network properly and avoid overfitting, a researcher has to decide about data splitting. Data are usually splitted into two subsets - in-sample and out-of-sample data. Learning process is typically based on cross-validation, where in-sample data are used for proper setting of the network and are further splitted into training and testing subsets. Neural network specifications are set in the training subset and tested in testing subset. Out-of-sample dataset is used to evaluate performance of the final model with selected specifications. There are many studies trying to find an optimal ratio how to split the data (see Zhang (2012)). In literature, most frequently used in-sample to out-of-sample ratios are 70:30 %, 80:20 % or 90:10 %. In case of non-time series data, dataset should be splitted randomly (not appropriate in case of time-series in order to keep autocorrelation patterns).

4.2.2

Backpropagation

Sarker et al. (2006) claim that 95% of neural networks application use backpropagation as training algorithm. There are various subtypes of backpropagation algorithm and two of them will be presented. Gradient descent backpropagation algorithm is most often used in ANN literature (Ghaffari et al. 2006). Its procedure was proposed by Rumelhart

4. Artificial neural networks

29

et al. (1985) and consists of measuring output error of a network, calculating its gradient and adapting vector of weights according to descending gradient. There are various modifications of gradient descent backpropagation, differing in weights adjustment frequency (once per iteration, after processing whole learning dataset, etc.). Weights are adjusted using following equation ∆wt = −η∇err(w)|w=w(t) + α∆wt−1

(4.2)

where w is vector of weights, −η is learning rate, α is momentum factor, and err is cost function. Speed of training using this type of algorithms depends heavily on choice of learning rate and momentum parameters by the user (Sarker et al. 2006). Compared to gradient descent algorithm, which work only with gradient of error surface, Levenberg-Marquardt algorithm reflects also curvature of the surface. This modification improves weight adjustment and makes convergence to minimum of cost function faster. However, both of the models are endangered by getting stuck in local minimum.

4.2.3

Genetic algorithm

Alternatively, artificial neural network can be trained using algorithm imitating natural selection. Basic idea is to select best individual from population, which changes over time according to pre-specified fitness function (in ANN setting, difference between target value and ouput value in any functional form (e.g. SSE or MSE)). Each individual in the population has n-bit chromosome. For search for vector of optimal weights, it is appropriate to define chromosome as n-dimensional vector of weights (chromosome includes all the weights from input and hidden layers). Training using genetic algorithm can be splitted to following steps: 1. Generate randomly pre-specified number of individuals in population. 2. Compute each individual’s fitness function and remember the best individual (in ANN setting with lowest value of fitness function defined as any measure of error). 3. Generate offsprings of current population. First, using some rule reflect-

4. Artificial neural networks

30

ing their fitness function (i.e. better value of fitness function increases probability of selection of particular individual)3 select individuals, who become parents. 4. Use crossover genetic operator, which means that two selected individuals transmit part of own genetic information to the offspring. 5. Mutate the offspring with pre-defined probability of mutation and add resulting individuals to the new population. 6. Remove old population and replace it by the resulting population. 7. Go to Step 2. During pre-specified number of epochs best individual is always remembered and fitness function of all new individuals is compared to the best one. If better fit is found, best individual, i.e. vector of optimal weights, is replaced. Sexton et al. (1998) points out that speed of convergence favours backpropagation algorithms. However, genetic algorithms performs better in finding global minimum and are less vulnerable to get stuck in local optimum, because, compared to backpropagation algorithms, genetic algorithms are searching in multiple directions at once, which increases probability of finding global minimum (Sexton et al. 1998).

4.3

Advantages of artificial neural networks

As White (1992) compares, neural networks are in fact very similar concept to least squares solutions to regression problems. If we remove the hidden layers from a feed-forward network and we choose linear function to be our activation function, we get basically linear regression model. Inputs represent independent variables, output is just another word for a dependent variable. Weights could also be called coefficients and bias plays the role of an intercept. As Kaastra & Boyd (1996) point out, if we allow for one hidden layer, we get a model resembling to non-linear least squares regression, where the weights determine the regression curve parameters. If we set the inputs to be lagged values of the dependent variable, we get in principle a model equivalent to a 3

There are various algorithms such as roulette wheel selection or tournament selection used for selection of individuals who pass their genetic information to the next generation.

4. Artificial neural networks

31

nonlinear autoregressive model. As will be demonstrated on concrete application in empirical part of this thesis, this setting is often useful for time-series forecasting, where the exact nature of relationship between lagged values and current value is uncertain. According to Bahrammirzaee (2010), artificial neural networks benefit mainly from no assumptions about statistical distribution of data, which give them opportunity to be applied to wider range of problems than standard statistical methods. Further, the author points out that adaptability of artificial neural networks allows for new observations to be added to the dataset and training without reprocessing original data in order to update the model. Neural networks also do not require a concrete feeling about nature of relationship among variables (linear, quadratic, etc.) - these interactions get modeled in hidden layers. Therefore, as also Zhang (2012) concludes, ANN are much less vulnerable to misspecification problem than parametric methods. Aljinovi´c & Poklepovi´c (2013) highlight ability of artificial neural networks to deal with missing, erroneous or noisy data. In such case ANN outperform standard statistical methods easily (Sarker et al. 2006).

4.4

Criticism and limitations of artificial neural networks

In spite of their significant benefits, artificial neural networks are still not a mainstream method for above mentioned problems. Kaastra & Boyd (1996) and many others see their major drawback in their black-box appearance. What happens in hidden layers is often an unknown very difficult to interpret. Compared to for example linear regression, there is no interpretation of coefficients as measure of influence of a particular independent variable on the dependent one feasible. Another important aspect to be aware of is the computing time of the training stage, which is in fact the heart of neural network process. Even if a researcher sacrifies the time to train the network, there is another large amount of time which has to be dedicated to proper setting of large number of parameters (such as number of hidden layers and number of contained neurons, etc.). Unfortunately, there are no procedures of the right setting and parameters selection of the network. This has to be done simply by trials and errors. Therefore, many opponents claim that proper setting is driven by

4. Artificial neural networks

32

chance and luck. In fact, there are multiple rules of thumb and best practices, but literature on neural networks design agrees that the most important factor is patience. However, even if the researcher deals with network setting and finds efficient model giving satisfactory results, there is problem in replicating the solution (Kaastra & Boyd 1996). Neural networks are also vulnerable to overfitting4 and to deadlock in local optimum which can be avoided mainly by proper choice and implemenation of learning algorithm.

4.5

Artificial neural networks and finance

Artificial neural networks are a very appealing concept in finance. Its undisputable benefits, which are especially appealing for financial markets applications, attracted financial institutions to support research on this topic. As Trippi & Turban (1992) say, financial institutions have been the second largest sponsor of research activities focusing on neural networks. Sarker et al. (2006) summarizes the most frequent and notable appliacations of artificial neural networks in finance: ˆ derivatives pricing, trading and forecasting ˆ future price estimation ˆ stock performance and portfolio selection ˆ foreign exchange rate prediction ˆ bankruptcy forecasting ˆ detection of fraud

Bahrammirzaee (2010) points out that today’s financial applications exhibit non-linear and uncertain behaviour changing over time. Such increasing need for solutions of non-linear and time-variant tasks gives neural networks an opportunity to show their results superior to standardly used statistical methods.

Recall, that core task of this thesis is to model and forecast term structure 4 Overfitting occurs in case of excessive ratio of input neurons and sample size. It results in “perfect” fit to the data, but the output has no implications and obstructs generalization. Neural network “remembers” the single data points rather than general patterns in the data.

4. Artificial neural networks

33

of crude oil markets using alternative approaches - concretely artificial neural networks. After description of theory and practice of crude oil market, Chapter 3 proceeded to detailed review of possible approaches to this type of task and selection of the Nelson-Siegel model for further research was made. I have also pronounced my intention to challenge standard forecasting methods by nonparametric approach. Last chapter of theoretical part of this thesis focused on explanation and prerequisites for proper usage of artificial neural networks. Having defined market environment, theoretical approaches and special tools, it can be proceeded to actual empirical part of this thesis in following chapters.

Chapter 5 Research methodology 5.1

Data

I have chosen two types of crude oil futures prices data, which will be presented later in text. Recall, WTI futures contracts are traded in US dollars and cents per 1 barrel. Contract unit is 1000 barrels and minimum fluctuation is 1 cent per barrel. A contract expires three trading days prior the 25th calendar day in the month preceeding the month of delivery.1 The understanding of the concrete structure of the data is crucial for further estimation of the models. I have chosen to work both with time series of prices of concrete historical contracts (monthly frequency) and with front contracts (daily frequency) data, however, each of the dataset has different purpose in the analysis.2 First, I will present historical contracts dataset. I have analyzed in total 396 historical (already delivered) and to-date undelivered contracts - 12 contracts per each year with delivery months in period starting 1984. Undelivered contracts represented in the dataset are contracts with delivery in November, December 2014 and 24 contracts with delivery in two subsequent years 2015 and 2016. I work with monthly (end-of-month) frequency of WTI futures prices for all the 396 contracts. Naturally, there are also higher frequency data available, but there are multiple reasons why I am going to work with monthly data in 1

Further specification of WTI futures contracts available on http://www.cmegroup.com/ trading/energy/files/en-153_wti_brochure_sr.pdf 2 Time series have been downloaded from online database on https://www.quandl.com/ c/futures/cme-wti-crude-oil-futures.

5. Research methodology

35

actual modeling. First, vast majority of yield curve modeling literature employs monthly data. Second, I want to use data for multiple decades of WTI futures trading in order to capture the widest variety of term structure shapes. Higher frequency data would yield extensive datasets implying cumbersome data manipulation, computational complexity, and excessive processing time. Furthermore, reviewed literature indicates that high frequency data do not outperform monthly data in oil prices predictions (Baumeister et al. 2013). As mentioned earlier in text, older observations (concretely end-of-month observations in period 1984 to 1990) suffer from low number of contracts traded at once. To preserve predicative value of the data, I removed months where less than 10 different contracts (i.e. 10 different numbers of days to maturity) were traded. Moreover, the maximum time to maturity for these contracts is about 9 months. The situation improves in 1990s and 2000s, where the maximum time to maturity reaches even more than 6 years. The problem comes again in most recent contracts (mainly to date undelivered contracts) which again show lower number of maximum time to maturity. These significant differences in maximum time to maturity in particular periods in the dataset make modeling of the far end of term structure unfeasible. In principle, there are three options how to proceed further in analysis. First, I might give up modeling longer term structures and use data for the longest period available. To be accurate, I would have to determine maximum time to maturity for each observed point in time, i.e. time to maturity of the most distant contract traded on the particular day; and consequently find minimum of this values over whole period. This minimum value would represent the furthermost end of y-axis representing time to maturity. On each observed point in time, contracts with longer time to maturity than the above defined threshold must be removed from dataset, which implies significant redution of original dataset.3 If I insist on modeling longer term structure (I have decided to model term structure up to 720 days to maturity), there are other two possibilities how to proceed. First, I decide about maximum time to maturity to be analyzed (720 days) and exclude observations on points in time where time to 3

For example, if in January 1985 the most distant contract expires in 300 days and on all other days, the most distant contract has longer expiry, I have to pick 300 days as maximum time to maturity analyzed and drop all contracts with longer time to maturity.

5. Research methodology

36

maturity of the most distant contract does not exceed 720 days.4 . There is also other option, which is inferior to all the preceeding ones. I could keep all the observations and simply extrapolate futures prices up to my desired maximum time to maturity. Such an approach introduces many risks as extrapolation is generally not recommended with regard to maintaining accuracy and actual substance of the data. Therefore, I have decided to exclude all observations up to 1990 in order to avoid risks and inaccuracy of data extrapolation and to make modeling of longer term structures legitimate. Before proceeding to organization of data, I would like to point out my motivation for decisions regarding my dataset. Benefits of methods, which I am going to employ in my analysis, in comparison to other approaches to yield curve modeling, are mainly associated with their ability to model various types and shapes of term structure curves - increasing, decreasing, humped, etc. In my opinion, the broadest palette of term structure shapes is best observable over long time period - in our case observations collected over the past 30 years, which is the longest time period available for WTI futures data. On the other hand, longer term structure (in term of broader range of days to maturity values) is both theoretically and practically appealing area of research. With respect to time-to-maturity vs. legth-of-period trade-off, I have decided to keep maximum points in time and maximum time to maturity feasible in my dataset. Front contracts dataset experiences much less issues compared to historical contracts. In the thesis, front contracts are not used for actual term structure modeling, they are used only for verification of stylized facts about yield curve as formulated in Diebold & Li (2006) and their validity for WTI futures prices data. In order not to confuse the reader, their description is presented in relevant Section 5.2.

5.1.1

Data organization

This and the following sections focus on historical contracts dataset used for actual term structure modeling. As follows from definition of term structure 4

In my analysis, I am using simplified convention: 1 month = 30 days and 1 year = 360 days.

5. Research methodology

37

stated above, I essentially need two sets of observations - futures prices and days to maturity. As far as crude oil futures (generally commodities futures) are concerned, data collection process is in general much less straightforward than in case of interest rates. Table 5.1 provides with examples of actual data to better illustrate features and dimensions of the dataset and also general logic of futures data. Publicly available WTI futures contracts data consist of observations of open, high, low, and settle futures prices and traded volume and previous day open interest. Time to maturity is not possible to download compactly within prices and volumes observations. Therefore, it is inevitable to work with concrete specifics of WTI crude oil futures contracts. As mentioned above in text, contracts mature three business days prior 25th calendar day in month preceding to the month of delivery. For example, CLQ20035 contract matures July 22nd 2003. In order to associate each observation of futures price with corresponding time to maturity, it is necessary first to find exact expiry date of each contract. Then, the difference between expiry date and date of observation gives us remaining days to maturity. Contract Date

...

28.2.2001 31.3.2001 30.4.2001 31.5.2001

CLQ2003 Settle τ

CLU2003 Settle τ

CLV2003 Settle τ

21,72 22,76 23,46 23,57

21,62 22,70 23,35 23,45

23,33

625 602 582 559

646 623 603 580

...

603

Table 5.1: Example of actual monthly data

Table 5.1 captures end-of-month futures prices of three different (in this case consecutive) contracts with delivery in August, September and October 2003. For example, at the end of February 2003, CLQ2003 and CLU2003 contracts were traded. On February 28 2001 it was possible to enter into contract with delivery in August 2003 with futures price USD 21,72 per barrel. Respective time to maturity (τ ) was 625 trading days. 5 Note about contract naming logic: CL is CME product code for WTI futures contract, Q symbolizes delivery in August and 2003 stands for delivery year 2003.

5. Research methodology

5.1.2

38

Data adjustment

After computation of days to maturity of each observed quotation of futures price, the dataset has the form of a matrix with number of rows equal to number of observed days and number of columns equal to number of analyzed contracts. As far as term structure modeling and forecasting is concerned, Hansen & Lunde (2013) summarize that there are basically two ways how to work with futures prices data. The chosen way determines also optimal organization of researcher’s dataset. First, it is possible to work with time series of futures prices and days to maturity for concrete contracts and employ autoregression. Such an approach requires no reorganization of the dataset explained above. However, the dataset resembles “diagonal” matrix and decision about dates where particular contract had been not yet or no more traded must be made. Alternative approach works with time series of futures prices associated with same time to maturity. This method requires reorganization of the data inspired by Diebold & Li (2006). Desired form of dataset is a matrix with number of rows equal to number of days included in analysis and number of columns equal to number of analyzed maturities. See the example of reorganized dataset in Table 5.2.

Date 28.2.2001 31.3.2001 30.4.2001 31.5.2001

30

60

27,48 26,50 28,74 28,49

27,36 26,59 28,89 28,42

Days to maturity (τ ) 90 120 150 180 26,99 26,43 28,53 28,14

26,60 26,20 28,07 27,78

26,21 25,94 27,60 27,38

25,84 25,68 27,19 27,00

210

...

25,48 25,43 26,78 26,59

... ... ... ...

Table 5.2: Example of reorganized data to constant time to maturity Time series captured in Table 5.2 are generally not observable. Furthermore, WTI crude oil futures are delivered and expire with one-month regularity. Therefore, futures prices with exactly 30, 60, or 90 days to maturity are not traded every day, the dataset must be artifically completed using interpolation.

Cubic splines interpolation method Form of dataset presented in Table 5.2 is referred to as constant-maturity futures prices. Table 5.3 shows futures prices and corresponding time to maturity

5. Research methodology

39

for May 31 2000. As we can see, no contract with exactly 30, or 90 days to maturity were traded. In order to get futures price respective to 30 or 90 days to maturity, interpolation will be applied.

Contract Date

CLN2000 Settle τ

CLQ2000 Settle τ

CLU2000 Settle τ

CLV2000 Settle τ

31.5.2000 ...

28,61

29,01

28,42

27,89

15

37

60

81

CLX2000 Settle τ 27,4

103

...

Table 5.3: Traded contracts on May 31 2000 There are various interpolation methods used in financial markets literature. Diebold & Li (2006) use linear interpolation for constant maturity transformation, Holton (2003) prefers cubic splines interpolation. For detailed discussion of interpolation methods for curve construction with applications on yield curve modeling see Hagan & West (2006). I have decided to interpolate my data by cubic splines according to algorithm presented in Holton (2003). He defines cubic spline as function f : R → R constructed by splicing cubic polynomials functions over set of intervals such that:    p1 (x) x(1) ≤ x < x(2)     p (x) x(2) ≤ x < x(3) 2 (5.1) f (x) = . ..      p (x) x n−1 (n−1) ≤ x < x(n) Consider two points in a plane - i-th and (i+1)th known pairs (x(i) , y(i) ) and (x(i+1) , y(i+1) ). To be concrete, let us make i equal to 1. Using data from Table 5.3, we have (15, 28,61) and (37, 29,01). There are following conditions of cubic splines construction: 1. Each polynomial goes through endpoints of spliced intervals: p(x(i) ) = y(i) ∧ p(x(i+1) ) = y(i+1)

(5.2)

5. Research methodology

40

2. First and second derivatives match at point inside intervals: d d d2 d2 pi (x(i+1) ) = pi+1 (x(i+1) ) ∧ 2 pi (x(i+1) ) = 2 pi+1 (x(i+1) ) dx dx dx dx

(5.3)

3. Second derivatives at endpoints are zero: d2 d2 p1 (x(1) ) = 2 pm−1 (x(m) ) = 0 dx2 dx

(5.4)

Presented restrictions result in system of linear equations. The equation of the following form are solved for : pi (x) = αi x3 + βi x2 + γi x + δi

(5.5)

where i ∈ h0, mi, and m is number of control points known prior the interpolation. System is solved for (αi , β, γi , δi ), i ∈ h0, mi.

5.1.3

Final dataset

Before proceeding to modeling of term structure, short illustration and description of final adjusted data further used for actual research is appropriate. Following figures show development of term structure of WTI futures prices over time. I have decided to present term structure in four figures seperately which enables better observation about their behaviour over different periods.

Figure 5.1: WTI futures prices term structure: 1990 - 2004 Term structure development in period 1990 to 2004 is plotted in Figure 5.1. In 1990s we can observe relatively steady development with slight shift

5. Research methodology

41

downwards in time of Asian crisis and lower demand. However, after 2000 the term structure development changes its dynamics and many horizontal shifts upward are observable. Literature speaks about energy crisis of 2000s. However, there is no clear consensus regarding its triggers. Some authors attribute increase in futures prices to speculators and sudden shrinkage of oil reserve, other researchers disprove their arguments (for detailed discussion about arguments of both sides see Kilian & Murphy (2014) or Fattouh et al. (2012)). In period between 2005 and 2009 relatively still and time series of term

Figure 5.2: WTI futures prices term structure: 2005 - 2009 structures is interrupted by turbulences starting in 2008. Before the turmoil, we can observe steady development of term structure with decent increase of futures prices across all maturities without visible jumps. Let us inspect in many ways turbulent period of 2008 and 2009. Immediately in the beginning of the period crude oil futures prices exceeded USD 100 per barrel. Militrary conflicts in Nigeria (including oil pipelines attacks) pushed up the prices even more. Tension between Iran and Israel and consequent fear of oil crisis as known from 1970s accelerated further rise to unprecedented levels. Political unrest in the Middle East and sharp depreciation of U.S. Dollar resulted in frequent and significant horizontal shifts in term structure. USD 11 per barrel jump in 24 hours in futures prices pulled the term structure to values around USD 140 per barrel.6 Global financial crisis returned WTI term structure back under USD 100 per barrel. Most recent data in Figure 5.3 exhibit horizontal shift of the term structure upwards at the end of 2009 caused again 6

New York Times, June 8, 2008: http://www.nytimes.com/2008/06/07/business/ 07oil.htmlem&ex=1212897600&en=3877f96aee464d82&ei=5087%0A&_r=0

5. Research methodology

42

Figure 5.3: WTI futures prices term structure: 2008 - 2009 by complicated political environment in the Middle East - this time conflicts in Gaza Strip. The most recent five-year period exhibits many shapes of term structure.

Figure 5.4: WTI futures prices term structure: 2010 - 2014 Increasing, decreasing or humped curves are observable. Obviously, WTI futures term structure expirienced horizontal shift upward in 2011. As I have mentioned earlier in text, this can be attributed both to political unrest in Egypt or Libya and weak U.S. Dollar. Another steep shift upwards in 2012 had also political reason - concretely danger of closing Strait of Hormuz by Iran as an answer to sanctions against Iran’s nuclear programme. U.S. Energy Information Administration claims that approximately 20% of worldwide

5. Research methodology

43

traded crude oil passes through the Strait.7 Also Greek bailout and stimulated Chinese economy by increased money supply contributed to rise of crude oil prices reflected also in its term structure.

5.2

Stylized facts about term structure

As mentioned earlier in text, I will try to model term structure of crude oil prices by means broadly used for modeling yield curves of bonds. Naturally, crude oil futures data exhibit many differences, but we suggest that on the whole, crude oil term structure has the same (or very similar) features as standard yield curve of government bonds. Diebold & Li (2006) present five stylized facts about government bonds yield curves. First, I present their list and explanation and further in text, I try to demonstrate that crude oil monthly futures prices data are in line with stylized facts as presented by Diebold & Li (2006) in case of yield to maturity. This conclusion is a justification8 for the use of dynamic Nelson-Siegel approach as benchmark model for further estimations. Stylized facts presented by the authors are following: 1. Average (in terms of average values of estimated coefficients in their model) yield curve is increasing and concave. 2. Yield curve has various shapes through time - upward or downward sloping, humped, and inverted humped. 3. Yield dynamics are persistent, dynamics of spreads are much less persistent. 4. The “near” end is much more volatile than the “far” end of the yield curve. 5. Long rates are more persistent than short rates. In order to have the most demonstrative and conclusive answers, I examine the stylized facts on the longest WTI futures time series available - over 30 7 See official EIA report on world oil transit choke points at http://www.eia.gov/ countries/analysisbriefs/World_Oil_Transit_Chokepoints/wotc.pdf. 8 Verification of the stylized facts for the case of crude oil future prices is not a prerequisite for proper usage of Nelson-Siegel model.

5. Research methodology

44

years of daily observations. First observations were made on March 30, 1984 and last on November 17, 2014. In contrast to the actual modeling of term structure by Nelson-Siegel model, above formulated stylized facts about yield curves will be confirmed (or rejected) using non-adjusted daily WTI front futures contracts data. The number of front contracts per day is limited to 60th nearest contract. Roll-over day is first day in month of delivery. I will proceed similarly to Hansen & Lunde (2013). They divided the dataset to 100-days clusters (for example first cluster included contracts with time to delivery up to 100 days, the second contracts with delivery from 100 to 200 days, etc.). In order to achieve most meaningful results, I decided to employ smaller clusters with period of 30 days. Such a clustering takes advantage from front contracts characteristics, because, in general time to delivery of the k-th nearest contract falls within the interval: (30(k − 1), 30ki, k ∈ Z+ Therefore, each front contract per a day in sample falls within respective cluster. First nearest contract to the first cluster, etc. Such reasoning allows for proper verification of the stylized facts. It must be noted that fulfillment of stylized facts about term structure is not prerequisite for valid application of dynamic Nelson-Siegel approach to term structure modeling. High-level verification of stylized facts is therefore done only visually in order to present interesting similarity between term structure of government bonds and crude oil futures. Fulfillment of stylized fact 1) is illustrated on Figure 5.5. Average term structure is really increasing (or at least non-decrasing). The concavity is not very obvious, however, it cannot be immediately rejected. As I have demonstrated in preceding section, crude oil term structure is vulnerable to political decisions and conflicts and its shape often changes not only in sense of horizontal shifts, but also in actual shape. Its ability to exhibit wide variety of shapes is shown in Figure 5.6. I have chosen four days with different shapes of the analyzed curve. At the end of November 1990 we can observe smooth decreasing term structure. In May 1999 the curve does not show any smoothness and behvaiour of the curve is very unclear. Bottom

5. Research methodology

45

Figure 5.5: Average term structure over period 1984 - 2014 left panel shows nice increasing curve and the most recent example proves also presence of humped curves in the data.

Figure 5.6: Term structure shapes Stylized fact 4) can be checked via analysis of standard deviation of prices for different maturities over the observed period. In spite of the fact that shape of standard deviation is humped, lower volatility of long end of term structure is still observable. High standard deviation for medium maturities was detected also by Diebold & Li (2006) in case of government bonds yield curve and Hansen & Lunde (2013) for crude oil term structure. Remaining two stylized facts - 3) and 5) - speak about persistence. The former about higher persistence of yield dynamics over spread dynamics.

5. Research methodology

46

Figure 5.7: Standard deviation of futures prices for different maturities To remind a reader once again, the stylized facts were formulated for yield curve of government bonds. However, as we have shown by verification of other stylized facts, yield curve and futures prices term structure exhibit common features. Therefore, it is acceptable to slightly reformulate the statement and replace “yield” by “futures price”. Similar reasoning applies for the latter, where we replace “rates” again by “futures prices”. Verifying of both proposition is unfortunately not so straightforward and cannot be performed by simple statistical analysis of raw data.

5.3

Dynamic Nelson-Siegel model

As starting point for my forecasting task I am going to use dynamic version of Nelson-Siegel model as formulated by Diebold & Li (2006). Choice of NelsonSiegel type of model is motivated by various aspects. First, other classes of models such as no-arbitrage or affine general equilibrium models fail in forecasting. As Sarker et al. (2006) points out, no-arbitrage models focus on crosssection fitting of yield curve at particular point in time, which implies lack of capturing yield curve dynamics by the model. Affine models do capture time-series dynamics, but omit proper cross-sectional fit at given time. But still, their forecasting results are very poor. Second, functional specification of yield curves provided by Nelson & Siegel (1987) is able to model diverse shapes observable on markets such as increasing, decreasing or humped curves. Third, the model provides intuitive parameters, which are straightforward to explain and interpret. Further, Bliss (1996) has shown that Nelson-Siegel model out-

5. Research methodology

47

performs other methods in yield curve estimation. And finally Diebold & Li (2006) proved Nelson-Siegel model to be able to replicate stylized facts about yield curve. On the contrary, Duffie & Kan (1996) concluded that yield curves estimated by affine general equilibrium models, such as Vasicek or CIR, do not conform the behaviour observed on markets. Diebold & Li (2006) succeeded in yield curve forecasting using time series of coefficients of three yield curve components formulated in Nelson-Siegel model. Compared to previous research on term structure forecasting, they do not apply no-arbitrage or equilibrium approaches. Instead of these methods, they forecast U.S. government bond yields by predicting individual exponential factors of yield curve as defined by multifactor model by Nelson & Siegel (1987). Diebold & Li (2006) described functional form of yield curve as follows:  yt (τ ) = β0t + β1t

1 − e−λt τ λt τ



 + β2t

1 − e−λt τ − e−λt τ λt τ

 (5.6)

where yt (τ ) is yield at time t on bond with time to maturity τ and β0t , β1t , β2t are interpreted as coefficients on level, slope and curvature factors, respectively. Recall that level factor is long-term component as the values of the factor are constant over whole period and maturities. Slope factor is short-term component, as long as it decays exponentially at rate λt . Finally, curvature factor is is referred to as medium-term component, because it increases for medium-term maturities and then decays for the longest maturities. As I have stated earlier in text, I will model prices instead of yields using Nelson-Siegel approach. Therefore, in case of my WTI crude oil futures dataset, T = 289 is number of monthly observations, t = 1, 2, ..., T and τ = 30, 60, 90, ..., 720 is time to maturity measured in days.

5.3.1

Decay parameter λt

The most controversial element in Nelson-Siegel class of models is parameter λt determining exponential decay (some literature uses its alternative form τ1 ). Low values of the parameter imply slower decay of the resulting curve and vice versa. Empirically, choice of λt value represent a tradeoff between fitting close and far ends of term structure. Higher values of the parameter result in better

5. Research methodology

48

fit of the functional form in case of short maturities. Conversely, lower values improve fit for longer maturities (Diebold & Li 2006). If we analyze the equation mathematically, we find out that λt also defines maturity where loading on the medium term curvature factor (β2t ) is maximized. Moreover, λt handling governs actual nature of above defined relationship. If we allow for dynamically evolving λt over time, we obtain non-linear problem, which is computationally much more demanding than linear one. In order to get linear nature of the relationship, it is useful to fix λt . If the value is fixed, it is possible to consider the explaining variables as three latent factors determining linearly the dependent variable. This raises question about proper choice of the fixed value. In literature there are four approaches observable first, some authors try to estimate λt ex ante using assumptions regarding time to maturity. They often presume that commonly 2-years or 3 -years time to maturity is considered as medium-term maturity on bond markets. For example Diebold & Li (2006) set 30 months as medium-term maturity for government bonds. Consequently, they solve following maximization problem:  λf ixed = argmax λ

1 − e−λt τ − e−λt τ λt τ

 (5.7)

where τ is author’s defined medium-term maturity, in Diebold’s case 30 months. This approach is according to my review of yield curve modeling literature very common. Probably even more common approach is simply just to adopt fixed value of λ (with more or less elaborated reasoning) from another paper, usually precisely from Diebold & Li (2006). Majority of papers which I have reviewed model U.S. yield curve and use very similar dataset. In this case is simple adoption of λ value acceptable.9 However, both of above described methods of fixing the decay parameter are infeasible in my case. Literature on crude oil term structure does not provide any well reasoned suggestions about medium-term maturities on oil markets. Moreover, there is almost no reference for proper choice of λ, as long as very few literature focuses on modeling term structure of crude oil markets using Nelson-Siegel family models. 9 Moreover, a researcher must be aware of the fact that value λ is from its logic sensitive to units used for time to maturity, i.e. if maturity measured in months implies fold higher values of λ, on the contrary maturity measured in days is associated with values much closer to 0.

5. Research methodology

49

Third approach to search for proper λ employs non-linear least squares estimation of all four parameters in Equation 5.6, i.e. β0t , β1t , β2t and λt for each t in interval h1, 289i10 However, there are multiple issues. It is true that allowing for dynamic development of λ I obtained very good fit, but resulting four time series of coefficients completely lose their predicative power. Vela (2013) made similar observation. He analyzed Latin American yield curves and found out that compared to U.S. yield curves, optimal λt is very unstable with unexpected jumps implying significantly worse performance of forecast models. Optimal values of λt for each t are ploted in Figure 5.8. Optimal lambda

Figure 5.8: Time series of optimal λt was found as mimimizing argument of sum of squared errors of Nelson-Siegel approximations of real WTI futures term structure for each observed point in time. To ease the optimization, I restricted its values to correspond maturity between 0 and 1000 days. Recall, that λ determines reciprocal value to number of days to maturity where medium-term (i.e. curvature) factor is maximized and therefore, search for optimal 1/λt outside this interval - let me call the respective set of allowable values of λt Θ - is meaningless11 . We can observe that λt is very unstable and is really used to minimize 10

Let me just remark, that from purely mathematical point of view, three traded contracts (three different values of time to maturity) would be sufficient for the estimation. 11 To set maturity exceeding 1000 days as medium maturity is highly unprobable, as long as the longest number of maturity observed on the WTI futures market is ca. 1800 and only on small portion of observed days such long-term contracts were traded. Trivially, negative values of time to maturity are not allowed.

5. Research methodology

50

errors in approximations of term structure. Obviously, there is also no pattern observable in its series. Similar lack of pattern is observable also in corresponding values of coefficients β0t , β1t and β2t . Consequently, allowing for dynamic λt makes successful predictions hardly possible. Therefore, I have decided to find single optimal value of λ minimizing sum of squared errors of Nelson-Siegel approximation of WTI term structure over the whole period. Formally ∗

λ = argmin λ∈Θ

289 X 24 X

(yt (τi ) − yˆt (τi , β t , λ))2

(5.8)

t=1 i=1

where 289 is total number of observed points in time and 24 is number of analyzed constant maturities (from 30 to 720 days). Resulting value of λ∗ is 0,0058, implying reciprocal value of 1/λ∗ equal to 173,4551. Result of the optimization is in line with reviewed literature. Hansen & Lunde (2013), who analyzed oil futures (although in different period) arrived to λ equal to 0,005. Even if I compare my result to interpretation of 1/λ as medium maturity, 173 days seems as acceptable result.12

5.3.2

In-sample fit: Ordinary least squares

Once optimal λ∗ has been found according to the algorithm presented above, I can proceed to actual in-sample estimation of β coefficients on latent factors as formulated by Nelson & Siegel (1987). Figure 5.9 presents loadings of the factors a function of time to maturity. From Equation 5.6 it is obvious that loading on β0 , i.e. level parameter is constant for all the maturities and equal to 1. Compared to other two loadings, it does not decay for longer maturities. In other words, β0 would have impact on futures prices for all maturities (even for infinitely long maturity). Therefore, change to level factor means horizontal shift of term structure, as long as it is not dependent on time to maturity and thus will affect prices at all maturities in the same way. Loading on the slope factor is decreasing to zero as maturity goes to infinity. For zero time to maturity it is equal to 1 (please note that Figure 5.9 plots maturities starting from 30 days). Compared to the curvature factor, slope factor is higher for shorter maturites which confirms β1 12 Recall that maximum time to maturity in observed period reached only less than 2000 days = approx. 6 years, which is much less compared to 30 years in case of U.S. yield curve. In such a case authors claim 2 - 3 years to be medium maturity.

5. Research methodology

51

Figure 5.9: Loadings of Nelson-Siegel latent factors of term structure to be rather short-term factor, i.e. affecting much more prices associated with shorter maturities. On the contrary, curvature factor goes to zero for time to maturity close to zero. Then it increases and decreases also converging to zero as maturity goes to infinity. β3 has highest loadings for medium maturities. Moreover, it achieves its maximum at time to maturity equal to 1/λ.

Just recall that after fixing λ, non-linear in-sample fitting problem changed to be linear and we can estimate the coefficients by ordinary least squares. In each OLS model we have 24 observations corresponding to 24 analyzed constant maturities. Equation 5.9 formalizes the problem as min

β0 ,β1 ,β2

   2 24  X 1 − e−λτi 1 − e−λτi −λτi (5.9) pt (τi ) − β0 − β1 − β2 −e λτi λτi i=1

where pt (τi ) is WTI futures price at time t with time to maturity τi . This procedure results in obtaining time series of three β-coefficients, i.e. 289 values for each β. Statistics of the estimated coefficients are presented in Appendix A. Time series of β coefficients is plotted in Figure 5.10. At first glance behaviour of β0 - the level factor - attracts attention. Its time series exhibits similar characteristics to WTI spot price development (see Figure 2.1). Increasing level factor over whole observed period corresponds to general increase of crude oil prices. Slope and curvature coefficients seem to be in general more

5. Research methodology

52

unstable. As far as slope factor is concerned, it is quite steady in the first part of the sample, i.e. until 1999. Afterwards, until 2008, the slope factor is predominanty positive meaning that resulting term structure was downward sloping. After 2008, slope coefficient jumps to large negative values and remains negative for following two years implying upward sloping term structure. Most recent period from 2011 is characterized by positive values of β1 and decreasing term structure. Observed trends of estimated β coefficients are in line with findings made by Hansen & Lunde (2013).

Figure 5.10: Values of β coefficients from 1990 to 2014 Time series of level factor seem to be non-stationary and therefore it is appropriate to inspect sample autocorrelation functions of β coefficients plotted in Figure 5.11. For β0 sample autocorrelation is significant up to 80-th lag. This suggests possibility of unit root presence in the series. Behaviour of sample autocorrelation functions are very similar to those examined by Diebold & Li (2006).

5. Research methodology

53

Figure 5.11: Sample autocorrelation functions of level, slope and curvature coefficients Results of performed ADF test with corresponding p-values are provided in Table 5.4. Null hypothesis of unit root presence was not rejected for level parameter β0 . In case of slope and curvature coefficients null hypothesis is rejected even on 1% level of significance. Unit root in β coefficients is not rare in Nelson-Siegel model application literature. Diebold & Li (2006) detected possible unit root both in level and slope factors.

Unit root rejected p-value

β0

β1

β2

0 0,799

1 0,001

1 0,001

Table 5.4: Results of Augmented Dickey-Fuller test Further in text, autoregressive and vector-autoregressive models are applied as benchmark to forecasting using neural networks. Therefore, a researcher must be aware of estimation and forecasting issues caused by unit root. Proper discussion of unit root implications for AR and VAR models is provided in Section 5.4. In order to illustrate fit of dynamic Nelson-Siegel model, I will use term structure of the same days as in Figure 5.6. It can be observed that well behav-

5. Research methodology

54

ing term structures are fitted with high degree of accuracy. On the contrary, non-smooth term structure of May 1999 is approximated not so successfully. Referring to problems of other term structure models (described in Chapter 3), it must be granted to Nelson-Siegel model that it can approximate all of the usual shapes of term structure - increasing, decreasing or humped. Approximation of WTI futures term structure confirms conclusions made by Diebold & Li (2006). They also proved very good performance of Nelson-Siegel model in case of smooth functions taking various shapes, but in case of yield curves with multiple local extremes were approximations very inaccurate. Term structure with multiple humps might get fitted better by model proposed by Svensson (1994), which was briefly presented in Chapter 3. Generally, Nelson-Siegel model provides very accurate fit except for multi-humped and unsmooth term structures, which are rather rare in the sample.

Figure 5.12: Fitted term structures by Dynamic Nelson-Siegel model

5.4

Forecasting

Recall that futures prices as combinations of three latent factors as formulated by Nelson & Siegel (1987) for interest rates have following representation Pt = Xt βt + t

(5.10)

where Pt is vector of futures prices for each constant time to maturity, Xt is matrix of factor loadings, βt is vector of coefficients and t is error vector following standard normal distribution.

5. Research methodology

55

Therefore, task of forecasting future term structure is the same for government bonds yield curve forecasting performed by Diebold & Li (2006). In their setting, forecasting of future yield curve is done by forecasting of individual β coefficients. Diebold & Li employ parametric methods - AR(1) and VAR(1) with random walk model as a benchmark and they conclude that forecasts are most accurate using AR(1) model. This thesis provides comparison of forecasts using parametric models (AR and VAR) and artificial neural networks. Not only results are presented, but also motivation and potential threats for each approach are discussed.

5.4.1

Parametric models

Autoregressive model First, β coefficients are predicted using AR(1) model. Similiarly to Diebold & Li (2006), forecasted futures price with forecast horizon h will be specified as pˆt+h (τ ) = βˆ0,t+h + βˆ1,t+h



1 − e−λτ λτ



+ βˆ2,t+h



1 − e−λτ − e−λτ λτ

 (5.11)

where βˆi,t+h = cˆi + γˆi βˆit , i = 1, 2, 3. Coefficients cˆi and γˆi are obtained by regression of βˆi,t on βˆi,t−h and an intercept. However, as tested above one of the time series to be forecasted using AR model might have unit root. Consider general example of AR(1) model: yt = φyt−1 + t

(5.12)

Generally, presence of unit root has two implications with offsetting impacts. As far as consistency of estimator is concerned, least squares estimator of AR model is said to be superconsistent. This means that if φ = 1, then φLS converges to its real value (i.e. in case of unit root to 1) as sample increases faster than in covariance stationary case |φ| < 1. The second implication of unit root presence is Dickey-Fuller bias of the estimator. Least squares estimator is biased downwards, i.e. φLS < φ and it applies that the larger is true value φ, the larger is the bias. This means, that in case of φ = 1 for unit root, the bias is the worst. Moreover, the bias gets even larger if intercept or time trend is included

5. Research methodology

56

(Diebold 2014). Superconsitency causes the bias to vanish with increasing sample more quickly, but still can in case of smaller samples result in poor forecasts. Diebold & Li (2006) and later works replicating his research often omit to discuss threats of unit root detected in β coefficients time series. However, sample of Diebold & Li consists of roughly 180 observations, which, with regard to their conclusion of superiority of AR(1) in yield curve forecasting, seem to be sufficient. Vector autoregressive model Benchmark term structure forecasts for comparison with neural networks will be done using vector autoregressive model. Forecasts will be generated again analogously to Diebold & Li (2006) using following equation pˆt+h (τ ) = βˆ0,t+h + βˆ1,t+h



1 − e−λτ λτ



+ βˆ2,t+h



1 − e−λτ − e−λτ λτ

 (5.13)

ˆ βˆt . where contrary to AR(1) model applies βˆt+h = cˆ + Γ In case of autoregressive model, issues implied by potential unit root presence in one of the series are not so severe. However, VAR models are experiencing other problems. First, unrestricted VAR models do generally perform quite poorly in forecasting tasks (Diebold 2014). Poor performance is caused mainly by danger of overparametrization due to large number of parameters to be estimated. Further, in concrete case of yield curve modeling Diebold & Li (2006) concluded also AR(1) model to signifinantly outperforms VAR(1). However, it must be noted that Diebold & Li detected only small cross correlation among the β coefficients time series. I have found significant cross correlation between level and curvature factors and also between slope and curvature (see Figure A.1 in Appendix A), which might promise better performance of VAR model than in analysis done by Diebold & Li.

5.4.2

Dynamic neural network

As mentioned earlier in text, artificial neural networks are also suitable for time series forecasting tasks. AR and VAR models assume that the underlying series are generated by linear process. This is not the case of ANN, as they do not require any assumptions about statistical properties of underlying series for

5. Research methodology

57

their proper application. Recall that neural networks are so called data-driven methods capable to learn from the data any relationships which are a priori hard to be defined. In other words, if there is a doubt about functional relationship between explanatory and explained variables and there is a possibility of non-linearity of the relationship - which is multplicatively more probable than linearity of the examined relationship (Zhang et al. 1998) - it makes sense to employ nonparametric methods such as artificial neural networks. ANN is used for forecasting of β coefficients time series. This task performs following functional mapping βˆt = f (βt−1 , βt−2 , ..., βt−k , W ) + t

(5.14)

where k is the furthest lag included (choice of k will be properly discussed later in text), k-dimensional vector of weights and f is the function defining relationship of input and output nodes using respective weights. As presented also earlier in text, opinions about ANN are diverse, which applies also to concrete case of forecasting using ANN. Hill et al. (1996) concluded that artficial neural networks outperform standard statistical approaches when applied on lower-frequency data (monthly and quarterly). Other authors reject their findings (see Vela (2013) for review). However, Hamza¸cebi et al. (2009) conclude that best method for time series forecasting depends significantly on the concrete underlying dataset. Generally, time series forecasting using artificial neural networks should follow the following procedure:

1. Inspect underlying time series, plot autocorrelation function and detect number of lags correlated to current value. 2. Split the data to in-sample training data and out-of-sample testing data. 3. Take k significant lags as input nodes and current value as target value for output node, include also some hidden nodes (rather less than more - see the discussion about hidden layer in Chapter 2).

5. Research methodology

58

4. Create training patterns from data of training set. First training pattern will consist of first k values as inputs and (k + 1)-th value as target value. Second pattern will be formed by shifting preceeding pattern one observation forward, etc. Last pattern will have the last value of training set as target value and preceeding k values as inputs. 5. Train the network accroding to specified criteria. 6. Test the trained network on out-of-sample data. Feed k consecutive values of testing set and predict the (k + 1)-th. 7. Compute and report forecast errors.

There is numberous literature focusing on financial time series forecasting by means of ANN. Majority of found literature was published before 2000, which is with regards to ANN and relevant software development a bit outdated. All the works employ static predominantly feed-forward neural networks, which are universal to all ANN tasks like classification, patter recognition and also forecasting. However, nowadays more advanced and for time series forecasting more suitable methods exist. One of such methods is focused time-delay neural network (FTDNN) belonging to dynamic neural networks family. Focused time-delay neural network Next to standard parametric models, β coefficients will be also forecasted by dynamic neural network, to be concrete by focused time-delay neural network. DiPiazza et al. (2013) summarizes key advantages of dynamic neural network for time series forecasting. DNN are capable to learn dynamics of time series relationships more effectively than static feedforward networks. This is achieved by the fact that not only current inputs impact current output, but output of DNN depends also on previous inputs, outputs and network states. FTDNN is referred to as the most straightforward dynamic neural network (Demuth et al. 2008). Structure of this type of network is on Figure 5.13. The network consists of set of feedforward networks with tapped delay line capturing autoregressive property of inspected series. Dynamic feature is implemented in the input layer. Figure 5.13 shows only structure for one-step

5. Research methodology

59

ahead forecast. For multi-step ahead forecasts, the outputs are connected back to input layer and serve as another input for further step forecast.

Source: DiPiazza et al. (2014)

Figure 5.13: Mechanism of focused time-delay neural network

Construction of the FTDNN In Chapter 4, recommended procedure of neural networks design by Kaastra & Boyd (1996) was presented. First, variables have to be defined and data must be collected. In my task of three time series forecasting, variables are current and lagged values of series to be forecasted and total dataset consists of three time series of β coefficients estimated by Nelson-Siegel model. Each β coefficient will be forecasted by seperate network. Further, data are pre-processed. In my case, the only pre-processing consists of normalize all three time series to interval h−1, 1i. This mapping makes training task easier and enables faster convergence to global optimum (Demuth et al. 2008). In-sample (training and validation) and out-of-sample (testing) datasets have to be defined. In case of parametric models, ratio 80:20 was used. Therefore, splitting rule was determined as 60:20:20 (training:validation:testing).

5. Research methodology

60

Last 20% of observations of term structures will be used for comparison of forecasts made by diverse models. Next step is setting of concrete structure including number of layers and corresponding nodes. Input layer consists of k lags of relevant for forecast, where k can be determined by inspecting respective sample autocorrelation function. In order to retain comparability of forecast results with AR(1) and VAR(1) models, I will start with involving only one lag. As Kaastra & Boyd (1996) point out, involving multiple hidden layers does not generate any additional benefit. Therefore, only one hidden layer will be included. This single hidden layer will be rich and consist of up to 20 hidden neurons. As far as optimal number of hidden neurons is concerned, there is no clear consensus in relevant literature. Generally, the higher is the number of hidden units, the higher is the probability of convergence to global optimum, but also the higher is the risk of overparametrization implying worse out-of-sample performance and excess computation time (Kaastra & Boyd 1996). It is recommended to find optimal number of hidden neurons using trial-and-error method. Output neuron is l-th step-ahead forecast of particular β coefficient - 1-month, 3-month, 6-,month and 12-month ahead forecasts have been examined. Final decision about network structure was made accoring to Hannan-Quinn information criterion is used as recommended by McNelis (2005): " hqic = ln

N X (βt − βˆt )2 t=1

N

!# +

k(ln(ln(N ))) N

(5.15)

Hannan-Quinn is compromise criterion between Akaike and Schwartz information criteria. HQIC punishes networks with excess number of parameters more than AIC but less than SIC. For forecast evaluation itself, this thesis follows again McNelis (2005) and uses the most widely used measure RMSE. As far as transfer functions inside the network are concerned, hyperbolic tangent sigmoid transfer function and linear transfer function were used for hidden and output layer, respectively. This network further uses LevenbergMarquardt backpropagation as its training algorithm. Training stops achieving one of the following criteria, whichever comes first:

5. Research methodology

61

ˆ 1000 epochs were reached. ˆ Target performance (mean squared error) of 1,00E-26 was achieved. ˆ Performance gradient is lower than 1,00E-10. ˆ µ is higher than 1,00E+10 (µ controls by how much do weights change

on each iteration and regulates suitable speed of convergence). ˆ Performance of validatione has risen more than 6 times since the last time

the performance decreased. Final step in Kaastra’s recommended procedure is network implementation. Focused time-delay neural network with above described specifications was implemented using newfftd function from Neural Network Toolbox— of MATLAB® software.

Chapter 6 Discussion of results As already mentioned, forecasting was performed by four approaches - focused time-delay neural network, AR(1), VAR(1) and random walk models. Both one-step-ahead and multi-step-ahead predictions have been inspected. Forecast average RMSE over all maturities for each horizon and applied model is summarized in Table 6.11 . Bold figures indicate lowest value for specified forecasting horizon. The lowest average RMSE was for each forecasting horizon achieved by FTDNN. Second-best forecasting model is AR(1) model, which confirms conclusions made by Diebold & Li (2006) about superiority of AR(1) model over other parametric models. The authors concluded also poor results of VAR(1) model, which is also confirmed here, and even random walk model outperforms unrestricted vector autoregression. For completeness of the analysis, forecast errors across all maturities were subjected to Diebold-Mariano test. Indicative test proposed by Diebold & Mariano (2002) tests difference between forecasts made by two competing models. Test has three inputs - two vectors of forecast errors and forecasting horizon and null hypothesis of equivalence of forecasts. Resulting test statistics are summarized in Table A.2 in Appendix. Tables A.3 and A.4 in Appendix A provide with summary of forecast performance for individual maturities. RMSE is for each model decreasing as time to maturity increases. This observation is in line with stylized facts about term structure, concretely that near-end of the term structure is much more volatile than the far-end. Furhter, there is general trend of increasing RMSE for more distant forecasts, this is also in line with what would be expected. 1

Errors of FTDNN are averages of 50 simulations.

6. Discussion of results

63

As regards comparison of individual models performance, in case of Horizon

FTDNN

AR(1)

VAR(1)

RW

1 month 3 months 6 months 12 months

4,398 6,077 6,425 7,881

4,708 7,572 8,868 7,947

4,971 8,060 10,362 11,487

4,772 7,952 10,140 9,841

Table 6.1: Average RMSE accross all constant maturities one-month-ahead forecast, FTDNN outperforms all other models for each maturity. Same conclusion applies for 3-month-ahead forecast, where the difference in RMSE is even more visible, and even stronger superiority of FTDNN is proved in 6-month-ahead forecasts. This forecasting horizon exhibits considerable increase in RMSE for parametric models, especially for shorter maturities. However, in case of 1-year-ahead forecasts, AR(1) model slightly outperforms FTDNN in medium-term maturities (concretely in range from 180 to 510 days). Further, interesting observation can be made when comparing performances of AR(1) and random walk between 6-month and 12-month-predictions. For both models applies that for short maturities RMSE of 6-month forecast is higher than for 12-month. For longer maturities vice versa, which would be generally expected. RMSE of FTDNN forecasts are monotonically increasing as forecasting horizon increases. Recall, that reported figures are errors of FTDNN with only one delayed input. Originally, the network was constructed according to procedure presented in Section 5.4.2. This would suggest to use 80 lagged values as inputs, which yields extremely accurate forecasts. Overview of RMSE as function of number of lags involved in the network is provided in Table A.5 in Appendix A. Despite including dozens of lagged values as network inputs results in almost zero errors, final network includes only single lag. Refraining from the structure reflecting all significant lags detected in ACF has following motivation. First, excess number of network parameters gets punished by Hannan-Quinn information criterion used for model selection. Further, because this thesis applies as its benchmark methodology by Diebold & Li (2006), who uses AR(1) and VAR(1) models, it would be “unfair” to compare neural network with 80 lags with single-lag parametric models. Nevertheless, FTDNN with only single lagged value at input on average outperformed all benchmark models. Ca-

6. Discussion of results

64

pability of the FTDNN to explain variances in target values of individual β coefficients is provided in Figure A.2 in Appendix A. All R-values are above 90% indicating that all the network outputs track the respective targets very well. Table A.4 summarizes mean errors of individual forecast models and maturities. This indicator enables to inspect each model’s propension to overor under-estimate future term structure. FTDNN generally slightly underestimates future term structure except for 12-month-ahead forecasts. On the contrary, AR(1) models on average overestimates it and tendencies of VAR and RW model are not so clearly interpretable. Example of 1-month-ahead forecast is provided in Figure 6.1. Obviously, FTDNN predicts future term structure most accurately. In medium-term maturities are forecasts by AR(1) and FTDNN very similar. However, best forecast for the shortest maturity was made by VAR(1) models, but with increasing time to maturity, is its forecast error fast increasing.

Figure 6.1: One-month-ahead forecast for 02/29/2014

Chapter 7 Conclusion Purpose of this thesis was to investigate properties of term structure of crude oil markets and its forecasting using alternative approaches - concretely dynamic neural networks. Analysis of crude oil futures prices has shown that their term structure exhibits very similar behaviour to government bonds yield curve, which was described by stylized facts summarized in Diebold & Li (2006). Commodity term structure modeling task is conceptually the same as in case of interest rates. Therefore, literature has been reviewed in order to select a model, which is well performing in yield curve forecasting. Term structure was modeled using a three-factor approach proposed by Nelson & Siegel (1987) and dynamically re-interpreted by Diebold & Li (2006). Results of in-sample fit are very encouraging and prove ability of the model to replicate true behaviour of term structure also in case of crude oil markets. Forecasting has been conducted using both parametric and non-parametric approaches. Performance of the models has been inspected in 1-month, 3month, 6-month and 12-month forecasting horizons. AR(1) model surpassed other parametric models, which again confirmed findings in relevant literature. However, focused time-delay neural network generally outperformed all parametric models, except for 12-months-ahead forecast of medium-term maturities, which was dominated by AR(1) model. In case of FTDNN, forecasting errors specified as RMSE have traceable patterns. For fixed forecasting horizon, the deviation between forecasted and

7. Conclusion

66

observed futures price decreases as time to maturity increses. Furthermore, for more distant forecast horizons the deviation on average expectedly increases. In summary, this thesis has shown that crude oil term structure can be successfully modeled by parsimonious Nelson-Siegel model primarily developed for interest rates. Moreover, neural networks surpassed parametric methods in term structure forecasting task, which confirmed core hypothesis of this thesis.

Bibliography ´, Z. & T. Poklepovic ´ (2013): “Neural networks and vector auAljinovic toregressive model in forecasting yield curve.” In “ICIT 2013 The 6th International Conference on Information Technology,” . Bahrammirzaee, A. (2010): “A comparative survey of artificial intelligence applications in finance: artificial neural networks, expert system and hybrid intelligent systems.” Neural Computing and Applications 19(8): pp. 1165– 1195. ´rin, & L. Kilian (2013): “Do high-frequency finanBaumeister, C., P. Gue cial data help forecast oil prices? the midas touch at work.” Technical report, CFS Working Paper. BIS (2005): Zero-coupon yield curves: technical documentation. Bank for International Settlements. ¨ rk, T. & B. J. Christensen (1999): “Interest rate dynamics and consisBjo tent forward rate curves.” Mathematical Finance 9(4): pp. 323–348. Bliss, R. R. (1996): “Testing term structure estimation methods.” Technical report, Federal Reserve Bank of Atlanta. Brennan, M. J. & E. S. Schwartz (1982): “An equilibrium model of bond pricing and a test of market efficiency.” Journal of Financial and quantitative analysis 17(03): pp. 301–329. Brennan, M. J. & E. S. Schwartz (1985): “Evaluating natural resource investments.” Journal of business 58(2): pp. 135–157. ¨ yu ¨ ksahin, B. & J. H. Harris (2011): “Do speculators drive crude oil Bu futures prices?” Energy Journal 32(2): pp. 167–202.

Bibliography

68

Cortazar, G. & E. S. Schwartz (2003): “Implementing a stochastic model for oil futures prices.” Energy Economics 25(3): pp. 215–238. Cox, J. C., J. E. Ingersoll Jr, & S. A. Ross (1985): “A theory of the term structure of interest rates.” Econometrica: Journal of the Econometric Society 53: pp. 385–407. Demuth, H., M. Beale, & M. Hagan (2008): Neural network toolbox— 6 User’s Guide. The MathWorks, Inc. Diebold, F. X. (2014): Econometrics. University of Pennsylvania. Diebold, F. X. & C. Li (2006): “Forecasting the term structure of government bond yields.” Journal of econometrics 130(2): pp. 337–364. Diebold, F. X. & R. S. Mariano (2002): “Comparing predictive accuracy.” Journal of Business & economic statistics 20(1): pp. 134–144. DiPiazza, A., M. C. DiPiazza, & G. Vitale (2013): “Solar radiation estimate and forecasting by neural networks-based approach.” In “XIII SpanishPortuguese Conference on Electrical Engineering (XIII CHLIE), Valencia,” . DiPiazza, A., M. C. DiPiazza, & G. Vitale (2014): “Estimation and forecast of wind power generation by ftdnn and narx-net based models for energy management purpose in smart grids.” In “Renewable Energy and Power Quality Journal,” . Duffie, D. & R. Kan (1996): “A yield-factor model of interest rates.” Mathematical finance 6(4): pp. 379–406. Fattouh, B., L. Kilian, & L. Mahadeva (2012): “The role of speculation in oil markets: What have we learned so far?” . Gabillon, J. (1991): “The term structures of oil futures prices.” Oxford Institute for Energy Studies. Working paper . Ghaffari, A., H. Abdollahi, M. Khoshayand, I. S. Bozchalooi, A. Dadgar, & M. Rafiee-Tehrani (2006): “Performance comparison of neural network training algorithms in modeling of bimodal drug delivery.” International journal of pharmaceutics 327(1): pp. 126–138.

Bibliography

69

Hagan, P. S. & G. West (2006): “Interpolation methods for curve construction.” Applied Mathematical Finance 13(2): pp. 89–129. Hamilton, J. D. (2009): “Causes and consequences of the oil shock of 200708.” Technical report, National Bureau of Economic Research. Hamzac ¸ ebi, C., D. Akay, & F. Kutay (2009): “Comparison of direct and iterative artificial neural network forecast approaches in multi-periodic time series forecasting.” Expert Systems with Applications 36(2): pp. 3839–3844. Hansen, N. S. & A. Lunde (2013): “Analyzing oil futures with a dynamic nelson-siegel model.” Creates research papers, School of Economics and Management, University of Aarhus. Haubrich, J. G., P. Higgins, & J. Miller (2004): “Oil prices: backward to the future?” Economic Commentary (Dec). Heath, D., R. Jarrow, & A. Morton (1992): “Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation.” Econometrica: Journal of the Econometric Society pp. 77–105. Hill, T., M. O’Connor, & W. Remus (1996): “Neural network models for time series forecasts.” Management science 42(7): pp. 1082–1092. Ho, T. S. & S.-B. Lee (1986): “Term structure movements and pricing interest rate contingent claims.” The Journal of Finance 41(5): pp. 1011–1029. Holton, G. A. (2003): Value-at-risk: theory and practice. Academic Press. Hotelling, H. (1931): “The economics of exhaustible resources.” The Journal of Political Economy pp. 137–175. Hull, J. & A. White (1990): “Pricing interest-rate-derivative securities.” Review of financial studies 3(4): pp. 573–592. ˆ , K. (1944): “Stochastic integral.” Proc. Imp. Acad. 20(8): pp. 519–524. Ito Kaastra, I. & M. Boyd (1996): “Designing a neural network for forecasting financial and economic time series.” Neurocomputing 10(3): pp. 215–236. Kaldor, N. (1940): “A note on the theory of the forward market.” The Review of Economic Studies 7(3): pp. 196–201.

Bibliography

70

Keynes, J. M. (1930): A Treatise on Money: The Applied Theory of Money, Vol. 2. London: Macmillan. Kilian, L. & D. P. Murphy (2014): “The role of inventories and speculative trading in the global market for crude oil.” Journal of Applied Econometrics 29(3): pp. 454–478. Lautier, D. (2005): “Term structure models of commodity prices: a review.” Technical report, CEREG. Litzenberger, R. H. & N. Rabinowitz (1995): “Backwardation in oil futures markets: Theory and empirical evidence.” The journal of Finance 50(5): pp. 1517–1545. Longstaff, F. A. & E. S. Schwartz (1992): “Interest rate volatility and the term structure: A two-factor general equilibrium model.” The Journal of Finance 47(4): pp. 1259–1282. Martellini, L., P. Priaulet, & S. Priaulet (2003): Fixed-Income Securities: Valuation, Risk Management and Portfolio Strategies. The Wiley Finance Series. Wiley. McCulloch, J. H. (1975): “The tax-adjusted yield curve.” The Journal of Finance 30(3): pp. 811–830. McNelis, P. (2005): Neural Networks In Finance: Gaining Predictive Edge in the Market. Academic Press Advanced Finance Series. Elsevier Acad. Press. Merton, R. C. (1973): “Theory of rational option pricing.” The Bell Journal of economics and management science pp. 141–183. Moore, G. E. et al. (1965): “Cramming more components onto integrated circuits.” Nelson, C. R. & A. F. Siegel (1987): “Parsimonious modeling of yield curves.” Journal of business 60(4): pp. 473–489. Pan, H., I. Haidar, & S. Kulkarni (2009): “Daily prediction of short-term trends of crude oil prices using neural networks exploiting multimarket dynamics.” Frontiers of Computer Science in China 3(2): pp. 177–191. Pooter, M. D. (2007): “Examining the nelson-siegel class of term structure models.” Technical report, Tinbergen Institute Discussion Paper.

Bibliography

71

Rumelhart, D. E., G. E. Hinton, & R. J. Williams (1985): “Learning internal representations by error propagation.” Technical report, DTIC Document. Sarker, R. A., J. Kamruzzaman, & R. Begg (2006): Artificial Neural Networks in Finance and Manufacturing. IGI Global. Schwartz, E. S. (1997): “The stochastic behavior of commodity prices: Implications for valuation and hedging.” The Journal of Finance 52(3): pp. 923–973. Sexton, R. S., R. E. Dorsey, & J. D. Johnson (1998): “Toward global optimization of neural networks: a comparison of the genetic algorithm and backpropagation.” Decision Support Systems 22(2): pp. 171–185. Shaeffer, S. M. & E. S. Schwartz (1984): “A two-factor model of the term structure: An approximate analytical solution.” Journal of Financial and Quantitative analysis 19(04): pp. 413–424. Steeley, J. M. (2008): “Testing term structure estimation methods: Evidence from the uk strips market.” Journal of Money, Credit and Banking 40(7): pp. 1489–1512. Svensson, L. E. (1994): “Estimating and interpreting forward interest rates: Sweden 1992-1994.” Technical report, National Bureau of Economic Research. Trippi, R. R. & E. Turban (1992): Neural Networks in Finance and Investing: Using Artificial Intelligence to Improve Real World Performance. McGraw-Hill, Inc. Vasicek, O. (1977): “An equilibrium characterization of the term structure.” Journal of financial economics 5(2): pp. 177–188. Vasicek, O. A. & H. G. Fong (1982): “Term structure modeling using exponential splines.” The Journal of Finance 37(2): pp. 339–348. Vela, D. (2013): “Forecasting latin-american yield curves: An artificial neural network approach.” Borradores de Economia 761, Banco de la Republica de Colombia.

Bibliography

72

White, H. (1992): Artificial neural networks: approximation and learning theory. Blackwell Publishers, Inc. Yu, L., S. Wang, & K. K. Lai (2008): “Forecasting crude oil price with an emd-based neural network ensemble learning paradigm.” Energy Economics 30(5): pp. 2623 – 2635. Zhang, G., B. Eddy Patuwo, & M. Y Hu (1998): “Forecasting with artificial neural networks: The state of the art.” International Journal of Forecasting 14(1): pp. 35–62. Zhang, G. P. (2012): “Neural networks for time-series forecasting.” In G. e. a. Rozenberg (editor), “Handbook of Natural Computing,” chapter 14, pp. 461–477. Springer Berlin Heidelberg.

Appendix A Figures and Tables β0

β1

β2

N 289 289 Min 11,356 -36,377 Max 130,531 34,801 Mean 44,524 3,373 Q1 19,636 -1,315 Median 22,678 2,075 Q3 76,548 8,507 Std Dev 30,493 10,011 Variance 929,824 100,225 Std Error 1,794 0,589

289 -22,234 40,221 3,282 -3,660 0,633 7,190 10,031 100,630 0,590

Table A.1: DNS coefficients statistics

Figure A.1: Sample cross correlations of β coefficients time series

A. Figures and Tables

II

FTDNN versus: Horizon AR(1) 1-month -5,4135** 3-month 2,4505** 6-month -1,3262* 12-month -7,7514**

VAR(1) RW 1,9187** -6,4373** 2,1143** -0,7493 1,0110 -5,011** -7,7963** -7,5536**

Table A.2: Value of Diebold-Mariano test statistics (* = 5%, ** = 10% levels of significance)

Figure A.2: Plot of regression of outputs and targets

FTDNN 1 month 3 months 6 months 12 months AR (1) 1 month 3 months 6 months 12 months VAR (1) 1 month 3 months 6 months 12 months RW(1) 1 month 3 months 6 months 12 months

RMSE

5,269 6,845 7,132 8,969

150

180

210

4,719 7,597 8,862 7,560

4,357 5,933 6,300 7,845

330

4,596 7,402 8,668 7,424

4,237 5,827 6,202 7,701

360

4,481 7,225 8,492 7,308

4,129 5,738 6,118 7,562

390

Time to maturity

4,399 7,080 8,334 7,216

4,029 5,663 6,039 7,423

420

4,316 6,933 8,171 7,135

3,938 5,583 5,956 7,281

450

4,230 6,777 8,003 7,067

3,853 5,506 5,889 7,148

480

4,139 6,618 7,849 7,023

3,766 5,436 5,808 7,027

510

4,589 7,629 9,782 9,473

4,498 7,459 9,597 9,334

4,387 7,276 9,382 9,198

4,283 7,079 9,158 9,075

4,177 6,881 8,940 8,957

Table A.3: RMSE of forecasts for individual maturities

4,806 8,031 10,210 9,777

4,689 7,819 9,981 9,617

4,847 7,828 9,093 7,739

4,460 6,044 6,391 7,992

300

6,253 6,145 5,994 5,837 5,663 5,513 5,359 5,211 5,061 4,909 10,560 10,296 10,021 9,754 9,509 9,268 9,027 8,767 8,505 8,257 12,961 12,769 12,476 12,164 11,918 11,655 11,389 11,111 10,800 10,500 12,462 12,032 11,633 11,267 10,990 10,749 10,540 10,351 10,158 9,959

4,989 8,039 9,334 7,930

4,633 6,195 6,537 8,188

270

4,858 4,752 4,668 4,591 4,518 4,439 7,839 7,671 7,527 7,387 7,249 7,105 9,996 9,769 9,562 9,357 9,159 8,971 10,863 10,721 10,605 10,506 10,429 10,384

5,140 8,279 9,594 8,157

4,789 6,336 6,669 8,371

240

6,289 6,204 6,085 5,945 5,784 5,641 5,501 5,366 5,226 5,080 4,974 10,420 10,257 10,014 9,739 9,470 9,206 8,960 8,715 8,475 8,251 8,033 14,031 13,655 13,205 12,739 12,319 11,927 11,572 11,233 10,884 10,553 10,254 15,069 14,409 13,800 13,246 12,800 12,383 12,028 11,735 11,468 11,244 11,024

5,285 8,522 9,837 8,411

5,472 7,074 7,401 9,262

120

6,115 6,031 5,908 5,756 5,594 5,438 9,864 9,728 9,510 9,256 9,016 8,764 11,291 11,129 10,873 10,587 10,334 10,079 10,846 10,412 9,933 9,466 9,074 8,714

5,663 7,331 7,706 9,595

90 4,946 6,487 6,801 8,543

5,852 7,601 8,048 9,960

60 5,103 6,653 6,940 8,739

6,029 7,886 8,380 10,298

30

3,949 6,320 7,583 6,992

3,595 5,328 5,683 6,838

570

4,087 6,698 8,707 8,846

3,993 6,543 8,530 8,742

4,362 4,267 6,962 6,832 8,788 8,634 10,356 10,356

4,035 6,443 7,691 6,995

3,696 5,381 5,760 6,935

540

3,803 6,123 7,455 7,041

3,496 5,346 5,645 6,732

630

3,765 6,079 7,412 7,071

3,498 5,385 5,656 6,687

660

3,767 6,047 7,354 7,098

3,571 5,446 5,723 6,660

690

3,811 6,078 7,302 7,103

3,640 5,511 5,748 6,599

720

3,922 6,425 8,408 8,675

3,846 6,340 8,335 8,629

3,801 6,282 8,277 8,593

3,785 6,244 8,200 8,572

3,726 6,176 8,114 8,550

4,200 4,135 4,104 4,133 4,171 6,719 6,649 6,625 6,640 6,693 8,524 8,454 8,404 8,365 8,326 10,378 10,412 10,445 10,488 10,541

3,869 6,198 7,502 7,011

3,540 5,325 5,660 6,786

600

A. Figures and Tables III

30

60

FTDNN 1 month -0,304 -0,311 3 months 0,695 0,733 6 months -0,148 -0,109 12 months 6,063 5,892 AR (1) 1 month 0,343 0,494 3 months 1,122 1,260 6 months 4,016 4,006 12 months 5,697 5,461 VAR (1) 1 month 0,061 0,073 3 months 2,706 2,461 6 months 8,666 8,074 12 months 10,267 9,408 RW 1 month -0,044 -0,052 3 months -0,058 -0,130 6 months 1,244 1,059 12 months 3,447 2,924

ME

120

150

180

210

240

270

300

330

360

390

Time to maturity 420

450

480

510

540

570

600

630

660

690

720

0,559 1,294 3,753 4,816

0,552 1,259 3,587 4,471

0,521 1,192 3,395 4,098

0,495 1,081 3,052 3,399

0,470 1,009 2,880 3,058

0,457 0,949 2,728 2,748

0,401 0,843 2,532 2,395

0,375 0,766 2,370 2,088

0,349 0,689 2,216 1,792

0,335 0,625 2,084 1,527

0,314 0,560 1,952 1,266

0,285 0,489 1,816 1,003

0,262 0,424 1,689 0,760

0,230 0,351 1,555 0,517

0,247 0,330 1,478 0,346

0,234 0,279 1,371 0,148

0,220 0,228 1,269 -0,041

0,181 0,152 1,147 -0,252

0,097 0,035 0,033 -0,061 0,982 0,846 -0,508 -0,732

-0,360 -0,420 -0,490 -0,541 -0,631 -0,686 -0,737 -0,772 -0,811 -0,856 -0,893 -0,937 -0,930 -0,953 -0,975 -1,021 -1,111 -1,178 1,024 0,798 0,571 0,373 0,145 -0,041 -0,215 -0,366 -0,509 -0,651 -0,779 -0,910 -0,984 -1,083 -1,177 -1,292 -1,448 -1,576 5,218 4,764 4,331 3,945 3,538 3,185 2,859 2,569 2,295 2,028 1,783 1,540 1,363 1,165 0,979 0,778 0,542 0,339 5,461 4,808 4,185 3,621 3,038 2,523 2,040 1,603 1,186 0,781 0,409 0,046 -0,234 -0,532 -0,814 -1,110 -1,446 -1,743

0,501 1,132 3,215 3,738

Table A.4: ME of forecasts for individual maturities

-0,056 -0,058 -0,063 -0,067 -0,070 -0,072 -0,076 -0,079 -0,082 -0,083 -0,084 -0,088 -0,091 -0,092 -0,091 -0,090 -0,091 -0,095 -0,105 -0,121 -0,147 -0,184 -0,167 -0,196 -0,221 -0,245 -0,263 -0,276 -0,286 -0,296 -0,308 -0,319 -0,330 -0,342 -0,350 -0,354 -0,355 -0,355 -0,356 -0,362 -0,378 -0,408 -0,456 -0,524 0,929 0,820 0,724 0,632 0,548 0,473 0,410 0,358 0,308 0,255 0,202 0,155 0,115 0,081 0,049 0,024 0,002 -0,024 -0,057 -0,105 -0,180 -0,293 2,501 2,124 1,823 1,515 1,227 0,968 0,740 0,571 0,365 0,200 0,040 -0,091 -0,217 -0,349 -0,464 -0,593 -0,653 -0,744 -0,832 -0,943 -1,103 -1,237

0,017 -0,081 -0,174 -0,278 2,179 1,867 1,575 1,283 7,467 6,849 6,277 5,722 8,564 7,719 6,936 6,171

0,557 1,313 3,914 5,169

-0,342 -0,413 -0,462 -0,533 -0,590 -0,627 -0,671 -0,693 -0,767 -0,805 -0,841 -0,859 -0,884 -0,923 -0,952 -0,993 -0,972 -0,988 -1,005 -1,053 -1,155 -1,233 0,730 0,673 0,628 0,553 0,486 0,433 0,370 0,327 0,230 0,168 0,106 0,063 0,012 -0,051 -0,105 -0,169 -0,171 -0,209 -0,248 -0,316 -0,439 -0,535 -0,119 -0,190 -0,254 -0,352 -0,445 -0,525 -0,617 -0,689 -0,815 -0,906 -0,996 -1,067 -1,144 -1,233 -1,311 -1,400 -1,425 -1,485 -1,544 -1,632 -1,773 -1,887 5,688 5,437 5,206 4,954 4,718 4,506 4,292 4,107 3,876 3,688 3,508 3,354 3,199 3,038 2,893 2,742 2,659 2,545 2,435 2,300 2,114 1,959

90

A. Figures and Tables IV

A. Figures and Tables

V

β0 forecast Lags 1 10 20 30 40 50 60 70 80

RMSE 2,255 1,858 0,9424 0,8623 0,2574 2,32E-13 1,58E-13 3,92E-13 5,76E-14

Table A.5: RMSE as function of number of lags fed to FTDNN

Appendix B Content of Enclosed DVD There is a DVD enclosed to this thesis which contains empirical data and Matlab source codes. ˆ Folder 1: Data

– Folder 1.1: For final use – Folder 1.2.: Raw data ˆ Folder 2: Matlab source codes

– Folder 2.1: Forecasting – Folder 2.2: Nelson-Siegel

Suggest Documents