Development and Testing of a Method for Forecasting Prices of Multi-Storey Buildings during the Early Design Stage: the Storey Enclosure Method Revisited

Franco Kai Tak Cheung

Doctor of Philosophy School of Construction Management and Property Queensland University of Technology 2005

To Ritz and my parents

i

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or diploma at any other higher education institution.

To the best of my

knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.

Signature:

Date:

ii

Abstract Although design decisions that are made in the preliminary design stages of a building are more cost sensitive than those that are made at later stages, previous research suggests that they result in only a slight improvement in the accuracy of building price forecasts as the design develops. However, established conventional forecasting methods lack measures of their own performance, which has inhibited the development of simpler early-stage techniques. One early-stage price forecasting model, the Storey Enclosure Method, which was developed by James in 1954, uses the basic physical measurements of buildings to estimate building prices. Although James’ Storey Enclosure Model (JSEM) is not widely used in practice, it has been proved empirically, if rather crudely, to be a better model than other commonly used models. This research aims firstly to advance JSEM by using regression techniques and secondly to develop an objective approach for the assessment of model performance. To accomplish the first research aim, this research uses data from 148 completed Hong Kong projects for four types of building: offices, private housing, nursing homes, and primary and secondary schools. Sophisticated features of the modelling exercise include the use of leave-one-out cross validation to simulate the way in which forecasts are produced in practice and a dual stepwise selection strategy that enhances the chance of identifying the best model. Two types of

iii regressed models from different candidate sets, the Regressed Model for James’ Storey Enclosure Method (RJSEM) and Regressed Model for Advanced Storey Enclosure Method (RASEM), are developed accordingly. In considering the RJSEM, RASEM, and the most commonly used alternative early stage floor area and cube models, all of the models except JSEM are found to be unbiased. The RJSEM and RASEM models are also examined for their consistency using a structured approach that involves the use of both parametric and non-parametric inference tests. This shows that although the RASEMs for different building types are generally more consistent, they are not significantly better than the other models. Finally, the combination of the forecasts that are generated from different models to capture the different aspects of information from the models is suggested as an alternative strategy for improving forecasting performance.

iv

Acknowledgements I am indebted to the following people for their time, help and contribution to the production of this thesis. Great appreciation is due to my former team mates at Levett & Bailey Ltd., including Hon Kong Yu, who allowed me to access the data; See Ping Wong, who provided me with a valuable insight into estimating practices; and Anselm Chow, who gave detailed explanations of the company’s recording system, and answered all of my queries. Gratitude is expressed to Dr. Derek Drew for suggesting the research topic and for supervising this research project. Without his crucial suggestions, this research would never have been started. Thanks are due to all my colleagues at the City University of Hong Kong for their support, and in particular to Professor Andrew Leung, Dr. S. M. Lo and Dr. S O Cheung for their encouragement and advice, to Dr. Raymond Lee for introducing me to the mathematical software Mathcad and to Dr. Eric Lee for giving me private lessons on resampling methods. Special acknowledgment is given to Dr. H P Lo, my local supervisor in Hong Kong, who gave pointed me in the right direction for many of the statistical problems encountered. His advice on the choice of techniques and proper mathematical

v interpretation was particularly helpful.

His patience in correcting my thinking is

greatly appreciated. I am indebted to Professor Martin Skitmore for many things, such as his extensive assistance, superb guidance, sharp advice, incredible patience and prompt responses to my queries throughout this study.

The time and effort that he spent

discussing my research during the occasion of his visit to Hong Kong and my stays at QUT are highly appreciated. Without his guidance and advice, I would not have been able to proceed and bring the research to completion.

vi

Table of Contents STATEMENT OF ORIGINAL AUTHORSHIP................................................................................ I ABSTRACT......................................................................................................................................... II ACKNOWLEDGEMENTS............................................................................................................... IV TABLE OF CONTENTS................................................................................................................... VI LIST OF FIGURES ........................................................................................................................... IX LIST OF TABLES ...............................................................................................................................X CHAPTER 1

INTRODUCTION ....................................................................................................1

CHAPTER 2

COST FORECASTING IN PRACTICE: A REVIEW .........................................8

2.1

INTRODUCTION .....................................................................................................................8

2.2

BUILDING ECONOMICS .......................................................................................................10

2.3

COST PLANNING AND CONTROL .........................................................................................10

2.4

COST FORECASTING IN THE COST PLANNING AND CONTROL PROCESS ...............................11

2.5

DESIGN PROCESS AND DESIGNERS’ FORECASTS .................................................................14

2.6

EARLY STAGE FORECASTING IN PRACTICE .........................................................................17

2.7 PROBLEMS OF EXISTING FORECASTING PRACTICE ..............................................................20 2.7.1 Misconception of the relationship between level of detail and forecasting accuracy ...20 2.7.2 Lack of theoretical background.....................................................................................21 2.7.3 Lack of performance evaluation....................................................................................22 2.7.4 Inexplicability, unrelatedness and determinism ............................................................23 2.8

SUMMARY ..........................................................................................................................24

CHAPTER 3

DEVELOPMENT OF FORECASTING MODELS ............................................26

3.1

INTRODUCTION ...................................................................................................................26

3.2

DEFINITION OF COST MODEL ..............................................................................................27

3.3 BRANDON’S “PARADIGM SHIFT” ........................................................................................29 3.3.1 Black box versus realistic models .................................................................................31 3.3.2 Deterministic versus stochastic models.........................................................................32 3.3.3 Deductive versus inductive models................................................................................33 3.4

MAJOR DIRECTIONS OF MODEL DEVELOPMENT .................................................................34

3.5 LIMITATIONS OF COST MODELS..........................................................................................42 3.5.1 Model assumptions........................................................................................................42 3.5.2 Reliance on historical data for prediction.....................................................................43 3.5.3 Insufficiency of information and preparation time........................................................44 3.5.4 Reliance on expert judgment .........................................................................................44

vii 3.6

REVIEW OF COST MODELS IN USE ..................................................................................... 45

3.7

SIGNIFICANT ITEMS ESTIMATION ....................................................................................... 47

3.8

DISCUSSIONS ON RESEARCH OPPORTUNITIES .................................................................... 49

3.9

STOREY ENCLOSURE METHOD........................................................................................... 52

3.10

REGRESSION ANALYSIS ..................................................................................................... 56

3.11

REVIEW OF MODEL PREDICTORS ....................................................................................... 57

3.12

OCCAM’S RAZOR: PARSIMONY OF VARIABLES .................................................................. 66

3.13

SUMMARY .......................................................................................................................... 70

CHAPTER 4

PERFORMANCE OF FORECASTING MODELS........................................... 73

4.1

INTRODUCTION .................................................................................................................. 73

4.2

MEASURES OF FORECASTING ACCURACY .......................................................................... 74

4.3

BASE TARGET FOR FORECASTING ACCURACY ................................................................... 82

4.4

OVERVIEW OF MODEL PERFORMANCE AT VARIOUS DESIGN STAGES ................................ 83

4.5

SUMMARY .......................................................................................................................... 87

CHAPTER 5

METHODOLOGY ................................................................................................ 89

5.1

INTRODUCTION .................................................................................................................. 89

5.2

RESEARCH FRAMEWORK ................................................................................................... 90

5.3

TYPES OF QUANTITY MEASURED IN SINGLE-RATE FORECASTING MODELS ...................... 92

5.4

SIMPLIFICATION OF JSEM ................................................................................................. 93

5.5

IDENTIFICATION OF A PROBLEM ......................................................................................... 97

5.6 DATA PREPARATION AND ENTRY ...................................................................................... 99 5.6.1 Data sample................................................................................................................ 100 5.6.2 Definition and classification of building types ........................................................... 101 5.6.3 Treating of outliers..................................................................................................... 104 5.7 MODEL BUILDING ............................................................................................................ 105 5.7.1 Dependent Variables .................................................................................................. 105 5.7.1.1 5.7.1.2

5.7.2 5.7.3

Price Index Adjustment .................................................................................................. 106 Other Adjustments .......................................................................................................... 107

Candidate variables ................................................................................................... 107 Fitting Criterion ......................................................................................................... 109

5.7.3.1

5.7.4

Matrix Notation for Calculation of MSQ........................................................................ 110

Reliability analysis ..................................................................................................... 112

5.7.4.1

5.7.5

Matrix Notation for Calculation of MSQ by Leave-one-out Method.............................. 114

Selection Strategies .................................................................................................... 115

5.8 MODEL ADJUSTMENT ...................................................................................................... 119 5.8.1 Exclusion of candidates.............................................................................................. 119 5.8.2 Transformation of variables ....................................................................................... 121 5.9 COMPARISON OF BEST MODEL WITH OTHER MODELS ..................................................... 122 5.9.1 Choice of parametric and non-parametric inference ................................................. 124 5.9.2 Statistical inference for bias....................................................................................... 126 5.9.3 Statistical inference for consistency ........................................................................... 127 5.10

TOOLS FOR COMPUTATION .............................................................................................. 132

5.11

SUMMARY ........................................................................................................................ 133

CHAPTER 6 6.1

ANALYSIS........................................................................................................... 137

INTRODUCTION ................................................................................................................ 137

viii 6.2 MODEL DEVELOPMENT.....................................................................................................138 6.2.1 Data Collected ............................................................................................................138 6.2.2 Candidates for Regression Models..............................................................................139 6.2.3 Response for Regression Models.................................................................................139 6.2.4 Selection of Predictors ................................................................................................142 6.2.4.1

6.2.5

Selected Predictors for RJSEMs and RASEMs ...............................................................144

Model Transformation.................................................................................................164

6.3 PERFORMANCE VALIDATION ............................................................................................164 6.3.1 Forecasting Results .....................................................................................................164 6.3.2 Normality Testing........................................................................................................168 6.3.3 Significance of Variable Transformation ....................................................................174 6.3.4 Comparisons of Models...............................................................................................175 6.3.4.1 6.3.4.2 6.3.4.3 6.3.4.4 6.3.4.5

Models for Offices...........................................................................................................178 Models for Private Housing.............................................................................................179 Models for Nursing Homes .............................................................................................180 Models for Schools..........................................................................................................180 Discussions on model comparisons .................................................................................181

6.4

COMBINING FORECASTS ...................................................................................................183

6.5

SUMMARY ........................................................................................................................188

CHAPTER 7

CONCLUSIONS...................................................................................................193

7.1

INTRODUCTION .................................................................................................................193

7.2

MODEL DEVELOPMENT.....................................................................................................194

7.3

PERFORMANCE VALIDATION ............................................................................................196

7.4

COMBINING FORECASTS ....................................................................................................198

7.5

IMPLICATIONS FOR PRACTICE ...........................................................................................199

7.6

MODEL LIMITATIONS........................................................................................................200

7.7

OPPORTUNITIES FOR FURTHER RESEARCH ........................................................................202

BIBLIOGRAPHY .............................................................................................................................205 APPENDIX A: APPROVAL LETTER FOR ACCESS OF COST ANALYSES ........................218 APPENDIX B : TENDER PRICE INDICES AND COST TRENDS IN HONG KONG, MARCH 2004 (PUBLISHED BY LEVETT AND BAILEY CHARTERED QUANTITY SURVEYORS LTD.)..................................................................................................................................................218 APPENDIX C: ORIGINAL DATA .................................................................................................218 APPENDIX D: FORECASTS BY CROSS VALIDATION USING CONVENTIONAL MODELS ...........................................................................................................................................218 APPENDIX E: ERRORS AND PERCENTAGE ERRORS OF FORECASTS ..........................218 APPENDIX F: RESULTS OF COMBINING FORECASTS .......................................................218

ix

List of Figures FIGURE 2-1: MODEL OF DESIGN PROCESS (SOURCE: MAVER 1970 P.200).................... 16 FIGURE 2-2: LEVEL OF INFLUENCE ON PROJECT COST (IN PER CENT) (SOURCE: BARRIE AND PAULSON 1978 P. 154)........................................................................................... 16 FIGURE 2-3: DESIGNERS’ COMMITMENT TO EXPENDITURE (SOURCE: FERRY ET AL. 1999 P. 96)........................................................................................................................................... 17 FIGURE 5-1: RESEARCH FRAMEWORK FOR IDENTIFICATION, SELECTION AND VALIDATION OF PRICE MODELS.............................................................................................. 91 FIGURE 5-2: ALGORITHM FOR DUAL STEPWISE SELECTION....................................... 118 FIGURE 5-3: ALGORITHM FOR COMPARISONS OF VARIANCES OF PERCENTAGE ERRORS ........................................................................................................................................... 128 FIGURE 6-1: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE FLOOR AREA MODEL FOR OFFICES ................................................................................................................. 170 FIGURE 6-2: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE LRASEM FOR OFFICES .......................................................................................................................................... 170 FIGURE 6-3: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE JSEM FOR PRIVATE HOUSING...................................................................................................................... 171 FIGURE 6-4: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE FLOOR AREA MODEL FOR PRIVATE HOUSING............................................................................................. 171 FIGURE 6-5: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE CUBE MODEL FOR PRIVATE HOUSING............................................................................................................. 172 FIGURE 6-6: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE RJSEM FOR NURSING HOMES ......................................................................................................................... 173 FIGURE 6-7: BOX-COX PLOT OF PERCENTAGE ERRORS FOR THE RASEM FOR NURSING HOMES ......................................................................................................................... 173 FIGURE 6-8: TESTS OF HOMOGENEITY OF VARIANCES USING BARTLETT’S TESTS, KRUSKAL WALLIS TESTS AND MANN-WHITNEY U TESTS............................................. 177

x

List of Tables TABLE 2-1: MODEL SELECTION CRITERIA (EXTRACTED AND MODIFIED FROM FORTUNE AND HINKS 1998) .........................................................................................................19 TABLE 3-1: CLASSIFICATION OF THIS RESEARCH ACCORDING TO NEWTON’S DESCRIPTIVE PRIMITIVES ..........................................................................................................37 TABLE 3-2: PREVIOUS STUDIES ON MODELLING TECHNIQUES AND APPLICATIONS ACCORDING TO NEWTON’S CLASSIFICATION .....................................................................38 TABLE 3-3: SUMMARY OF ESTIMATING TECHNIQUES (EXTRACTED FROM SKITMORE & PATCHELL 1990) ...................................................................................................40 TABLE 3-4: ADJUSTMENT FOR THE FACTORS AFFECTING THE ESTIMATES IN THE STOREY ENCLOSURE METHOD .................................................................................................53 TABLE 3-5: WEIGHTINGS AND INCLUSIONS FOR INDIVIDUAL COMPONENTS IN THE STOREY ENCLOSURE METHOD .................................................................................................54 TABLE 3-6: THE RESULTS OF TESTS FOR THE CUBE, FLOOR AREA AND STOREY ENCLOSURE METHODS IN JAMES’ STUDY (SOURCE: JAMES (1954)) .............................56 TABLE 3-7: SUMMARY OF THE MODELS DEVELOPED BY THE POST-GRADUATE STUDENTS OF THE DEPARTMENT OF CIVIL ENGINEERING AT LOUGHBOROUGH UNIVERSITY OF TECHNOLOGY (EXTRACTED FROM MCCAFFER 1975).......................68 TABLE 3-8: SUMMARY OF FORECASTING TARGETS AND INFLUENCING VARIABLES IN PREVIOUS EMPIRICAL STUDIES ..........................................................................................69 TABLE 4-1: MEASURES OF PERFORMANCE OF FORECASTS (SOURCE: SKITMORE ET AL. 1990 P. 22) ..............................................................................................................................77 TABLE 4-2: FACTORS AFFECTING QUALITY OF FORECASTS – SUMMARY OF EMPIRICAL EVIDENCE (EXTENDED FROM THE SIMILAR TABLE IN SKITMORE ET AL. (1990, P. 20-21)) ...........................................................................................................................78 TABLE 4-3: PERFORMANCE OF DESIGNERS’ FORECASTS REVIEWED BY ASHWORTH AND SKITMORE (1983)...........................................................................................87 TABLE 5-1: COEFFICIENTS AND VARIABLES DESIGNATED IN JSEM.............................99 TABLE 5-2: CLASSIFICATION OF BUILDING PROJECTS ACCORDING TO BUILDING TYPES................................................................................................................................................103 TABLE 5-3: LIST OF CANDIDATE VARIABLES......................................................................108

xi TABLE 6-2: INCLUDED CANDIDATES, EXCLUDED CANDIDATES AND SELECTED PREDICTORS FOR RJSEMS AND RASEMS ............................................................................ 144 TABLE 6-3: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RJSEM FOR OFFICES ................................................................................................................................. 147 TABLE 6-4: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RJSEM FOR PRIVATE HOUSING............................................................................................................. 148 TABLE 6-5: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RJSEM FOR NURSING HOMES ................................................................................................................ 148 TABLE 6-6: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RJSEM FOR SCHOOLS ............................................................................................................................... 149 TABLE 6-7: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RASEM FOR OFFICES ................................................................................................................................. 150 TABLE 6-8: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RASEM FOR PRIVATE HOUSING............................................................................................................. 151 TABLE 6-9: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RASEM FOR NURSING HOMES ................................................................................................................ 152 TABLE 6-10: STEP-BY-STEP SELECTION RESULTS OF PREDICTORS FOR THE RASEM FOR SCHOOLS ............................................................................................................................... 153 TABLE 6-11: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RJSEM FOR OFFICE ............................................ 154 TABLE 6-12: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RJSEM FOR PRIVATE HOUSING...................... 155 TABLE 6-13: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RJSEM FOR NURSING HOMES ......................... 156 TABLE 6-14: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RJSEM FOR SCHOOLS ........................................ 157 TABLE 6-15: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RASEM FOR OFFICES ......................................... 158 TABLE 6-15: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RASEM FOR OFFICES ......................................... 158 TABLE 6-16: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RASEM FOR PRIVATE HOUSING..................... 159 TABLE 6-17: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RASEM FOR NURSING HOMES ........................ 160 TABLE 6-18: COEFFICIENTS, FORECASTS AND MSQS DETERMINED BY LEAVE-ONE-OUT METHOD FOR THE RASEM FOR SCHOOLS ....................................... 161 TABLE 6-19: SIGNS OF COEFFICIENTS FOR SELECTED PREDICTORS........................ 161 TABLE 6-20: CONTRIBUTIONS OF FLOOR AREA RELATED PREDICTOR TO RESPONSE....................................................................................................................................... 162

xii TABLE 6-21: CONTRIBUTION OF NON-FLOOR AREA RELATED PREDICTORS TO RESPONSES .....................................................................................................................................163 TABLE 6-22: SUMMARY OF MEANS AND STANDARD DEVIATIONS OF PERCENTAGE ERRORS............................................................................................................................................167 TABLE 6-23: RESULTS OF NORMALITY TESTS FOR PERCENTAGE ERRORS ACCORDING TO BUILDING AND MODEL TYPES ................................................................169 TABLE 6-24: ESTIMATED LAMBDA VALUES ACCORDING TO BUILDING AND MODEL TYPES (FOR MODELS NOT SATISFYING NORMALITY ASSUMPTION ONLY).............173 TABLE 6-25: TWO-SAMPLE F-TESTS AND MANN-WHITNEY U TEST BETWEEN REGRESSED MODELS WITH UNTRANSFORMED VARIABLES AND WITH LOGARITHMIC TRANSFORMED VARIABLES ......................................................................175 TABLE 6-26: TWO-SAMPLE MANN-WHITNEY U-TESTS BETWEEN MODELS FOR OFFICE AND PRIVATE HOUSING .............................................................................................176 TABLE 6-27: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 1 MODELS ................................................................185 TABLE 6-28: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 2 MODELS ................................................................186 TABLE 6-29: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 3 MODELS ................................................................186 TABLE 6-30: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 4 MODELS ................................................................187 TABLE 6-31: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 5 MODELS ................................................................187 TABLE 6-32: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 6 MODELS ................................................................187 TABLE 6-33: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 7 MODELS ................................................................188 TABLE 6-34: ACCURACY FOR COMBINED, MODEL AVERAGE, MINIMUM AND MAXIMUM FORECASTS FOR GROUP 8 MODELS ................................................................188 TABLE D-1: FORECASTS BY CROSS VALIDATION USING THE CONVENTIONAL MODELS FOR OFFICES................................................................................................................218 TABLE D-2: FORECASTS BY CROSS VALIDATION USING THE CONVENTIONAL MODELS FOR PRIVATE HOUSING ...........................................................................................218 TABLE D-3: FORECASTS BY CROSS VALIDATION USING THE CONVENTIONAL MODELS FOR NURSING HOMES...............................................................................................218 TABLE D-4: FORECASTS BY CROSS VALIDATION USING THE CONVENTIONAL MODELS FOR SCHOOLS..............................................................................................................218 TABLE E-1: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE CONVENTIONAL MODELS FOR OFFICES..............................................................................218 TABLE E-2: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE CONVENTIONAL MODELS FOR PRIVATE HOUSING .........................................................218

xiii TABLE E-3: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE CONVENTIONAL MODELS FOR NURSING HOMES ............................................................ 218 TABLE E-4: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE CONVENTIONAL MODELS FOR SCHOOLS ........................................................................... 218 TABLE E-5: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE REGRESSED MODELS FOR OFFICES...................................................................................... 218 TABLE E-6: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE REGRESSED MODELS FOR PRIVATE HOUSING ................................................................. 218 TABLE E-7: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE REGRESSED MODELS FOR NURSING HOMES..................................................................... 218 TABLE E-8: ERRORS AND PERCENTAGE ERRORS OF FORECASTS FOR THE REGRESSED MODELS FOR SCHOOLS.................................................................................... 218 TABLE F-1: COMBINED FORECASTS FOR GROUP 1 MODELS ........................................ 218 TABLE F-2: COMBINED FORECASTS FOR GROUP 2 MODELS ........................................ 218 TABLE F-3: COMBINED FORECASTS FOR GROUP 3 MODELS ........................................ 218 TABLE F-4: COMBINED FORECASTS FOR GROUP 4 MODELS ........................................ 218 TABLE F-5: COMBINED FORECASTS FOR GROUP 5 MODELS ........................................ 218 TABLE F-6: COMBINED FORECASTS FOR GROUP 6 MODELS ........................................ 218 TABLE F-7: COMBINED FORECASTS FOR GROUP 7 MODELS ........................................ 218 TABLE F-8: COMBINED FORECASTS FOR GROUP 8 MODELS ........................................ 218

1

Chapter 1

Introduction

Philosophy is a game with objectives and no rules. Mathematics is a game with rules and no objectives. Anonymous

The forecasting approach for the prediction of building prices that is used in practice has been criticized for misconstruing the relationship between level of detail and forecasting accuracy (Bennett et al. 1979), for lacking solid theoretical support (Brandon 1982; Skitmore 1988; Bon 2001), for lacking performance evaluation (Morrison 1983; Raftery 1984a; Fortune and Lees 1996), and for being inexplicable, unrelated, and deterministic (Bowen et al. 1987).

Although many alternative

approaches and new models have been developed, solid evidence from surveys that have been conducted in different countries suggests that they are rarely put forth in practice (Akintoye et al. 1992; Fortune and Lees 1996; Bowen and Edwards 1998). The majority of studies of model development have chosen to focus on the uniqueness of a new model and the way in which it is different from other models (Raftery 1984a; Newton 1990). In the early design stage of a building project, the freedom to modify the scopes, requirements, standards and designs is very high.

This alone will create

high uncertainty in building price despite the fact that the later decisions on tendering

2 arrangement, procurement methods and number of tenderers to be invited, etc., and the possible change in market conditions as design develops will also have serious price implication.

Although the design information available is very coarse and

limited in the early design stage, construction clients are generally eager to know the likely building price.

Very often, this price refers to the lowest tender price.

Conventionally, practicing forecasters measure the total floor area from a few sketch drawings and make a forecast using the floor area method (or the cube method before the floor area method gained the popularity). To make full use of information extracted from sketches, James proposed a method as a rule of thumb, called the Storey Enclosure Method, which he claimed takes into account the effect of physical shape, the total floor area, the vertical positioning of the floor area, the storey heights and the sinking usable floor area below ground level (e.g. basement) on building prices.

Like the floor area and the cube method, James’ method is a single rate

method which uses the storey enclosure area as the quantity for measurement.

To

determine this area, the area for each floor, the external wall area, the basement wall area and the roof area are first measured.

Then, these measured areas are multiplied

by their associated weightings. Finally, the products of these areas and weightings are summed and the total is the storey enclosure area. Although James’ Storey Enclosure Model (JSEM) has not been developed empirically, its forecasting performance, together with that of two other conventional models, the floor area and cube models, have been calculated with empirical data for comparison.

James’ study of 1954 is a pioneering study in model exploration that

attempts to show model advancement empirically.

It is able to show that forecasts

that are produced by his model are nearer to actual tender prices than those that are produced by the other two models, and that the range of price variation is reduced

3 accordingly. Despite the better performance demonstrated by James, JSEM serves primarily as a textbook method for forecasting, rather than a method that is used in practice. Clearly, JSEM is more complicated than the floor area and the cube models in terms of measurement and ease of understanding.

Moreover, there is a

major criticism in that the use of weightings are purely based on experience (Wilderness Group 1964; Seeley 1996 pp.161-162; Ashworth 1999 p.251). JSEM, which is considered to be the most sophisticated model of all of the single rate models, has been chosen for further development in this research.

The

idea of using areas of different parts of a building as variables allows for model exploration using regression analysis.

The major problem of JSEM, that of a lack

of rigor, can be solved by using advanced modelling techniques for model development, and statistical inferences for performance validation. By following a rigorous approach of cross validation to the further development of JSEM and the subsequent examination of the developed model by statistical testing, it is expected that the model will achieve a balance between the requirements of theory (science) and practicability (technology) for forecasting building prices. With reference to the variables identified in the JSEM, the primary aim of this research is to develop regressed models for forecasting the lowest tender prices of multi-storey buildings in the early design stage using a systematic and logical approach. To achieve this aim, this research adopts the cross validation approach for modelling using regression analysis as it is proved to be markedly superior for small data sets (Goutte 1997).

The accuracy of statistical inference in cross

validation is preserved by dividing at random a sample of data into two sub-samples, an exploratory sub-sample, which is used to select a statistical model for the data, and a validatory sub-sample, which is used for formal statistical inference (Fox 1997).

4 The cross validation algorithm developed in this study for modelling JSEM’s variables is a significant contribution because of its advancement to the model building process.

Although the data, i.e. the observed values for the candidates and

the response, used in this study are only for four different types of building projects, the developed methodology for modelling is also applicable to data from other types of buildings and other types of data.

In revisiting James’ study, the specific

objectives of this research are: (1) to collect project data of multi-storey buildings; (2) to classify the data according to the type of projects; (3) to develop a cross-validated regression algorithm for model selection; (4) to generate regressed models of different project types by the cross validation method using the variables in JSEM as candidates; (5) to repeat (4) based on another set of variables that are modified from the variables in JSEM. It is hypothesised that the new regressed models will outperform the conventional forecasting models, i.e. the JSEM, the floor area and cube models. The secondary aim is to prove the hypothesis. To accomplish this, the forecasting accuracy of the developed models has to be tested against that of the conventional models.

An algorithm for selecting the appropriate tests for the comparisons is

designed. The specific objectives concerning the statistical inference are: (1) to measure the forecasting accuracy in terms of bias and consistency ; (2) to compare the forecasting accuracy of these models by the use of different parametric and non-parametric tests and (3) to group the models that show the same potency together if the developed models do not perform significantly better than the conventional models.

5 The thesis is divided into three parts. The background to this research is presented in Chapters 2 to 4, the empirical work is contained in Chapters 5 to 6, and the conclusions are presented in Chapter 7. Due to the difference in cost sensitivity, early design decisions that are strongly influenced by forecasting accuracy have a stronger impact on the final value of a building.

In Chapter 2, the significance of early stage forecasting and

forecasting in practice are reviewed.

Disregarding the probable strong impact, it is

found that accuracy is rarely monitored in real life forecasting. There is also a lack of theoretical support for the widely adopted forecasting methods, but by contrast there are a variety of forecasting cost models that have been developed, mainly by academia, arguably for the sake of publication purposes only. The development, use, classification and limitation of cost models are summarised in Chapter 3.

In particular, JSEM, as the model for further

development in this study, is extensively explained.

This research applies

regression techniques to the variables in JSEM, and previous studies on the application of similar techniques and the variables selected are also discussed.

In

the process of model development, modellers always face a dilemma between choosing a slightly complex model that is general but may be unrealistic, and choosing a very complex model that is specific but may be unreliable. To resolve this dilemma, the principle of parsimony in scientific theory and model development is addressed. Due to the limited information that is available for early-stage forecasting, models are mainly operated in ‘black box’ mode.

In the case of models that are

developed by regression, as in this research, performance validation is essential.

6 The different ways of measuring forecasting performance, and previous empirical work on forecasting accuracy, are reviewed in Chapter 4. The methodology is described in Chapter 5. The cost analyses for four types of building were collected from a large quantity surveying practice in Hong Kong.

To employ the regression methodology for multi-storey buildings, the

number of variables in the original JSEM is trimmed down to a manageable level by making an assumption of the variations in floor size.

Some advanced features of

this methodology include the use of cross validation for reliability analysis, which simulates the practical production of forecasts, and a dual stepwise selection strategy that enhances the chance of identifying the best model.

In the section concerning

the comparison of the models, two commonly used measures – bias and consistency – are described, and statistical inference using parametric and non-parametric tests is compared. To assist in the making of a proper statistical inference, a framework for choosing the appropriate tests is also proposed. The analysis in Chapter 6 contains three sections: model development, performance validation, and combining forecasts.

Eight regressed models were

developed according to two sets of candidates (one set from JSEM and one set that was modified from JSEM) for the four types of building. The selected variables were also transformed to seek a further improvement in forecasting accuracy.

Each

regressed model was compared separately with the conventional models, and the models possessing the same potency were grouped together.

Finally, an approach

to combining forecasts to improve forecasting performance is demonstrated with empirical data.

7 An overall summary, conclusions, and suggested further research are presented in Chapter 7.

8

Chapter 2

Cost Forecasting in Practice: A Review

In science, it doesn't matter if you're wrong, as long as you're not stupid. In business, it doesn't matter if you're stupid, so long as you're not wrong. Anonymous

2.1

Introduction

The total cost of a development includes the cost of land, building costs, finance charges, legal charges, consultants’ fees, and so forth. In a broader sense, it also includes the running, marketing, maintenance and repair costs.

To manage the

economic aspect of a building development effectively, clients often employ professionals of various disciplines, such as accountants, general practice surveyors and quantity surveyors.

Quantity surveyors, whose profession originates in the

measurement of the quantities of buildings, are responsible for giving advice on building costs. The economic aspects of building procurement play a very significant role, because building cost is one of the major components of the total development cost, next to the cost of land.

Unlike the cost of land, which reflects the cost of

ownership and usage, building cost is determined by the building market through the

9 cost approach.

In a market economy, it is the traded value to a contractor for the

procurement of a building. At the design development stage, building cost planning and control is an iterative process that is used to forecast the unknown building price based on available drawings and specifications (i.e., costing a design) and the revision of drawings and specifications to ensure that the building price falls within a predetermined sum (i.e., designing to a cost) (Jaggar et al. 2002 pp.10-11).

Design

decisions that are made during this process are crucial to the success of a project. As the decisions that are made in the early design stages, especially before a detailed design has been worked out, are more cost sensitive than those that are made in the later stages, changes to design decisions in the later design stages or execution stage may lead to serious redundancy.

Thus, it is essential to produce an accurate cost

forecast, especially in the early design stage. The use of the right cost model is therefore a fundamental concern. The task of forecasting the cost of buildings is especially difficult because of the heterogeneity of the design, procurement, and contractual arrangements; the complexity of the resources and production methods that are involved; and the lengthy cycle of building projects.

The task of forecasting is, however, very

important in the design process, as design decisions are always made with reference to the forecasts (outputs of the task) of building costs. An incorrect forecast will inevitably lead to the ineffective use of resources. In a typical building project, the quantity surveyor is held responsible for giving strategic cost advice.

The science and art of this function is what

distinguishes quantity surveying as a professional discipline (James 1954; Male 1990;

10 Connauhgton and Meikle 1991; RICS 1992), and forecasting forms a core part of this function.

This chapter reviews the significance of early stage forecasting,

conventional forecasting practice, and the underlying problems of the traditional forecasting approach.

2.2

Building Economics

To give professional cost advice, quantity surveyors should be well equipped with knowledge of building economics.

Building economics is the study of

economising the use of scarce resources throughout the development life cycle, from conception to demolition (Bon 1989 p.5).

It involves a combination of technical

skills, informal optimisations, cost accounting, cost control, price forecasting and resource allocation (Raftery 1991 pp.4-5). In a broader sense, it can be considered as a branch of general economics that involves the application of the techniques and expertise of economics to the study of the construction firm, the construction process and the construction industry (Hillebrandt 1985 p.3).

The objective of seeking an

optimal allocation of resources for building clients distinguishes building economics from cost and management accounting.

2.3

Cost Planning and Control

Practitioners refer to the process of applying economics principles to building projects as cost planning and control. The purposes of cost planning and control are to provide clients with a good value for money project, to achieve a balance of

11 design expenditure on various building component, and to keep expenditure within the amount that is allowed by the client (Maver 1979; Karashenas 1984; Seeley 1996 p.6; Flanagan and Tate 1997 p.13; and Ashworth 1999 pp. 9-10). In practice, it may involve the study of the client’s requirements, the possible effects on the surrounding areas if the development is carried out, the relationship between space and shape, the assessment of the initial cost, the reasons for, and methods of, controlling costs and the estimation of the life of the building and materials (Ashworth 1999 pp.9-10; Ferry et al. 1999, pp.26-28).

2.4

Cost Forecasting in the Cost Planning and Control Process

To avoid ambiguities in the understanding of commonly used terms such as ‘cost planning’ and ‘cost forecasting’ for describing the activities involved, and ‘estimate,

‘forecast’ and ‘prediction’ for describing the output produced, their

corresponding definitions are reviewed.

The terms ‘cost planning’ and ‘cost

control’ are defined by Seeley (1996). Cost planning – a systematic application of cost criteria to the design process, so as to maintain in the first place a sensible and economic relation between cost, quality and utility and appearance, and in the second place, such overall control of proposed expenditure as circumstances may dictate. (p. 22)

12 Cost control – all methods of controlling the cost of building projects within the limits of a predetermined sum, throughout the design and construction stage. (p. 23) With reference to his definitions, there are four key elements in the process of cost planning and control. target or a cost limit.

First, it is necessary to produce a base figure as a cost

Second, the analysis of cost and the production of a probable

building cost is an iterative process. Third, the cost study requires the application of knowledge of how to relate building design to building economics. Fourth, the cost target, or cost limit, is used to monitor the probable building cost. In short, the process is an iterative process that forecasts the building cost based on available information such as drawings and specifications (costing a design) and the revision of drawings and specifications to ensure that the building cost falls within the limit of a predetermined sum (designing to a cost) (Jaggar et al. 2002 p.11). The terms ‘cost’ and ‘cost forecasting’ are defined in the first chapter of the book, Cost Modelling, edited by Skitmore and Marston (1999): Cost – the cost of the contract to the client. This is the value of the lowest bid received for the contract, or the contract sum. (p. 18) Cost forecasting – the process of forecasting the client’s cost. Cost forecasting is a part of the cost evaluation (planning and control) process. (p. 18) Obviously, the cost of a building under a building contract formed between a building owner and a contractor is different from the cost of production of that building. Their relationship varies according to many unquantifiable factors such as

13 market condition and project risks, etc. As one person’s price is another person’s cost, the terms ‘price’ and ‘cost’ of a building refer to the amount received by a contractor and the amount paid by a owner respectively (Raftery 1991, pp. 30-32). To avoid ambiguity, the terms ‘price’ and ‘cost’ are used synonymously throughout this thesis in the sense of the cost to building owners. According to the definitions of cost planning and cost forecasting, the latter determines what the future cost will be, whereas the former determines what it should be.

Cost forecasting is an input to the cost planning process, or a

sub-process of cost planning.

The importance of cost forecasting is often

understated compared with cost planning.

Armstrong (1985 p.6) argues that

alternative plans can be compared only if reasonable forecasts can be made, and that forecasting should be considered as being as important as planning. The term ‘forecast’ is distinguished from ‘estimate’ and ‘prediction’. An estimate is made of quantities that may exist before, during or after the event under consideration.

Forecasting requires a prior estimate, and is a subset of the

estimating task (Skitmore et al. 1990 p.3). An estimate of a future event must by definition be a forecast, whereas an estimate of an event that is based on information that contains the event itself is a prediction (Skitmore and Marston 1999 p. 19). statistical parlance, a prediction is an estimate of formulae.

In

A forecast, however, is

an estimate of a similar value that is outside of the database (Skitmore et al. 1990 p.4).

As this research concerns the development of cost models for the estimation

of future events, the term ‘forecast’ is used throughout the thesis.

14

2.5

Design Process and Designers’ Forecasts

Making appropriate design decisions is crucial to the success of a project, because design changes that involve vast expenditure, future changes or variations in decisions, especially after the commencement of construction, often lead to redundancy and waste in terms of work completed and resources deployed. Some decisions on design may have long-term consequences or may be unrecoverable. Design decisions are solutions to problems of function, form, time and economy for buildings (Peña and Parshell 2001).

Referring to Figure 2-1, which exhibits the

process of the search for design solutions as an iterative process of analysis, synthesis, and appraisal (Maver 1970), it can be seen that building cost forecasts are used at the appraisal stage to assist in the making of decisions to achieve an economic objective.

These forecasts are also referred to as ‘designers’

forecasts’(Ashworth and Skitmore 1983), as it is the building design that gives the information for forecasting and determines whether value can be achieved at an acceptable cost (Morton and Jagga 1995 p. 9). As clients need reliable cost advice to enable the assessment of the viability of a project as soon as is possible (Fortune and Lees 1994), designers’ forecasts help to make them aware of their probable financial commitments before any extensive design work is undertaken (Seeley 1996 p. 54). The outline plan of work of the Royal Institute of British Architects (RIBA) (RIBA 1991) divides the construction process into 12 stages.

It gives a

comprehensive picture of the information that is required and the tasks that need to be completed at each stage of work.

There are four stages in between the

appointment of various professionals and the production of tender information.

15 They are the feasibility, outline proposal, sketch design and detailed design stages. During these stages, quantity surveyors are responsible for producing designers’ forecasts according to the information that is provided, and the most important goal of these forecasts is to give a forecasted value of the work that is as close to the unknown value of the lowest tender as possible. Although designers’ forecasts at different stages share the same goal, the levels of influence differ according to the qualities of these forecasts.

Figure 2-2 illustrates the level of influence of the

different project stages on project cost.

It shows that the level of influence drops

drastically from the planning and design stage to the procurement and construction stage, even though the percentage of actual cost spent in relation to the overall building cost is small in the former stage. This is also reinforced by current studies, which suggest that the commitment of a construction cost before a sketch design is formalised may amount to over 80% of the final potential cost (Skitmore 1985; Ferry et al. 1999 pp.95-96). Figure 2-3 shows the suggested accumulated commitment expenditure against the design time. As demonstrated, early decisions are more cost sensitive, and thus the quality of early stage forecasts plays a more influential role in the final value of buildings than the quality of later forecasts. Skitmore et al. (1990 p.5) suggested five primary determinants that affect the quality of forecasting: the nature of the target, the information used, the forecasting technique used, the feedback mechanism used and the person providing the forecast. The forecasting technique for early stage forecasting is identified as the study area for this research.

16

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library

Figure 2-1: Model of design process (Source: Maver 1970 p.200)

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library

Figure 2-2: Level of influence on project cost (in per cent) (Source: Barrie and Paulson 1978 p. 154)

17

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library

Figure 2-3: Designers’ commitment to expenditure (Source: Ferry et al. 1999 p. 96)

2.6

Early Stage Forecasting in Practice

Bennett et al. (1979) classified conventional designers’ forecasting techniques into eight categories: cost limit calculation, floor area method, functional unit method, elemental cost estimation, lump sum estimation, cost per meter squared for functional use, approximate quantities, and pricing the bill of quantities.

None of

these techniques take into account how building cost is actually incurred by contractors.

In traditional procurement, the responsibilities of design and

construction are separately carried out by two groups of professionals. Designers’ forecasts are usually prepared by professional quantity surveyors who do not normally has access to the cost data of contractor’s accounts. These data show how actual cost of construction is incurred by a contractor.

Due to the lack of these data,

forecasters can only refer to the prices and unit rates from returned tenders.

Out of

18 the eight types of techniques involved, the most frequently used before the preparation of a tender are the floor area method, elemental cost estimation and approximate quantities.

The first method assumes that the building price is

proportional to its floor area.

The second method divides a building into a set of

elements and assumes that the cost of an element is proportional to the unit of measurement that is defined for that element.

The third method requires the

calculation of quantities of the major items of a building and pricing them by means of composite unit rates. efficacy.

The popularity of these methods is unrelated to their

Moreover, there is a major criticism about the lack of a theoretical

relationship for the application of these methods at different design stages. Conventional forecasting techniques are applied in when there is a trade-off between the estimation of accuracy and the time that is available for forecasting or between forecasting accuracy and the adequacy of available forecasting information (Taylor 1984). According to the results of surveys on the forecasting techniques that are employed by practitioners in Nigeria (Akintoye et al. 1992), South Africa (Bowen and Edwards 1998) and the United Kingdom (Fortune and Lees 1996; Fortune and Hinks 1998), conventional forecasting methods are still in widespread use, and their applications outnumber those of the newer models. The UK survey, which was conducted by Fortune and Hinks (1998), also prioritised the model selection criteria for practicing forecasters. Table 2-1 shows the identified model selection criteria in descending order of importance. The two highest-ranked criteria are the availability of data and the data that is needed for a model.

19

Table 2-1: Model Selection Criteria (extracted and modified from Fortune and Hinks 1998) Model Selection Criteria Identified by Fortune and Hinks (1998) (UK) according to the descending importance ranking Amount of project data availability Data need for model Forecasters understanding of the model Time available for forecast preparation Project type Accuracy of model output Forecasters experience of model in-use Amount of risk in project decisions Ease of model application Feedback from previous forecasts Complexity of the project Speed of the model in use Human resources required to operate model Site characteristics of the project Level of awareness of new models Flexibility of model in use Project size Nature of the client Market conditions Cost of using model Design consultants for the project Quality levels required in the project Availability of computers for use with model Relationships between the forecaster and manager Anticipated height of project Geographic location of the project

Model Selection Criteria Identified by Akintoye et al. (1992) (Nigeria)

Model Selection Criteria Identified by Bowen and Edwards (1998) (South Africa)

9 9

9

9 9

9

9

9 9

Other criterion found: (1) Expected frequency of model use

Other criteria found: (1) Cost of Project; (2) Client Sophistication

Although the approximate quantities method is generally thought to be more accurate (Fortune and Lees 1996), as it utilises more data, a recent study surprisingly found the opposite result, that the floor area method is significantly more accurate than the approximate quantities method (Skitmore and Drew 2003).

Moreover, the

approximate quantities technique requires more detailed design information and more time to prepare. By contrast, the floor area method is very rough, and requires much

20 less information and effort to produce. As design decisions that are made before the completion of the sketch design stage are far more important than those that are made afterwards (refer to Figure 2-2 and 2-3), it would be more worthwhile to spend time on improving the early stage forecast, from the perspective of cost and benefit.

2.7

Problems of Existing Forecasting Practice

2.7.1 Misconception of the relationship between level of detail and forecasting accuracy

Bennett et al. (1979) found that some forecasters applied quite detailed estimating techniques at a very early stage of the design planning process without taking into account the correlating accuracy.

Practising forecasters generally

believe that forecasts that are produced from more detailed quantities are more accurate. Thus, forecasters usually attempt to measure quantities in as much detail as possible within the limitations on the available data and allowable time. This explains why forecasters ranked very highly the three model selection criteria of “amount of project data available” (most important), “data needed for model” (second most important) and “time available for forecast preparation” (fourth most important) in the model selection criteria survey. The conviction of practising forecasters that the more detailed a forecast is, the higher its accuracy remains a proposition only.

Skitmore (1991) highlights the

need for the assessment of the performance of individual techniques:

21 “the standard construction price forecasting texts all assert that more detailed forecasting techniques such as those using approximate quantities are ipso facto necessarily of better quality than coarser techniques such as the floor area method . . .

very little research

seems to have been attempted in establishing the validity of this assertion or of the relative quality of individual techniques.” (p. 12) Ironically, empirical evidence of forecasting accuracy reveals that very little improvement can be made to overall accuracy simply by increasing the level of detail and the complexity of quantity-based methods (Ashworth and Skitmore 1983; Ross 1983; Morrison 1984; Beeston 1987).

This could be due to the fact that factors

such as the type, size and shape of buildings that are not counted in quantity-based methods have a greater significance, and that costs are closely related to market forces and therefore, to an extent, are divorced from actual costs (Skitmore 1995). More empirical studies on forecasting accuracy are reviewed in Chapter 4.

2.7.2 Lack of theoretical background

In the preface of the book, Building as an Economic Process: An Introduction to Building Economics, Bon (1989) raises the question “why then is building economics developing at such a sluggish pace, and what are the reasons for its lack of professional recognition?”

He opines that it is because the field lacks a

theoretical foundation. Although effort has been expended in the development of advanced forecasting systems, theoretical development has not been forthcoming (Skitmore 1988).

With the assistance of information technology today, forecasting

22 researchers are now faced with an unmanageable amount of data but no theoretical basis for analysis (Skitmore and Patchell 1990).

2.7.3 Lack of performance evaluation

Forecasters generally assume that a forecast is correct, and that the error is in the difference between the forecast and tender price (Morrison 1983; Fortune and Lees 1996). In his study of cost planning and the forecasting techniques that are used in practice, Morrison (1983) finds that no forecaster in practice monitors their own forecasting performance against received tenders.

Forecasters are too

optimistic about their own forecasting performance, and pay very little explicit attention to the confidence limits that are attached to the forecasted range of prices within which the eventual outcome is expected to fall (Bowen and Edwards 1985a). Practitioners often neglect the importance of producing accurate forecasts. An opinion survey was conducted amongst architects and quantity surveyors, and found that a significant number of respondents expected a great degree of accuracy from price forecasts that are produced by quantity surveyors (Bowen and Edwards 1985a).

Empirical studies also show that clients are generally dissatisfied with the

quality of strategic cost advice that is provided by their professional advisors (Ellis and Turner 1986; Proctor et al. 1993).

These studies reveal that there is room for

forecasters to improve, that forecasters have traditionally had no awareness of their own performance and that forecasters should monitor and find ways to improve the quality of cost advice to satisfy the needs of their clients.

23 2.7.4 Inexplicability, unrelatedness and determinism

The use of forecasting methods in practice is subjective, although research studies on the formalisation of the model selection process have been carried out (Fortune and Hinks 1997, 1998).

Forecasters sometimes use a mixture of different

techniques to manage the forecasting task without a clear rationale.

For example, a

forecaster may use the floor area method to forecast a part of the work for which they have little data to refer to, and uses the approximate quantities method for the rest of the work for which more detailed data exists. These conventional methods were mainly developed by rule of thumb without any attention being paid to the theory behind them, and their use in combination is theoretically baseless. The reliability of the forecasts that are produced by conventional methods depends on the reliability of each quantity value, the reliability of each unit price rate value, the number of items and the collinearity of the quantity and rate values (Skitmore and Patchell 1990).

It is doubtful, however, that unit price rates that are

derived deterministically from a number of historical projects can produce accurate forecasts. Moreover, to use process-biased data (e.g. historical price rates, which tend to reflect the utilisation of available resources) for design-biased forecasting models would be to imply either that production methods do not differ, or that differing production methods do not significantly affect cost, both of which are patently untrue (Bowen and Edwards 1985a). Furthermore, the supposition that forecasts will be accurate only if the quantities and unit price rates can be determined ignores the variability of unit price rates (Flanagan and Norman 1983).

There is no

explicit qualification with regard to the inherent variability and uncertainty of the conventional models.

24 To conclude, conventional forecasting methods and approaches suffer from their inexplicability, unrelatedness and determinism (Brandon 1982; Wilson 1982; Taylor 1984; Bowen and Edwards 1985a, 1985b; Bowen et al. 1987). In short, these approaches fail to explain the systems they purport to represent, fail to show the relationship and interdependency between the variables and fail to consider the variability and uncertainty of forecasting.

2.8

Summary

Building price forecasting is a sub-process of cost planning.

It helps

decision makers to be aware of the probable financial commitments before extensive design work is undertaken.

After all, a decision to build can be put forth, or

alternative plans compared, only if a reasonable forecast can be made. Although the forecasts that are made at different stages have similar functions, their levels of influence are different, because a design decision that is made in the early stages is more cost sensitive than the same decision made later. Thus, early stage forecasts play an influential role in the final value of buildings. At the early design stage, the type of information that is available for the use of forecasting is usually very rough, and practising forecasters use simple single unit methods, such as the floor area method, to accomplish the forecasting task. Practitioners generally believe that accuracy is proportional to the level of detail of the forecast. This perception is reflected in their choice of forecasting model, and that they consider the amount of data available to be the most significant criterion.

25 Paradoxically, the simple floor area method is found to be more accurate than the detailed approximate quantities method. A few empirical surveys on forecasting practices have been undertaken in different countries.

They all show that conventional forecasting methods, such as

the floor area and approximate quantities methods, still dominate, despite the fact that plenty of new alternative models have been developed. Several problems of existing forecasting practices are identified. They include the misconception of the relationship between level of detail and forecasting accuracy, the lack of theoretical background, the lack of performance evaluation, and the inexplicability, unrelatedness and determinism that are rooted in the forecasting approach. Therefore, the direction of development for new models should focus on the features of logical transparency (i.e. be theoretically supported), interdependence (i.e., show the relationship between variables) and stochastic variability (i.e., allow the output to be expressed in probability terms).

The performance of new models should also be

measured empirically to demonstrate their forecasting ability.

26

Chapter 3

Development of Forecasting Models

If the moon's face is red, of water she speaks. Saying of the Zuni Indians of the Southwest

3.1

Introduction

The first recorded forecasting method was the cube method, which was invented about 200 years ago (Skitmore et al. 1990 p. xix). The more widely used floor area method was developed around 1920 (Skitmore et al. 1990 p. xix). Starting from the mid-1950s, more and more research has focused on the development of alternative forecasting cost models.

One of the pioneers, James,

developed the storey enclosure method in 1954 as an alternative method to the floor and cube methods for early stage forecasting.

As a method developed 50 years ago,

it possesses the inherent problems that are explained in Chapter 2.

However, James

identifies some possible variables other than total floor area and building volume that might influence building cost.

These variables attempt to explain the variability of

building shapes, the vertical positioning of the floor areas, storey heights and the presence of basements in the design of a building.

The author also demonstrates

27 (although only through a very crude comparison) that the accuracy level of his proposed storey enclosure method is greater than the floor area and cube methods. The storey enclosure model is considered to be the most sophisticated model of all of the single price-rate models (as elaborated in Section 3.8) that are used for forecasting in the early design stage, but despite the empirical evidence for the performance of the storey enclosure model, it has not been widely used by practising forecasters. The conventional methods, such as the approximate quantities and elemental cost methods, are the cost models that express building costs as a function of quantities and unit rates. Extensive studies on the subject of cost modelling were conducted in the mid-1970s. modelling.

Researchers started to apply statistical techniques to

A wider variety of cost modelling techniques in the categories of

simulation, generation and optimisation have been developed in the past 30 years. However, a lot of research on model development focuses on the way in which new alternative models are different from other models, and stresses their uniqueness. There is a lack of clear demonstration of the applicability of these models, which is considered to be the biggest obstacle to their practical application.

3.2

Definition of Cost Model

The English word, ‘model’, comes from the Latin word, ‘exemplum’, which means the manner, fashion, or example to be followed, a precedent and an example of what may happen.

A model is a representation of a structure, or an “organised

body of mutually connected and dependent parts” (Holes 1987). The etymology

28 suggests that a model only represents the general picture of what may occur. clear enough from its definition that uncertainties do exist within it.

It is

A model that is

developed from historical information or experience can represent reality, but it does not thereby become reality (Beeston 1974; Bowen 1984). Seeley (1996 p. 202-203) defines the word ‘model’ as “a procedure developed to reflect, by means of derived processes, adequately acceptable output for an established series of input data”. Therefore, a building cost forecasting model is a system that produces forecasted prices (output) from historical data (input). Beeston (1987 p. 46) considers that all forecasting methods can be described as cost models, which are classified as in-place quantity-based, descriptive or realistic, and their task may be to forecast the cost of a whole design or of an element of it, or to calculate the cost effect of a design change. Cost models are technical models that are used to assist in the evaluation of the financial implication of building design decisions (Maver 1979).

Skitmore and

Marston (1999 pp.2-4) differentiate technical models from isomorphic models.

The

former type features an important step in the abstraction of the most significant influencing elements at the beginning of the model development process, whereas the latter type involves the mapping of every influencing element within the results, which is expensive and is not cost effective.

As buildings are composed of

thousands of items, involve hundreds of companies in their production and take years to complete, the number of elements that influence building costs is huge. Building a cost model requires the selection of a sub-set of major influencing elements, which is an exercise in cost-benefit trade-off. Even if the resources were available, it is impossible to construct an isomorphic model for building costs due to individual variation between projects (Kenley and Wilson 1986).

29 The purposes of cost models are to forecast the total cost that the client will have to pay for the building at any stage in the design evaluation, to compare a range of actual design alternatives at any stage in the design evolution, to compare a range of possible design alternatives at any stage in the design evolution, and to forecast the economic effects upon society of changes in design codes and regulations (Skitmore and Marston 1999 p. 9).

3.3

Brandon’s “Paradigm Shift”

Although many experimental cost models were generated in the 1970s (for example: Buchanan (1972), Regdon (1972), Kouskoulas and Koehn (1974), Braby (1975), McCaffer (1975), Wilson and Templeman (1976), Flanagan and Norman (1978)) few are able to challenge the existing forecasting approaches. Nobody had probed the possibility that the existing forecasting models might actually be wrong until Brandon (1982) addressed the need for a paradigm shift in building cost research. He doubts the reliability of existing forecasting models, and urged the development of a cost model that is founded on solid theory.

With the assistance of

computer technology, which makes complicated calculation much easier than before, simulation is suggested as the direction for further research investigation, because it gives a better understanding of why certain costs arise. This new approach sets out a more explicit and sound criteria for model development. Brandon’s view is inspiring and visionary. In response to Brandon’s suggestion, Bowen and Edwards (1985a) review the existing paradigm, and address who it is that needs a new paradigm and why it must it be a new paradigm.

The

30 authors believed that the new approach to cost modelling and price forecasting after the shift would entail the recognition both of the continuing need for historically derived data in the exploration of cost trends and relationships, and the recognition of the importance of the building process by the incorporation of significant aspects of resource utilisation into the estimation methods. They also believed that the new approach would insist on inferential statements backed by statistically reliable data, that the approach would be stochastic in creatively dealing with future uncertainty through the use of probabilistic techniques, and that it would simulate reality and be capable of demonstrating the strength and associative characteristics of the relationships that exist between the factors involved. Forecasters would then profit from the knowledge base that would be gained through their expert understanding of the field, and be capable of using this systematically to provide logically coherent solutions to cost modelling and price forecasting problems. Beeston (1987) does not rule out the use of descriptive methods (that is, those that contain variables that describe the design and its environment by measurements of such factors as size, shape, type of construction and location), despite their inherent deficiencies. He considers that they would be suitable both for forecasting at the early planning stage and for forecasting the maintenance costs of estates. Both Beeston (1987 p. 18) and Bowen et al. (1987) suggest that the development of modelling systems for the purpose of design economics should attempt to represent as closely as possible the way in which costs are actually incurred. As is highlighted in Chapter 2, the conventional approach is ill equipped because of its inexplicability, unrelatedness and determinism, and thus the

31 development of new cost models should shift towards logical transparency, interdependence and stochastic variability (Bowen et al. 1987).

3.3.1 Black box versus realistic models

There are two distinct ways of representing costs – the realistic approach and the “black box” approach (Beeston 1983).

The realistic approach attempts to

represent the ways in which costs arise, whereas the black box approach does not. The former approach identifies all of the direct causes of cost, and measures them directly. This involves the detailed comparison of methods and prototype structures, and thus this approach has the best potential accuracy.

However, the data that is

required for the realistic approach is extremely difficult, if not impossible, for forecasters that represent clients to acquire (Hardcastle 1984).

Although it is

possible, even at the early design stage when information is scant, to use the realistic approach through the simulation of production operations (such as CPS (Bennett and Ormerod 1984) and CASPAR (Thompson and Willmer 1985)), forecasters still prefer to use black box models.

This is partly because the way that cost is incurred

is not a perfect function of the building design, and thus forecasters have to make additional assumptions to convert design information into production information if the realistic approach is used.

These additional assumptions will inevitably create

extra complications. Thus, models for very early stage forecasting are unavoidably inexplicable, but their performance can still be judged, and indeed, the justification for the black box approach rests on its actual performance.

It is measured by comparing the

32 output of the model that is based on the black box approach in response to certain stimuli with the output of the prototype under the same stimuli. Both the black box approach and the realistic approach have their raison d'être.

Choosing which of them to use depends on the purpose of the model

(Skitmore and Patchell 1990).

The realistic approach needs structural validation to

test its soundness, but it has the benefit of being explanatory. However, the black box approach uses model performance in model testing.

3.3.2 Deterministic versus stochastic models

A model without a formal measure of uncertainty is, by definition, a deterministic model.

Conventional models generally only give a single-figure

estimate as their output without recognising the reality of the inherent variability and uncertainty, and are thus deterministic models.

The variability and uncertainty are

not formally assessed, but are more often dealt with intuitively by forecasters.

By

contrast, if the duration and cost of activities or groups of activities are recognised as being uncertain, then they will be modelled as stochastic variables using a probabilistic approach (Bowen and Edward 1985a).

Formal measures of

uncertainty may be articulated as the associated coefficient of variation (as in regression) or the cumulative frequency distribution (as in the Monte Carlo Simulation) (Newton 1990).

The application of probabilistic approaches to the

problems of building economics has been demonstrated through various studies such as that of Spooner (1974), Mathur (1982), Wilson (1982) and Diekmann (1983). Despite the different considerations of uncertainty that are discussed, the earlier studies do not challenge the validity of the hidden assumptions, for example, that the

33 events that are simulated are independent events, and that the use of normal and rectangular frequency distributions is appropriate in the application of the Monte Carlo Simulation (Raftery 1984b). More recent works by Chau (1995a; 1995b) and Wall (1997) validate these assumptions in their application of the Monte Carlo Simulation. The test of underlying assumptions in the modelling process is an indication of the sophistication of the simulation techniques.

3.3.3 Deductive versus inductive models

Approaches to modelling cost in construction can also be classified as deductive or inductive (Wilson 1982; Raftery 1984b).

Models that are developed

from the former approach involve the analysis of cost data over design variables (whichever are being considered) with the objective of deriving formal mathematical expressions that succinctly relate a wide range of design-valuable values to price. This approach draws heavily upon the techniques of statistics, and of correlation and least-squares regression in particular.

Deductive models arise largely from the

follow equation: P = f1(V1, V2, V3, … Vn),

(3.1)

where P is the forecasted price, which is a function, f1, of the design variables, V1, V2, V3, … Vn.

The crucial constraints to the deductive approach include the not

inconsiderable limitations of the statistical techniques that are available for modelling, and the total dependence upon the suitability of the cost data used. Inductive models do not involve the analysis of a set of given cost data, but rather the synthesis of the costs of individual discrete design solutions from the

34 constituent components of the design.

Inductive methods require the summation of

cost over some suitably defined set of subsystems that are appropriate to the building design.

The most detailed level of subsystem definition would be the individual

resources themselves, but several other levels of aggregation are in common use, for example, operational activities and constructional elements. Inductive models arise largely from the equation: P ' = ∑ f j (C j ) , n

(3.2)

j =1

where P’ is the forecasted price, which is the summation of each cost function fj of the resources committed, Cj, for j equal to 1 to n, where n is the total number of subsystems that represent the prices. In deductive models, the techniques of statistical inference are used to deduce the relationships between building features or design models, whereas in inductive models the resource implications of design decisions are calculated and aggregated to measure economic performance. Thus, the former models are more relevant to early design stages designs and the latter models to later design stages.

3.4

Major Directions of Model Development

Newton (1990) classifies nine descriptive primitives for cost modelling studies: data, units, usage, approach, application, model, technique, assumptions and uncertainty.

Table 3-1 briefly explains the meaning of each primitive and its

corresponding classification criteria. The descriptive primitives for this research are also exhibited in the table.

Table 3-2 shows a summary of the reviewed research

35 studies on modelling techniques and applications according to Newton’s classification.

The number of studies on modelling techniques for early design

stages (feasibility and sketch design) exceeds the number for later design stages. This circumstance seems reasonable because, as is discussed in Chapter 2, design decisions are more cost sensitive at the early stage than they are at the later stage, and the potential benefit of developing a good model for the early design stage is therefore greater.

Thus, the development of designers’ forecast models focuses on

their application in the early stages of design. Skitmore and Patchell (1990) review all of the modelling techniques that have been developed in the building and process plant industries.

The authors

differentiate the various techniques one by one according to their characteristics or primitives, which include the mathematical model, relevant contract type, general accuracy, whether the technique itself is deterministic or probabilistic, the number of variables, type of variables, the characteristic of quantities (derivation, deterministic or probabilistic (quantity model), derivation database) and the characteristic of rates (weighting, current, quantity trend and deterministic or probabilistic (rate model)). A summary of the characteristics for the various techniques identified is shown in Table 3-3. To summarise the development of cost models, Skitmore and Patchell conclude that research has developed with differing emphasis on all of the four factors that influence estimation reliability, although much system development has been centred at the item level involving the search for the best set of predictors of tender price (regression analysis), the homogenisation of database contracts by weighting or proximity measures (BCIS and Lu Qian system), the generation of items and quantities from contract characteristics (Holes, Calculix and expert

36 systems such as ELSIE) and the quantification of overall estimate reliability from assumed item reliability (probabilistic model (PERT-COST) and simulation). Since the 1990s, there is a new class of tools, neural networks, which offers an alternative approach to cost forecasting (Li 1995; Adeli and Wu 1998; Bode 1998 Emsley et al. 2002; Kim et al. 2004). Neural network models are black-box in nature and usually involve complicated algorithm.

The superiority of neural

network models over other mathematical models lies on their ability to learn and adapt their own representation during the model training process. Although many researches have proved their outstanding performance, especially in terms of error reduction, it is doubtful that practising forecasters understand these models, or even have heard their names. As suggested in Fortune and Lee’s report, the relative performance of new and traditional cost models in strategic advice for clients (as addressed in Section 2.6), the possible fact that many practising forecasters are not well-equipped enough to understand and use these models could be a big hurdle for their real-life application.

37 Table 3-1: Classification of this research according to Newton’s descriptive primitives Descriptive primitives

Explanation

Suggested Classification

Primitives of this research

Data

Whether data is specifically relates to a type of design proposal or not

Specific or non-specific

Specific

Units

Whether it is a unit in abstract form, a unit of finished works or unit of as-built works

Abstracted, finished or as-built

Finished works (floor area, external wall (building perimeter and storey height) and roof area of final product)

Usage

Whether the purpose is for designers’ price estimation or builders’ bidding

Cost or price

Price

Approach

Whether it is implemented for estimation of the whole building cost or a particular component or part

Marco or micro

Marco approach

Application

When is the model applied in the design process

Feasibility, sketch, detailed, tender, throughout, non-construction

Feasibility (or very early sketch)

Model

Common classification of techniques

Simulation, generation or optimisation

Simulation

Technique (See also Table 3.2)

Type of technique used

Dynamic programming, expert system, functional dependency, linear programming, manual, monte carlos simulation, networks, parametric modelling, probability analysis, regression analysis

Regression analysis

Assumptions

Whether assumptions can be accessed or not

Explicit or implicit

Explicit

Uncertainty

Whether there is a formal measure of uncertainty not

Stochastic or deterministic

Stochastic

38 Table 3-2: Previous studies on modelling techniques and applications according to Newton’s classification Techniques

Application

Dynamic programming

Feasibility Sketch Detailed Tender Throughout Non-construction

Expert system

Functional dependency

Feasibility Sketch Detailed Tender Throughout Non-construction

Previous works

Atkin (1987)

Brandon (1988), Lu (1988)

Feasibility

Wilderness (1964), Thomsen (1965), Bathurst and Butler (1977), Flanagan and Norman (1978) , Pegg (1984), Meijer (1987), Tan (1999)

Sketch

DOE (1971), Townsend (1978), Moore and Brandon (1979), Powell and Chisnall (1981), Scholfield et al. (1982), Langston (1983), Newton (1983), Weight (1987), Boussabaine and Elhag (1999)

Detailed Tender Throughout

Holes and Thomas (1982), Sidwell and Wottoon (1984), Berny and Howes (1987), Holes (1987), Woodhead et al. (1987)

Non-construction Linear programming

Manual

Feasibility Sketch Detailed Tender Throughout Non-construction

Russell and Choudhary (1980), Cusack (1985)

Feasibility Sketch Detailed Tender Throughout Non-construction

James (1954) Dunican (1960), RICS (1964), Barrett (1970) Gray (1982), PSA (1987), Munns and Al-Haimus (2000) Kiiras (1987), Dreger (1988)

39

Table 3-2: Previous studies on modelling techniques and application according to Newton’s classification (Cont’d)

Techniques

Application

Monte carlos simulation

Feasibility Sketch Detailed Tender Throughout Non-construction

Networks

Parametric modelling

Feasibility Sketch Detailed Tender Throughout Non-construction

Regression analysis

Mathur (1982), Pitt (1982), Wilson (1982), Bennett and Ormerod (1984) Walker (1988) Gehring and Narula (1986)

Bowen et al. (1987), Brown (1987)

Feasibility

Tregenza (1972), Selinger (1988)

Sketch

Nadel (1967), Meyrat (1969), Southwell (1971), Tregenza (1972),Brandon (1978) , Selinger (1988), Warszawski (2003)

Detailed Tender Throughout Non-construction Probability analysis

Previous works

Park (1988)

Feasibility Sketch Detailed Tender Throughout Non-construction

Zahry (1982), Cusack (1987), Pegg (1987)

Feasibility

Buchanan (1972), Regdon (1972), Kouskoulas and Koehn (1974), Braby (1975), McCaffer (1975), Wilson and Templeman (1976), Bathurst and Butler (1977), McCaffer et al (1984), Karshenas (1984) Gould (1970), Buchanan (1972), Sierra (1982), Yokoyama and Tomiya (1988), Skitmore and Patchell (1990)

Sketch Detailed Tender Throughout Non-construction

Fine (1980) Skitmore (1982)

Khosrowshahi (1988)

40

Table 3-3: Summary of estimating techniques (Extracted from Skitmore & Patchell 1990) Estimate Technique

Model

Relevant Contract Type

General Accuracy (c.v.)

Deterministic / Probabilistic

Number of variables

Type of variables

Unit

P = qr

All

25-30%

Deterministic

Single

Any comparable unit, e.g. tonne steelwork, metre pipeline

Graphical

P = fr(q)

Process Plant

15-30%

Deterministic

Few

Ditto

Functional Unit

P = qr

Buildings

25-30%

Deterministic

Single

Ditto, e.g. number of beds, number of pupils

Parametric

P = fr(q1, q2,, q3,,…)

Process Plant

15-30%

Deterministic

Few

Process parameters, e.g. capacity pressure, temperature, material, cost index

Process Plant

15-30%

Deterministic

Single

Size of plant or equipment, e.g. capacity

Process Plant

10-15%

Deterministic

Few

Any

Exponent

P2 = P1

q2 r q1

m

N

i =1

i =1

P = ∑ facti ∑ qi ri

a) m=1 (Lang method)

Factor

b) m>1, fact1 ≠ fact2,etc. (Hand method) c) facti = U(αi, βi) (Chiltern Method) N

Comparative

P2 = P1 + ∑ ( p 2i − p1i ) i =1

All

25-30%

Deterministic

Few

Depends on differences

Interpolation

P = qr

Buildings

25-30%

Deterministic

Single

Gross floor area

Conference

P = f(P1,P2,…)

Process Plant

?

Deterministic

any

Any

Floor Area

P = qr

Buildings

20-30%

Deterministic

Single

Gross floor area

Cube

P = qr

Buildings

20-45% (based on 86 cases)

Deterministic

Single

Volume

Storey Enclosure

P = qr

Buildings

15-30% (based on 86 cases)

Deterministic

Single

Floor area, external wall area, basement wall area and roof area

10-20% Construction (5-8% for builders)

Deterministic

Many (number of variables varies)

Quantities required under SMM

BQ Pricing: N

(i) Conventional

P = ∑ q i ri

(ii) B Fine

P = ∑ q i ri

Significant Item Estimating

P = ∑ q i ri

i =1

N

i =1

N

i =1

Buildings

15-20%

Deterministic

Many (number of variables varies)

Ditto

PSA Buildings

10-20%

Deterministic

Medium

Quantities required under SMM

Construction

15-25%

Deterministic

Medium to many

Combining quantities and items required under SMM

Approximate Quantities: (i) Conventional

N

P = ∑ q i ri i =1

41

Table 3-3 (Cont’d): Summary of estimating techniques (Extracted from Skitmore & Patchell 1990) Estimate Technique

Model

Relevant Contract Type

General Accuracy (c.v.)

Deterministic / Probabilistic

Number of variables

Type of variables

Buildings

15-25%

Deterministic

Few to medium

Ditto

Buildings

15-25%

Deterministic

Few to medium

Ditto

Buildings

25% (based on 17 cases)

Deterministic / Probabilistic

Few to medium

Ditto

Buildings

50% (based on 17 cases)

Deterministic / Probabilistic

Few to medium

Ditto

Buildings

30% (based on 17 cases)

Deterministic / Probabilistic

Few to medium

Ditto

Buildings

20-25%

Deterministic

Medium

BCIS/Cl afb entities (UK), individual company manual (HK)

Buildings

20-25%

Deterministic

Medium

Similar

Deterministic

Medium

DBE

Approximate Quantities: N

(ii) Gleeda

P = ∑ q i ri

(iii) Gilmore

P = ∑ q i ri

(iv) Ross 1

P = ∑ q i ri

i =1 N

i =1 N

i =1

N

(v) Ross 2

p + ∑ q i ri i =1

N

p + ∑ q i ri i =1

(vi) Ross 3 (Pi = a + bqi + e, e = N(0,σ2) N

Elemental

P = ∑ q i ri

CPU

P = ∑ q i ri

Elsie

P 2 = ∑ q i ri

Norms (schedule)

P 2 = ∑ q i ri

i =1 N

i =1

N

i =1

Offices

N

i =1

Buildings

Many (number SMM type, e.g. PSA schedule of variables varies)

10-20%

Deterministic

15-25%

Deterministic / Probabilistic

Few

Usually contract characteristics, e.g. floor area, no. of storey

Buildings

?

Deterministic

Few

Usually contract characteristics, e.g. floor area, no. of storey

All

5-8%

Deterministic

Many (number of variables varies)

Resources, e.g. man hours, materials, plant

All

N/A

Probabilistic

Number of variables varies

Usually time resources, e.g. man hours

Buildings

6.50%

Probabilistic

Usually few

Resources, e.g. man hours, materials, plant

Construction

N/A

Probabilistic

Usually few

Any

Buildings

N/A

Deterministic

Any

Any

N

Regression

P = a + ∑ qi bi + e i =1

e = N(0,σ ) 2

N

Lu Qian

P = ∑ q i ri

Resource (Scheduling Activity, Operational)

P = ∑ q i ri

i =1

N

i =1

All

N

P = ∑ pi

PERT-COST

CPS

i =1

where pi = N(qiri,σ i2) N

N

i =1

i =1

P = ∑ t i ri +∑ ni r

where ti = F(µi,σ i2) N

Risk Estimating

P = ∑ q i ri

Homogenised Estimating (BCIS on line) (BICPE etc.)

P = ∑ q i ri

i =1

N

i =1

42

3.5

Limitations of Cost Models

3.5.1 Model assumptions

Models are only ever a representation of reality, and forecasting models are always non-isomorphic models that are simplifications of an actual system.

Every

model has a set of inherent assumptions about problem boundaries, about what is or is not significant and about how the user might best conceptualise a problem (Newton 1990).

Regardless of whether the assumptions of a model are explicit or

implicit, it is always possible to devise tests that show models to be deficient in some way or other.

This implies that models should be used with care, and should not be

pushed beyond the limits of their validity (Skitmore and Marston 1999). Designers’ forecast models are structured to represent completed buildings or their components.

However, the origin of the price of a building or a component

should be based on the construction process and the resources that are employed. To modify this kind of price data to suit a designers’ forecast model, an implicit assumption must be made that the actual buildings in the data pool are so similar that their production methods do not differ, or that differing production methods do not significantly affect cost. Edwards 1985a).

Obviously, these assumptions are untrue (Bowen and

43 3.5.2 Reliance on historical data for prediction

All forecasting models demand historical data as inputs for prediction. The Wilderness Group (1964, p. 254-255) point out two limitations to using historical data: “it is almost impossible to find the actual buildings which are sufficiently similar for their differences in cost to be related to particular factors and . . .

It was found impracticable merely with

historical data to isolate with any certainty and the effect upon buildings cost of certain design factors individual to those buildings of which the costs were examined”. Moreover, Bowen and Edwards (1985a) criticise the use of mathematics in historical data for modelling, because it fails to reflect the change in technology over time.

It is also debateable whether backward-looking concepts that are based on

historical price data should be used for forecasting.

Bon (1989 pp.61-62) explains

the problem: “Ex ante or forward-looking concepts predominate in economics, while ex post or backward-looking concepts are more prevalent in accounting . . .

Cost is often treated as the pre-eminent ex post

concept. However, no matter how accurate and exhaustive our historical records, backward-looking concepts of cost are inadequate for two reasons.

First, at the moment of decision one is perforce

considering future costs. Second, the valuation of costs is impossible without explicit account of opportunity cost - the satisfaction forgone .

44 ..

The difficulties with cost forecasting based on historical data are

exacerbated in the case of long-lived capital goods, such as buildings.”

3.5.3 Insufficiency of information and preparation time

The preparation of a forecast relies heavily on information input from external organisations such as the client’s brief and the designers’ layout plans, and the information that is available within the organisation, such as historical price data. It is quite common that the design information that is given at the early design stage is ambiguous and contradictory.

The very limited information and allowable time

for producing forecasts may force forecasters to make assumptions according to their own subjective judgements.

For instance, forecasters will usually rely on price data

that is derived from a sample of buildings that do not perfectly match the characteristics of the proposed building or works if appropriate historical price data are unavailable (Flanagan and Norman 1983).

3.5.4 Reliance on expert judgment

Forecasting is partly an art and partly a science. The science part involves the use of modelling techniques and mathematics. exercising of professional judgement.

The art part comes with the

Tversky and Kahneman (1974) suggest that

in making judgements in uncertain conditions, people in general do not follow the calculus of chance or the statistical theory of prediction, but instead rely on a number of simplifying strategies or heuristics that direct their judgements.

Such heuristics

(rules of thumb) can sometimes lead to reasonable judgements and sometimes to

45 severe and systemic errors.

The exercise of judgement is therefore the cost

forecaster, rather than the forecasting model itself (Skitmore et al. 1990).

Raftery

(1995) and Birnie (1995) point out that humans make mistakes when making judgements, and state that more work is needed to understand the behavioural processes that are involved.

Empirical evidence shows that judgement has a

significant role within the formulation and transmission of early cost advice to clients (Fortune and Lees 1996).

As the exercise of judgement is a human cognitive

process, it can be subject to error, bias and heuristics.

3.6

Review of Cost Models in Use

Fortune and Lees (1996) study the incidence of the use of certain techniques, and the extent to which lack of understanding is a factor that influences the incidence of the use of certain techniques in their research on the relative performance of new and traditional cost models. categories:

traditional

The studied techniques are classified into seven

(conventional)

techniques,

statistical

techniques,

knowledge-based techniques, life cycle costing, resource- and process-based techniques, risk analysis and value-related techniques. The authors reveal that the use of conventional techniques outweighs the use of all of the other techniques, and that these other techniques were not well understood by respondents. A more recent study by Fortune and Hinks on models that are used by UK quantity surveying practices also reinforces the notion that practitioners have not yet answered the call of academia to adopt the new computer-based stochastic models that are available in the assessment of project risk and uncertainty (Fortune and Hinks

46 1998). The study also indicates that, in the period 1993 to 1997, conventional models that provide single-figure deterministic price forecasts had only a slightly reduced incidence of use, whereas newer computer-based models had only a slightly increased incidence of use, which suggests that the paradigm shift in the formulation of reliable early cost advice has not yet been achieved in practice (Fortune and Hinks 1998). A similar survey that studies the forecasting models that are used in South Africa also indicates that conventional models remain firmly in the mainstream in application (Bowen and Edwards 1998). The demand for a move to a more scientific basis for forecasting appears to come mainly from academia, rather than from practice (Bowen and Edwards 1985, Raftery 1987 p. 53).

For the sake of producing publications, academics (modellers)

have focused on the demonstration of how a newly developed model is different from other models. However, the conservative attitude and the ignorance of practitioners (forecasters) towards change and new knowledge create another hurdle (Brandon 1982).

To initiate a paradigm shift, academia will have to convince

practitioners by establishing and advertising the benefits that forecasters will enjoy from these alternative approaches. This could include educating forecasters and managers about how these new approaches can be applied and about how much better they are than the conventional approaches, and the heightening of their awareness of the inadequacy of the conventional approaches (Fortune and Lees 1996).

A model that is new and mathematically sound for forecasting may not

necessarily be appropriate for implementation.

Thorough studies on the benefits of

and strategies for putting a new model into practice are crucial to the acceptance of new models or forecasting approaches.

47

3.7

Significant Items Estimation

Surveys that have been conducted in the UK and South Africa have reinforced the fact that newer models are not popular in practice. More than 20 years after the proposal of a paradigm shift, the idea remains a pipe dream, and the popularity of conventional models remains unchanged. Perhaps the only new model that has been put forth in practice (although it is still not well recognised), is the significant items estimating model that was developed by the Property Services Agency (PSA) in the UK. Barnes (1971) investigates the implication of the proposition that different values of rates have different degrees of reliability, and, specifically, that the reliability of a product of quantity and rates is an increasing function of its value. By assuming a constant coefficient of variation for each item, he shows that a selective reduction in the number of low-valued items has a trivial effect on the overall estimate reliability.

The empirical evidence that backs up favour Barnes’

assumption is quite strong, and therefore its essence has been used to develop the significant items method. According to the outline that was published by the Department of Environment of the UK government in 1987, the statement “some 80% of the value of measured work on building projects is contained within 20% of the items in the bills of quantities” was tested by analysing the prices in 40 bills of quantities. It was found that 78% of the value was contained within the top 20% of items, which broadly confirmed the 80/20 relationship.

By restricting measurement and pricing to

the most significant items (the top 20% of items), and by using data with a reasonable

48 sample size, it should be possible to minimise the unreliability of the rates from the bills of quantities (PSA 1987).

The major benefits of the significant items

estimating model include the shorter forecasting time that is required due to its concentration on fewer items, the improvement in accuracy, the improvement in reliability (because its outputs are derived from data from a large sample) and the flexibility it affords in allowing a move away from average rates towards varying percentage additions for each trade (Allman 1988). Munns and Al-Haimus (2000), in their work to refine the significant items model, reveal that there is a lack of formal rules for the selection of work packages to be used within the original significant items model, and therefore a potential to overestimate the cost of projects. By using their new methodology for selecting work packages and the refined technique that is known as the cost significant global cost model, they demonstrate that there is a significant improvement in performance over the original significant items model. Although the use of the significant items model has shown significant improvements in performance, the actual contribution to the overall value of a building is limited, because it is a forecasting model for the later design stage.

At

this stage, the design information is quite sophisticated, and there is little room for cost saving or value enhancement.

However, studies on the significant items model

give an empirically supported demonstration of how conventional models, such as the approximate quantities method, can be further advanced by the use of statistical techniques.

49

3.8

Discussions on Research Opportunities

Having reviewed the development of, and limitations to, forecasting models, it seems reasonable to conclude that there is no universally agreed approach to modelling building costs.

There is no general agreement on the most useful set of

elements and functions for each of the model types, nor on how the models themselves and their values should be derived, nor is there any agreement on the nature of the functions that connect the cost with the various elements (Skitmore and Marston 1999, p.19).

In contrast, it appears that practising forecasters need

commonly agreed models.

Since the existing models in use have been developed to

be a convention, forecasters can strictly follow them to prepare estimates (even though many of them are not developed strictly) without the need to worry that their choice of models will be challenged by other practitioners. However, the fact that every forecaster is using a model does not automatically validate that model.

The

conventional models deserve rigorous tests to justify their existence. This research focuses on the study of conventional models used in the early design stage.

As explained in Section 2.5, earlier forecasts’ contribution is higher

since earlier decisions mainly influenced by forecasts are more cost sensitive.

The

successful experience of applying the significant item model in practice provides insights into the potential of developing a new model that is applicable to the early design stage.

According to the RIBA outline plan of works (RIBA 1991), this early

design stage corresponds to the period between the beginning of the feasibility stage and the midway of the sketch design stage.

Before the beginning of this period,

referred to by the RIBA outline plan of works as the inception stage, there are no drawings available.

Forecasters have to make their best guess by discretion, or

50 sometimes known in forecasters’ slang as “guestimates”. After the end of this period, when there is more available information such as formal sketch layout plans (as compared with those sketches produced during the early design stage), a few sketch elevation plans, draft specifications, and perhaps the schedules of finishes, doors and window, forecasters can use more detailed methods (for example, the elemental cost estimating method) for forecasting. During the early design stage, information is often more brief and can only be extracted from a few sketches.

Because of this, all the conventional methods used

by forecasters follow a single price-rate system.

Differing from the elemental

method, which is applied at a later stage, and the significant item method or the approximate quantities method, which is applied at an even later stage, the lack of information inevitably imposes higher uncertainty on forecasts prepared by single price-rate methods.

However, the two early conventional methods, the cube and

floor area methods, appear unable to extract all the information available from the sketches. The subsequent storey enclosure method, as described in Section 3.9, shows sophistication in attempting to extract some further, arguably all the major, information from the sketches, i.e. the area of each floor and the envelop area of a building. Although also following a single price-rate system, the storey enclosure method takes into account a additional aspects of building design economics. To avoid falling into the trap of developing something new without any theoretical base, as criticised by Raftery (1984b) and Newton (1990), the storey enclosure model that was developed by James in 1954 is chosen for further development.

Although the

model shares many of the flaws of the single-unit deterministic models, it is considered to be the most sophisticated model, and is worth refining.

Skitmore and

Marston (1999, p.164) suggest that the model has considerable potential for further

51 development by statistical means.

Ashworth (1999, p.251) also suggests that in the

past, credibility was a factor to be taken into account, but that it might be more acceptable today to apply the storey enclosure model.

After all, what is needed is

an approach that will harness the strengths and minimise the weaknesses of all that has been developed to date (Bowen and Edwards 1985a). Patchell (1987) suggests four criteria to be observed for cost advice at the schematic or feasibility stage: cost accuracy from very preliminary information, a flexible and quick response to various options, economy of production in man and machine hours, and estimation and analysis on the same basis.

These criteria are

very practical, and are used as the requirements for the forecasting models that are developed in this research. The models that are developed in this research share similar justifications with the significant items model, but they are designed to be applied in the early design stage.

They should be understandable, easy to use, fairly accurate and

relatively reliable compared with the conventional forecasting models. The accomplishment of this research relies on the use of empirical data for both the model development and the assessment of model performance.

The

emphasis here is on the purely empirical nature of the model, which is thought to be the best way to avoid subjectivity, as the essence of a good empirical research is to minimise the role of the researcher in interpreting the results of the study (Skitmore 1988). Like much of the research on empirical models, the modelling of prices in this research follows mainstream research by using statistical techniques such as regression analysis.

Hypothetical models that contain various groups of variables

52 are derived by multiple regressions. The machinery of the approach is described in Chapter 5.

3.9

Storey Enclosure Method

Generally, all of the conventional methods for forecasting building prices at the early design stage are single-rate methods. used method is the floor area method.

Amongst them, the most commonly

The floor area method is simple in

application, easily understood and produces a forecast quickly.

However, it is also

considered to be too simple to take into account the different characteristics of buildings. James (1954) criticised the floor area method as well as the cube method. First, none of the two methods is satisfactory for universal application. Second, none of the two methods reflect the cost implications of building shape, building height and the number of storeys. Third, the two single price-rate methods have to account separately for basements. unit rates.

Four, the cube method is sensitive to changing

He proposed an alternative single-rate method, the storey enclosure

method, to overcome this limitation.

This method takes into account various

important aspects of design in building price forecasting, whilst leaving the type of structure and standard of finishes to be assessed in the price rate.

The factors to be

considered and the adjustments in the methods to reflect those factors are shown in Table 3-4. The method involves the multiplication of a weighting that is assigned to each of the adjustments in the table. The assigned weightings suggested by James and inclusions for each component are shown in Table 3-5.

53 The summation of the products of each measured value of the adjustment and its corresponding assigned weighting will form the storey enclosure area, which is the unit quantity of the storey enclosure method.

The product of an appropriate

single unit rate and the storey enclosure area will produce a forecasted price. Equations (3.3) and (3.4), which represent the forecasting method, are shown. P = S .R

(3.3)

n m m ⎞ ⎛ n P = ⎜⎜ ∑ (2 + 0.15i ) f i + ∑ pi s i + 2∑ f j′ + 2.5∑ p ′j s ′j + r ⎟⎟ ⋅ R , i =0 j =0 j =0 ⎠ ⎝ i =0

(3.4)

where P is the forecasted price, S is the storey enclosure area, R is the unit rate, fi is the floor area at i storeys above ground, pi is the perimeter of the external wall at i storeys above ground, si is the storey height at i storeys above ground, n is the total number of storeys above ground level, m is the total number of floors below ground level,

f’j is the floor area at

j floors below ground level, p’j is the

perimeter of the external wall at j storeys below ground level, s’j is the storey height at j storeys below ground level and r is the roof area. Table 3-4: Adjustment for the factors affecting the estimates in the storey enclosure method Factors affecting the estimates

Adjustment

Shape of building

By measuring the external wall area

Total floor area

By measuring the area of each floor

Vertical positioning of the floor area in a building

By using a greater multiplier to the floor area of a suspended floor positioned higher in a building

Storey heights of building

Proportion of floor and roof areas to the external wall

Overall Building heights

Ratio of roof area to external wall area

Extra cost of sinking usable floor area below ground level

By using increased multiplier for work below ground level

54

Table 3-5: Weightings and inclusions for individual components in the storey enclosure method Components

Weighting Inclusions Factors

Above Ground Components Ground Floor

2

Internal partitioning, finishings, fitments, doors, etc., on the floor; a non-suspended floor; finishings on one side of it; and normal foundations to all vertical structural members in a single storey building including those of its external walls

Upper Floors

2 + (0.15 x No. of Floor above Ground)

Internal partitioning, finishings, fitments, doors, etc., on the floor; a suspended load-carrying floor; finishings on both sides of it; vertical structural supports to it; and the further cost which arises, in the case of vertical structural floor supports to the lower floors of multi-storey buildings, from the need to support the additional transmitted load of all superimposed floors and the roof above them

Roof

1

A suspended roof and its ligher-than-floor) load; finishings on both sides of it (one weatherproof); horizontal structural supports to it (such as beams and trusses); and vertical structural supports to it (such as walls and columns)

External Walls

1

A wall with weatherproof qualities; finishings on both sides of it; windows and external doors, etc.; and normal architectural features

2

Displacement and disposal of earth; waterproof tanking and the loading skins to keep it in position; members of heavier construction than those required in equivalent positions above ground; finishings on one side of these members; internal partitioning finishes, fitments, door, etc.; and normal (in the basement sense) foundations to all vertical structural members in a single basement-storey building

Below Ground Components Floors below Ground

External Walls below Ground 2.5

In James’ study, the proposed storey enclosure method is applied to 86 tenders for different building types.

The storey enclosure method is compared with

two other early stage methods – the superficial floor area method and the cube method.

James’ results of the tests for the cube, floor area and storey enclosure

methods as shown in Table 3-6. The estimates that are produced by the storey

55 enclosure method are nearer to the tender figures, and that the range of price variation is reduced accordingly.

These results turn out to be statistically significant

(chi-square 5.99, 2df), with the storey enclosure and floor area methods being better than the cube method (Skitmore 1991).

There are some examples that show the use

of the storey enclosure method in the textbooks of cost planning (Cartlidge and Mehrtens 1982; Seeley 1996 pp.160-162; Ashworth 1999 pp.250-251; Ferry at el. 1999).

However, despite the many benefits that are demonstrated by James, the use

of the storey enclosure method remains very limited in practice.

Survey results on

the use of conventional cost forecasting models in the UK reveal that less than 2% of respondents made use of the storey enclosure method to provide strategic cost advice to clients (Fortune and Lees 1989).

However, another survey that was conducted

more recently in South Africa indicates that 27% of the respondents had used the storey enclosure method in practice (Bowen and Edwards 1998). Like any of the single-unit deterministic models, the story enclosure method suffers the deficiency of being inexplicable, unrelated and deterministic.

Its

unpopularity is probably due to the fact that the weightings are not derived empirically from proven data, but are based on experience (Wilderness Group 1964; Ashworth 1999 p.251), that there is insufficient historical data support (Wilderness Group 1964), that there are difficulties in obtaining an appropriate rate (Seeley 1996 pp.161-162), that the calculations that are involved are relatively complex (Seeley 1996 pp.161-162), and that the method provides no link with other forecasting methods, such as the elemental or approximate quantity method that would be used subsequently as the design develops.

56 Table 3-6: The results of tests for the cube, floor area and storey enclosure methods in James’ study (Source: James (1954))

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library

3.10 Regression Analysis

As there is no universal set of elements or variables for forecasting models, the purpose of reviewing previous empirical research on the influencing variables and forecasting targets is to consolidate a list of them for later use in model selection. A review of the surrounding literature shows that the technique of regression analysis has been widely used in the modelling of building prices.

The technique of

regression analysis is statistically able to demonstrate the strength of the relationship between two or more variables, for example, height and unit price. A variety of applications of regression analysis in the forecasting of building cost have been developed since the mid 1970s. Regression analysis has been used for modelling the prices of building at three levels: the overall price, the price of building elements and the price of components.

Regression analysis was first used to model building

prices for offices (Department of Environment 1971; Tregenza 1972; Flanagan and

57 Norman 1978; Karshenas 1984; Skitmore and Patchell 1990), schools (Moyles 1973), houses (Neale 1973; Braby 1975; Khosrowshahi and Kaka 1996), homes for old people (Baker 1974), lifts (Blackhall 1974), electrical services (Blackhall 1974), motorway drainage (Coates 1974) and a few other types of building (Kouskoulas and Koehn 1974).

It was then used to model the prices of reinforced concrete frames

(Buchanan 1972; Singh 1990) and building services (Gould, 1970).

It has also been

used to model the prices of components such as the beams of suspended-roof steel structures (Southwell, 1971).

This research concerns the modelling of overall

building prices.

3.11 Review of Model Predictors

Of the conventional methods of forecasting for the early design stage, the floor area method is the most widely used (Akintoye et al. 1992; Bowen and Edwards 1998; Fortune and Lees 1996). In this method, the floor area is presumed to be the only variable that is directly proportional to the building price.

Another frequently

addressed variable is the height of a building, and previous studies have expressed this in different measures, such as the overall building height (Kouskoulas and Koehn 1974; Karshenas 1984; Pegg 1987), number of storeys (Clark and Kingston 1930; Wilderness Group 1964; Buchanan 1969; Department of Environment 1971; Buchanan 1972; Tregenza 1972; Steyert 1972; Neale 1973; Braby 1975; Flanagan and Norman 1978; Singh 1990) and storey height (Wilderness Group 1964; Buchanan 1969; Buchanan 1972; Moyles 1973).

High-rise buildings are generally

more expensive to build than low-rise buildings, because the former require extra cost for the special arrangements for servicing the building, particularly the upper floors,

58 and because the lower part of high rises is designed to carry the weight of the upper floors and the extra wind load. The additional cost of working at a great height from the ground when erecting the building, and the increasing area that is occupied by the service core and circulation are also factors that increase the cost of high rises (Ferry et al. 1999 p. 293). The earliest work on the identification of the variable of building height was undertaken in the United States. Clark and Kingston (1930) analyse the relative costs of the major components of eight office buildings that range from 8 to 75 storeys on a hypothetical site.

In general, they find that the unit building cost tends

to rise moderately with the building height. In the UK, Stone (1963) reports a moderate rise in the unit building cost with the building height for blocks of flat and maisonettes in London and other parts of the UK.

The Wilderness Group (1964) produced a series of schedules that detail

the costs of a steel frame for a structure. The spans, storey heights and number of storeys vary and are manually priced. Their study is the first serious attempt within the UK building industry to isolate the cost effects of fundamental design variables, such as the number of storeys, storey height, the superimposed loading of suspended floors, column spacing in the direction of the slab span and column spacing across the slab span, taking into account the interacting cost effects of each variable upon the others. Tan (1999) cites a report that was prepared by Thomsen (1966) from the United States that states that, except for the lower floors, the unit office building cost is almost constant when the building height is varied.

However, as details of the

59 simple simulation study are not given in Thomsen’s report, Tan warns that the results should be interpreted with care. A study that was conducted by the Department of Environment (1971) of the UK government reports that the cost of local authority office blocks rises fairly constantly by two per cent per floor as the height increases above four storeys. Tregenza (1972) analyses the price per square meter of ten office buildings that range from one to eighteen storeys high. The prices were rebased to January 1971 prices.

A linear regression line was fitted and the result agrees with the

findings of earlier works that tall buildings tend to be more expensive than low buildings with the same internal floor area. However, the sample was too small, and the fitting was done by pure observation. Thus, it is doubtful whether it is appropriate to interpret the relationship as being linear. Buchanan (1972) uses the multiple regression technique for the development of a model that represents the total cost of a reinforced concrete structure.

The

model was developed from 38 reinforced concrete frame buildings that were constructed by the Ministry of Public Building and Works of the UK between 1960 and 1968. The dependent variables that are identified are the gross floor area, storey height, number of storeys, average superimposed loading, shortest span, longest span, slab concrete thickness and number of lifts. Kouskoulas and Koehn (1974) represent the pre-design estimation of building prices per square foot (price per area) as a function of six variables: building locality, price index, building type, building height, building quality and building technology. Karshenas (1984) regards the resulting pre-design estimation technique that is devised by Kouskoulas and Koehn as being simple, fast and applicable to forecasts

60 for a wide variety of building construction projects, and opines that the methodology might be generalised in a global sense.

Kouskoulas and Koehn’s use of raw

dependent variables with the inflation index as the independent variable is also particularly interesting. They use a multiple regression methodology to derive the single cost-estimation function from 40 sets of data on building contracts in the US. Disregarding the possibility of obtaining a better-performing model by the elimination of some of the variables, they insist on keeping all of the variables, as they believe that the better result that is obtained by omitting some of the variables is due to a bias in the data sample.

This supposition is rather subjective.

The final

model is tested on only two ex ante projects, and shows little forecasting bias.

This

test sample is also considered as be too small to draw a reasonable conclusion from. Unfortunately, no results on the performance of the reduced model are shown as a comparison in their paper. In Australia, Braby (1975) studied the relationship between the height of buildings, as represented by the number of storeys, and the building price per floor area in eighty buildings in Melbourne.

Instead of classifying the data according to

the building type, as is typical in other studies, Braby divides the data according to its location relative to the central business district (i.e., whether it is inside or outside of the central business district).

The results of the linear regression indicate that

building prices generally increase with the number of storeys.

However, the results

are not conclusive due to the poor determination of the correlations. McCaffer (1975) summarises research work that was produced by post-graduate students (Buchanan (1969); Gould (1970); Moyles (1973); Neale (1973); Baker (1974); Blackhall (1974) and Coat (1974)) of the Department of Civil Engineering at Loughborough University of Technology on the use of regression

61 analysis for forecasting.

A summary of the models that were developed by the

post-graduate students is shown in Table 3-7.

The paper raises an important

statistical concern about the deterioration of the performance of regression models in actual forecasting using ex ante data from their validation performance using ex post data.

The experience of the author indicates that the coefficient of variation (as a

measure for model performance) increases by 25% to 50% when the derived model is applied to data outside of its own database. Thus, a model with a coefficient of variation of 10% in its validation will deteriorate by 15% to 20% when used for other cases of a similar type. This study, although it does not show detailed calculations as evidence, is particularly important to studies in the area of cost modelling, as it is the first in the field to address the difference between ex post performance and ex ante performance. In fact, except when using a more advanced approach to resampling validation, such as the cross validation (as is applied in this research), or bootstrapping, it is crucial to measure both the ex post and ex ante performance in the validation of a model to give a full picture of its performance. Based on the theoretical study that was undertaken by Steyert (1972), who suggests that the cost of the various elements of a building respond differently to changes in the number of storeys, Flanagan and Norman (1978) further elaborate his idea by suggesting that the cost components of a building can be split into four categories: those that fall as the number of storeys increases, those that rise as the number of storeys increases, those that are unaffected by height and those that fall initially and then rise as the number of storeys increases. They use the learning curve that was produced by the Committee of Housing in New York to illustrate that every time the number of repetitions doubles, the output time declines by a fixed percentage.

Fifteen office buildings of more than two storeys that were built

62 between 1964 and 1975, including the ten that were used by Tregenza (1972), are selected for curve fitting.

They apply the regression analysis technique to model

the relationship between the building height and building price.

By making the

assumption that other influencing variables, such as the quality of building, geographical location, size of project, site characteristics and so forth, are constant, the results show that the relationship between the price per square meter and the number of storeys in an office is projected to be U-shaped. Karshenas (1984) uses data from 24 historical multi-storey office buildings in the US to derive the mathematical relationship between price, overall building height and average floor area (termed “typical floor area” in Karshenas’ paper).

By

merely observing the points that are distributed on the chart of the average floor area against the height, a set of contours that represents the constant per area price for different heights and average floor areas is constructed on the chart. Based on the shape of the contours, the author hypothesises that building price is a function of the average floor area and overall building height: C = α . Aβ . Hγ,

(3.5)

where C is the building price, A is the average floor area, H is the overall building height and α, β and γ are the constants. By transforming both sides with a natural logarithm, the equation becomes:

lnC = lnα + β . lnA + γ . lnH.

(3.6)

63 This transformed hypothetical equation suits the methodology of the multiple linear regression.

To make building prices comparable, Karshenas updates all

prices to the base of March 1982, according to the price index.

His derived model

of building price with the average area and overall height as variables is compared with the floor area model using floor area unit rates from the published price book. Unfortunately, the comparison does not pay attention to the deterioration problems that are addressed by McCaffer. Thus, the conclusion that a better method has been developed is not persuasive enough. Based on a large sample of 1188 projects, Pegg (1984) identifies ten variables that statistically and significantly affect building price level: building price date, location, selection of contractor, contract sum (≤£20,000 or >£20,000), building function, measurement of structural steelwork, building height, form of contract, site conditions and type of work.

Within these significant variables, the only

quantitative variable is the building height. Skitmore and Marston (1999, p.252) criticise the study for not giving a clear description of the method of analysis or of levels of accuracy. Apart from summarising the model development in the résumé that was cited earlier in this review, Skitmore and Patchell (1990) also demonstrate the use of the multiple regression analysis technique in the development of a forecasting model of building price per gross floor area (GFA) based on six raw independent variables, including the number of storeys.

Data was extracted from 28 office buildings in the

UK for the period 1982 to 1988.

The final model is a natural logarithmic

transformed model that is derived by forward stepwise regression. It contains three chosen variables: the number of bidders, GFA and the contract period.

Very

64 detailed empirical work is incorporated in this study, but no ex ante performance validation is included. Khosrowshahi and Kaka (1996) use multivariate regression analysis with an improvised iterative method to develop forecasting models for the cost and duration of housing projects.

The objective of the paper is to develop building price and

duration forecasting models for both the contractor and the client.

Fifteen variables

are taken into account, including the number of storeys, which is divided into three groups (low, medium and high). Data from 54 housing projects in the UK in the period 1981 to 1991 are used. multivariate regression analysis

Six of the fifteen candidate variables are selected by These include one scale variable, ‘unit’, and five

categorical variables, ‘project operation’ (which comprises refurbishment, extension, alteration and new); ‘project sub-type’ (whether sheltered, public or bungalow); ‘abnormality’ (which comprises access to site, poor communication, repeating stoppages, sudden speed ups, transportation problems, time and cost yardsticks, keeping occupation and unknown factor, contractor’s mistakes, various delays, resource shortages, repeating variations, lack of presence and others); ‘starting month’ (January, February, March, April, May, June, July, August, September, October, November or December); and ‘horizontal access’ (whether good, fair or poor).

The final model may have a problematic area in application. In real-life,

abnormalities cannot be assumed to be mutually exclusive and independent to each other. The presence of more than one abnormalities at the same time and the interdependence of those abnormalities could easily ruin the model.

Also, there is

no actual performance validation being shown in Khosrowshahi and Kaka’s paper. An interesting commonality between Skitmore and Patchell’s final office price model and Khosrowshahi and Kaka’s house model is that the variable that

65 represents the height of building was eliminated during the process of selecting the variables. A summary of the forecasting targets (independent variables) and the influencing variables (dependent variables) for the building price forecasting models as reviewed in this section is shown in Table 3-8. The influencing variables in the table are classified according to whether they are quantitative (measurable) or qualitative (intangible, normally divided into various levels) in nature. Most of the studies put quantitative and qualitative measures into their model.

This approach is

acceptable, because these models all belong to the category of ‘black-box’ forecasting tools, which are validated solely on the performance accuracy of the models.

However, the ways in which the qualitative variables are chosen, defined

and divided into various levels, are mainly based on the experience of the modellers. By defining the variables or the levels or scales differently, a rather different final model may be produced.

Thus, these models must be used with extra care.

Alternatively, this possible flaw can be avoided by employing models that use only quantitative variables. Instead of putting the qualitative variables into the model, an alternative approach is to group the data with similar qualitative characteristics together to derive a model that explains only a particular set of qualitative characteristics.

This approach, however, produces more models than a generalised

approach does, and the grouping criteria are subjective. Armstrong (2001 p.342-345) reviews the general principles for using forecasting methods in published research. He concludes that a quantitative method should be used if there is enough data. In consideration of the limited amount of data that is available in the early design stage (because many of the qualitative characters of a project are yet to be determined), the quantitative variables of floor

66 area, roof area, basement wall area and external wall area, as identified in JSEM, are used in the model development in this research.

3.12 Occam’s Razor: Parsimony of Variables

For a given set of data, there are always an unlimited number of possible explanatory models.

If a model is too simple, then the model and its predictions

will be unrealistic, whereas if it is too complex, then the model will be specific but its predictions unreliable (Edwards 2001 p.129).

It has long been advocated that

“economists should follow the advice of natural scientists and others to keep their models sophisticatedly simple, especially as simple models seem to work well in practice” (Zellner 2001 p.4). In the world of scientific modelling and theory development, scientists should adopt the underlying principle of parsimony to distinguish a better model from others. The principle of parsimony, also known as Occam’s razor, is attributed to the mediaeval philosopher William of Occam, who suggested that “pluralitas non est ponenda sine necessitate” (another version is “entia non sunt multiplicanda praeter necessitatem”), which means entities should not be multiplied unnecessarily.

Thus,

if there are two competing theories (or models in the context of this study) that both describe the same characteristics of observed fact (data set), then the simpler of the two should be adopted until more evidence comes along (Stangl 1997).

Occam’s

razor is particularly important in the development of universal models, as the subject domain of these models is of an unlimited complexity. Because of this complexity, the chance of obtaining a manageable model is very slight if the modelling process

67 starts with a very complicated theoretical foundation.

The same principle also

applies to the development and selection of early stage forecasting models, because data that is obtained at the early stage are highly abstract and uncertain, and the use of a complicated model will inevitably add unnecessary assumptions.

The

discourse on scientific theory suggests that no theory can be totally validated, but that any theory can be falsified by facts (Popper 1959 pp.78-92). Thus, science operates according to the principle of parsimony. To apply the principle of parsimony to model selection, Simon (2001 p.35) suggests expressing parsimony as a measure in the ratio of the complexity of the data set to the complexity of the formula set. In the context of the competition between two models, the parsimony of the relationship of the data set to the simpler model (e.g., a linear model that contains one explanatory variable) is greater than the parsimony of its relationship with the more complex model (e.g., a linear model containing two explanatory variables) if they both describe a data set equally.

By

the same token, the parsimony of the relationship of a model with a larger data set is greater than the parsimony of the relation of the model with a smaller data set, if the same model equally describes the two data sets. To implement Occam’s razor, regression techniques can be applied to achieve the parsimony of variables. This involves the development of a model through the least-squares error method for a given domain (data for a particular type of building). The goal of the final model is to produce accurate forecasts, and the criterion for the selection of that model is the forecasting accuracy. Forecasting accuracy is an objective measure of the success of a model, and is also the expected fit of unseen data in a domain.

It plays a very important role in

68 the judgment of models, as models themselves can never give error-free forecasts. More reviews on forecasting accuracy are contained in Chapter 4.

Table 3-7: Summary of the models developed by the post-graduate students of the Department of Civil Engineering at Loughborough University of Technology (extracted from McCaffer 1975) Author(s)

Subject of Regression Model

Variables

Model performance

Buchanan, J. S. (1969)

Reinforced concrete frame in buildings

Gross floor area, average load, shortest span, longest span, no. of floors, height between floors, slab concrete thickness and no. of lifts

More accurate for medium and high cost schemes rather than low cost schemes.

Gould, P. R. (1970)

Heating, ventilating and air conditioning services in buildings

Functions which described the heat and air flow through the building, the heat source and distance which it has to be ducted and shape

High accuracies higher cost

Moyles, B. F. (1973)

System built school buildings

Floor area, area of external and internal walls, no. of rooms and functional units, area of corridors, storey height and no. of sanitary fittings

Generally, high accuracy

Neale, R. H. (1973)

Houses for private sale

Floor area, area of roof, are of garage, number of storeys, slope of site, unit cost of external finishes and cost of sanitary fittings, area and volume of kitchen units, site densities, regional factors, number of doors, area of walls, number of angles on plan, construction date and duration of development, and type of central heating

Only two cases outside the ±10%

Baker, J. (1974)

Residential apartment scheme for old people

Area of single units, double units, triple units, common rooms, Warden’s Flat, laundry, access corridors, number of lifts and garages and duration of contract

Coefficient of Variation (c.v.) of 9.16%

Blackhall, J. D. (1974)

Passenger lifts in office building

Contract date, dimensions of the car, no. of landings, length of travel, operating speed, type of control system and location of installation

Coefficient of Variation (c.v.) of 20.9%

Blackhall, J. D. (1974)

Electrical services in buildings

No. of distribution boards, fused load, number of active ways, no. of socket and other outlets, voltage, contract date and a differentiation of whether the building was commercial or domestic use

Coefficient of Variation (c.v.) of 20.0%

Coates, D. (1974)

Motorway drainage (including three models: (1) using porous pipes; (2) using helpline pipes and (3) using asbestos pipes)

Internal diameter, average depths and cost of pipes

(1) For porous pipe, c.v. of 12.8%; (2) For helpline pipe, c.v. of 9.2%; and (3) For asbestos pipe, c.v. of 6.9%

at

fell

69

Table 3-8: Summary of Forecasting Targets and Influencing Variables in Previous Empirical Studies Variables

Empirical studies

Forecasting Targets (Dependent Variables Used) Overall building price / Cost of reinforced concrete structure

James (1954); Buchanan (1972); Moyles, (1973); Neale (1973); Baker (1974); Karshenas (1984); Singh (1990); Khosrowshahi and Kaka (1996)

Overall building price building price per square meter floor area

Department of Environment (1971); Tregenza (1972); Kouskoula and Koehn (1974); Braby (1975); Flanagan and Norman (1978); Skitmore and Patchell (1990)

Influencing Factors (Independent Variables Used) Building type

Kouskoula and Koehn (1974); Khosrowshahi and Kaka (1996)

Gross floor area

Buchanan (1972); Moyles (1973); Neale (1973); Skitmore and Patchell (1990)

Typical floor area

Karshenas (1984)

Number of storeys

Department of Environment (1971); Buchanan (1972); Tregenza (1972); Neale (1973); Braby (1975); Flanagan and Norman (1978); Singh (1990)

Overall height

Kouskoula and Koehn (1974); Karshenas (1984)

Storey height

Buchanan (1972); Moyles (1973)

External wall area

James (1954); Moyles (1973)

Location index (No location index in Hong Kong)

Kouskoula and Koehn (1974), Neale (1973)

Roof area

James (1954); Neale (1973)

Starting date

Neale (1973); Khosrowshahi and Kaka (1996)

Contract duration

Neale (1973); Baker (1974); Skitmore and Patchell (1990)

Area of garage

Neale (1973); Baker (1974)

Area of corridors

Buchanan (1972); Baker (1974)

Number of lifts

Buchanan (1972); Baker (1974)

Basement wall area

James (1954)

Average superimposed loading, shortest span, longest span, slab concrete thickness

Buchanan (1972)

Internal walls area of number of rooms and functional units, number of sanitary fittings

Moyles (1973)

70

Table 3-8 (Cont’d): Summary of Forecasting Targets and Influencing Variables in Previous Empirical Studies Variables

Empirical studies

Influencing Factors (Independent Variables Used) (cont’d) Slope of site, unit cost of external finishes, cost of sanitary fittings, area and volume of kitchen units, site densities, number of doors, area of walls, number of angle on plan, duration of development, type of central heating

Neale (1973)

Area of single units, double units, triple units, common rooms, Warden’s Flat, laundry

Baker (1974)

Price index, quality, building technology

Kouskoula and Koehn (1974)

Quantities of constituents of concrete construction, structural scheme, section of beams, grade of concrete, grid location, grid size

Singh (1990)

Number of bidders

Skitmore and Patchell (1990)

Project operation, abnormality, and horizontal access

Khosrowshahi and Kaka (1996)

Note: Bold typed variables are measurable.

3.13 Summary

A building price forecasting model is a system that produces forecasted prices from historical data. It is a type of technical model that attempts to dig out the variables that have most influence on building prices. Forecasting models can be distinguished according to whether they are black box or realistic, deterministic or stochastic, and deductive or inductive.

A more detailed classification was prepared

by Newton using descriptive primitives. According to his classification, the final models in this research are specific for individual types of building (Data); applicable

71 to finished works, i.e. equating price to a function of identified variables that comprises floor and external wall areas and so forth (Units); represent the designer’s price forecast (Usage); follow Marco’s approach, i.e., producing forecasts for the whole building (Approach); are applied at the feasibility and sketch design stage (Application); are simulation models in terms of the problem boundary, the variables considered and the inter-relationships between the variables (Model); are generated by regression analysis (Techniques); are based on explicit assumptions about defined problem boundaries (Assumptions) and are stochastic in terms of their performance assessment (Uncertainty). The characteristics of different types of forecasting cost models are summarised in Skitmore and Patchell’s study.

The application of cost models is

highly restricted by the assumptions that lie beneath the models, their reliance on historical data for predicting future events, the insufficiency of information and preparation time, and their reliance on expert judgment.

Many studies on the

development of new models are criticised for their overemphasis on the uniqueness and innovativeness of the model, their ignorance of the practicability of the model and the lack of a clear demonstration of the benefit of the model, especially in terms of their forecasting performance relative to the conventional models.

To put forth

more advanced models in practice, their statistical significance and practical significance are both crucial issues that should be addressed. James’ storey enclosure model (JSEM), proposed in 1954, has been chosen for further development.

The original model uses some physical measurements,

such as floor area, roof area and elevation area of buildings to estimate building prices. Although JSEM is not a widely used model in practice, and suffers from the same inherent shortcomings as other early stage conventional forecasting models,

72 JSEM has been proved empirically to outperform other models. As the simplified equation for JSEM for multi-storey buildings (as elaborated in Chapter 5) shows that it can be considered as a problem of determining the best set of predictors, regression analysis is employed to improve the JSEM further. The regressed models are developed empirically, and are expected to be understandable, easy to use, fairly accurate and reliable. Predictors for regression models that have been used in previous studies are reviewed.

The two most commonly studied variables are the floor area, as

represented by the gross floor area or typical floor area, and the building height, as represented by the number of storeys, overall height and storey height. The former variable represents the costs of the horizontal elements of a building, whereas the latter variable represents the costs of the vertical elements.

In JSEM, the identified

variables include the areas of the floor, roof, basement walls and external walls. The price of buildings can be expressed in an unlimited number of ways with different mathematical functions and variables. Occam’s razor is addressed at the end of this chapter because it is considered to be the most important principle for model development.

Taking this into account, the regression technique that is used

in this research is considered to be the means to achieve the necessary parsimony.

73

Chapter 4

Performance of Forecasting Models

The more unpredictable the world is the more we rely on predictions. Steve Rivkin

4.1

Introduction

It is essential for modellers to demonstrate the benefits of a new forecasting model or approach to practising forecasters before its launch.

The fundamental

benefit that a new model should show is an improvement in forecasting performance. For instance, this study hypothesises that the new regressed models outperform the conventional models in terms of forecasting accuracy. Much research has been conducted in the past on the subject of forecasting performance, and some of this research has studied the determinants of forecasting performance. The measures for forecasting performance include bias, consistency and accuracy.

The bias in forecasting that is produced by a model is generally

represented either by the average percentage difference between the designers’ forecast and the lowest tender sum, or the average ratio between them. Bias is the most popular measure of performance.

Consistency refers to the degree of variation

74 around the average that is represented by standard deviations, and accuracy is the combination of bias and consistency into a single quantity (Skitmore 1991 p.2).

4.2

Measures of Forecasting Accuracy

A naive definition of accuracy would be the absence of error, or the assertion that the smaller the error, the higher the accuracy and vice versa (Flanagan and Norman 1983).

Accuracy measures are usually defined in terms of the ratio of the

lowest bid to a forecast, the ratio of a forecast to the lowest bid (the reciprocal of the ratio of the lowest bid to a forecast), the percentage by which the lowest bid exceeds a forecast, the percentage by which a forecast exceeds the lowest bid (the reciprocal of the percentage by which the lowest bid exceeds a forecast), the difference between the lowest bid and a forecast, and the total number of “serious” errors.

As the

percentage by which a forecast exceeds the lowest bid, is a widely accepted expression of error in practice and is a unit-free measure, it is used to measure accuracy in this study. To properly interpret accuracy measures, there are two major components: bias and consistency.

Bias can be measured by the arithmetic mean, median,

Pearson r, Spearman’s rho and the coefficient of regression of the errors, the percentage errors or the ratios described above.

The first measure of bias uses

forecasts as the base of reference, which is suitable for the evaluation of the forecasting performance of an individual forecaster or an individual company. second and third measures are statistically the same.

The

The fourth measure does not

75 take into account the scale of the data, and data with large numbers might easily dominate the comparisons. Consistency can be measured by the standard deviation and coefficient of variation of the errors, the percentage errors or the ratios described above.

While

they both represent the degree of variation around the mean, the latter measure adjusts the differences in magnitudes of the means of the data sets. Instead of measuring bias and consistency, alternatively, accuracy can be measured by a single quantity. The common combined measures, found mainly from researches on modelling, are the mean square error, the root mean square error and the mean modulus (absolute) percentage error. Skitmore et al. (1990 pp. 5-23) extensively review the measures of performance of forecasts in the literature.

The different representations of bias,

consistency and accuracy (combined accuracy measures) in previous research are summarised in Table 4-1. The authors found that the consistency measures in terms of the coefficient of variation of forecasts and the overall accuracy measures are by far less frequently used then bias measures. Since all the models in comparison in this study are generated and tested by the same set of data and the use of cross validation for modelling would likely produce mean percentage errors that are close to zero, the effect of magnitude differences mentioned earlier is likely to be small which lessens the benefit of using the coefficient of variations. To compare models deterministically, a forecasting model that is less biased (e.g. smaller mean error) and more consistent (e.g. smaller standard deviation of error), or more accurate (e.g. smaller mean square error) than other models is more preferable. However, the more sophisticated probabilistic

76 approach suggests that statistical inference should be used to conclude whether one model is significantly better than the others.

There are far more statistical inference

methods available for the measures using mean and standard deviation (or variance, i.e. the square of standard deviation). Therefore, this study adopts the mean and standard deviation of percentage error as the measures of forecasting performance. Skitmore et al. (1990 pp. 5-23) also suggest that there are five primary determinants that affect forecasting performance: the nature of the target, the information used, the forecasting technique used, the feedback mechanism used and the person who is providing the forecast. Except for the feedback mechanism, the other factors have been well explored by many researchers.

A summary of the

empirical evidence on the factors that affect forecasting quality is shown in Table 4-2. The table is an extended version of a similar table that was prepared by Skitmore et al. (1990 p.20-21), but more recent empirical studies are incorporated. One of the major inadequacies that is found from a review of the literature on forecasting accuracy is that some of the evidence is not strong enough because of a lack of tests for the significance of forecasting errors (Skitmore and Drew 2003). According to Table 4-2, there are a few contradictory results, but these contradictions might have occurred by chance and may not represent the true population (Gunner 1997 p.30-31).

77

Table 4-1: Measures of Performance of Forecasts (Source: Skitmore et al. 1990 p. 22)

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library

78 Table 4-2: Factors affecting quality of forecasts – summary of empirical evidence (extended from the similar table in Skitmore et al. (1990, p. 20-21)) Factor

Researcher

Evidence

McCaffer (1975)

Buildings more biased and more consistent than roads.

Harvey (1979)

Different biases for buildings, non buildings, special trades, and others.

Morrison & Stevens (1980)

Different bias and consistency for schools, new housing, housing modifications, and others.

Flanagan & Norman (1983)

No bias differences between schools, new housing, housing modifications, and others.

Skitmore (1985)

Different bias and consistency for school, housing, factory, health centre and offices.

Skitmore & Tan (1988)

No bias or consistency differences for libraries, schools, council houses, offices and other buildings.

Skitmore et al (1990, pp. 79-87)

No bias or consistency differences for primary school, sheltered housing, offices, unit factories, health centres and other buildings

Quah, L. K. (1992)

New works more consistent than refurbishment

Gunner and Skitmore (1999)

No bias or consistency differences for commercial, non-commercial and residential buildings. Renovation works more biased and more consistent than new works.

Skitmore and Drew (2003)

No bias or consistency differences for commercial, health, apartment, education and other. No bias or consistency differences for new and alterations works.

McCaffer (1975)

No bias trend.

Harvey (1979)

Bias reduces with size.

Morrison & Stevens (1980)

Modulus error reduces with size. Consistency improves with sizes.

Flanagan & Norman (1983)

Bias trend reversed between samples.

Wilson et al (1987)

No linear bias trend.

Skitmore & Tan (1988)

Bias reduces and consistency improves with size.

Skitmore (1988)

No consistency trend.

Ogunlana and Thrope (1991)

Consistency reduces with larger contract size.

Cheong (1991, p.106)

No consistency trend.

Thng (1989)

Ditto.

Gunner and Skitmore (1999)

Bias reduces and consistency improves with size.

(1) Nature of target Contract works type

Contract size

79

Table 4-2 (Cont’d): Factors affecting quality of forecasts – summary of empirical evidence (extended from the similar table in Skitmore et al. (1990, p.20-21)) Factor

Researcher

(1) Nature of target

(cont’d)

Contract size (Cont’d)

Evidence

Skitmore (2002)

No bias or consistency trend.

Skitmore and Drew (2003)

No bias or consistency trend.

Project size (area)

Skitmore and Drew (2003)

No bias or consistency trend.

Contract conditions type

Wilson et al (1987)

More bias for bill of quantities contracts.

Gunner and Skitmore (1999)

(1) Bias difference between conditions of contract issued by Singapore Institute of Architects (SIA) and standard form (RHLB form). (2) Consistency difference between contract with a fluctuation provision and contract without.

Harvey (1979)

Bias differences between Canadian regions.

Wilson et al (1987)

No bias trend between Australian regions.

Ogunlana and Thrope (1991)

No conclusion although bias and consistency difference between regions of United Kingdom.

Harvey (1979)

Bias differences for individual bidders.

McCaffer (1975)

Estimates higher with more bidders.

de Neufville et al (1977)

Ditto.

Harvey (1979)

Ditto. Inverse number of bidders gives best model.

Flanagan & Norman (1983)

Estimates higher with more bidders.

Runeson & Bennett (1983)

Ditto.

Hanscomb Association (1984)

Estimates higher with more bidders. linear relationship.

Wilson et al (1987)

Ditto.

Tan (1988)

Ditto but not with UK data.

Ogunlana and Thrope (1991)

No bias and consistency trend.

Skitmore (2002)

No consistency trend.

Geographical location

Nature of competition

Non

80

Table 4-2 (Cont’d): Factors affecting quality of forecasts – summary of empirical evidence (extended from the similar table in Skitmore et al. (1990, p.20-21)) Factor

Researcher

(1) Nature of target

(cont’d)

Prevailing economic climate

Price intensity

Contract period

Evidence

de Neufville et al (1977)

Estimates higher in ‘bad’ years with lagged response rate.

Harvey (1979)

Ditto.

Flanagan & Norman (1983)

Ditto.

Morrison & Stevens (1980)

Estimates lower in uncertain economic climate.

Ogunlana and Thrope (1991)

No significant relationship

Gunner and Skitmore (1999)

Estimates higher in ‘bad’ years with lagged response rate.

Skitmore et al (1990, p.191)

High value contracts were underestimated and low value contracts over estimated.

Gunner and Skitmore (1999)

Ditto.

Skitmore (1988)

No difference between groups of contract period

Gunner and Skitmore (1999)

No conclusion due to different results obtained from using contract sum as the base for measurement of bias against contract sum minus provisional sums as the same.

Skitmore & Tan (1988)

Bias reduces and consistency trend with contract period and basic plan shape.

Ogunlana and Thrope (1991)

Bias and consistency differences between design offices

Gunner and Skitmore (1999)

(1) Bias and consistency better for foreign than local (Singapore) contractors. (2) No conclusion although bias difference between foreign and local architects. (3) Consistency improves with increasing area. (4) Consistency better for private sector than public sector

Skitmore and Drew (2003)

No bias or consistency trend with client type.

Other project characteristics

81

Table 4-2 (Cont’d): Factors affecting quality of forecasts – summary of empirical evidence (extended from the similar table in Skitmore et al. (1990, p.20-21)) Factor

Researcher

Evidence

Jupp & McMillan (1981)

Slight bias reduction with price data.

Bennett (1987)

Consistency differences between price data sources.

Skitmore (1985)

No bias or consistency trend with price data. Increased bias and consistency with project information.

Gunner and Skitmore (1999)

Consistency reduces as the number of items reduces.

Gunner and Skitmore (1999)

No conclusion due to different results obtained from using contract sum as the base for measurement of bias against contract sum minus provisional sums as the same.

James (1954)

Consistency differences between cube, floor area and storey enclosure methods.

McCaffer (1975)

Consistency better for regression methods than conventional.

Morrison & Stevens (1981)

Simulation model has less bias and more consistency than conventional

Ross (1983)

Consistency better with simpler techniques.

McCaffer et al (1984)

Consistency of regression comparable with conventional.

Brandon et al (1988)

Expert system has less bias consistency than conventional.

Munns and Al-Haimus (2000)

Cost significant global model has less bias and more consistency than conventional.

Skitmore and Drew (2003)

Bias and consistency differences between approximate quantities and floor area methods. Consistency better for floor area method.

(2) Level of information Number of priced items

Preliminaries percentage

3) Forecasting technique

(4) Use of feedback

No evidence available

and and

method more

82

Table 4-2 (Cont’d): Factors affecting quality of forecasts – summary of empirical evidence (extended from the similar table in Skitmore et al. (1990, p.20-21)) Factor

Researcher

Evidence

Jupp & McMillan (1981)

Bias and consistency differences between subjects.

Morrison & Stevens (1980)

Bias and consistency differences between offices.

Skitmore (1985)

Bias and consistency differences between subjects.

Skitmore et al (1990)

Bias and consistency differences between subjects.

Gunner and Skitmore (1999)

No bias but consistency differences between subjects

Gunner and Skitmore (1999)

Bias reduces in proportion to number of price forecasts

(5) Ability of forecasters Forecasters

Number of price forecasts

4.3

Base Target for Forecasting Accuracy

Generally speaking, contractors derive a tender price by summing the estimated total costs of production (including head office overheads and the cost of finance) and their mark up.

For the traditional procurement method, where the task

of design is separated from that of construction, design team members gain no access to the details of the estimated costs of production or the allowed mark up in tenders. The target of forecasts at the early design stage is the returned tender price, rather than the final contract price, as the latter presents far too many unforeseeable

83 reasons and uncertainties, such as the possibility of contractual claims, that would frustrate the forecasts of the final contract sum in the early design stage.

There is,

however, a controversy between practice and academia about the use of returned tenders as the forecasting target. Some suggest that the target should be the lowest returned tender price (Morrison 1994; Ogunlana and Thorpe 1987) whilst others suggest employing the mean (McCaffer 1976) or median of the returned tender prices. The proposal for using the mean is based on the reason that it is less variable, and is therefore more likely to be more accurate. The proposed use of the median simply derives from a conservative notion, though one that is widely accepted by practicing forecasters, that the possibility of underestimation should be avoided. As price models are used to forecast the market price (i.e. the unknown value of the contract to contractors buying on the contract market) (Skitmore and Marston 1999, p.20), the lowest returned tender price is chosen to be the forecasting target in this study.

After all, the major interest of the forecasting exercise is to

predict the probable market price, and the use of the mean or median is ill defined. Moreover, the effect of using the lowest or the mean tender on the assessment of accuracy is found to be small (Beeston 1983).

4.4

Overview of Model Performance at Various Design Stages

The field of forecasting techniques has been studied for more than four decades.

The popularity of research on this subject is due to the inherent

84 shortcomings of the conventional models of forecasting, as is described in Chapter 2. As is detailed in section 3.6 of Chapter 3, modellers should demonstrate the benefits of a model before attempting to implement it in practice, and thus the assessment of the performance of a newly developed model is essential. However, only a few empirical studies demonstrate the performance of new models, in a relative sense, compared with that of conventional models, and even less deal with performance measurement seriously by the use of statistical inference. Barnes (1971) suggests that the performance of designers’ forecasts at the commencement of feasibility studies is between +20% to 40% of the coefficient of variation (cv), which improves to +10% to 20% cv at the commencement of the detailed design stage. Beeston (1974) uses a hypothetical example to show that the performance of designers’ forecasts can only be reduced to close to the contractor’s estimate. Based on the assumption that the variability of the designers’ forecast can be reduced to 5%, this would lead to a figure of 6% of the coefficient of the variation of differences between the forecast and the lowest tender, and there would be no further reduction. If 6% of the coefficient of variation could be achieved, then 60% of the designers’ forecast would fall within 5% of the lowest tender, 90% would fall within 10% and all of the forecast would fall within 20%. Marr (1974) divides designers’ price forecasting into four stages: planning, budget, schematics and preliminaries.

Their corresponding adequate degrees of

accuracy are stated as 20-40% for planning, 15-30% for budget, 10-20% for schematics and 8-15% for preliminaries, reducing to 5-10%. McCaffery (1978), in his assessment of the forecasting accuracy for 15 schools, also makes a similar

85 division of stages, i.e., forecast, brief, sketch plan and detailed design.

Their

corresponding cv are 17%, 10%, 9% and 6%. McCaffer (1975) compares the quality of eight multiple regression statistical models with that of other unspecified (conventional) models that are used by practicing forecasters.

Table 3-7 shows the performance of the eight methods.

Based on the assumption that the coefficient of variation of the forecast is likely to be 25% to 50% greater than the coefficient of variation of the prediction, as suggested by the author, the multiple regression approach is proved to produce better quality forecasts than the other (unspecified) methods that are adopted in practice. Ross (1983) reveals some surprising results on the relationship of the sophistication of models and their performance. study.

Three models are compared in his

The first uses the simple average of the value of sections of construction

work from a set of bills of quantities for previous contracts. The second model uses a regression procedure to predict the total value from the sectional values, and the third model uses a regression on the unit value of items. The models are therefore arranged in order of increasing use of information.

However, the results reveal the

first model to be the most accurate, with a coefficient of variation of 24.5%, followed by the second model with a cv of 30.49%, and the third method with a cv of 52.66%, which suggests, controversially, that the more sophisticated methods that utilise more of the available data produce less accurate results. Ashworth and Skitmore (1983) review the literature that is concerned with the forecasting accuracy of conventional models. Their cited references are shown in Table 4-3. They draw the important conclusions that certain types of project are associated with higher degrees of accuracy, and that the estimation accuracy is found

86 to be 15% to 20% cv in the early design stages, which only improves to 13% to 18% at the tender stage. McCaffer et al. (1984) suggest a more sophisticated approach to forecasting based on the element unit rate method.

This approach involves the use of 32

different models together with a criterion for selecting the most appropriate model to match the characteristics of the target.

The reported consistency of this method is

between 10% to 19% cv, which is at least comparable to that of conventional methods. The research of Brandon et al. (1988) suggests the use of a developed expert system for forecasting. The performance of the expert system for early stage forecasting for office projects is reported to be within 5% of that predicted by the expert forecaster, and the system provides a forecast within 10% of the lowest bid, which is much better than that achieved by the average forecaster. Skitmore and Drew (2003) reveal significant differences in both bias (ANOVA test, p=0.021) and consistency (Bartlett’s test, p=0.030) between the approximate quantities and the floor area method. Similar to the surprising result found by Ross, the approximate quantities method (with 14.27% cv) that utilizes more data is found to be less accurate than the floor area method (with 10.87% cv).

87 Table 4-3: Performance of designers’ forecasts reviewed by Ashworth and Skitmore (1983) [email protected]

This figure is not available online. Please consult the hardcopy thesis available from the QUT Library

4.5

Summary

The bias of the forecasts that are produced by a model is calculated by the arithmetic mean of the percentage difference between the designers’ forecast and the lowest tender sum. forecasting errors.

The use of a percentage mean is a unit-free measure for Consistency is the degree of variation around the average, which

88 is represented by the standard deviation of percentage errors.

Both bias and

consistency are chosen to be the accuracy measures in this study. There are different options as to the choice of forecasting target. The lowest returned tender price is considered to be a more appropriate forecasting target than the mean or median of returned tender prices, as the major concern of the forecasting exercise is to predict the probable market price to be paid by clients, which is often the lowest bid in a tendering exercise. Results from studies on bias and consistency tend to be contradictory, rather than conclusive.

Although significance tests can provide strong evidence to show

whether one model prevails over the others, and as a result can demonstrate the major benefit of using the model in terms of its accurate performance, a review of forecasting studies finds that sufficient significance testing is lacking. Empirical studies on forecasting accuracy in different stages suggest that there is little improvement in accuracy as a building project proceeds from the early design stage to the detailed design stage. Paradoxically, there are two studies, one by Ross and the other by Skitmore and Drew, which provide evidence that rougher models are more accurate than more sophisticated models.

89

Chapter 5

Methodology

By three methods we may learn wisdom: first, by reflection which is noblest; second, by imitation, which is the easiest; and third, by experience, which is the bitterest. Confucius

5.1

Introduction

James’ Storey Enclosure Model (JSEM) has been chosen for further development because it is considered to be more sophisticated than other conventional models in terms of the number of variables contained therein (such as the floor area for each floor, basement area, external wall area and roof area of a building) and the rationale behind the use of these variables (i.e. the consideration of certain design factors, such as the shape of buildings, the vertical positioning of floor areas, the storey heights and the cost of sinking storeys below ground in estimating building prices).

However, JSEM lacks the appropriate support for its assigned

weightings and selection of variables. Disregarding this deficiency, JSEM has been judged, if rather roughly, to be a more accurate method than the floor area and the cube models.

The floor area model is still a very popular model that is widely

employed in practice, whereas JSEM is still only found in the textbooks of building price studies.

Because JSEM is as simple in application as other conventional

90 models, and has been proved to be relatively more accurate, it has been chosen for further development in this research. The method for the development of JSEM that is undertaken comprises the simplification of JSEM for multi-storey buildings by making reasonable assumptions, the use of regression techniques for modelling empirical data of building projects in Hong Kong and the assessment of forecasting performance by statistical inference.

5.2

Research Framework

The further development of JSEM involves a purpose-designed modelling approach that uses different regression techniques.

Figure 5-1 shows the

framework for the identification, selection and validation of the price models in this research. The framework comprises seven major steps: (1) the simplification of JSEM, (2) data collection, (3) model building, (4) reliability analysis, (5) model selection, (6) model adjustment and (7) performance assessment.

The final price

models are developed through the identification of candidates in JSEM (Step 1) and through the selection of predictors by regression techniques (Steps 2 to 6). The use of regression techniques overcomes the major criticism of the irrationality of the assigned weightings in JSEM. The forecasting performance of the final price models (i.e. the best subset-regressed models) is then assessed by using the known measures of the bias and consistency (Step 7).

Finally, the best subset-regressed models are

compared with other conventional models to classify the models according to their forecasting performance.

91

Simplification of JSEM Identification of Candidate Variables

Formation of Base Model containing all Candidates

Leave-One-Out Method Construct Sub-samples

Data Collection Classification and Entry of Historical Data

(each omitting one unique case from the sample)

Model Building Generation of Subset Models by Least Square Error Method

Fit Each Sub-sample Using Least Square Method

Reliability Analysis Calculation of Average MSQ for Each Subset Model by Cross Validation Model Selection

Calculate Forecasting Error for Each Sub-sample

Selection by Forward Stepwise and Backward Stepwise Procedures Model Adjustment Exclusion of Offending Variables

Determine Average MSQ for Sub-samples of a Subset of Predictors

Best Subset Model (Model with Smallest Average MSQ) Performance Assessment Accuracy Testing against JSEM, Floor Area and Cube Models

Figure 5-1: Research Framework for Identification, Selection and Validation of Price Models

92

5.3

Types of Quantity Measured in Single-Rate Forecasting Models

The traditional models that are used for comparison in this study, including JSEM, the floor area model and the cube model, are all single-rate forecasting models. JSEM is the most complicated of the models because it demands more measured variables, including the area of each floor, the perimeter of each floor, the storey height of each floor and the roof area.

After the introduction of these

traditional models, there are more models proposed such as those reviewed in Chapter 3. However, most of them demand far more information than what is extractable from sketch drawings. In other words, these models are to be used at a later design stage. As described in Chapter 3, JSEM can be represented by Equation (5.1): n m m ⎛ n ⎞ P = ⎜⎜ ∑ (2 + 0.15i ) f i + ∑ pi s i + 2∑ f j′ + 2.5∑ p ′j s ′j + r ⎟⎟ ⋅ R , i =0 j =0 j =0 ⎝ i =0 ⎠

(5.1)

where P is the forecasted price, fi is the floor area at i storeys above ground, pi is the perimeter of the external wall at i storeys above ground, si is the storey height at i storeys above ground, n is the total number of storeys above ground level, m is the total number of storeys below ground,

f’j is the floor area at j storeys below

ground level, p’j is the perimeter of the external wall at j storeys below ground level, s’j is the storey height at j storeys below ground level, r is the roof area and R is the unit rate (determined by historical data).

93 The floor area model and cube model can also be represented mathematically by Equations (5.2) and (5.3), respectively: ⎛ m+ n ⎞ P = ⎜ ∑ fi ⎟ ⋅ R ⎝ i =0 ⎠

(5.2)

⎛ m+ n ⎞ P = ⎜ ∑ f i ⋅ si ⎟ ⋅ R ⎝ i =0 ⎠

(5.3)

According to Equations (5.1) to (5.3), there are common variables amongst the three models (e.g., fi, m and n).

5.4

Simplification of JSEM

To make a price model useful, it must be general enough to accommodate variations without violating the original assumptions of the model, and specific enough to reflect cost-significant factors. It must also be simple enough to be understood by practicing forecasters, and intricate enough to explain real situations. Although the data that are used in James’ study are mainly from low-rise buildings (less than three storeys) such as houses, and medium rise buildings (3 to 10 storeys) such as schools and industrial buildings, JSEM can also be applied to high-rise buildings (higher than 10 storeys). Moreover, it can be applied to building projects that

contain

more

than

one

t

t

l =0

l =0

building

(by

adding

another

set

of

variables, ∑ (2 + 0.15l ) f l " + r + ∑ pl "s l " ). However, the higher the building, the more variables have to be created.

If Equation (5.1) is used to estimate the price of a

40-storey building without a basement (a typical number of storeys for high-rise buildings in Hong Kong), then one has to measure the floor area, the perimeter and

94 storey height ten times (once for each level), the number of levels, and the roof area. With the JSEM, 81 variables (e.g., pjsj and p’js’j) have to be created, which are calculated from 161 items of measurement (e.g., pj ,,sj, p’j and s’j).

The rationale in

behind of the JSEM is that the areas of different parts of a building affects the building price differently.

The huge number of variables generated for modelling

the price of high-rise buildings would induce a heavy burden on the size of the data set required.

However, the rationale can be sustained and the number of variables

can be significantly reduced if the assumption is made that the floor areas at different levels are approximately the same. This assumption is supported by the fact that high-rise buildings generally comprise repeating floors. Very often, only typical layout plans instead of the layout plans for every floor are provided for forecasting in the early design stage. Although layout plans for every floor are more available at the later stages, other development restrictions such as those laid down on land leases, e.g. the site coverage and the plot ratio, leave little room for designers or the decision makers to change the distribution of areas and the number of storey drastically. With this assumption, the number of variables is reduced to four: the total level, total elevation area (which can be easily measured by multiplying the average perimeter by the overall building height), the average floor area and the roof area. Equation (5.7) represents the simplified JSEM for use with high-rise buildings.

Care has to be taken to avoid applying the simplified equation to

buildings with significantly different floor sizes at different levels, or the assumption will be violated.

It is possible that the presence of a podium in a typical large

development may also violate the assumption, as floors that are located at podium level will generally contribute a much larger average floor area than those in the tower or towers above the podium.

To avoid a probable violation, the variables in

95 JSEM that represent the floor area above ground level have to be divided into two parts – one for the podium and the other for towers. This leads to Equation (5.6), which represents JSEM for buildings with a podium design. Here are the steps for deriving Equations (5.6) and (5.7). n

Let ∑ pi si i =0

= np pt s pt ,

where ppt is the average perimeter of the superstructure m

and spt, is the average storey height of the podium. Let ∑ p ′j s ′j = mp b s b , where pb is j=0

the average perimeter of the basement and sb, is the average storey height of the basement. Let

m

∑ f ′ = mf j =0

j

b

, where fb is the average floor area per storey for floors at

basement level and f’0 ≈ f’1 ≈ … ≈ f’m ≈ fb (the floor area for each level of the basement is more or less the same, and is approximately equal to fb).

Equation (5.1)

for JSEM becomes: ⎛ n ⎞ P = ⎜ ∑ (2 + 0.15i ) f i + r + np pt s pt + 2mf b + 2.5mpb s b ⎟ ⋅ R . ⎝ i =0 ⎠

(5.4)

Consider that a building comprises a podium section and a tower section. Let n = a + b, where a is the number of storeys of the podium and b is the number of storeys of the tower. f0 ≈ f1 ≈ …≈ fa ≈ fp (the floor area for each level of the podium is more or less the same, and is approximately equal to fp), where fp is the average storey area for floors at the podium level. fa+1 ≈ fa+2, … , fb ≈ ft (the floor area for each level of the tower is more or less the same, and is approximately equal to ft), where ft is the average storey area for floors at tower level. Then,

96

n

a +b

a

i =0

i =0

i =0

∑ (2 + 0.15i) f i = ∑ (2 + 0.15i) f i = ∑ (2 + 0.15i) f p +

b

∑ (2 + 0.15i) f

i = a +1

t

= 2 af p + 0.15(0 + 1 + L + a ) f p + 2bf t + 0.15[( a + 1) + ( a + 2) + L + ( a + b )] f t = 2af p + 0.15(0 + 1 + L + a ) f p + 2bf + 0.15abf t + 0.15(1 + 2 + L + b) f

(5.5)

a ( a − 1) b(b − 1) f p + 2bf t + 0.15abf t + 0.15 ⋅ ft 2 2 0.15 ⎞ 0.15 2 0.15 ⎞ 0.15 2 ⎛ ⎛ = ⎜2 − a fp + ⎜2 − b f t + 0.15abf t ⎟ af p + ⎟bf t + 2 ⎠ 2 2 ⎠ 2 ⎝ ⎝ = 2 af p + 0.15 ⋅

The simplified equation for JSEM becomes: ⎤ ⎡⎛ 0.15 ⎞ 0.15 2 0.15 ⎞ 0.15 2 ⎛ ⎢⎜ 2 − 2 ⎟af p + 2 a f p + ⎜ 2 − 2 ⎟bf t + 2 b f t + 0.15abf t ⎥ P = ⎢⎝ ⎠ ⎝ ⎠ ⎥⋅R ⎥⎦ ⎢⎣+ r + np pt s pt + 2mf b + 2.5mpb sb

0.15 ⎞ 0.15 2 0.15 ⎞ 0.15 2 ⎛ ⎛ = ⎜2 − a f pR + ⎜2 − b f t R + 0.15abf t R ⎟af p R + ⎟bf t R + 2 ⎠ 2 2 ⎠ 2 ⎝ ⎝ + rR + np pt s pt R + 2mf b R + 2.5mpb s b R

(5.6)

Consider a building that has no podium, or that the average storey area for the podium is approximately equal to that of the tower, i.e., fp ≈ ft ≈ fpt, where fpt is the average storey area for floors above ground level, and a + b = n. The simplified equation becomes:

97

⎤ ⎡⎛ 0.15 ⎞ 0.15 2 0.15 ⎞ 0.15 2 ⎛ ⎢⎜ 2 − 2 ⎟af pt + 2 a f pt + ⎜ 2 − 2 ⎟bf pt + 2 b f pt + 0.15abf pt ⎥ P = ⎢⎝ ⎠ ⎝ ⎠ ⎥⋅R ⎥⎦ ⎢⎣+ r + np pt s pt + 2mf b + 2.5mpb sb ⎡⎛ ⎤ 0.15 ⎞ 0.15 = ⎢⎜ 2 − (a + b) 2 f pt + r + np pt s pt + 2mf b + 2.5mpb sb ⎥ ⋅ R ⎟(a + b) f pt + 2 ⎠ 2 ⎣⎝ ⎦

(5.7) ⎡⎛ ⎤ 0.15 ⎞ 0.15 2 = ⎢⎜ 2 − n f pt + r + np pt s pt + 2mf b + 2.5mpb sb ⎥ ⋅ R ⎟nf pt + 2 ⎠ 2 ⎣⎝ ⎦ 0.15 ⎞ 0.15 2 ⎛ = ⎜2 − n f pt R + rR + np pt s pt R + 2mf b R + 2.5mpb sb R ⎟nf pt R + 2 ⎠ 2 ⎝

5.5

Identification of a Problem

In JSEM, building prices are assumed to be proportional to the floor area, roof area and elevation area. However, their exact relationships have not been properly studied. As JSEM has been determined by rule of thumb, or by a very coarse method, it is possible that JSEM may include some irrelevant predicting variables, or have excluded some significant predicting variables, and that the relationships between building prices and the predicting variables are not the same as has been proposed. and (5.7).

As suggested, JSEM can be represented by Equations (5.6)

These equations actually fit the hypothetical models such that each

equation contains one dependent variable (response), P, and some independent variables (predictors), including nfptR and rR, which can be statistically developed by regression techniques. The question in hand can be considered as a typical multiple linear regression problem. Regression techniques can be used to determine the

98 subset of variables and the corresponding coefficients that give the best forecast of the building prices. The developed regressed models and the employed modelling approach in this research are both advancements of JSEM. Let all of the possible predictors be Vi, where i = 1, 2, …, k, the building price model can be represented as k

P = β 0 + β1V1 + β 2V2 + L + β k Vk = β 0 + ∑ β iVi ,

(5.8)

i =1

where β0, βis are constant coefficients and Vis are independent variables. Table 5-1 shows the coefficients and the variables that are designated in JSEM with reference to Equations (5.6) and (5.7). There are other available techniques for modelling the variables other than multiple regression analysis.

Perhaps the closest alternative approach that serves

the same purpose is structural equation modelling. This takes into account the modelling of interactions, nonlinearities, correlated independents, measurement errors, correlated error terms, multiple latent independents, and one or more latent dependents (independents and dependents are each measured by multiple indicators). Compared with multiple regression, structural equation modelling includes more flexible assumptions (particularly in allowing interpretation even in the face of multicollinearity), uses confirmatory factor analysis to reduce measurement error by having multiple indicators for each latent variable, provides a graphical modelling interface and has the ability to test models with multiple dependents to model mediating variables and error terms, and tests coefficients across multiple between-subject groups (Garson 2004). Although structural equation modelling has many advantages over the multiple regression method and is considerably more

99 powerful, multiple regression is more suitable for this research because of the possible violation of assumptions and the multivariate normality of the indicators (Jaccard and Wan 1996 p. 80). More importantly, the cross validation approach to the multiple regression method that is used in this research provides a more direct means for the measurement of reliability for small size samples. Table 5-1: Coefficients and Variables Designated in JSEM Equation (5.6)

Coefficients (βi)

Variables (Vi)

β0 = 0

Equation (5.7)

Coefficients (βi)

Variables (Vi)

β0 = 0

0.15 ⎞ ⎛ β1 = ⎜ 2 − ⎟ 2 ⎠ ⎝

0.15 2

V1 = af p R

0.15 ⎞ ⎛ β1 = ⎜ 2 − ⎟ 2 ⎠ ⎝

0.15 2

V1 = nf n R V2 = n 2 f n R

V2 = a 2 f p R

β2 =

0.15 ⎞ ⎛ β3 = ⎜ 2 − ⎟ 2 ⎠ ⎝

V3 = bf t R

β3 =1

V6 = rR

0.15 2

V4 = b 2 f t R

β4 =1

V7 = np pt s pt R

β5 = 0.15

V5 = abf t R

β5 =2

V8 = mf b R

β6 =1

V6 = rR

β6 = 2.5

V9 = mpb sb R

β7 =1

V7 = np pt s pt R

β8 =2

V8 = mf b R

β9 = 2.5

V9 = mpb sb R

β2 =

β4 =

5.6

Data Preparation and Entry

Cost analyses that were prepared by forecasters are chosen to be the data source, as they contain all the information that is required for this study, such as the

100 tender price, floor area, roof area, building height and external wall area. The cost analyses that are used in this research were provided by one of the two dominating quantity surveying practices in Hong Kong (see Appendix A). Since the quantity surveying consultants in Hong Kong rarely focus their business on providing services to projects of particular types or with particular characteristics, project data obtainable from the dominating practices are considered to have sufficient representation of the price behaviour.

5.6.1 Data sample

The data sample consists of the values of identified candidate variables and tender prices from 148 completed projects in Hong Kong.

The tenders for these

projects were received in the ten-year period between the third quarter of 1988 and the second quarter of 1997. Hong Kong is a former British colony.

Both the structure of the

construction industry and the professional practices within the industry are very similar to those in the UK.

In 1929, the Royal Institution of Chartered Surveyors

(RICS) established a branch office in Hong Kong.

Before the local surveying

institution, The Hong Kong Institute of Surveyors (HKIS), was founded in 1984, the Hong Kong branch of theRICS was the only institution that recognizes and provides local support to surveyors.

However, neither the HKIS nor the RICS have

attempted to formalise forecasting practice in Hong Kong. Unlike forecasts that are produced in the UK, which are generally presented in the format of the Building Cost Information Service (BCIS) (BCIS 1969), there is

101 no standardised definition or classification of building elements in Hong Kong. Using data from a single source avoids the unnecessary complications that arise from differences in the classification of building elements or the format and the breakdown of building costs, which may differ across firms. Moreover, there is no Building Cost Information Service (BCIS) type of organisation that provides online cost advice services in Hong Kong, and a practice will not provide its own historical project data to a competitor.

Thus, it is almost impossible for the forecaster of one

company to get access to cost data from a third party such as the BCIS or another practice.

That the data is collected from a single source also ensures that the

models that are generated in this research are applicable to practical forecasting, because the cross validation approach, as described in section 5.7.4, is very similar to the manner in which forecasts are prepared in practice.

5.6.2 Definition and classification of building types

James’ study is based on a sample of 86 tenders in the categories of flats, schools, industrial buildings and let houses in the 1950s in the UK. In accordance with James’ study, all of the data from the 148 projects in this research are grouped according to their building types. The data are grouped for analysis into different building types according to the Construction Index Samarbetskommitem for Byggnadsfrager (CI/SfB), which is published by the Royal Institution of British Architects (RIBA) (Ray-Jones and Clegg 1976).

Five types of building were identified: (1) code no. 32 – office

facilities, offices; (2) code no. 442 – nursing homes, convalescent homes, sanatoria;

102 (3) code no. 712 – primary schools; (4) code no. 713 – secondary schools; (5) code no. 816 – flats (apartments). Because the number of available projects was small, and the provisions for primary and secondary schools were very similar, these two sub-types were grouped together. For ease of reference, the four groups are known as offices, nursing homes, schools and private housing.

Table 5-2 shows the distributions of the

building projects that were used for the development of price models, according to their building types. It should be noted that a few projects contain a mixture of more than one type of building. For example, a 50-storey office tower project may have a few shops at ground floor level. As all of the projects selected are dominated by one particular type of building, the effect of the presence of another type or types of building is considered to be insignificant.

103 Table 5-2: Classification of building projects according to building types CI/SfB code

Building Type

Inclusions

Exclusions

No. of Samples Collected

No. of Discarded Cases

No. of Samples Used

32

Office

Offices, such as design offices, professional offices and executive offices, that are not associated with a particular facilities

Official administrative facilities, law court, commercial facilities, trading facilities, shops, protective service facilities, bank, shopping arcade, industrial and office

45

3

42

442

Nursing home

Nursing homes, convalescent homes and sanatoria

Hospital facilities, hospitals, medical facilities and animal welfare facilities

23

-

23

712 & 713

School

Primary and secondary schools including infants schools, secondary modern, secondary technical and community schools

Universities, colleges, nursery schools, kindergarten, scientific facilities, private schools, exhibition, display facilities, information facilities, libraries and other education facilities

23

-

23

816

Private Housing

Multi-storey Flats (Apartments)

Low-rise housing, one-off housing units, houses, public housing, special housing facilities, hotels, hostel, historical residential facilities, quasi-private housing and service apartment

57

7

50

Total:

148

10

138

104 5.6.3 Treating of outliers

Outliers (extreme cases) are especially troublesome when the goal is to select from a set of forecasting models, but are less of a problem for model calibration (Armstrong and Collopy 1992). The presence of outliers can seriously affect the least-squares fitting of a regressed model.

These outliers may possess different

characteristics from the rest of the data. Some regression diagnostics, such as the jack-knife residual and leverage, assist in the identification of outliers. However, pure reliance on the results of these statistical techniques (e.g., when they lie three or more standard deviations from the mean of the residuals) for excluding extreme cases without studying the plausibility of the exclusion may lead to a favourable model being produced from biased data. Therefore, unless there is strong evidence to indicate that a case is not a member of an intended sample, it should not be discarded. All of the inputs and outputs of the regression are evaluated according to three criteria: reasonableness and given knowledge of the variable, response extremeness, and predictor extremeness (Kleinbaun et al. 1998 p. 228).

The

residuals from the regressed models were analysed, and three office and seven private housing cases were discarded.

All of the discarded office cases had

comparatively lower response values. Further investigation revealed that the three office cases were for industrial and office (I-O) purposes1. Moreover, five of the seven discarded private housing cases had comparatively lower response values, and 1

“An I-O Building is defined as a dual-purpose building in which every unit of the building, other than that in the purpose-designed non-industrial portion, can be used flexibly for both industrial and office purposes. In terms of building construction, the building must comply with all relevant building and fire regulations applicable to both industrial and office buildings, including floor loading, compartmentation, lighting, ventilation, provision of means of escape and sanitary fitments.” (Town Planning Board 2003)

105 the other two had higher response values. The five lower response cases were discarded, as they are quasi-private housing development (housings completed under the Private Sector Participation Scheme (PSPS) 2), which were not solely developed by private developers and thus were not part of the intended sample data. The two higher response cases were found to be service apartment buildings, which are generally better furnished than ordinary private housing, and were therefore also discarded. To sum up, the differences in response values for the discarded cases may be caused by the differences in the building provisions (industrial and office buildings, service apartments and quasi-private housing), contractual arrangements (quasi-private housing) and technology of fabrication (quasi-private housing). Finally, 138 building projects in four categories were used for the modelling (see Table 5-2).

5.7

Model Building

5.7.1 Dependent Variables

As is reviewed in section 4.4 of Chapter 4, the lowest tender price is set to be the target of forecast. In accordance with James’ paper, the lowest tender prices that are used in the modelling exclude the price of the foundations, building services, external works, preliminaries and contingencies.

2

Under the Private Sector Participation Scheme (PSPS), private sector developers bid for the right to build according to a given design. The finished flats will be purchased by the Housing Authority of the Hong Kong Government at a pre-agreed price for onward sale to buyers who are selected by the Housing Authority.

106 When tender prices are used as the response for modelling, there is a risk of producing poorly performing models in terms of their percentage errors, i.e. the ratio of error (which is forecasted tender price minus the actual or lowest tender price) to the actual tender price. It is also found that the magnitude of error that is produced from forecasts of a wide range of tender prices (e.g., for offices, the tender prices range from HK$24 million to $1,477 million) varies significantly.

As the

performance of the forecasts are measured according to their percentage errors, the minimisation of total squared errors in the least-squares method is not necessarily an effective means of obtaining a good model unless tender prices in all of the cases in a model are fairly close to each other. To reduce the influence of a wide tender price range, the tender price per total floor area is adopted as the response. The tender price per total floor area is a sensible alternative because forecasters usually present building prices in unit prices, especially at the early budget stage, and the calculation of forecasted prices from the unit price model is straight forward. The unit price model can be directly compared with other conventional models despite their responses being different, because performance is measured on the basis of percentage errors.

5.7.1.1 Price Index Adjustment

The tender prices were rebased to the prices of the second quarter of 1997 by means of the tender price index that is published by the quantity surveying practice that provided the data for this study. A copy of this tender price index is attached in Appendix B.

107 5.7.1.2 Other Adjustments

Apart from incorporating inflationary effects using the tender price index, there may be some other characteristics that need to be adjusted using indices (Kouskoulas and Koehn 1974; Pegg 1984). However, there is a lack of indices other than the tender price index in Hong Kong. For instance, the location index, while popular in many countries, is not in use at all. As the overall area of Hong Kong is only slightly more than 400 square miles, projects that are undertaken anywhere in Hong Kong are interpreted as being in the same geographical region (Drew 1995). Other than projects that are located in remote areas such as outlying islands and hillsides, etc., the location effect is not significant. No buildings located in remote areas are included in the data pool. The other possible adjustments by indices such as the quality and technology of buildings are considered to be either irrelevant or inapplicable. First, there are no quality and technology indices in use. Second, detailed specifications and method statement for buildings are yet to be defined at the early design stage. Instead of using indices for adjustment, only project data with similar characteristics, such as project type, are grouped together for modelling.

5.7.2 Candidate variables

To identify the predictors for best subset models, the modelling process started off with the variables that are used in JSEM.

The actual measurements of

quantities (e.g. perimeter and storey height) for the variables in JSEM (e.g. elevation area) were extracted to form the primary candidate variables for the regression analysis. With reference to the variables in JSEM, a few candidate variables, such

108 as the number of storeys, the square of the number of storeys and their interaction with storey height, were also added to form another set of candidate variables for regression analysis. The unit rate ‘R’ was excluded, because the tender price is not measured on a unit area basis in regressed models. Table 5-3 shows a full list of the candidate variables for the regressed models for buildings with and without basements.

Table 5-3: List of Candidate Variables Primary Model

All Identified Variables (without higher degree and interaction effects) No. of storey for podium (a), No. of storey for tower (b), No. of storey for basement (m), Square of no. of storey for podium (a²), Square of no. of storey for tower (b²), Average floor area for podium (fp), Average floor area for tower (ft), Average floor area for basement (fb), Average storey height for podium (sp), Average storey height for tower (st), Average storey height for basement (sb), Average perimeter for tower and podium (ppt), Average perimeter for basement (pb), Roof area (r)

JSEM Model

All Subsets Model (With Basement)

All Subsets Model (Without Basement)

n , m , n² , fpt , fb , spt , sb , ppt , pb ,

n , n² , fpt , spt , ppt , nfpt , n²fpt ,

nfpt , n²fpt , mfb , nspt , msb , n²sp t , nsptppt , msbpb , n²sptppt , r

nspt , n²spt , nsptppt , n²sptppt , r

a fp , a²fp , bft , b²ft , abft , mfb , (asp + bst)ppt , msbpb , r (separating podium and tower)

Reduced Version of All Identified Variables (without higher degree and interaction effects) No. of storey for superstructure (n), No. of storey for basement (m), Square of no. of storey for podium (n²), Average floor area for superstructure (fpt), Average floor area for basement (fb), Average storey height for superstructure (spt), Average storey height for basement (sb), Average perimeter for tower and podium (ppt), Average perimeter for basement (pb), Roof area (r)

n fpt , n²fpt , mfb , nsptppt , msbpb , r (combining podium and tower)

109 5.7.3 Fitting Criterion

There are two approaches for the selection of predictors based on errors of forecasts – parametric and non parametric.

For a linear model, the former approach

demands the satisfaction of some statistical assumptions, including the following (Kleinbaum et al. 1998, pp. 43-46). (1) For any fixed value of the variable V, P is a random variable with a certain probability distribution, e.g., a normal distribution (σ P2|V , μ P|V ) , that has a finite mean and variance; (2) the p-values are statistically independent of one anther; (3) the mean value of P (μ P|V ) is a straight

(

)

line function of V; (4) the variance of P is the same for any V σ P2|Va = σ P2|Vb ; and (5) for any fixed value of V, P has a normal distribution. If assumptions (1) to (4) are satisfied and assumption (5) is not badly violated, then the conclusions that are reached by a regression analysis remain reliable and accurate. This approach allows the use of multiple partial F statistics and p-values to select variables for the best models. These parametric procedures are suitable for routine problems, but not for the problems that are identified in this research.

First, the sample sizes for the

various types of building are small, around 25 to 50. This would easily cause bias in the estimation of the coefficient. Second, the use of parametric techniques such as the least-squares method is known to be robust, even if the normality assumption (the fifth assumption) is not fully satisfied. However, the parametric estimates of the error rates may not be correspondingly robust (McLachlan 1987). Although transformation can be applied to variables to fulfil the requirement of normality, it may cause the violation of other assumptions.

110 Instead of relying on the multiple partial F statistics and p-values for the selection of variables, a non-parametric approach that is based on the mean square error (MSQ) is adopted. There are two main advantages of using MSQ rather than the actual errors or absolute errors. The first is that positive differences do not cancel negative differences, and the second is that the use of differentiation is not difficult (Fausett 2002). Previous regressed price models that have been developed by researchers use either the least-squares approach or the minimum variance approach for the model fitting. In a linear fitting, both approaches produce the same solution (Kleinbaum et al. 1998, p. 118). According to the non-parametric approach that is adopted in this study, the termination criterion is to minimise the MSQ, and therefore the least-squares approach is preferred.

5.7.3.1 Matrix Notation for Calculation of MSQ

Recall that Equation (5.8) can be presented in a matrix notation. Let P be a column vector containing n rows of observed values for the response {P1, P2, … ,

Pn}T and V be a matrix that contains n x (k + 1) of the observed values for a subset of variables such that:

V

⎡1 V1,1 V1, 2 ⎡V1 ⎤ ⎢1 V ⎢V ⎥ V2, 2 2 ,1 2 ⎥ = ⎢ = ⎢ ⎢M M ⎢M⎥ M ⎢ ⎢ ⎥ ⎣Vn ⎦ ⎣1 Vn ,1 Vn , 2

L V1,k ⎤ L V2,k ⎥⎥ . O M ⎥ ⎥ L Vn ,k ⎦

(5.9)

Corresponding to Pi is Vi, a row vector that contains the observed values for the variables (which contain a constant term and k number of predictors) and {1, Vi,1,

111

Vi,2, … , Vi,k}, where i = 1, 2, … , n

.

In a regressed model, the price is

represented by: P =V β + e ,

(5.10)

where β is a column vector of the coefficients {β0, β1, β2, … , βk}T and e is a column vector of the forecasting errors {e1, e2, … , en}T. The mean square error then becomes:

MSQ =

=

=

1 n 2 1 T ∑ ei = n ( e e ) n i =1

1 (P − V β )T (P − V β ) n

(5.11)

1 T T T (P P − βT V P − P T V β + βT V V β) n

βˆ is the β that produces the minimum MSQ.

To determine βˆ , the

MSQ is differentiated with respect to β , and the result is equated to zero, i.e., ∂MSQ ∂β

1 T T = (−2V P + 2V V βˆ ) = 0 . n

β = βˆ

(5.12)

This yields: T T V V βˆ = V P

(

T βˆ = V V

)

−1

T

V P

Therefore, the minimum MSQ is:

(5.13)

112

MSQ min

=

(

)

1 T . T T P P − βˆ T V P − P T V βˆ + βˆ T V V βˆ n

(5.14)

5.7.4 Reliability analysis

The fitness of a model that is built by historical data is not a reliable indicator of its forecasting ability (Armstrong 1985).

In classical statistical inference, a

model is validated using ex ante (out of sample) forecasts. However, the lack of available data is always a limitation in the construction of price forecasting models. Unquestionably, it is problematic to use the same data both to build up and to validate a statistical model, i.e., to use ex post simulation prediction (within simple), but the alternative of analysing data blindly simply to preserve the purity of classical statistical inference, presents even worse problems. In this research, a resampling method is adopted to select variables and evaluate models. Three possible resampling methods were considered (Efron 1982): cross validation, in which one case is omitted in turn from the model derivation and the resulting coefficients are applied to that case; the jack-knife method, in which one case is omitted in turn from the model derivation and the resulting coefficients are applied to the other cases; and the bootstrap method, in which the coefficients are used to generate simulated data from which a second set of coefficients is obtained. For predictive applications, the cross validation method has the most intuitive appeal as with non-time-series data of this nature each error value can be thought of as a real error that may arise in the practice of forecasting (Skitmore 1992).

In cross

validation, the accuracy of statistical inference is preserved by dividing at random a sample of data into two sub-samples, an exploratory sub-sample, which is used to

113 select a statistical model for the data, and a validatory sub-sample, which is used for formal statistical inference (Fox 1997). This is a compromise method that keeps the integrity of the inference when the same data are used for the selection and validation of statistical models, and is an approach to ex post forecasting, because test data are within simple but are not used in model fitting. It is different from split sample validation in that the split sample validation uses only a single sub-sample (the validation set) to estimate the error.

This distinction is particularly important,

because cross validation is proved to be markedly superior for small data sets (Goutte 1997). To simulate a practical situation, the ‘leave-one-out’ cross validation method is the most suitable approach, and is adopted in this study.

The steps of the

‘leave-one-out’ cross validation approach for the assessment of the reliability of a model are shown in Figure 5-1.

The accuracy of statistical inference in the

leave-one-out method is preserved by dividing a sample that contains n cases of data into n exploratory sub-samples (each containing n - 1 cases that are obtained from the original n-case sample by the omission of one case without repetition), each of which is used to select a statistical model using the least-squares approach, and n omitted cases, each of which is used to validate the selected model from an exploratory sub-sample that does not contain the omitted case. An average MSQ is deduced from n models for each subset of candidates. The average MSQs from models of different subsets of candidates are compared, and the model with the smallest average MSQ is taken to be the best subset model. Cross validation appears to make no assumptions at all. For the purpose of comparing models, each explanatory sub-sample produces a slightly different best-fitting curve in the family, and there is a penalty for large, complex families of

114 curves because large families tend to produce greater variation in the curves that best fit an explanatory sub-sample (Turney 1990a). This leads to an average fit that is poorer than the fit of the curve that best fits the total data sample (Forster 2001 pp. 96-97). In cross validation, the selection criterion is designed implicitly, rather than explicitly, as it gives the forecasting accuracy in terms of MSQ.

5.7.4.1 Matrix Notation for Calculation of MSQ by Leave-one-out Method

Referring to the least-squares method that is described in the matrix notation in section 5.7.4.1, let P(-j) be a column vector that contains n rows of observed values for the response {P1, P2, …, P(j-1), P(j+1), …, Pn}T, let V(-j) be a matrix containing (n – 1) x (k + 1) of the observed values for the subset of variables (with the omission of one row of the observed values, representing the jth case, from the matrix of variables V such that j is any number from 1 to n):

V

(−j)

V1,2 ⎡1 V1,1 ⎡ V1 ⎤ ⎢1 V ⎢ V ⎥ V2 ,2 2 ,1 ⎢ ⎢ 2 ⎥ ⎢M ⎢ M ⎥ M M ⎢ ⎥ ⎢ = ⎢V( j −1 ) ⎥ = ⎢1 V( j −1 ),1 V( j −1 ),2 ⎢1 V( j +1 ),1 V( j +1 ),2 ⎢V( j +1 ) ⎥ ⎢ ⎥ ⎢ M M ⎢M ⎢ M ⎥ ⎢1 V ⎢ V ⎥ Vn ,2 n ,1 ⎣ n ⎦ ⎣

L L O L L O L

V1,k ⎤ V2 ,k ⎥⎥ M ⎥ ⎥ V( j −1 ),k ⎥ . V( j +1 ),k ⎥ ⎥ M ⎥ Vn ,k ⎥⎦

(5.15)

β ( − j ) is a column vector of the coefficients {β0, β1, β2, …, β(j-1), β(j+1), … , βk}T and e(-j) is a column vector of the forecasting errors {e1, e2, …, e(j-1), e(j+1), … , en}T of the regressed model P( − j ) = V ( − j ) β( − j ) + e( − j ) . Similar to the derivation that is shown in Equations (5.11) to (5.14), the minimum MSQ of the regressed model that does not contain the jth case becomes:

115 MSQmin

(−j)

=

(−j) ( − j )T (−j) ( − j )T (−j) ( − j )T (−j) 1 ⎛ ( − j )T ( − j ) ˆ ( − j )T ( − j )T ( − j ) + βˆ P −β V P −P V βˆ V V βˆ ⎞⎟ ⎜P n⎝ ⎠

The average of MSQmin(-j), MSQ min

(−j)

(5.16)

, is deduced from n regressed models

(for j = 1, … , n) of the subset of variables in accordance with Equation (5.17),

MSQ min

(−j)

=

(−j) 1 n MSQ min . ∑ n j =1

Different MSQ min

(−j)

(5.17)

from different subsets of variables that are chosen by

the selection strategy that is described in the next section are compared. The subset of variables that gives the smallest MSQ min

(−j)

is the best subset model.

5.7.5 Selection Strategies

The all-possible regressions procedure that fits all combinations of variables is used over other variables selection procedures whenever practicable, because it is the only procedure that guarantees the identification of the best subset model. However, to find the best subset out of all of the subsets for the models with basements that are listed in Table 5-3 using this procedure involves the fitting of 1 to 19 combinations of variables, i.e., 19

19!

∑ i!(19 − i)! = 5.243 ×10

5

.

i =1

If each fitting consumes four seconds of computing time, then a full analysis of all of the subsets for one type of building using one fitting criterion would take

116 over 24 days of computing time.

As four types of building are included in this

research and two sets of variables are suggested (refer to Table 5.3), the overall computing time would be much longer than 24 days! There are a few common selection procedures for parametric problems, such as forward elimination, backward elimination and stepwise selection.

Forward

selection begins with no variable in the regression equation. The variable that has the highest correlation with the dependent (criterion) variable is entered into the equation first.

The remaining variables are then entered into the equation

depending on the contribution of each variable. Backward elimination begins with all of the predictor variables in the regression equation, and sequentially removes them. Stepwise selection is a combination of the forward and backward elimination procedures. These procedures can also be applied to non-parametric regression, and the difference rests on the use of different termination criteria. To ensure the selection of the best subset model, a dual stepwise procedure that consists of a combination of the forward stepwise and backward stepwise procedures is adopted (Figure 5-2). According to the algorithm for the forward stepwise procedure (on the left-hand side of the figure), forward regression is first applied by entering one candidate variable at a time. When no candidate that enters into the model can further reduce the average MSQ, the forward regression ends.

A subset of variables that produces the

minimal average MSQ is selected. Backward regression is then applied, and if the number of variables that was selected in the forward regression is less than two, then the stepwise procedure will be terminated, as all single predictor models have been considered in the forward regression. Candidates in the subset that are selected by the forward regression are eliminated one at a time until the average MSQ cannot be

117 further reduced by the elimination of a candidate. Forward regression starts again and backward regression follows until the average MSQ cannot be further reduced, and a minimum average MSQ is determined at the end of the forward stepwise procedure. The backward stepwise procedure (on the right-hand side of the figure) is the same as the forward stepwise procedure, except that it commences with all of the candidates being contained in the model and starts off with a backward regression. The best subset model that is deduced by the forward stepwise procedure is compared with that deduced from the backward stepwise procedure. If they are the same, then the selected subset model will either be very close to, or the same as, the best model using the all-possible regression procedure.

118

FORWARD STEPWISE REGRESSION

Identification of a base model containing n variables

Forward Regression

BACKWARD STEPWISE REGRESSION

Backward Regression

Generate all 1-variable models

Generate n-variable model

Select best 1-variable model

For r = n

For i = 2

Generate all (r-1)-variable models from already entered 1st to rth variables

Generate all i-variable model with 1st to (i-1)th variables already entered

Select best (r-1)-variable model

Select best i-variable model

For r = r - 1

For i = i + 1

Yes Is average MSQ of best i -variable model < that of best (i-1)variable model?

Yes Exclusion of an offending variable

Is average MSQ of best (r-1) variable model < that of rvariable model?

No No Yes No

No

Is r > (n – 1)?

Is i > 2?

Yes

Backward Regression

The best model in this stage contains (i1) number of variables

Best Subset Model by Forward Stepwise Procedure

Generate all (i-2)-variable models from already entered 1st to (i-1)th variables

Are they the same model?

Best Subset Model by Backward Stepwise Procedure

Forward Regression

The best model in this stage contains r number of variables

Generate all (r+1)-variable model with 1st to rth variables already entered

Yes Select best (r+1)-variable model

Select best (i-2)-variable model STOP

Yes For i – 1 = r + 1, which the best model in this stage contains (i -1) number of variables

Is average MSQ of best (i-2) variable model < that of best (i-1)variable model?

No

No

For i = r, which the best model in this stage contains i number of variables

Yes For r = i - 2, which the best model in this stage contains (r + 1) number of variables

For r = i – 3, which the best model in this stage contains (r+1) number of variables

Generate all i-variable model with 1st to (i-1)th variables already entered

Generate all r-variable models from already entered 1st to (r+1)th variables

Select best i-variable model

Select best r-variable model

For r = r - 1

No

Is average MSQ of best (r+1) variable model < that of rvariable model?

Is average MSQ of best r-variable model < that of best (r+1)variable model?

Yes

For i = i + 1

Yes

Is average MSQ of best i variable model < that of (i-1)variable model?

Figure 5-2: Algorithm for Dual Stepwise Selection

No

119

5.8

Model Adjustment

5.8.1 Exclusion of candidates

The best subset models that are selected by the forward stepwise and backward stepwise procedures are not necessarily the same. Divergence is easily caused by multicollinearity, i.e., strong correlations amongst the predictors. One typical strategy to avoid the presence of multicollinearity is to combine or remove predictors that are strongly correlated to each other. implemented by the use of correlation tables.

This can be easily

However, this strategy is not

appropriate for the modelling exercise in this research, because a lot of the selected predictors are actually interaction terms, and are likely to be strongly correlated with the primary variables (in Table 5-3). Moreover, as the future use of the best model is for forecasting rather than understanding how predictors in the model have an impact on the response, good models that suffer from multicollinearity still produce accurate forecasts. Therefore, except for variables that are very highly correlated (> 0.95), predictors that have similar values to each other have not been deleted simply because their correlation is high (say, > 0.7). If the cross-validated average MSQs of the best models that are generated from the two procedures are different, then one of them will always be better – the one with the smaller average MSQ. To prevent a less significant candidate acting as an offending variable and entering into the model before a more significant candidate (or a more significant candidate being eliminated from the model before a less significant candidate), an algorithm to

120 exclude offending variables has been set up to deal with the possible divergence. This involves four steps: (1) the exclusion of a candidate in turn before modelling by regression, (2) the generation of models with forward stepwise and backward stepwise procedures, (3) the selection of the model with the smaller average MSQ if two different subsets of variables are chosen, and (4) the comparison of the smaller average MSQ with that of a subset of variables that is selected from an all-subset model that contains the excluded candidate. Step 1 is repeated (i.e., excluding the second, third or more candidates before modelling) if the forward stepwise and backward stepwise procedures for modelling cannot produce an agreeable model, or the average MSQ of the best subset model is higher than that of the subset of variables that is selected from an all-subset model that contains the excluded candidate(s).

The procedure for excluding candidates as described stops if the

forward and backward stepwise procedures produce the same model (subset of predictors) with the smallest average MSQ. The use of cross validation is a non-parametric approach to the determination of the best subset of predictors, and therefore does not have to fulfil the assumptions of homoscedasticity and normality of predictors that are required in parametric regression. Because of this, the use of transformation strategies for variables in this research is limited to the circumstances in which the original data suggested a model that is non-linear in either the regression coefficients or the original variables, or the linearisation of the regression coefficients. A few studies have attempted to find the relationships between various predictors and the price of building or the prices of the components of a building (Wilderness Group 1964; Flanagan and Norman 1978; Russell and Choudhary 1980; Tan 1999). However, a generalised relationship between any particular predictor

121 and the price of building or the prices of its components is absent, and on the contrary, many studies have shown quite different relationships for the same subjects. For example, the relationship between building price (represented by total price or price per total floor area) and building height (represented by the number of storeys or overall building height) has been expressed as a linear (Tregenza 1972; Braby 1975), a parabolic with a minimum (Flanagan and Norman 1978) and a power (Karshenas 1984) function. Perhaps it can only be concluded that each relationship can only be held true for the data from which it is generated.

5.8.2 Transformation of variables

For a given set of predictors and a given response, there can be unlimited combinations of transformed predictors and transformed responses.

Certainly,

models with transformed variables are more complicated, more inexplicable, and bear a higher risk of being too specific for the given data than their untransformed counterparts.

More importantly, complicated models often do a bad job of

forecasting new data, although they can be made to fit old data quite well. This is experienced also by modellers in other disciplines (Sober 2001 p.30). In terms of practicability, simplicity also aids understanding and implementation by decision makers, reduces the likelihood of mistakes, and is less expensive (Armstrong 2001 pp. 374-375). In the light of the principle of parsimony 3, as reviewed in Chapter 3, this research avoids the development of models with complex mathematical functions.

3

Instead, each best subset model has been transformed to a power

“The concern for parsimony can lead to normative rules for discovery systems: that such systems should be designed, as far as possible, to generate simple rules before generating complex ones.” (Simon 2001 p.42-43)

122 function, because this has been demonstrated by Karshenas (1984) and Skitmore and Patchell (1990) to improve accuracy. The power function model can be expressed as follows: k

P' = β' 0 ⋅∏ V' iβ 'i ,

(5.18)

i =1

where P’ is the forecasted price, β’0, β’is are constant coefficients and V’is are the variables of the best subset model. Taking the natural logarithm (ln) of both sides (the ln transformation for the model), Equation (5.18) may be equivalently expressed as: k

ln P' = ln β' 0 + ∑ β' i ⋅ ln V' i .

(5.19)

i =1

Equation (5.19) shows the transformation of the original variables to a linear function of ln variables. The forecasting performance of the linear best subset model has to compare with that of the model that is represented by Equation (5.18). Referring to the principle of parsimony, the linear model prevails over the power function counterpart unless the latter is shown to make significantly better forecasts.

5.9

Comparison of Best Model with Other Models

To assess the forecasting accuracy of the best subset models for the four types of building, their forecast results have been compared with those obtained from the other three conventional models.

The same set of data that was collected for

123 building regressed price models is used to analyse the performance of all of the models to facilitate a fair comparison. With regard to the regressed models, the forecasted price per total floor area for each case is multiplied by the total floor area to obtain the forecasted price to calculate the forecasting error. Similar to the leave-one-out method, the reliability of the three conventional models is also analysed using cross validation. The data for each building type is split into two parts in turns without repetition. One part is the exploratory sub-sample that contains all of the cases minus the one that is used to calculate the average unit rate, and the other part contains the omitted case for the assessment of the forecasting ability. The forecast for each turn is then calculated by multiplying the average unit rate by the value of the predictor in the omitted case. To measure the closeness of a forecast relative to the actual tender price, the percentage error of the forecast is used, i.e.,

Forecasted Tender Price - Actual Tender Price × 100% Actual Tender Price

(5.20)

The mean and standard deviation of percentage errors that represent the two widely established accuracy measures of bias and consistency are used. The higher the mean, the more bias the model has, and the higher the standard deviation, the less consistent the model is. However, the magnitude of these two measures cannot distinguish whether a model is better or worse than the others without significance testing. The confidence level for all of the significance tests that are employed in this research is 95%.

124 5.9.1 Choice of parametric and non-parametric inference

There are two approaches to statistical inference – parametric and non-parametric. The former approach refers to modern statistical inference that is based on the postulation of a parametric statistical model (Fisher 1922).

The

parametric models are arguably simpler than the non-parametric models because they are more informative, more amenable to statistical adequacy assessment, are often more parsimonious and are more likely to give rise to reliable and precise empirical evidence (Spanos 2001 p.186). Therefore, statistical adequacy can best be analysed in a parametric setting. However, the common assumption of normality that lies behind a parametric model may not always be fulfilled. There are statistical tests that are available to check normality, such as the Anderson-Darling (A-D) and Kolmogorov-Smirnov (K-S) tests.

The K-S test

essentially looks at the most extreme absolute deviation, and determines the probability that this deviation can be explained by a normally distributed data set, whereas the A-D test is a modification of the K-S test that gives more weight to the tails than the K-S test. The A-D test also differs from the K-S test in that it makes use of specific distributions, such as a normal distribution, in the calculation of critical values, and thus has the advantage of being more sensitive. The A-D test is adopted for testing the assumption of normality in this research. The null hypothesis for the test is that the forecasted percentage errors for a particular model follow a normal distribution. The A-D test statistic is defined as: ( 2 i− 1 ) [ln D(y i ) + ln(1 − D( yn+1−i ))] , n i =1 n

A2 = − n− ∑

(5.21)

125 where D is the cumulative distribution function of the normal distribution, n is the sample size and yi are the ordered data. In a case in which the assumption of normality is proved to be invalid, transformation using such techniques as the Box-Cox normality plot may help to normalise a distribution. The Box-Cox transformation identifies a value of lambda (λ) such that the suggested transformation of the original data is Yiλwhen λ≠ 0 and ln(Yi) whenλ= 0.

To find the optimal lambda values, the Box-Cox transformation modifies the original data using Equations (5.22) and (5.23) for Wi (a standardised transformed variable). It then calculates the standard deviation of the variable Wi. The goal is to find the value of lambda that minimises the standard deviation of Wi.

Wi =

(Y

− 1) whenλ≠ 0 λ G λ −1 λ

i

Wi =G ln(Yi) whenλ= 0,

(5.22)

(5.23)

where Yi is the original data, G is the geometric mean of all the data and λ is the lambda value. If the transformation of the data fails to fulfil the normality assumption, then the parametric way to proceed is to postulate another appropriate distribution. Unfortunately, there are much fewer available statistical tests for distributions that are other than normal. Alternatively, the non-parametric model, which makes use

126 of less specific probabilistic assumptions, may be used for inference.

The

non-parametric model is distribution free, which refers to implicit assumptions such as whether the random variable is discrete or continuous, the nature of the support set of the distribution, the existence of certain moments and the smoothness of the distribution. Inference using a non-parametric model is based on rank, and is less susceptible to the problem of statistical inadequacy. The benefits of non-parametric inference include its significant gains in power and efficiency when the error distribution has tails that are heavier than those of a normal distribution, and superior robustness in general (Hettmansperger and McKean 1998 p. xiii).

5.9.2 Statistical inference for bias

To ascertain the significance of bias, the models are tested against a mean zero using t statistics. The t-test is well known for its robustness, even if the distribution of data departs from normality (Lehmann 1959). The null hypothesis for the t-test is that the mean percentage error for a model is equal to zero, which represents an unbiased model. Let μ d be the mean percentage error, σ d be the standard derivation of the percentage error, and nd the total number of cases for one of the models that is represented by the notation d. The p-value that is calculated from the t statistics in Equation (5.24) shows whether a model is significantly biased from the zero mean percentage error.

t=

μd σd nd

.

(5.24)

127 As all forecasts have been produced by cross-validated models that are represented by the same set of selected predictors and their different coefficients in the regressed models (or the different average unit rates in the conventional models for different turns), the mean percentage errors for the models are likely to be close to zero.

5.9.3 Statistical inference for consistency

As models are expected to be more or less unbiased, the consistency of the models becomes an important indicator to distinguish the model or models that perform better than others. Although the t-test for bias is robust even for departures from normality, the parametric inference tests for consistency (the standard deviation of percentage errors) are not. Figure 5-3 shows an algorithm for the selection of parametric and non-parametric tests. To avoid using the parametric tests naively, the assumption of the normality of the data (forecasted prices) has been tested. As the parametric inference is more amenable in terms of statistical adequacy, it is more preferable that the assumption of normality be fulfilled, by means transformation if necessary. The details concerning the checking of the normality assumption and the use of the Box-Cox

transformation

are

described

in

section

5.9.1.

Alternatively,

non-parametric inference is employed if the assumption is not satisfied. After deciding on the type of inference, the forecasting models are first tested in groups for homogeneity of multivariances. This involves the use of the Bartlett’s test for parametric inference and the Kruskal-Wallis test non-parametric inference.

128

Determine forecasted percentage errors for models under comparison Conduct Anderson-Darling test for normality of distributions

Is distribution of percentage errors for each model normal?

Yes

Parametric Tests

Conduct Bartlett’s test for equality of variances

Are models of same variance? No

No Conduct Box-Cox transformations of percentage errors and Anderson-Darling test

Yes

Conduct Multiple F-tests using LSD approach

Yes Is distribution of transformed errors for each model normal? No All models are comparable in consistency

Conduct Kruskal-Wallis test for equality of rank deviations

Non-parametric Tests Are models of same variance? Yes

No Conduct Multiple Mann-Whitney U tests using LSD approach

Models of about same potency in consistency are grouped together

Figure 5-3: Algorithm for Comparisons of Variances of Percentage Errors

129 The Bartlett’s test is used to study the significance of the differences between the variance of percentage errors for the models under comparison.

The null

hypothesis for the test is that the variance of percentage error for the models in comparison is equal. Let M be the number of models for comparison, and the Bartlett’s test statistic (B) be represented by Equation (5.25) as follows: ⎞ ⎛ M ⎜ ∑ (nd − 1) ⋅ σ d2 ⎟ M M ⎛ ⎞ ⎟ − (n − 1) ⋅ lnσ 2 ⎜ ∑ (nd − 1)⎟ ⋅ ln⎜ d =1 M d d ⎟ ∑ ⎜ d =1 ⎝ d =1 ⎠ ( ) − 1 n ⎟ ⎜ ∑ d ⎠ ⎝ d =1 . B= ⎡ ⎤ ⎢M ⎥ 1 1 1 ⎢∑ ⎥ − M 1+ 3(M + 1) ⎢ d =1 (nd − 1) (nd − 1)⎥⎥ ∑ ⎢⎣ d =1 ⎦

(5.25)

With reference to a chi-square (x²) distribution, the B value corresponds to a p-value, which suggests whether the models in comparison are of equal variance. The Kruskal-Wallis test (H-test) is a nonparametric equivalent to a one-way ANOVA that tests whether several independent samples have a mean. The central tendencies or medians are the main concern in the H-test. Based on the assumption that the values for each sample under consideration have underlying continuous distributions, the null hypothesis is that k samples from possibly different populations actually originate from similar populations. By replacing percentage errors with absolute deviations from the sample mean as the sample values for ranking, the H-test assesses for the homogeneity of population variance (Sprent 1993 pp. 155-157). Let Rj be the sum of ranks of the jth sample, nj be the size of the jth sample, and N be the size of the combined sample. The H-test statistic is:

130 k R2 ⎤ ⎡ 12 j H =⎢ ⋅ ∑ ⎥ − 3( N + 1) . ⎣⎢ N ( N + 1) j =1 n j ⎦⎥

(5.26)

With reference to a chi-square (x²) distribution, the H value corresponds to a p-value, which suggests whether the models in comparison are of equal variances. If the p-value from the Bartlett’s test or Kruskal-Wallis test statistics is smaller than 0.05 and the null hypothesis is not supported, then the consistencies of the models in comparison are not equal. The next step is to determine which of the models differ specifically from each other. To do this, the variance of percentage errors of the models are compared in pairwise using the F-tests or Mann-Whitney U rank sum tests. Following the Bartlett’s test that shows the significant difference of variances amongst the models, the F-test is used to test the null hypothesis of whether the variances or standard deviations of the forecasted percentage errors for two models are equal. The F-test statistics is:

F=

s12 , s 22

(5.27)

where s12 and s22 are the sample variances. The more this ratio deviates from 1, the stronger the evidence for unequal population variances. With reference to the F distribution, a corresponding p-value can be found that suggests whether the two models in comparison are of equal variance.

131 If the H-test shows a significant difference of variances amongst the models, then it follows the Mann-Whitney U test (U-test) by using the rank sums of the two samples to examine the null hypothesis of whether the absolute deviations from the sample means of the two samples are equal. The observations from both samples are combined and ranked, with the average rank assigned in the case of a tie. If the percentage error deviations for the two samples in comparison are identical, then the ranks should be randomly mixed between the two samples. Two rank sums, Ta and Tb, are calculated. For sample sizes that are larger than 20, the U statistics refer to a normal Z distribution, as is shown in Equation (5.28):

Z=

n1n2 2 , n1n2 (n1 + n2 + 1) 12 U−

(5.28)

where U is the smaller of Ua and Ub in Equations (5.29) and (5.30) as follows:

U a = n1n2 +

n1 (n1 + 1) - Ta 2

(5.29)

U b = n1n2 +

n2 (n2 + 1) - Tb . 2

(5.30)

With reference to the Z distribution, a corresponding p-value can be found that suggests whether the two models in comparison are of equal variance. Unfortunately, performing several F-tests or Mann-Whitney U rank sum tests has a serious drawback. The more null hypotheses there are to be tested, the more likely it is that one of them will be rejected even if all of the null hypotheses are

132 actually true (Kleinbaum et al. 1998 pp. 443-447). In other words, if each test has a 5% probability of erroneously rejecting the null hypothesis (H0), then the probability of incorrectly rejecting at least one H0 is much larger than 5%, and continues to increase with each additional test that is carried out. Fisher’s least significance difference (LSD) approach is used to correct exaggerated significance levels. For example, if k sets of two-sample tests are produced, then the maximum possible value for this overall significance is 0.05k. The remedy for the LSD is to decrease the significance level to 0.05/k. In this research, six (4C2) two-sample tests are produced for each type of building (i.e., k = 6), and therefore the corrected significance level for each pairwise test is 0.0083.

5.10 Tools for Computation

Both spreadsheets (e.g. Excel) and statistical software packages (e.g. SPSS) provide built-in regression functions. Users can simply use these functions by inputting the observed values for dependent and independent variables, and a regression model by least-squares method (or other methods), together with other relevant information to describe the model, will automatically be generated in report format. However, these built-in functions do not feature a resampling procedure, which means that they are unable to satisfy the needs of this study. To accomplish this requirement and follow the various algorithms that are described in sections 5.7 to 5.9 of this chapter requires a purpose-made programme.

Therefore, this research

uses the programming language of the mathematical software MathCad to write a programme for handling the selection procedures and reliability analysis. Mathcad

133 is also used as a calculation tool in this study. It possesses advantages over other programming languages in its use of direct equation input and its approach to the solution of mathematical problems symbolically or numerically, which means that programmes that are written by Mathcad are readable even for someone who has no background in programming language. To illustrate the use of the worksheets that were written by Mathcad, an example for the RASEM for office is attached in Appendix D. In addition, the functions of significance tests, such as the t-test, K-W test and U-test, are available in the spreadsheets and statistical software packages that are used.

5.11 Summary

This chapter describes an approach to further develop JSEM. JSEM is first simplified to avoid an escalation in the number of variables that are induced by increasing the number of storeys of a building.

The simplification procedure

successfully reduces the number of variables for JSEM from a function of the number of storeys to 9 for buildings with a podium and 6 for buildings without a podium. The cost analyses of 148 completed projects in Hong Kong for four types of building – offices, private housing, nursing homes and primary and secondary schools – were collected. Ten out of the 148 samples were considered as outliers due to their differences in building provision, contractual arrangement and technology of fabrication, and were discarded from further analysis. The building

134 prices per total floor area, which were extracted from the analyses and rebased in accordance with the tender price index, are set as the observed values of the response for modelling.

With reference to the actual measurements of quantities (e.g.

perimeter and storey height of buildings) for the variables in JSEM (e.g. elevation area), another two sets of variables are identified, one containing 12 variables for buildings with basements and the other containing 19 variables for buildings without basements. A non-parametric approach using the least average MSQ as the termination criterion is proposed to prevent the violation of parametric assumptions that are likely to be caused by small sample sizes.

The leave-one-out cross validation

method, which on the one hand determines the models by an explanatory sub-sample, and on the other hand checks the forecasting ability of the models by an omitted case, is considered to be the most intuitive method that simulates the practice of forecasting. To improve the probability of identifying the best subset models, a dual stepwise procedure, together with an algorithm to eliminate the possible offending variables, is suggested.

The transformation of variables may further

improve forecasting performance, and in this research the natural logarithmic transformation method is selected for the variables that are chosen in the best regressed models.

The principle of parsimony is particularly addressed in the

selection of models, and a more complicated model has to demonstrate its benefits in terms of forecasting accuracy to be chosen over a simpler model. The performance of a forecast is measured by the percentage error of departure from the actual price. To assess the performance of a model in terms of forecasting accuracy, bias and consistency are adopted.

135 Statistical inference can be classified as parametric or non-parametric. The former approach is more powerful and is used if its assumptions can be satisfied. If they cannot, then the percentage errors are transformed to see if the transformed distribution can fulfil the assumptions.

If the assumptions are still not fulfilled, then

the non-parametric approach is used. As the regressed models and the unit rates in the conventional models are developed by cross validation, it is expected that the forecasts from these models will have a close to zero bias. Because of this, each model is first tested against a zero bias using the t-test.

The t-test is parametric, but is known to be robust for

departures from normality.

However, parametric tests for consistency are not

robust, and an algorithm is developed to assist in the selection of an appropriate approach and significance tests within that approach. Two stages are involved to distinguish the models using measures of consistency. First, the homogeneity of variance of all of the models is tested using k-sample tests, such as the Bartlett’s test under the parametric approach and the Kruskal Wallis test under the non-parametric approach. If models are found to be significantly different, then these tests are followed by multiple two-sample tests, such as F-tests under the parametric approach and Mann-Whitney U-tests under the non-parametric approach. Because of the exaggerated significance levels due to the multiple comparisons, the Fisher’s least significance difference approach (LSD) is used for rectification. With the assistance of the LSD, models of the same potency in consistency are grouped together. The benefits of advances in computer software are harnessed to assist in this research, and a combination of different software is used.

The mathematical

136 software Mathcad is used to execute the purpose-made algorithm of regression analysis using cross validation, and commonly used spreadsheets and statistical packages that offer a variety of built-in functions for significance tests are also adopted to produce statistical inferences.

137

Chapter 6 Analysis

Think as you work, for in the final analysis, your worth to your company comes not only in solving problems, but also in anticipating them. Harold Wallace Ross

6.1 Introduction

This chapter is divided into three sections. The first section concerns the development of the regressed models based on the data that was collected from Hong Kong projects. The details of the eight regressed models that are generated from two sets of variables for the four types of buildings and the corresponding logarithmic transformed models are explained. The variables that are selected in each regressed model are different. The bias and consistency of percentage errors of the forecasts from the regressed models that were developed in the first section, and those of the conventional methods, are measured in the second section. Each regressed model is compared individually with the conventional models. On average, the regressed models, especially the Regressed Model for Advanced Storey Enclosure Method (RASEM), produce more accurate forecasts and all fall into the best clusters of

138 models in the eight groups of models under comparison.

However, there is

insufficient evidence to conclude their superiority over their conventional counterparts. A practical approach to combining forecasts is proposed in the third section to improve prediction accuracy. The combined forecast is always more accurate than the average forecast, and is sometimes better than the best forecast.

6.2 Model Development

6.2.1 Data Collected

The data collected include the number of podium storeys (a), the number of tower storeys (b), the number of basement storeys (m), the average area per podium storey in m² (fp), the average area per tower storey in m² (ft), the average area per basement storey in m² (fb), the average podium storey height in m (sp), the average tower storey height in m (st), the average basement storey height in m (sb), the average perimeter on plan for the superstructure in m (ppt), the average perimeter on plan for the basement in m (pb), the roof area in m² (r), the original tender price in Hong Kong dollars (tp), the date the tender was returned and the tender price index (TPI). Appendix C (enclosing Table C-1 to Table C-4) is attached to display these data according to the building type in a tabular format. The original tender prices were rebased to the base period of the second quarter of 1997 in accordance with the tender price index in Appendix B. The rebased prices are also shown in Appendix C.

139 6.2.2 Candidates for Regression Models

The regression methodology that is described in Chapter 5 is used to advance the original JSEM.

A new model – the Regressed Model for James’ Storey

Enclosure Method (RJSEM) – is developed by using the variables that were identified in JSEM for each type of building. The methodology that is applied to the new model is the Regressed Model for Advanced Storey Enclosure Method (RASEM) methodology, which uses another set of variables. The RASEM contains four types of candidates: the primary variable (n, m fpt, fb, spt, sb, ppt, pb, r), the second degree variable (n2), the interaction term that is formed amongst the primary variables (nfpt, mfb, nspt, msb, nsptppt, msbpb) and the interaction term that is formed between primary variables and second degree variables (n2fpt, n2spt, n2sptppt).

Table 6-1 shows the candidate variables, the response and the

corresponding equations for the RJSEM and the RASEM.

6.2.3 Response for Regression Models

A regressed model that produces a small average MSQ may not produce a corresponding small mean or standard deviation of percentage errors (for which the mean represents bias and the standard deviation represents the consistency of the model), because larger response values have more influential effects in the least-squares method, whereas the use of percentage errors for performance assessment is unit free.

These large-value influential effects can be reduced

tremendously by changing the response from the tender price to the tender price per total floor area, as described in section 5.7.1 of Chapter 5. By adopting this change,

140 the ranges of actual response values that are represented by the ratios of the maximum actual response value to the minimum are reduced from 60.74 to 2.33 for offices, from 113.16 to 2.17 for private housing, from 6.87 to 2.09 for nursing homes and from 7.93 to 2.23 for schools.

141 Table 6-1: Candidates, Responses and their Equations for the RJSEM and the RASEM Variable

Equation

Notation

Total floor area for podium

a · fp

afp

Storey number for podium · Total floor area for podium

a² · fp

a2fp

Total floor area for tower

b · ft

bft

Storey number for tower · Total floor area for tower

b² · ft

b2ft

Storey number for podium · Total floor area for tower

a · b · ft

abft

Total floor area for basement

m · fb

mfb

Elevation area

(a · sp + b · st) · ppt

nsptppt

Basement wall area

m · sb · pb

msbpb

Roof area

r

r

P ÷ (a · fp + b · ft + m · fb)

Y

Storey number for superstructure

a+b

n

Storey number for basement

m

m

Square of storey number for superstructure

(a + b)²

n2

Average area per storey for superstructure

(a · fp + b · ft) ÷ (a + b) fb

fpt

RJSEM Candidates

Response Adjusted tender price per total floor area RASEM Candidates

Average area per storey for basement

fb

Average storey height of basement

(a · sp + b · st) ÷ (a + b) sb

Average perimeter on plan for superstructure

ppt

ppt

Average perimeter on plan for basement

pb

pb

Total floor area for superstructure

(a · fp + b · ft)

nfpt

Storey number for superstructure · Total floor area for superstructure

(a + b) · (a · fp + b · ft)

n2fpt

Total floor area for basement

m · fb

mfb

Height of building above ground

(a · sp + b · st)

nspt

Depth of basement

m · sb

msb

Storey number for superstructure · Height of building above ground

(a + b) · (a · sp + b · st)

n2spt

Elevation area

(a · sp + b · st) · ppt

nsptppt

Basement wall area

m · sb · pb

msbpb

Storey number for superstructure · Elevation area

(a + b) · (a · sp + b · st) · ppt

n2sptppt

Roof area

r

r

P ÷ (a · fp + b · ft + m · fb)

Y

Average storey height of superstructure

spt sb

Response Adjusted tender price per total floor area

142 6.2.4 Selection of Predictors

The selection of best models (the best subsets of the predictors) concerns the minimisation of the average MSQ by leave-one-out cross validation.

The dual

stepwise procedure that is described in section 5.7.5 of Chapter 5 is applied to the two sets of candidates and responses (one for the RJSEM and the other for the RASEM), as is shown in Table 6-1. Except for the RJSEMs for nursing homes and schools, for which agreeable subsets of predictors were produced, two different subsets of predictors were selected from the values of these candidates and responses using the forward stepwise and backward stepwise procedures separately. As is explained in section 5.8.1 of Chapter 5, this discrepancy may possibly be due to a less significant predictor that acts as an offending variable and enters the model before a more significant predictor, or a more significant predictor that acts as an offending variable is eliminated from the model before a less significant predictor. To avoid this circumstance, candidates in the RJSEMs or RASEMs are excluded repetitively using the algorithm that is shown in Figure 5.3 of Chapter 5. According to this algorithm, the selection process ceases when both forward stepwise and backward stepwise procedures produce the same best subset of variables. Several candidates in the RJSEMs and RASEMs for the four types of building were excluded. Table 6-2 shows the included candidates, excluded candidates and selected predictors in these models. Amongst the candidates in the RJSEMs, msbpb (basement area) was the only candidate that was excluded in the RJSEMs for offices and private housing. However, there were more excluded candidates in the RASEMs. First of all, the observed values for r (roof area) were found to be very close to, or the same as, those for fpt (average floor area for the superstructure), because most multi-storey buildings in Hong Kong, including those in this research, have a flat roof design for

143 the podium and tower. As fpt is considered to be a more representative candidate, because the average floor area corresponds to more elements of a building than the roof area, r was excluded from the RASEMs.

The other primary variables, such as

n, m fpt, fb, spt, sb, ppt, and pb, and the second degree variable, n2, were kept because the use of untransformed variables excluding any interaction term is the best starting point for a general regression model (Skitmore and Patchell 1990). All of the interaction terms were subject to the exclusion procedures. nfpt (being a candidate in RJSEM as well), n2fpt, n2spt, msbpb (being a candidate in RJSEM as well) and n2sptppt were excluded from the RASEMs for the four types of building. Furthermore, the interaction terms mfb (total basement floor area) and msb (depth of basement) were also excluded from the private housing and nursing home models. The agreeable best models from both the forward stepwise and backward stepwise procedures were generated from the Mathcad worksheets that were purposefully written to carry out the selection algorithm and the reliability analysis using cross validation.

144

Table 6-2: Included Candidates, Excluded Candidates and Selected Predictors for RJSEMs and RASEMs RASEM

RJSEM Office

Private Nursing School Housing Home

Office

Private Nursing School Housing Home

afp / nfpt*

o

o

o

o

n

o

o

o

o

a2fp /

o

o

o

o

m

o

o

o

NA

bft

o

o

NA

NA

n2

o

o

o

o

b2ft

o

o

NA

NA

fpt

o

o

o

o

abft

o

o

NA

NA

fb

o

o

o

NA

mfb

o

o

o

NA

spt

o

o

o

o

nsptppt

o

o

o

o

sb

o

o

o

NA

msbpb

x

x

o

NA

ppt

o

o

o

o

r

o

o

o

o

pb

o

o

o

NA

nfpt

x

x

x

x

n2fpt

x

x

x

x

mfb

o

o

x

NA

nspt

o

o

o

o

msb

o

x

x

NA

n2spt

x

x

x

x

Remarks:

nsptppt

o

o

o

o

* - afp and a2fp for office and private housing,

msbpb

x

x

x

NA

n2sptppt

x

x

x

x

r

x

x

x

x

Legend: o - Candidate

x - Excluded Candidate

o - Selected Predictor NA - Not applicable

nfpt and n2fpt for nursing home and school

6.2.4.1 Selected Predictors for RJSEMs and RASEMs

Tables 6-3 to 6-10 show the step by step results of the predictor selection by forward stepwise and backward stepwise procedures based on the criterion of average MSQ.

Tables 6-11 to 6-18 show the regression coefficients for each

predictor, forecast and MSQ as determined by the cross-validated models. Table 6-19 divides the constants and selected predictors of all of the regressed models according to the signs of their corresponding coefficients. Special

145 attention is drawn to the fact that the sign of a coefficient does not represent the actual relationship between a predictor and the response of tender price per total floor area, but the relationship between them in the best model under the proposed regression methodology. Thus, the use of another methodology (e.g. the use of another termination criterion rather than the least average MSQ) may produce another best model (such as another group of transformed valuables or another subset of predictors) that would suggest a different set of relationships between the selected predictors and the response in terms of the signs and values of the coefficients. All of the constant terms (β0) are positive except the term for the RASEM for private housing. The selected predictors can be classified into two groups: floor area related predictors and non-floor area related predictors. Referring to Table 6-23, all of the models have at least one floor area related predictor. The floor area predictors include afp, a2fp, bft, b2ft, fb, fpt, n2fpt and r. Most of these predictors exhibit a negative effect on the tender price per total floor area in the RJSEMs and the RASEMs. The average floor area of the superstructure (fpt) does not exist as a candidate in the RJSEMs.

Instead, the effect of floor area on the response is

represented by the total floor area and total floor area multiplied by the number of storeys (afp, a2fp, bft and b2ft or nfpt and n2fpt). If r in the RJSEMs is considered to be an alternative candidate to fpt in the RASEMs due to their proximity in value, then all of the regressed models except for the RJSEMs for offices and private housing would have a negative component that is represented by the average area of the superstructure (similar to the typical floor area for multi-storey buildings of rectangular shape). In addition, the RASEM for nursing homes is considered to be very similar to the corresponding RJSEM in terms of the selected predictors (nsptppt and r are the predictors in the RJSEM, whereas nsptppt and fpt are the predictors in

146 the RASEM) and the values of the corresponding coefficients due to the proximity of value of fpt and r. All of the RASEMs contain the predictor fpt with a corresponding negative coefficient. If these models were used for prediction, then they would suggest that the higher the value of the average floor area of a superstructure, the smaller the forecasted tender price per total floor area would be. In the RJSEMs for offices and private housing, the predictors of total floor area such as a2fp (for offices), bft (for offices and private housing) and b2ft (for offices), instead of the predictors of average area per storey, are present as the negative components. In contrast, some other floor area related predictors such as afp (in the RJSEM for offices), r (in the RJSEM for private housing), n2fpt (in the RJSEM for schools) and fb (in the RASEM for private housing) are present in the different models with positive coefficients. To find out the overall effect of the floor area related predictors on all of the regressed models, their aggregate contributions to the response were reckoned. Table 6-20 shows the contributions of the floor area related predictors to the response. From the table, it can be found that the aggregate contribution of these floor area related predictors is generally negative (except for a few cases in the RJSEMs for offices and private housing and the RASEM for private housing), which suggests that the tender price per total floor area is inversely proportional to the floor area related predictors in the models. However, the non-floor related predictors, n2, pb, ppt, sb, spt, nspt and nsptppt exhibit solely positive aggregate contributions to all responses. Their contributions are shown in Table 6-21. Unlike the original JSEM that assumes the price of components (e.g. external wall, window and external finishes) to be proportional to the measured areas (e.g. external wall area), the regressed models select variables without assuming such a

147 relationship. The aggregate contributions according to the classification of floor or non-floor related predictors provide further information on the composition of the regressed models. Table 6-3: Step-by-Step Selection Results of Predictors for the RJSEM for Offices Forward Stepwise Step Variables entered 1 2 3 4 5 6 Final model:

2 3 4 5 6 Final model:

Average MSQ

a2fp nsptppt bft afp b2ft

3.00E+06 2.86E+06 2.48E+06 2.10E+06 2.07E+06

(No entry or deletion, end regression) a2fp, nsptppt, bft, afp, b2ft

2.07E+06

Backward Stepwise Step Variables entered 1

Variables deleted

afp, a2fp, bft, b2ft, abft, mfb, nsptppt, r

Variables deleted

Average MSQ

2.87E+06 abft 2.46E+06 r 2.11E+06 mfb 2.07E+06 (Stop backward, start forward) (No deletion or entry, end regression) 2.07E+06 a2fp, nsptppt, bft, afp, b2ft

148

Table 6-4: Step-by-Step Selection Results of Predictors for the RJSEM for Private Housing Forward Stepwise Step Variables entered 1 2 3 Final model:

2 3 4 5 6 7 8 Final model:

Average MSQ

bft r

9.72E+05 9.59E+05 (No entry or deletion, end regression)

bft, r

9.59E+05

Backward Stepwise Step Variables entered 1

Variables deleted

Variables deleted

Average MSQ

afp, a2fp, bft, b2ft, abft, mfb, nsptppt, r mfb b2ft nsptppt a2fp abft afp (No deletion or entry, end regression) bft, r

1.69E+06 1.33E+06 1.15E+06 1.08E+06 1.06E+06 9.95E+05 9.59E+05 9.59E+05

Table 6-5: Step-by-Step Selection Results of Predictors for the RJSEM for Nursing Homes Forward Stepwise Step Variables entered 1 2 3 Final model:

2 3 4 5 6 Final model:

Average MSQ

r nsptppt

6.73E+05 6.57E+05 (No entry or deletion, end regression)

r, nsptppt

6.57E+05

Backward Stepwise Step Variables entered 1

Variables deleted

Variables deleted

Average MSQ

nfpt, n2fpt, mfb, nsptppt, msbpb, r n2fpt mfb nfpt msbpb (No deletion or entry, end regression) r, nsptppt

3.26E+06 1.01E+06 7.74E+05 7.00E+05 6.57E+05 6.57E+05

149 Table 6-6: Step-by-Step Selection Results of Predictors for the RJSEM for Schools Forward Stepwise Step Variables entered 1 2 3 Final model:

Average MSQ

r n2fpt

2.17E+05 2.07E+05 (No entry or deletion, end regression)

r, n2fpt

2.07E+05

Backward Stepwise Step Variables entered 1 2 3 4 Final model:

Variables deleted

Variables deleted

Average MSQ

nfpt, n2fpt, nsptppt, r nsptppt afp (No deletion or entry, end regression) r, n2fpt

3.16E+05 2.35E+05 2.07E+05 2.07E+05

150

Table 6-7: Step-by-Step Selection Results of Predictors for the RASEM for Offices Forward Stepwise Step Variables entered 1 2 3 4 5 Final model:

Variables deleted

nspt n2 fpt ppt

2.79E+06 2.04E+06 1.79E+06 1.63E+06 (No entry or deletion, end regression)

nspt, n2, fpt, ppt

Backward Stepwise Step Variables entered 1

2 3 4 5 6 7 8 9 10 11 12 13 14 15 Final model:

Average MSQ

1.63E+06

Variables deleted

Average MSQ

n, m, n2, fpt, fb, spt, sb, ppt, pb, mfb, nspt, msb, nsptppt pb nspt spt fb nsptppt msb mfb m (Stop backward, start forward) nspt (Stop forward, start backward) sb n (No deletion or entry, end regression) nspt, n2, fpt, ppt

1.85E+07 7.14E+06 3.78E+06 2.51E+06 2.04E+06 1.93E+06 1.92E+06 1.90E+06 1.89E+06 1.80E+06 1.69E+06 1.63E+06 1.63E+06

151

Table 6-8: Step-by-Step Selection Results of Predictors for the RASEM for Private Housing Forward Stepwise Step Variables entered 1 2 3 4 5 6 Final model:

Variables deleted

Average MSQ

spt fb pb fpt sb

5.96E+05 5.62E+05 5.19E+05 4.96E+05 4.92E+05 (No entry or deletion, end regression)

spt, fb, pb, fpt, sb

4.92E+05

Backward Stepwise Step Variables entered

Variables deleted

Average MSQ

1 n, m, n2, fpt, fb, spt, sb, ppt, pb, mfb, nspt, nsptppt 2 3 4 5 6 7 8 9 Final model:

m nspt ppt n mfb nsptppt n2 (No deletion or entry, end regression) spt, fb, pb, fpt, sb

5.26E+06 6.74E+05 6.11E+05 5.67E+05 5.41E+05 5.20E+05 5.02E+05 4.92E+05 4.92E+05

152

Table 6-9: Step-by-Step Selection Results of Predictors for the RASEM for Nursing Homes Forward Stepwise Step Variables entered 1 2 3 Final model:

Variables deleted

fpt nsptppt (No entry or deletion, end regression) fpt, nsptppt

2 3 4 5 6 7 8 9 10 11 Final model:

Variables deleted

Average MSQ

n, m, n2, fpt, fb, spt, sb, ppt, pb, nspt, nsptppt sb n2 pb fb n m nspt ppt spt (No deletion or entry, end regression) fpt, nsptppt

6.70E+05 6.47E+05 6.47E+05

Backward Stepwise Step Variables entered 1

Average MSQ

1.23E+08 2.34E+07 3.79E+06 1.33E+06 9.68E+05 8.63E+05 7.89E+05 7.36E+05 6.47E+05 6.47E+05 6.47E+05

153

Table 6-10: Step-by-Step Selection Results of Predictors for the RASEM for Schools Forward Stepwise Step Variables entered Variables deleted Average MSQ 1 nspt 1.80E+05 2 fpt 1.75E+05 (No entry or deletion, end regression) 3 Final model: fpt, nspt 1.75E+05 Backward Stepwise Step Variables entered 1 2 3 4 5 6 7 Final model:

Variables deleted

n, n2, fpt, spt, ppt, nspt, nsptppt n2 nsptppt ppt n spt (No deletion or entry, end regression) fpt, nspt

Average MSQ 2.68E+05 2.26E+05 2.07E+05 2.04E+05 2.00E+05 1.75E+05 1.75E+05

154

Table 6-11: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RJSEM for Office

Case

RJSEM ( β 0 + β 1 ⋅ a2fp + β 2 ⋅ nsptppt + β 3 ⋅ bft + β 4 ⋅ afp + β 5 ⋅ b2ft )

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Average:

4696 4708 4690 4710 4706 4541 4686 4614 4568 4697 4663 4620 4676 4694 4703 4524 4787 4713 4510 4655 4711 4599 4665 4662 4667 4593 4559 4649 4665 4727 4715 4669 4653 4642 4700 4702 4639 4718 4745 4722 4604 4698

β1 -0.066 -0.069 -0.063 -0.070 -0.069 -0.083 -0.068 -0.064 -0.059 -0.065 -0.068 -0.066 -0.067 -0.058 -0.064 -0.064 -0.121 -0.069 -0.066 -0.068 -0.069 -0.069 -0.068 -0.068 -0.068 -0.064 -0.069 -0.066 -0.068 -0.087 -0.068 -0.068 -0.068 -0.068 -0.068 -0.071 -0.068 -0.067 -0.078 -0.067 -0.056 -0.067

β2 0.223 0.222 0.221 0.221 0.224 0.242 0.226 0.217 0.239 0.214 0.223 0.224 0.225 0.225 0.212 0.228 0.221 0.221 0.232 0.225 0.221 0.227 0.223 0.223 0.223 0.218 0.229 0.223 0.223 0.227 0.219 0.224 0.224 0.225 0.221 0.220 0.226 0.220 0.221 0.220 0.235 0.223

β3 -0.079 -0.079 -0.078 -0.078 -0.080 -0.097 -0.080 -0.074 -0.081 -0.078 -0.078 -0.078 -0.079 -0.079 -0.075 -0.076 -0.081 -0.079 -0.079 -0.079 -0.077 -0.078 -0.078 -0.078 -0.077 -0.074 -0.078 -0.078 -0.078 -0.088 -0.077 -0.079 -0.078 -0.078 -0.078 -0.082 -0.077 -0.075 -0.082 -0.075 -0.064 -0.076

β4 0.295 0.306 0.286 0.308 0.309 0.402 0.305 0.277 0.251 0.293 0.300 0.293 0.300 0.246 0.280 0.278 0.466 0.307 0.296 0.301 0.300 0.303 0.299 0.300 0.298 0.280 0.303 0.294 0.301 0.403 0.295 0.301 0.299 0.297 0.300 0.325 0.297 0.285 0.349 0.284 0.207 0.290

β5 -0.0002 -0.0002 -0.0002 -0.0002 -0.0002 -0.0001 -0.0003 -0.0003 -0.0002 -0.0002 -0.0003 -0.0003 -0.0003 -0.0002 -0.0002 -0.0003 -0.0002 -0.0002 -0.0003 -0.0003 -0.0002 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 -0.0003 -0.0002 -0.0003 -0.0003 -0.0003 -0.0003 -0.0002 -0.0002 -0.0003 -0.0003 -0.0002 -0.0003 -0.0005 -0.0003

Forecasted Y 5,802 5,657 5,118 5,645 6,186 2,325 6,247 6,540 9,757 7,508 6,618 6,034 6,375 3,925 7,969 5,639 2,516 5,692 5,279 8,907 5,359 5,226 8,958 5,424 4,518 6,333 5,323 6,432 5,916 6,743 4,828 6,505 5,262 5,506 5,268 6,080 5,149 5,431 6,025 5,401 7,046 5,861

MSQ 1.26E+06 8.77E+05 8.07E+05 8.19E+05 2.33E+06 3.08E+06 2.49E+06 6.78E+06 2.88E+06 1.44E+06 1.73E+04 1.34E+06 6.34E+05 1.16E+06 1.40E+06 8.25E+06 6.12E+06 1.09E+06 6.51E+06 1.36E+04 6.54E+05 1.27E+06 1.74E+03 3.51E+01 3.17E+04 1.01E+07 4.03E+06 2.00E+05 3.09E+03 5.49E+06 5.44E+05 1.56E+05 2.19E+04 1.47E+05 4.07E+05 1.02E+06 9.69E+04 1.54E+06 1.65E+06 1.64E+06 7.38E+06 1.15E+06

2.07E+06

155 Table 6-12: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RJSEM for Private Housing RJSEM ( β 0 + β 1 ⋅ bft + β 2 ⋅ r ) β2 Forecasted Y

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Average:

β1 4530 4567 4484 4466 4533 4496 4475 4509 4536 4531 4625 4533 4474 4470 4522 4509 4558 4532 4583 4490 4511 4516 4568 4535 4530 4541 4552 4551 4536 4495 4534 4586 4526 4551 4537 4552 4534 4523 4550 4523 4533 4530 4532 4572 4558 4601 4499 4542 4560 4532

-0.008 -0.008 -0.007 -0.007 -0.007 -0.008 -0.007 -0.007 -0.007 -0.007 -0.010 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.008 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.008 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.007 -0.008 -0.007 -0.007 -0.007 -0.007

0.057 0.052 0.055 0.058 0.057 0.087 0.058 0.057 0.056 0.045 0.069 0.057 0.056 0.058 0.057 0.061 0.056 0.056 0.056 0.058 0.058 0.058 0.055 0.056 0.056 0.060 0.056 0.056 0.056 0.057 0.055 0.054 0.054 0.059 0.057 0.056 0.054 0.054 0.054 0.054 0.055 0.057 0.055 0.055 0.054 0.055 0.058 0.056 0.056 0.054

3,935 3,784 4,504 4,430 3,353 5,744 4,480 4,520 4,466 4,771 2,997 3,845 4,471 4,464 4,514 4,024 4,566 3,926 4,595 4,368 4,213 4,261 4,321 4,048 4,531 4,416 4,472 4,219 4,037 4,418 3,890 4,399 4,648 4,403 4,259 4,499 3,856 3,600 4,067 3,614 3,820 4,028 3,802 4,447 4,153 4,486 4,377 4,479 4,490 3,822

MSQ 2.41E+06 4.77E+06 1.53E+06 2.09E+06 8.83E+02 1.27E+06 1.27E+06 2.25E+05 8.91E+03 5.08E+05 9.64E+06 5.93E+04 1.99E+06 1.52E+06 4.95E+04 7.96E+05 2.35E+05 3.08E+04 9.98E+05 1.12E+06 6.41E+05 2.60E+05 1.35E+06 6.96E+03 3.55E+03 4.88E+05 1.99E+05 9.78E+05 1.91E+05 9.56E+05 2.68E+05 1.85E+06 1.10E+05 8.02E+05 9.87E+04 1.61E+05 6.24E+05 3.37E+05 1.01E+06 3.22E+05 2.76E+05 3.60E+04 9.83E+04 7.69E+05 1.11E+06 2.78E+06 6.62E+05 3.55E+04 4.01E+05 6.27E+05

9.59E+05

156

Table 6-13: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RJSEM for Nursing Homes RJSEM ( β 0 + β 1 ⋅ r + β 2 ⋅ nsptppt ) β1 β2 Forecasted Y

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Average:

4541 4424 4619 4604 4210 4566 4540 4278 4527 4569 4480 4577 4548 4575 4719 4420 4509 4501 4585 4397 4776 4621 4496

-0.799 -0.814 -0.859 -0.875 -0.722 -0.815 -0.719 -0.730 -0.801 -0.810 -0.830 -0.842 -0.807 -0.740 -0.813 -0.788 -0.803 -0.752 -0.830 -0.760 -0.830 -0.835 -0.752

0.121 0.161 0.111 0.121 0.163 0.124 0.096 0.149 0.125 0.123 0.135 0.122 0.126 0.112 0.099 0.139 0.130 0.122 0.125 0.137 0.093 0.125 0.115

4,389 4,730 3,928 3,512 4,181 4,608 4,822 4,310 3,825 4,434 3,838 3,133 4,476 3,901 4,237 4,082 5,434 3,184 4,734 4,391 4,327 4,520 4,780

MSQ 2.01E+04 1.73E+06 1.20E+06 4.97E+05 1.02E+06 1.02E+05 1.14E+06 1.46E+06 4.84E+03 2.06E+05 6.47E+05 6.30E+04 1.64E+05 1.16E+06 1.10E+06 3.05E+05 5.86E+03 7.23E+04 2.32E+05 4.63E+05 1.48E+06 1.62E+06 4.09E+05

6.57E+05

157

Table 6-14: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RJSEM for Schools

RJSEM ( β 0 + β 1 ⋅ r + β 2 ⋅ n2fpt )

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Average:

β1 2391 2379 2379 2386 2413 2415 2453 2372 2332 2406 2372 2414 2316 2387 2386 2406 2432 2405 2511 2375 2417 2434 2361

-0.520 -0.512 -0.515 -0.516 -0.519 -0.537 -0.555 -0.530 -0.393 -0.524 -0.504 -0.531 -0.485 -0.556 -0.513 -0.602 -0.649 -0.517 -0.570 -0.527 -0.532 -0.506 -0.473

β2

Forecasted Y 0.013 0.013 0.012 0.012 0.012 0.013 0.012 0.014 0.008 0.012 0.012 0.013 0.012 0.014 0.013 0.017 0.016 0.012 0.011 0.013 0.012 0.011 0.012

2,205 2,016 2,197 2,257 2,042 2,375 2,318 2,359 2,314 2,123 2,343 2,209 2,156 1,910 2,069 2,640 1,204 1,977 2,304 2,063 2,252 2,039 1,786

MSQ 8.05E+03 1.03E+04 6.61E+04 1.69E+04 2.96E+04 1.64E+04 1.77E+05 9.66E+04 1.09E+06 8.67E+03 3.70E+04 7.20E+04 1.25E+06 1.94E+05 3.17E+04 4.36E+05 1.84E+05 1.32E+04 6.32E+05 1.08E+05 1.76E+04 2.07E+05 6.76E+04

2.07E+05

158

Table 6-15: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RASEM for Offices

RASEM ( β 0 + β 1 ⋅ nspt + β 2 ⋅ n2 + β 3 ⋅ fpt + β 4 ⋅ ppt)

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Average:

2370 2359 2363 2356 2300 2025 2288 2395 2235 2405 2281 2286 2298 2387 2433 2106 2290 2361 2096 2239 2370 2218 2352 2300 2291 2269 2108 2292 2275 2335 2293 2296 2266 2244 2347 2270 2101 2367 2284 2378 2290 2202

β1 43.45 45.68 45.18 45.81 48.45 47.18 47.24 42.12 47.75 48.12 46.65 45.11 47.77 46.10 46.52 48.58 46.27 46.88 45.67 47.86 46.51 45.53 46.75 46.61 45.53 46.35 47.33 46.35 46.94 46.00 46.50 45.41 46.47 47.00 45.84 46.98 52.95 45.35 46.62 44.97 46.48 53.66

β2 -1.892 -1.981 -1.961 -1.983 -2.107 -2.038 -2.057 -1.799 -2.025 -2.059 -2.024 -1.955 -2.061 -1.982 -1.989 -2.091 -2.004 -2.029 -1.983 -2.076 -2.016 -1.976 -2.015 -2.019 -1.963 -2.005 -2.061 -2.008 -2.034 -2.001 -2.015 -1.973 -2.013 -2.035 -1.987 -2.035 -2.439 -1.982 -2.020 -1.963 -2.015 -2.347

β3 -1.571 -1.450 -1.454 -1.441 -1.398 -1.688 -1.453 -1.468 -1.407 -1.218 -1.440 -1.480 -1.402 -1.508 -1.260 -1.368 -1.449 -1.396 -1.546 -1.422 -1.405 -1.510 -1.353 -1.427 -1.448 -1.340 -1.478 -1.438 -1.420 -1.440 -1.434 -1.516 -1.448 -1.430 -1.440 -1.425 -1.337 -1.469 -1.434 -1.488 -1.435 -1.232

β4 18.76 16.88 17.08 16.76 15.94 19.90 16.94 17.60 16.24 13.33 16.83 17.57 16.05 16.59 14.10 16.09 16.94 15.99 18.86 16.50 16.15 18.07 15.49 16.61 17.38 15.72 17.69 16.80 16.53 16.93 16.73 17.92 16.99 16.74 16.76 16.63 15.23 17.29 16.73 17.53 16.77 13.35

Forecasted Y 6,005 5,576 4,937 5,520 6,487 2,394 6,433 7,068 8,550 7,416 6,899 6,409 6,610 3,836 7,866 5,694 4,925 5,738 5,414 9,053 5,519 5,430 8,587 5,587 5,605 6,509 5,339 6,770 5,661 5,606 4,113 6,801 5,171 5,392 5,121 7,224 4,320 5,611 4,696 5,720 4,363 6,966

MSQ 1.75E+06 7.34E+05 5.14E+05 6.09E+05 3.34E+06 2.84E+06 3.11E+06 4.31E+06 2.40E+05 1.68E+06 2.23E+04 6.10E+05 1.06E+06 1.36E+06 1.65E+06 7.93E+06 4.25E+03 1.18E+06 5.84E+06 6.92E+04 9.40E+05 8.50E+05 1.70E+05 2.48E+04 1.60E+06 9.02E+06 3.96E+06 1.22E+04 3.95E+04 1.46E+06 5.50E+02 4.78E+05 5.72E+04 2.48E+05 2.41E+05 1.80E+04 1.30E+06 2.02E+06 1.96E+03 2.56E+06 1.06E+03 4.73E+06

1.63E+06

159

Table 6-16: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RASEM for Private Housing RASEM ( β 0 + β 1 ⋅ spt + β 2 ⋅ fb + β 3 ⋅ pb + β 4 ⋅ fpt + β 5 ⋅ sb)

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

Average:

-6090 -5808 -6091 -8495 -6419 -6386 -6281 -6403 -6370 -6369 -6309 -6208 -6162 -6258 -7189 -6508 -6301 -6373 -6145 -6251 -6459 -6544 -6197 -6249 -6855 -6368 -6496 -6291 -6369 -6114 -6217 -5997 -6525 -6329 -6371 -6601 -6433 -6380 -6199 -6392 -6341 -6486 -6361 -6237 -6142 -5902 -6694 -6355 -6510 -6383

β1 3757 3666 3748 4625 3873 3863 3813 3861 3861 3857 3836 3804 3782 3803 4168 3901 3846 3859 3791 3815 3890 3908 3799 3822 4009 3863 3898 3837 3860 3767 3801 3744 3902 3847 3859 3934 3880 3859 3804 3862 3848 3895 3855 3817 3785 3705 3952 3855 3918 3864

β2 0.617 0.632 0.603 0.633 0.606 0.514 0.600 0.605 0.612 0.581 0.454 0.664 0.608 0.599 0.629 0.598 0.618 0.610 0.617 0.620 0.611 0.606 0.543 0.623 0.599 0.628 0.608 0.610 0.610 0.662 0.676 0.617 0.608 0.609 0.610 0.619 0.611 0.601 0.609 0.602 0.607 0.611 0.609 0.612 0.611 0.614 0.600 0.616 0.615 0.603

β3

β4

β5

-3.045 -3.119 -2.975 -3.110 -2.985 -2.298 -2.955 -2.978 -3.015 -2.836 -2.157 -3.268 -2.995 -2.949 -3.097 -2.948 -3.048 -3.005 -3.042 -3.069 -3.025 -2.984 -2.588 -3.071 -2.947 -3.110 -2.998 -3.006 -2.991 -3.302 -3.492 -3.046 -2.997 -3.003 -3.007 -3.043 -3.012 -2.964 -3.003 -2.965 -2.991 -3.012 -3.001 -3.020 -3.012 -3.031 -2.954 -3.069 -3.031 -2.972

-0.142 -0.165 -0.125 -0.116 -0.121 -0.126 -0.117 -0.120 -0.128 -0.126 -0.127 -0.133 -0.128 -0.116 -0.134 -0.110 -0.137 -0.126 -0.138 -0.126 -0.126 -0.120 -0.130 -0.145 -0.106 -0.133 -0.123 -0.127 -0.129 -0.127 -0.124 -0.140 -0.124 -0.126 -0.127 -0.118 -0.127 -0.116 -0.127 -0.116 -0.123 -0.127 -0.125 -0.132 -0.131 -0.139 -0.111 -0.130 -0.129 -0.117

-114.9 -105.6 -118.9 -167.1 -131.6 -150.0 -123.0 -126.6 -130.7 -131.7 -121.2 -122.0 -124.7 -122.3 -148.9 -136.2 -132.8 -129.8 -130.8 -134.9 -124.1 -125.6 -119.2 -122.6 -129.1 -135.9 -128.7 -134.9 -134.5 -151.2 -91.8 -132.3 -123.1 -132.6 -129.5 -146.3 -129.2 -133.9 -133.2 -133.6 -131.8 -126.7 -130.3 -129.6 -131.0 -128.7 -124.3 -117.2 -137.7 -136.5

Forecasted Y 4,166 3,969 4,868 7,330 3,564 5,053 4,946 4,590 4,538 5,292 5,212 4,995 5,620 4,974 5,918 3,724 4,622 3,790 4,118 5,148 5,164 4,042 3,631 3,518 3,806 3,451 3,753 4,014 3,488 4,623 3,994 3,964 3,983 3,955 3,958 3,707 2,957 3,536 3,777 3,509 3,622 3,722 3,608 3,797 3,638 3,441 4,067 4,541 4,664 3,856

MSQ 1.74E+06 4.00E+06 7.60E+05 2.12E+06 3.29E+04 1.91E+05 4.37E+05 1.64E+05 2.76E+04 3.67E+04 7.90E+05 8.22E+05 6.93E+04 5.24E+05 1.40E+06 3.51E+05 2.92E+05 1.51E+03 2.73E+05 7.65E+04 2.27E+04 5.32E+05 2.22E+05 3.76E+05 6.15E+05 7.07E+04 7.48E+04 6.16E+05 1.24E+04 5.97E+05 3.85E+05 8.56E+05 9.94E+05 2.01E+05 1.81E+02 1.53E+05 1.18E+04 2.67E+05 5.09E+05 2.14E+05 1.07E+05 2.46E+05 1.43E+04 5.16E+04 2.90E+05 3.88E+05 1.26E+06 6.25E+04 6.52E+05 6.81E+05

4.92E+05

160 Table 6-17: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RASEM for Nursing Homes

RASEM ( β 0 + β 1 ⋅ fpt + β 2 ⋅ nsptppt )

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Average:

β1 4608 4492 4695 4727 4293 4639 4597 4348 4601 4640 4547 4664 4616 4642 4776 4487 4567 4564 4661 4468 4837 4687 4564

-0.920 -0.929 -0.989 -1.056 -0.832 -0.938 -0.828 -0.845 -0.934 -0.932 -0.939 -0.981 -0.927 -0.855 -0.929 -0.911 -0.930 -0.849 -0.957 -0.877 -0.948 -0.954 -0.869

β2

Forecasted Y 0.125 0.162 0.113 0.123 0.162 0.127 0.100 0.152 0.129 0.126 0.136 0.125 0.129 0.115 0.103 0.143 0.138 0.124 0.128 0.139 0.096 0.128 0.119

4,423 4,690 3,914 3,283 4,262 4,613 4,892 4,315 3,704 4,430 3,944 3,071 4,449 3,890 4,195 4,044 5,494 3,351 4,754 4,405 4,297 4,496 4,803

MSQ 1.15E+04 1.62E+06 1.23E+06 8.74E+05 8.65E+05 1.05E+05 9.94E+05 1.45E+06 3.62E+04 2.02E+05 4.88E+05 9.79E+04 1.43E+05 1.14E+06 1.02E+06 3.47E+05 1.88E+04 1.90E+05 2.51E+05 4.44E+05 1.40E+06 1.56E+06 3.80E+05

6.47E+05

161 Table 6-18: Coefficients, Forecasts and MSQs Determined by Leave-One-Out Method for the RASEM for Schools RASEM ( β 0 + β 1 ⋅ nspt + β 2 ⋅ fpt ) β1 β2 Forecasted Y

Case

β0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

1484 1443 1485 1461 1473 1489 1503 1445 1673 1497 1478 1488 1559 1395 1476 1391 1389 1480 1635 1415 1465 1576 1402

49.079 50.343 48.432 49.437 49.570 50.339 51.626 50.826 35.520 49.233 49.856 49.902 41.767 52.655 49.299 55.184 55.851 49.330 46.517 51.450 49.734 45.224 50.264

-0.208 -0.185 -0.205 -0.201 -0.206 -0.223 -0.244 -0.192 -0.195 -0.214 -0.211 -0.212 -0.195 -0.201 -0.199 -0.202 -0.245 -0.208 -0.272 -0.198 -0.202 -0.219 -0.135

Average:

2,282 2,023 2,272 2,153 1,840 2,466 2,452 2,225 2,417 2,223 2,596 2,247 2,298 1,914 2,008 2,491 1,271 1,846 2,293 2,003 2,061 1,817 1,918

MSQ 1.73E+02 1.16E+04 3.30E+04 5.47E+04 8.99E+02 4.79E+04 3.08E+05 3.14E+04 8.84E+05 3.74E+04 3.76E+03 9.42E+04 9.48E+05 1.91E+05 1.37E+04 2.61E+05 1.31E+05 2.51E+02 6.15E+05 1.51E+05 3.42E+03 5.45E+04 1.54E+05

1.75E+05

Table 6-19: Signs of Coefficients for Selected Predictors

Positive Coefficients

Negative Coefficients

RJSEM Office Private Housing Nursing Home School

Constant, nsptppt and afp Constant, r Constant, nsptppt Constant, n2fpt

a2fp, bft and b2ft bft r r

RASEM Office Private Housing Nursing Home School

Constant, nspt and ppt spt and fb Constant, nsptppt Constant, nspt

n2 and fpt Constant, pb, fpt, and sb fpt fpt

Remark: Bold – Floor area related predictor

162

Table 6-20: Contributions of Floor Area Related Predictor to Response Case Office (β 1 ⋅ a2fp + β 3 ⋅ bft + β 4 ⋅ afp + β 5 ⋅ b2ft) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

-750 -431 -1,322 -525 -707 -4,540 -1,395 -330 -9,242 -2,137 -2,173 -570 -688 -15,890 -1,165 -238 -7,146 -435 -291 -1,208 -423 -337 -1,323 -583 -7,349 -556 -510 -524 -639 -1,074 -658 -540 -217 -269 -349 -10,290 -10,140 -1,113 274 -1,135 -2,083 -1,170

Regressed JSEM Private Nursing Housing Home (β 1 ⋅ bft + β 2 ⋅ r)

-596 -786 21 -36 -1,180 1,255 6 11 -72 236 -1,601 -688 -2 -6 -7 -482 8 -608 13 -122 -298 -254 -248 -485 1 -122 -80 -333 -499 -76 -644 -187 123 -147 -278 -53 -677 -923 -483 -910 -711 -501 -732 -125 -404 -115 -121 -63 -70 -709

(β 1 ⋅ r)

-999 -847 -1,366 -1,768 -332 -534 -396 -460 -1,274 -711 -1,187 -2,172 -738 -1,132 -779 -737 -477 -1,865 -461 -486 -712 -728 -416

School

Office

(β 1 ⋅ r + β 2 ⋅ n2fpt)

(β 3 ⋅ fpt)

-175 -364 -193 -140 -371 -34 -141 -42 -18 -289 -37 -198 -165 -471 -313 237 -1,229 -430 -211 -313 -168 -396 -587

Regressed ASEM Private Nursing Housing Home (β 2 ⋅ fb + β 4 ⋅ fpt)

-945 -598 -1,853 -630 -679 -6,077 -1,363 -399 -2,293 -1,747 -2,113 -489 -425 -11,830 -1,171 -378 -4,839 -474 -247 -758 -367 -241 -1,298 -517 -4,807 -481 -428 -629 -968 -2,305 -1,362 -487 -391 -477 -446 -4,958 -2,801 -1,557 -1,186 -1,324 -6,673 -759

Remark: Bold numbers represents positive contributions to the responses

-422 -610 -161 -62 -647 6,718 -20 -45 -117 5,379 5,953 6,348 -146 -18 -40 -529 -26 -434 -52 1,664 310 -162 1,070 -715 -17 2,156 -159 -281 -100 1,205 1,176 -147 -238 -303 -319 855 -425 -503 -295 -529 -427 -309 -439 -80 -250 -156 -106 278 -187 -386

(β 1 ⋅ fpt)

-1,058 -966 -1,474 -2,133 -333 -614 -414 -533 -1,485 -800 -1,155 -2,335 -848 -1,223 -891 -852 -553 -1,766 -531 -552 -813 -832 -480

School (β 2 ⋅ fpt)

-187 -461 -184 -178 -186 -110 -135 -311 -155 -186 -108 -204 -174 -218 -331 -207 -515 -219 -146 -194 -100 -213 -327

163

Table 6-21: Contribution of Non-Floor Area Related Predictors to Responses Case Office (β 2 ⋅ nsptppt)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

1,856 1,380 1,750 1,460 2,187 2,324 2,956 2,256 14,431 4,948 4,128 1,984 2,387 15,121 4,431 1,353 4,875 1,414 1,060 5,460 1,071 964 5,616 1,345 7,200 2,296 1,274 2,307 1,890 3,090 771 2,376 826 1,133 917 11,668 10,650 1,826 1,006 1,814 4,525 2,333

Regressed JSEM Private Nursing Housin Home (β 2 ⋅ nsptppt)

847 1,153 675 676 303 576 678 492 572 576 545 728 666 458 297 399 1,402 548 610 480 263 627 700

School

Office (β 1 ⋅ nspt + β 2 ⋅ n2 + β 4 ⋅ ppt) 4,580 3,815 4,427 3,794 4,866 6,446 5,508 5,072 8,608 6,758 6,731 4,612 4,737 13,279 6,604 3,966 7,474 3,851 3,565 7,572 3,516 3,453 7,533 3,804 8,121 4,721 3,659 5,107 4,354 5,576 3,182 4,992 3,296 3,625 3,220 9,912 5,020 4,801 3,598 4,666 8,746 5,523

Regressed ASEM Private Nursing Housing Home ( β 1 ⋅ spt +

(β 2 ⋅ nsptppt)

School (β 1 ⋅ nspt )

β 3 ⋅ pb + β 5 ⋅ sb) 10,678 10,387 11,120 15,887 10,630 4,721 11,247 11,038 11,025 6,282 5,568 4,855 11,928 11,250 13,147 10,761 10,949 10,597 10,315 9,735 11,313 10,748 8,758 10,482 10,678 7,663 10,408 10,586 9,957 9,532 9,035 10,108 10,746 10,587 10,648 9,453 9,815 10,419 10,271 10,430 10,390 10,517 10,408 10,114 10,030 9,499 10,867 10,618 11,361 10,625

Remark: Bold numbers represents positive contributions to the responses

873 1,164 693 689 302 588 709 500 588 590 552 742 681 471 310 409 1,480 553 624 489 273 641 719

985 1,041 971 870 553 1,087 1,084 1,091 899 912 1,226 963 913 737 863 1,307 397 585 804 782 696 454 843

164

6.2.5 Model Transformation

The regressed models with the logarithmic transformed variables can be expressed in the form of Equation (5.19) in Chapter 5. The response and all of the predictors in the regressed models were logarithmically transformed (in base e). The LRJSEM and the LRASEM represent the transformed models for the RJSEM and the RASEM, respectively. There is a key condition that governs the logarithmic transformation that all of the values of the transformed variables must be larger than zero. Unfortunately, the predictors of two of the regressed models do not satisfy this condition. As some of the office projects do not have podiums and private housing projects do not have basements, certain predictors, including afp and a2fp in the RJSEM for offices, and fb, sb and pb in the RASEM for private housing, cannot be transformed.

To fulfil the condition, these predictors were excluded in the

LRJSEM for offices and the LRASEM for private housing.

6.3 Performance Validation

6.3.1 Forecasting Results

To study whether the regressed models improve the performance of forecasts, their performance was compared with that of the conventional models. The same data for generating the regressed models were used to assess the performance of the conventional models. Forecasted tender prices for the JSEM, the floor area model

165 and the cube model were calculated using Equations (6.1) to (6.3), respectively, as follows: ⎫ ⎧⎛ 0.15 ⎞ 0.15 2 0.15 ⎞ ⎛ ⎪⎜ 2 − 2 ⎟a ⋅ fp + 2 a ⋅ fp + ⎜ 2 − 2 ⎟b ⋅ ft ⎪ ⎠ ⎝ ⎠ ⎪ ⎪⎝ ⎪ ⎪ 0 . 15 Pˆ = ⎨+ b 2 ⋅ ft + 0.15a ⋅ b ⋅ ft + r + (a + b) ⋅ spt ⋅ ppt ⎬ ⋅ R ⎪ ⎪ 2 ⎪ ⎪+ 2m ⋅ fb + 2.5m ⋅ pb ⋅ sb ⎪ ⎪ ⎭ ⎩

(6.1)

Pˆ ' = (a ⋅ fp + b ⋅ ft + m ⋅ fb ) ⋅ R '

(6.2)

Pˆ ' ' = (a ⋅ fp ⋅ sp + b ⋅ ft ⋅ st + m ⋅ fb ⋅ sb ) ⋅ R ' ' ,

(6.3)

where Pˆ , Pˆ ' and Pˆ ' ' are the forecasted prices for the JSEM, the floor area and cube models, respectively, and R , R ' and R ' ' are their corresponding unit rates that are deduced by cross validation as described in section 5.9 of Chapter 5. The quantities measured, the cross-validated unit rates and the forecasted tender prices for the three conventional models for offices, private housing, nursing homes and schools are shown in Tables E-1 to E-4 in Appendix E. The forecasted prices as shown in the tables were used to calculate the corresponding percentage errors for the purpose of making comparisons with the regressed models. To assess the performance of the best subset of regressed models, their forecasting results were compared with those that were obtained from the conventional models. First of all, the forecasting errors and percentage errors of all of the models were calculated. The forecasting errors for various conventional models and regressed models are shown in Tables F-1 to F-4 and the percentage errors are shown in Tables F-5 to F-8 in Appendix F. Table 6-22 shows a summary

166 of the means and standard deviations of the percentage errors that represent the bias and consistency of all the models as extracted from the appendix, and the results of the significance testing (p-values of the t-tests) for zero bias for all of the models. As expected, the forecasted prices from the models that were generated by the method of cross validation generally have very little bias, and most do not deviate significantly from zero. The only exception is the JSEM for offices. This model is significantly biased, and has the highest mean percentage error (-6.88%) amongst all of the models.

As bias alone is not informative enough to distinguish the

performance of the models, consistency becomes an important measure in this study. Unlike the t-tests that are used for the comparison of means, the use of parametric tests for the homogeneity of variance are not robust in their departure from normality, as is explained in section 5.9.1 of Chapter 5.

As parametric tests are more

preferable than non-parametric tests, the distribution of errors (in terms of the ratio of forecast to actual tender price) for all of the models were examined in order to choose the appropriate tests.

167 Table 6-22: Summary of Means and Standard Deviations of Percentage Errors

Office JSEM Mean % error (m) SD of % error p -value for t -test (H0: m=0) FLOOR AREA Mean % error (m) SD of % error p -value for t -test (H0: m=0)

Private Housing

Nursing Home

School

-6.88% 21.43% 0.04

-2.73% 29.04% 0.51

2.09% 20.03% 0.62

4.08% 21.25% 0.37

5.62% 27.32% 0.19

1.31% 23.53% 0.69

4.20% 24.45% 0.42

3.35% 21.45% 0.46

CUBE Mean % error (m) SD of % error p -value for t -test (H0: m=0)

0.16% 26.99% 0.97

1.47% 19.59% 0.60

5.75% 25.21% 0.29

3.56% 24.56% 0.49

RJSEM Mean % error (m) SD of % error p -value for t -test (H0: m=0)

3.06% 25.38% 0.44

4.84% 22.64% 0.14

3.21% 21.45% 0.48

3.41% 20.84% 0.44

afp, a2fp, bft, b2ft, nsptppt

bft, r

n2fpt, r

n2fpt, r

2.96% 22.15% 0.39

2.66% 15.95% 0.24

3.09% 21.36% 0.49

2.94% 19.56% 0.48

n2, fpt, ppt, nspt

fpt, fb, spt, sb,pb

fpt, nspt

fpt, nspt

1.87% 19.47% 0.54

2.27% 21.14% 0.45

1.44% 20.28% 0.74

2.14% 19.64% 0.61

ln(bft ), ln(b2ft ), ln(nsptppt )

ln(bft ), ln(r )

ln(n2fpt ), ln(r )

ln(n2fpt ), ln(r )

2.71% 21.86% 0.43

1.68% 17.60% 0.50

1.36% 19.69% 0.74

2.07% 20.06% 0.63

ln(n2 ), ln(fpt ), ln(ppt ), ln(nspt )

ln(fpt ), ln(spt )

ln(fpt ), ln(nspt )

ln(fpt ), ln(nspt )

Predictors RASEM Mean % error (m) SD of % error p -value for t -test (H0: m=0) Predictors LRJSEM Mean % error (m) SD of % error p -value for t -test (H0: m=0) Predictors LRASEM Mean % error (m) SD of % error p -value for t -test (H0: m=0) Predictors

Remark: Bold - p -value < 0.05, H0 is rejected (i.e., Mean % error is significantly different from zero)

168

6.3.2 Normality Testing

To use the parametric tests appropriately, the distributions of the forecast to actual tender price ratios should follow normality. If the models have to be transformed to fulfil the normality requirement, then the ratios for the models under examination should be transformed on the same basis.

Therefore, all of the

distributions of the ratios for the three conventional models, together with the distribution of the ratios for one of the regressed models (either with the untransformed variables or the transformed variables for comparison), would have to pass the normality tests before the parametric tests could be used to ascertain homogeneity of variance. The same requirement would also have to be applied to the comparison between two regressed models with untransformed variables and transformed variables. Table 6-23 shows the p-values of the Anderson-Darling (A-D) tests for normality. The ratios of forecast to actual tender price were used to produce the plot and to deduce the lambda value, rather than the percentage errors, to avoid the presence of negative values that handicap the transformation of the logarithm or square root. Seven distributions of the ratios of forecast to actual tender price were found to depart significantly from the norm at a confidence level of 95%. They were from the floor area model and the LRASEM for offices, the JSEM and the floor area and cube models for private housing, and the RJSEM and the RASEM for nursing homes. To normalise these distributions, a transformation was carried out using the Box-Cox normality plots, as is shown in Figures 6-1 to 6-7.

169 The best lambda (λ) values were determined from the normality plots and are summarised in Table 6-24. If the best λ equals 1, then no transformation can further normalise the distribution. If it equals 0.5, then a square root transformation is suggested; if 0, then a logarithmic transformation is suggested; and if -1, then reciprocal transformation is suggested.

As none of the lambda values for the

models in any particular type of building matches with any of the others, the ratios for each model under the same building type were transformed according to the same determined lambda value, with the exception of schools because all school models support the normality assumption.

The transformed ratios for each distribution

were then subjected to the A-D tests again to assess the normality of all of the distributions of the transformed ratios.

Unfortunately, the various attempts to

transform the ratios for the groups of models under comparison in sections 6.3.3 and 6.3.4 failed to normalise their distributions. Therefore, non-parametric tests were employed for the comparisons involving the seven models that failed to fulfil the normality requirement. Table 6-23: Results of Normality Tests for Percentage Errors According to Building and Model Types Anderson-Darling Tests (p-value) JSEM Floor Area Cube RJSEM RASEM LRJSEM LRASEM

Office 0.227