Towards a quality control of precipitation data

Article Meteorologische Zeitschrift, Vol. 17, No. 6, 733-749 (December 2008) c by Gebr¨uder Borntraeger 2008 Towards a quality control of precipita...
Author: Paul Walters
0 downloads 0 Views 1MB Size
Article

Meteorologische Zeitschrift, Vol. 17, No. 6, 733-749 (December 2008) c by Gebr¨uder Borntraeger 2008

Towards a quality control of precipitation data A RMIN M ATHES∗ , P ETRA F RIEDERICHS and A NDREAS H ENSE Meteorological Institute, University of Bonn, Germany (Manuscript received February 29, 2008; in revised form October 6, 2008; accepted October 6, 2008)

Abstract The priority program ’Quantitative Precipitation Forecast’ funded by the German Research Foundation (DFG) has been implemented in April 2004 for a period of six years. Within this program observations from almost 1000 rain gauges, which are not routinely used, have been collecting since 2007. Therefore a quality control of this observed raw data is needed. First of all a conditional probabilistic model for precipitation was developed. Based on the large scale atmospheric circulation or on small scale precipitation observations, the probability of precipitation exceeding a threshold is estimated. The model is applied to daily precipitation sums (1986– 1999) of 231 rain gauge stations from the German Weather Service (DWD) network in Nordrhein-Westfalen (NRW) and NCEP reanalysis data. The procedure is based on a generalized linear model using logistic regression (GLM). Relative vorticity, vertical velocity (both in 850 hPa), and relative humidity (in 700 hPa and 850 hPa) are used as predictor variables. Alternatively precipitation observations are also used as predictor variables. The statistical model forecasts are validated by observations in a cross validation modus using the Brier Skill Score (BSS) and relative operating characteristics (ROC) curves. It is shown that the model represents the observations very well. Zusammenfassung W¨ahrend des DFG-Schwerpunktprogramms SPP 1167 ‘Quantitative Niederschlagsvorhersage’ werden u.a. Niederschlagsbeobachtungen von nahezu 1000 Niederschlagsstationen gesammelt. Diese Stationen sind nicht im operationellen Gebrauch. Deshalb ist eine Qualit¨atskontrolle n¨otig. Dazu wurde ein bedingtes Wahrscheinlichkeitsmodell f¨ur Niederschlag entwickelt. Ausgehend von der bekannten großskaligen atmosph¨arischen Zirkulation oder von benachbarten Niederschlagsbeobachtungen wird die Wahrscheinlichkeit gesch¨atzt, dass der Niederschlag einen Schwellwert u¨ berschreitet. Das Modell basiert auf einem Generalisierten Linearen Modell (GLM) und benutzt eine logistische Regression. Es wird anhand von Niederschlags-Tagessummen von u¨ ber 230 Stationen des Deutschen Wetterdienstes (DWD) in Nordrhein-Westfalen (NRW) und NCEPReanalysedaten trainiert. Relative Vorticity, relative Feuchtigkeit und Vertikalgeschwindigkeit sowie die (jeweils benachbarten) Niederschlagsbeobachtungen stellen die Prediktorvariablen dar. Die statistischen Modellvorhersagen werden mittels einer Kreuzvalidation an den Beobachtungen validiert. Dabei werden der Brier Skill Score (BSS) und Relative Operating Characteristics (ROC) Kurven benutzt. Das Modell repr¨asentiert die Beobachtungen ausgezeichnet.

1

Introduction

The development of forecasting methods and observation systems has resulted in a constant increase of the quality of short-range (up to three days) and mediumrange (up to ten days) weather forecasts in the past years. However, owing to the fact that precipitation is highly influenced by small-scale and non-resolved processes, quantitative precipitation forecasting is still not nearly as accurate as predictions of synoptic-scale fields such as pressure, temperature, humidity, and wind velocity. Precipitation forecasts today are associated with the same deficiencies as 15 years ago (F RITSCH and C ARBONE, 2004). This is true for the forecast of the occurence of precipitation (precipitation yes/no) but the situation is even more dissatisfying for quantitative precipitation forecasts. At present, deficiencies include (1) the incomplete modeling of the components of the water cycle, (2) ∗ Corresponding

author: Armin Mathes, Meteorological Institute, University of Bonn, Auf dem H¨ugel 20, 53121 Bonn, Germany, e-mail: [email protected]

DOI 10.1127/0941-2948/2008/0347

gaps, non-resolved structures, and errors of the initial fields, (3) inadequate methods of linking observations with forecast models (assimilation of data into models), but also (4) basic problems of understanding and interpreting deterministic weather forecast models. Hence, a priority program 1167 ‘Quantitative Precipitation Forecast’ (PP 1167 ‘QPF’) funded by the German Research Foundation (DFG) was implemented in April 2004 for a period over 6 years. It has, amongst others, the following scientific objective: Determination and use of the potentials of existing and new data, and processing descriptions to improve quantitative precipitation forecast (H ENSE et al., 2006). Two observation periods are included within the PP 1167 ‘PQP’: a General Observation Period (GOP) and the Convective and Orographically induced Precipitation Study (COPS). One important aspect that must be covered under this objective is the consideration of data that have not yet been assimilated into numerical forecast models. The organization of the data collection and quality control during the GOP is one of the major aims of the PP 1167 ‘PQP’. Activities in the Mesoscale Alpine Programme

0941-2948/2008/0347 $ 7.65 c Gebr¨uder Borntraeger, Berlin, Stuttgart 2008

734

A. Mathes et al.: Towards a quality control of precipitation data

Meteorol. Z., 17, 2008

Figure 1: Orography of NRW. The white line is the border of NRW. The dark-gray areas are flat, light-gray areas represents mountainous regions. The flat area in the south of NRW is called ‘Cologne Bay’. The mountainous region in the southwest of NRW is called ’Eifel’ and in the southeast ‘Sauerland’ and ‘Rothaargebirge’.

(MAP) have shown that high density data sets of standard observations will provide essential informations for ¨ , model validation and verification (F REI and S CH AR 1998). Within the PP 1167 ‘QPF’ precipitation data from more than 1000 rain gauge stations with daily and subdaily time resolution, which are neither routinely archived nor used routinely by the German Weather Service (DWD), have been collected since 2007. They come from water authorities, environmental agencies, and university institutes and will be quality controlled. This data set substantially enhances the data coverage with daily and sub-daily observations far beyond the coverage defined by the operational network of the DWD. As these raw data are neither archived nor used a quality control is essential. Because of their strongly stochastic character, precipitation data need other quality control methods than temperature, pressure etc. For quality control of precipitation data many methods have been applied. These are interpolation based on statistical interpolation techniques, principal component (PC) analysis, spline surface fitting, and others. Statistical interpolation techniques are the so-called optimum interpolation or optimal averaging (G ANDIN, 1993) and the kriging methods (K RIGE, 1981). These methods are based upon the knowledge of the statistical structure function (semivariogram) or the correlation function. E.g., with the knowledge of the correlation function a scale dependent precipitation analysis can be performed (RUBEL, 1996). The correlation function depends on the considered space-time aggregation and can be characterised by decorrelation distances. Spline surface fitting includes a piecewise functional fitting approach which is based on a variational algorithm (DAQUAMAP, G ROEHN et al., 2000). The basic advantages of this method are that neither a first

Figure 2: Above: location of the 20 grid points (circles), where predictors were calculated, and the 231 rain gauge stations (dots); bottom: location of the 231 rain gauge stations only.

guess or (prognostic) model field is necessary nor a priori knowledge about structure or weighting functions is required. For precipitation the problems associated with this method have not been solved completely. Within the Baltic Sea Experiment (BALTEX) the precipitation observations were corrected for systematic measuring errors with the Dynamic Correction Model (F ORLAND et al., 1996; RUBEL and H ANTEL, 1999). Based on synoptic observations, necessary parameters (wind speed at the rim of the gauge, air temperature, relative humidity and precipitation intensity) for station-related correction can be derived on a daily timescale for individual precipitation events (RUBEL and H ANTEL, 1999). ¨ (1998) adopted a procedure that is F REI and S CH AR operational at the German Weather Service (B EHRENDT, 1992). In order to test the spatial consistency of precip-

Meteorol. Z., 17, 2008

A. Mathes et al.: Towards a quality control of precipitation data

itation every daily report is verified against some tolerance bounds derived from neighbouring stations. In contrast to these various techniques we have developed a probabilistic model for precipitation. At first we describe the procedure in order to assess the likelihood of precipitation conditional on a set of large scale flow variables. Alternatively the same probabilistic model is used in order to assess the likelihood of precipitation conditional on a set of small scale variables. These are precipitation observations from nearby stations. This means, given the large scale atmospheric circulation or the small scale precipitation observations the probability of precipitation exceeding a threshold is estimated. If at a specific rain gauge site that threshold is exceeded while the conditional probability based on the concurrent large scale weather situation or based on the precipitation observations for such an event is low, the respective observation is at least questionable and will be flagged. Effectively, this approach is identical to a well known method from statistical forecast postprocessing, the perfect - prog (PP) approach. Here we will describe the setup of the model, its training and assessment using cross validation and an application to a regional data set. The next step will be the enhancement of the model, so that a quantitative quality control of the observations is possible, i.e, to rate the likelihood of the observations given the estimated probabilities of the model. Improvements in predicting probability of precipitation (the predictor) are achieved by Model Output Statistics (MOS) in case that forecasts are available (G LAHN and L OWRY, 1972) or by a PP approach in case that (re)analysis variables of the large scale weather patterns are the predictors. The MOS/PP approach uses statistical relationships between observational data and model or reanalysis variables. In quantitative precipitation forecast statistical methods using products of numerical prediction models have been employed for many years with the MOS/PP generally based on multivariate linear regression schemes. In our case of a probabilistic model the method is based on logistic regression, because of the mixed discrete-continuous character of precipitation. Logistic regression is a special case of a formal generalization of linear regression concepts commonly summarized under the term generalized linear models (GLMs, FAHRMEIR and T UTZ, 1994). These statistical models can be applied to a broad class of problems characterized by a variety of theoretical distributions, the exponential family, which among others also embraces the binomial distribution as being appropriate for an event ¨ , 2001.). statistics (C HANDLER, 2005; F REI and S CH AR In comparison with other linear and non-linear statistical methodologies, logistic regression often performs best (A PPLEQUIST et al., 2002) with respect to modelling the probability of events. Concerning the correlation between atmospheric circulation and precipitation processes C LAUSSNITZER et

735

al. (2008) connect parameters of the atmospheric circulation with observed precipitation. They use a Dynamic State Index to evaluate precipitation processes. This article is organized as follows: Section 2 describes the data. In Section 3 the framework of logistic regression and the cross-validation technique are explained. Furthermore, the Brier Skill Score (BSS) and the relative operating characteristics (ROC) curves for verification are briefly introduced and finally some remarks about the covariates are made. In Section 4 results of verification are shown. Some case studies are presented in Section 5. In Section 6 we discuss some climatological aspects. Section 7 concludes the paper with a brief summary and some possible future applications.

2 Data We use daily precipitation totals from 231 rain gauge stations from the DWD network in Nordrhein-Westfalen (NRW) in Germany. NRW is – in terms of population and economic output – the largest and westernmost Federal State of Germany. It has more than 18 million inhabitants and comprises a land area of 34085 square kilometers. Figure 1 shows the orography of NRW. The data are available for the 15 year period 1985 to 1999. They represent accumulated precipitation from 7.30 GMT of the calender day till 7.30 GMT of the following day. In order to account for the seasonally varying character of precipitation, the daily precipitation totals are analysed separately for the four meteorological seasons of the year, i.e, March to May (MAM), June to August (JJA), September to November (SON), and December to February (DJF). The large scale predictor variables are taken from the NCEP re-analysis project (K ALNAY et al., 1996). The data are available on a 2.5◦ × 2.5◦ grid. We have chosen a sector between 2.5◦ E and 12.5◦ E, and 47.5◦ N and 55◦ N that covers parts of Mid-Europe. This is a region which is large enough to represent the large scale atmospheric circulation. Fig. 2 shows the 20 NCEP reanalysis grid points as well as the locations of the 231 rain gauge stations. As we deal with daily precipitation totals, we utilize NCEP re-analysis data at 00 GMT and 12 GMT. To represent different types of atmospheric processes responsible for precipitation in mid-latidude we chose the following variables as potential predictors, • relative vorticity at 850 hPa (ζ850 for the geostrophic dynamics), • vertical velocity at 850 hPa (ω850 ) for the ageostrophic dynamics, and • relative humidity at 850 hPa and 700 hPa (q850 , q700 ) representing the large scale moisture processes.

736

A. Mathes et al.: Towards a quality control of precipitation data

The precipitation observations are taken from the 231 rain gauge stations described above. In each case we used the observations from the 20 neighbouring rain gauge stations.

3

~ i ) = µi and a scale or dispersion parameter Φ E(yi |X which is constant. Structural assumption: The expectation value µi is ~ i through the link and rerelated to the covariates X sponse functions

Methods

~ , ~ iβ ηi = X

In this study we want to derive a probabilistic model for the exceedance of station precipitation above a given threshold conditioned on a given large scale atmospheric flow. Let RR denote the station precipitation, the threshold exceedance is defined as c=



1 0

: RR ≥ u : RR < u

(3.1)

Two threshold values will be considered: u = 0.1 mm, defining the precipitation event yes or no, and u = 10 mm. The statistical model relating the exceedance of precipitation above the threshold to the large scale atmospheric circulation model or to the neighbouring precipitation observations (indicated by the vector valued variable m) ~ should give an estimate of the conditional probability y = Prob(RR ≥ u|m). ~

3.1

Meteorol. Z., 17, 2008

with ~ and ηi = g(µi ) = X ~ . ~ i β) ~ iβ µi = h(ηi ) = h(X (3.3) h is a known sufficiently smooth response function and g the link function, i.e., the inverse of h. Thus, a specific GLM is fully characterized by the following two components: • the type of the exponential family • the response or link function The choice of an appropriate link function depends on the type of the exponential family. For each member of the exponential family of distributions there exists a so-called natural or canonical link function, ~ . ~ iβ gcan (µi ) = X

Generalized Linear Models

Logistic regression is used to estimate the conditional probability of a threshold exceedance. It is a special case of a formal generalization of the linear regression concept, commonly summarized under the term GLM. The class of GLMs is an extension of the standard linear regression model. GLMs model a response variable p to linearly depend on a multivariate covariate m ~ through a non-linear link function taking into account the probability density distribution of p. The distribution of the response variable can be any member of the so-called exponential family (i.e., the binomial, gamma, or lognormal distribution). The standard multivariate linear models are used extensively in statistical data analysis. They assume normally distributed response variables and the identity as link function, whereas GLMs are applicable to a much wider range of data types. Originally, GLMs have been introduced by N ELDER and W EDDERBURN (1972). A detailed description is given in FAHRMEIR and T UTZ (1994). To formulate a GLM, these assumptions of the linear regression model are relaxed, notably the assumption of a normally distributed response variable. We will assume that we have a sample of size ns of the random variable y: yi , i = 1, ns together with the large scale at~ i , i = 1, ns . For GLMs the following mospheric data X assumptions have to be satisfied: Distributional assumption: The yi are (conditionally) independent, and the (conditional) distribution of yi belongs to a simple exponential family with (conditional) expectation

(3.2)

(3.4)

In the case of binary responses one has to consider the binomial distribution, yi ∼ B(n, πi ), where yi ∈ (0, 1), n = 1 and πi (the probability of observing the event yi = 1) are the parameters of the binomial distribution. The canonical link-function for the binomial distribution is given by the logit function: gcan = log



π  1−π

(3.5)

and the logistic regression model can be written as: log



πi  ~ . ~ iβ =X 1 − πi

(3.6)

~ ~ i β) exp(X  ~ ~ i β) 1 + exp(X

(3.7)

This leads to πi = 

~ is estimated according to the maxiThe parameter β mum likelihood principle by fitting the logistic regression model to the observed event records. An iterative algorithm known as Fisher scoring (see FAHRMEIR and T UTZ, 1994) is used.

3.2

Model assessment and verification

The 14 years of daily data are first divided into the four main season MAM, JJA, SON, and DJF. The data of the four seasons are analysed separately. Thus, the analysis is carried out on a data set that comprises about 1260 daily values for each variable.

737

A. Mathes et al.: Towards a quality control of precipitation data

Meteorol. Z., 17, 2008

In order to prevent overfitting, the probabilistic forecasts are derived using cross-validation (M ICHAELSEN, 1987). Therefore the data set is further separated into a target period that contains the daily values of a specific season, and a training period that comprises the days of the season from the remaining 13 years. The GLM pa~ of the logistic regression are estimated using rameters β the data from the training period. To assess the quality of the model, the conditional probability of a threshold exceedance π ˆi , is calculated for each day of the target ˆ~ season using the estimate β of the GLM and the given ~ i of the target season. This is repeated for covariates X each target season in the 14 year time sequence. In order to assess the skill of the probabilistic forecasts the BSS (B RIER, 1950) and ROC curves (W ILKS, 1995) are used. The BSS is defined as: BS , (3.8) CS where BS is the Brier Score, which is defined as the mean squared difference between the predicted probabilities π ˆi and the observations ci BSS = 1 −

BS =

N 1 X (ˆ πi − ci )2 . N i=1

(3.9)

CS denotes the Brier Score of a reference forecast. Here we use the climatological forecast for π ˆc with π ˆc = PN c i . The BSS measures the relative improvement i=1 N of the forecasts over the reference forecast. It achieves values from negative infinity to one with the latter representing a perfect forecast, zero indicating that the forecast is just as good as the reference forecast and negative values indicating that the probabilistic forecast is inferior to the reference forecast. Regarding the ROC curves we consider the following two category contigency table Event predicted Event not predicted

Event observed a c

Event not observed b d

where a denotes the number of hits (correct forecasts of precipitation greater than a threshold), b the number of false alarms (forecasts of precipitation when no precipitation greater a threshold occured), c the number of misses (forecasts of no precipitation when precipitation greater a threshold occured), and d the number of correct rejections (correct forecasts of no precipitation greater a threshold). The hit rate (H) a (3.10) H= a+c gives the relative number of times an event was forecasted when it occured, whereas the false alarm rate (F) F =

b b+d

(3.11)

gives the relative number of times the event was forecasted when it did not occur. If, for a given contigency table, these two measures are plotted against each other, a single point results. It is most desirable that the hit rate be high and the false alarm rate be low. On the graph, the closer the point is to the upper left hand corner, the better the forecast. We stratified the probability forecasts into 10 % wide categories. If, e.g. 70 % is chosen as a threshold for forecasting precipitation, all estimated probability forecasts greater than 0.7 are counted as forecasts. The contigency table can be filled by comparing the forecasts to the observations. The hit and false alarm rate can be calculated and a point can be plotted. This process is repeated for all categories (10 %, 20 %, ..., 90 %). The result is a set of points that will usually form a smooth curve. This curve is called ROC curve. The area under the curve is a measure of forecast skill. A perfect forecast means all correct forecasts and no false alarms regardless of the threshold chosen. Such a forecast is represented by a curve that lies along the left-hand side of the graph to the upper left corner, from there along the upper side. The area under this curve is 1.0. A method with no skill (diagonal line in the figure) would have an area of 0.5, representing forecasts for which there are as many false alarms as there are hits. This system cannot discriminate between occurences and non-occurences of the event. A forecast giving more false alarms than hits is considered to have negative skill. BSS as well as the ROC curve are calculated solely from the crossvalidated data.

3.3

Covariates

Until now we have not specified the composition of the covariates. We deal with two different compositions. 3.3.1 Predictors: Large scale atmospheric circulation These covariates are taken from the NCEP reanalysis. As described in the data chapter we a priori selected the relative vorticity and vertical velocity, both in 850 hPa and at different times (00 and 12 GMT) and relative humidity at different times (00, 06, 12, and 18 GMT) and different heights (850 hPa and 700 hPa). From these variables we have tried different composites of the covariates and chosen the best fit. The composites vary from season to season. These variables have different physical units and vary over different orders of magnitude. To give equal weight to the different variables, each predictor variable xp is normalized using the expression xp (normalized) =

xp − xp,m sd

(3.12)

738

A. Mathes et al.: Towards a quality control of precipitation data

Meteorol. Z., 17, 2008

EOFs represents more than 90 % of the total variance of the normalized covariate. Hence, we composed the ~ from the first 20 EOFs. multivariate covariate X Fig. 3 shows the BSS for single predictors for all seasons. That means we composed the covariate only with one predictor variable. For threshold 0.1 mm q850 shows the most skill followed by ω850 . q700 plays a moderate role and is not shown here. For threshold 10 mm ω850 provides the largest skill, ζ850 and q850 assist. For large precipitation amounts particularly arising from stratiform precipitation events (autumn, winter, and spring) ω850 becomes more important. 3.3.2 Predictors: Precipitation observations As in the former case the question is how many predictor variables shall be used? Therefore we tested different number of rain gauge stations and found out that a number of 20 neighbouring stations ensures nearly the same information as if we took all 231 stations. Because the predictor variables consist exclusively of precipitation observations there is no need to normalize these data. So the observed values from 20 neighbouring stations were directly used to calculate the probability of precipitation at the considered rain gauge station.

4 Results 4.1 Figure 3: BSS of single predictors at all seasons for threshold 0.1mm (top) and threshold 10 mm (bottom) for station Attendorn. The predictors are rel. vorticity (Vort.), vertical velocity (VV), and relative humidity (HUM.).

where xp,m is the mean and sd is the standard deviation for each predictor calculated from the data set. The questions are how many predictor variables we shall use and which combinations of predictor variables are the best choice? We are looking for a few dynamically significant components of the atmospheric circulation. When too many predictors are used, the regression model will lead to an overfitting. In order to quantify the optimal number of spatial degrees of freedom representing the covariates and to find the significant components, a principal component (PC) analysis of the full (normalized) data vector is used. It identifies the relevant patterns of variability by decomposing the correlation matrix into its Empirical Orthogonal Functions (EOFs) (VON S TORCH and Z WIERS, 1999). In order to reduce the degrees of freedom of the covariate, only the leading PCs being the projections of the full data vector onto the main EOF’s enter the regression. Depending on the choice of the predictor variables included, the dimension of the predictor ranges from 20 (one predictor variable at 20 grid points) to 160 (eight predictor variables at 20 grid points). The leading 20

Climatology

As pointed out in chapter 2, we use 14 years of daily precipitation totals observed at 231 weather stations in NRW. With this time series we calculated a 14 year climatology (1986–99) for each season. Figure 4 shows the interpolated daily mean precipitation amount. As for climatological considerations it is interesting to see, particularly in winter, the low mountain ranges receive more precipitation than the other regions. Furthermore the low precipitation mean at the ‘Cologne Bay’ stands out. The patterns of the daily mean precipitation amounts are similar to the patterns of the climate atlas of Germany ¨ from 1999 for the time period 1961–90 (M ULLER W ESTERMEIER et al., 1999). Figure 5 shows the observed frequency of precipitation exceeding 0.1 mm compared to the mean of probability calculated from the estimated conditional probability of the time series (predictors are the surrounding observations). Figure 6 shows the same for threshold 10 mm. The patterns of observed frequency of precipitation and mean of probability for threshold 0.1 mm as well as for threshold 10 mm are similar. For example, the low frequency of precipitation at ‘Cologne Bay’ coincides with a low probability mean. The differences of the actual values between the observed frequency of precipitation and the probability are small and range from 0.0 to 0.026 (threshold 0.1 mm) and from 0.0 to 0.01 (threshold 10 mm). Regarding the large scale atmospheric circulation as predictors, the mean of the probability is similar and not shown here.

Meteorol. Z., 17, 2008

A. Mathes et al.: Towards a quality control of precipitation data

739

Figure 4: Mean of daily precipitation [mm] at each station as interpolated field plot. Time series of about 14 years (1986–1999).

4.2 4.2.1

Brier skill score Predictors: Large scale atmospheric circulation

The BSS derived from the modeled probabilities for the independent data set compared to observed precipitation occurence at 231 stations for all four seasons are displayed in Fig. 7 (0.1 mm threshold) and Fig. 8 (10 mm threshold). The BSS values at each station are shown as interpolated field plot for each season (predictors large scale atmospheric circulation on the left, predictors precipitation observations on the right). The black dots indicate the station positions. For all stations and all seasons, the logistic regression model performs much better than a climatological estimate (for threshold 10 mm it performs better than a climatological estimate except for eleven stations). The sampling error of the BSS is estimated as the 97.5 % (2.5 %) quantile of a bootstrap sample (E FRON et al., 1993) of size 1000 using the cross validated data (not shown here). The individual scores for threshold 0.1 mm range from 0.24 (±0.11) to 0.54 (±0.08) (–0.05 (±0.25) to 0.5(±0.11) for threshold 10 mm. The BSS of the threshold 0.1 mm (10 mm) averaged over all 231 stations, amounts to 0.46 (0.12) in spring, 0.44 (0.10) in summer, 0.46 (0.17) in autumn, and 0.44 (0.27) in winter. Only 58 cases regarding all seasons and all stations have a BSS below 0.4

(for threshold 10 mm only 27 cases have a BSS greater than 0.3 and these cases occur almost all in the winter season). In order to compare the BSS of two stations, we generated a time series of the difference values (Brier Scores) from each station and computed bootstrap confidence intervals on the differences (with equal samples). If the confidence intervals do not overlap, then the BSSs of these stations are statistically different (at this point we would like to thank one of the anonymous referees for this information). Based on this procedure, the confidence intervals are nearly half the size of the confidence intervals based on the sampling error of the BSS (see above). Threshold 0.1 mm In general most stations show larger skill for the seasons spring, autumn, and winter than for summer. There are significant differences with respect to the confidence intervals described above in the BSS between the stations. This could indicate that the data quality strongly varies from station to station. For example, in autumn and winter the dark-gray boxes southwest from Kahler Asten represent station Bad Laasphe-Rueckershausen (left hand side at Fig. 7). The BSS deviates strongly compared to the neighboring stations. Because there are

740

A. Mathes et al.: Towards a quality control of precipitation data

Meteorol. Z., 17, 2008

Figure 5: Frequency of precipitation (left) and mean of probability (right), threshold 0.1 mm, at each station as interpolated field plot; predictor: precipitation observations. Time series of about 14 years (1986–1999).

three other stations with significantly greater skill close to this station, it is unlikely that some physical processes are responsible for these different results; one rather has to suspect this station. For further discussion it is impor-

tant to take into account the local orography depicted in Fig. 1. In spring, the BSS is high and very homogenous except for the northeastern region (this is not the low

Meteorol. Z., 17, 2008

A. Mathes et al.: Towards a quality control of precipitation data

741

Figure 6: Frequency of precipitation (left) and mean of probability (right), threshold 10 mm, at each station as interpolated field plot; predictor: precipitation observations. Time series of about 14 years (1986–1999).

mountainous range Teutoburg Forest, but north-easterly therefrom) and for the ‘Cologne Bay’ in the south. In both regions the BSS is nearly 20 percent lower than in the other regions. For many stations, the lowest skill scores are obtained during summer season. During autumn there are high skill scores in mountainous regions

as well as in flat regions. Just like in spring, the northeastern region and the ‘Cologne Bay’ show lower skill than in the other regions. During winter, the BSS is highest only in mountainous regions (Sauerland, Rothaargebirge, and Eifel). The BSS during the autumn and winter season show strong spatial variances. It is noticeable that

742

A. Mathes et al.: Towards a quality control of precipitation data

Meteorol. Z., 17, 2008

Figure 7: BSS at each station as interpolated field plot; threshold: 0.1 mm. Predictors: large scale atmospheric circulation (left), precipitation observations (right). The black dots indicate the rain gauge stations.

for all seasons the scores in the region ‘Cologne Bay’ are less than the scores in the neighbourhood. Furthermore, in winter the scores of mountainous stations are better than scores of stations in the flat areas whereas in the other seasons the differences between flat and mountainous regions are not always significant (again with respect to the confidence intervals).

Threshold 10 mm Fig. 8 shows the BSS values at each station as interpolated field plot for each season (large scale atmospheric circulation as predictors on the left, precipitation observations as predictors on the right). During all seasons, the highest skill scores occur in mountainous regions (Sauerland, Rothaargebirge, and Eifel). BSS val-

Meteorol. Z., 17, 2008

A. Mathes et al.: Towards a quality control of precipitation data

743

Figure 8: BSS at each station as interpolated field plot; threshold: 10 mm. Predictors: large scale atmospheric circulation (left), precipitation observations (right). The black dots indicate the rain gauge stations.

ues greater than 0.2 were only found in mountainous regions and values greater than 0.4 additionally only in winter. The statistical model apparently better captures the stratiform character of large precipitation events in winter (and in parts of spring and autumn) than the more

convective precipitation events in summer (and in parts of spring and autumn). This could be because events with precipitation greater than 10 mm occur less often in flat regions and in summer compared to mountainous region and winter. Furthermore, there is evidence for both

744

A. Mathes et al.: Towards a quality control of precipitation data

Meteorol. Z., 17, 2008

Figure 9: ROC curves calculated from 14 mountainous stations (the figure shows the mean, station altitudes are greater than 500 m); threshold 0.1 mm (left), threshold 10 mm (right); predictors: precipitation observations. The error bars indicate the sampling error of the hit rates and the false alarm rates. Scales on the right are different because of clearness.

thresholds 0.1 mm and 10 mm that skill scores are high where precipitation is high and vice versa, skill scores are low in regions where precipitation is low. An example for the latter is the region ‘Cologne Bay’, where precipitation is low during all seasons: the skill scores are also low. 4.2.2

Predictors: Precipitation observations

Utilizong precipitation observations as predictors leads to much greater skill compared to the large scale circulation as predictors. The BSS derived from the modeled probabilities for the independent data set compared to observed precipitation occurence at 231 stations for all four seasons are displayed in Fig. 7 (0.1 mm threshold) and Fig. 8 (10 mm threshold). The BSS values at each station are shown as interpolated field plot for each season (predictors large scale atmospheric circulation on the left, predictors precipitation observations on the right). The black dots indicate the station positions. The

sampling error of the BSS is estimated as the 97.5 % (2.5 %) quantile of a bootstrap sample of size 1000 using the cross validated data (not shown here). The individual scores for threshold 0.1 mm range from 0.34 (±0.1) to 0.73 (±0.07) (0.3 (±0.26) to 0.91(±0.14) for threshold 10 mm). The BSS of the threshold 0.1 mm (10 mm) averaged overall 231 stations amounts to 0.64 (0.63) in spring, 0.64 (0.57) in summer, 0.59 (0.71) in autumn, and 0.60 (0.72) in winter. Threshold 0.1 mm As in the previous section there are significant differences with respect to the confidence intervals in the BSS between the stations. Again, station Bad LaaspheRueckhershausen (the dark-gray boxes southwest from Kahler Asten in Fig. 7) shows least skill for spring, autumn, and winter, with the skill in autumn and winter being exceptionally low. Nevertheless the average of the BSS is nearly 25 % greater compared to the BSS with

Meteorol. Z., 17, 2008

A. Mathes et al.: Towards a quality control of precipitation data

745

Figure 10: Case studies for correct probability forecasts. Values are calculated at each station and interpolated as field plot. Probability of precipitation is on top. In the middle the observations are displayed as binary variable (1 for precipitation event, 0 for no event). At bottom the observational data [mm] are shown.

the large scale atmospheric circulation as predictor. In mountainous regions the BSS shows high skill in spring and summer and low skill in autumn and winter. Regarding the seasons separately these results are opposite to the results with the BSS calculated with large scale atmospheric circulation as predictor (left hand side). There is no correlation between mountainous regions and high BSS values (compared to flat regions). That means that it does not matter whether stations have much or less precipitation, e.g. the BSS in the Cologne Bay are not lower than in the neighbourhood. In spring the BSS

at the Eifel is high (also in the westerly part of NRW). In summer the Sauerland and Rothaargebirge in the southeast of NRW show high skill. Threshold 10 mm Fig. 8 shows on the right the BSS values calculated with predictors precipitation observations as predictor. BSS is mostly somewhat greater compared to the BSS for threshold 0.1 mm but the sampling error and the standard deviation (not shown here) are greater. For all seasons the BSS in the ‘Cologne Bay’ is less than the BSS

746

A. Mathes et al.: Towards a quality control of precipitation data

4.3

Meteorol. Z., 17, 2008

ROC curves

Fig. 9 shows ROC curves of both thresholds (0.1 mm on the left, 10 mm on the right) and for summer (JJA) and winter (DJF) season. ROC curves for spring and autumn are similar and not shown here. The ROC curves are calculated from the mean of hit rates (H) and false alarm rates (F) of 14 mountainous stations (station altitudes are greater than 500 m). The elements in the legend indicate the (H,F)-pairs calculated from the several threshold categories (10 %, 20 %, ..., 90 %) and the dotted curves are the corresponding curves. The solid line is an example for a forecast method with no skill. The sampling error of the hit rates and the false alarm rates is estimated as the 97.5 % ( 2.5 % ) quantile of a bootstrap sample of size 1000 using the cross validated data. If elements lie in the upper right corner or near the upper right corner, the counts in box d (correct rejections) of the contigency table are zero or almost zero (only for threshold 0.1 mm). That indicates a false alarm rate of 1.0 or nearly 1.0. This mostly occurs for the lowest threshold 10 %. If this is the case the probability forecast predicts precipitation greater the specified threshold (0.1 mm) at all times. The areas under the dotted curves in Fig. 9 range from 0.93 to 0.94 for threshold 0.1 mm and from 0.92 to 0.96 for threshold 10 mm. These values are quite high and indicate a very good skill. Concerning all 231 stations, the areas under the curves range from 0.85 to 0.96 for threshold 0.1 mm and from 0.83 to 0.98 for threshold 10 mm. As in the case of the BSS, the station Bad LaaspheRueckershausen shows a low skill (0.85) compared to other stations for threshold 0.1 mm (not shown here). Note that we applied the ROC curves on precipitation observations as predictors only.

5 Case studies In the following we show two examples of modeled probability with no apparent measurement problems and one example where an apparent erroneous observational datum could be detected. In all cases we used the precipitation observations as predictor in order to estimate the probabilities. Figure 11: Case study for erroneous observational data. Values are calculated at each station and interpolated as field plot. Probability of precipitation is on top. In the middle the observations are displayed as binary variable (1 for precipitation event, 0 for no event). At bottom the observational data [mm] are shown.

in the neighbourhood. Stations in mountainous ranges show high BSS values except the Eifel in the summer. This confirms the argument, that the statistical model better captures stratiform character of large precipitation events in winter (and in parts of spring and autumn) than the more convective precipitation events in summer (other than for threshold 0.1 mm with precipitation observations as predictor).

5.1

Precipitation on August 27th 1998

The first example is on August 27th 1998 using a threshold of 0.1 mm. Germany was influenced by a large low pressure system over the Baltic Sea. A trough of this system passed Germany during this day. Figure 10 (top left) shows the estimated probability. The probabilities are below 0.4 in the western part of NRW (dark grey boxes), whereas they are significantly higher in the eastern part of NRW with probabilities ranging from 0.6 to 0.9. The regions with no observation of precipitation (middle and bottom left) indeed coincide with the regions of low probability. Neither do we observe a station with precipitation in an area of low probability based on surrounding rain observations nor vice versa.

Meteorol. Z., 17, 2008

5.2

A. Mathes et al.: Towards a quality control of precipitation data

Strong precipitation on January 14th 1999

This is an example using a threshold of 10 mm. The weather situation on January 14th in 1999 was determined by a strong Westerly flow caused by a lowpressure system over the northeast Atlantic and high pressure over southern Europe. Two front systems passed over NRW, the first in the night from 13th to 14th , the second in the night from 14th to 15th . Such a weather situation is often associated with strong precipitation especially in the low mountain ranges ‘Sauerland’ and ‘Rothaargebirge’. The probability for precipitation greater than 10 mm is high only for the low mountain range in the southeast of NRW (Figure 10 top right). The observations again fit well into this modeled probability structure (middle right and bottom right). Again no apparent station anomaly can be seen.

5.3

Erroneous observation on July 17th 1999

This case study discusses an example for an erroneous observation. On that day, the weather situation was characterized by moderate high pressure over middle Europe and low pressure over the northeast Atlantic. The probability of precipitation using a threshold of 0.1 mm for this day was very low with values less than 0.01 for the whole area. However, the station Wermelskirchen northeasterly of Cologne reported precipitation (0.2 mm) (Figure 11 middle and bottom). This observation appears at least questionable and would have been flagged by a quality control using the conditional probability as an error measure.

6

Conclusion

In order to assess the likelihood of precipitation, a conditional probabilistic model for precipitation was developed. Seasonal probabilistic forecasts of 24-h precipitation totals exceeding thresholds of 0.1 mm and 10 mm are calculated using a logistic regression model. This model is similar to a MOS procedure. Statistical relationships among observed data and model variables were used. Precipitation data from 231 rain gauge stations in NRW were used as predictand. Two different predictor variables were applied. First, the predictors were chosen from NCEP reanalysis data and composed of the EOFs of a combined vector of relative vorticity, vertical velocity, and relative humidity. The major part of the variance of the predictor variables are represented in the multivariate covariate. Alternatively we have chosen precipitation observations from nearby stations as predictor variables. Thus, the information which are used to assess the likelihood of precipitation is the large scale atmospheric circulation respectively the precipitation observations from nearby stations and the observations itself.

747

The BSS and the ROC curves were used as a measure of the skill and were calculated for each station. In general, for threshold 0.1 mm the statistical model shows significant skill at all stations using BSS score as well as ROC curves. For threshold 10 mm there is only significant skill if we choose the precipitation observations as predictor. If the predictors are calculated with the large scale atmospheric circulation, the model reproduces the stratiform character of precipitation in winter better than the more convective precipitation events in summer and in parts of spring and autumn. If the predictors are calculated with the neighbouring observations, the model shows greater skill but no significant differences concerning the seasons. However, the results show that the probabilistic forecast model is quite reliable in representing the observations in all seasons. It suggests itself to combine both predictors, the large scale atmospheric circulation and the precipitation observations in order to get a powerful covariate. Since these are independent they ought to lead to still better probabilistic models. However, the combination did not lead to a better probabilistic model. The BSS values are not significantly different to the BSS values calculated by using only large scale atmospheric circulation. The reason why the performance of the precipitation data as predictor is better is certainly based on the high station density of the precipitation observations. We tested the performance with lower station density which confirmed this presumption. Furthermore, instead of taking the nearest neighbours we selected stations with the same elevation of the stations. However, there was no improvement of the results. In contrast, the skill got worse. Outlook As one has seen from the example of the case studies, the model is able to detect possibly erroneous data. This must be verified in general. The next step has to be the enhancement of the model, so that a quantitative quality control of the observations is possible, i.e., to rate the likelihood of the observations given the estimated probabilities of the model. The procedure for estimating rainfall probability from large scale atmospheric flow fields will be implemented as the basic QC. The advanced QC method, where the conditional probability of a certain observation given its neighbouring observations is evaluated, will form the basis for a probabilistic space-time modeling of high resolution precipitation observations. While the QC is applied to a specific observation we will introduce an additional hidden observational data point e.g. a grid point. The exceedance of a fixed threshold at this unobserved point is a Bernoulli trial with a certain probability which can be calculated given the surrounding observations similarly as in the QC case by maximizing the likelihood for a general linear model. Unfortunately

748

A. Mathes et al.: Towards a quality control of precipitation data

the yet unknown probability or even the Bernoulli trials at the hidden grid point are unknown. However, this is the typical setup for the Expectation-Maximization (EM) algorithm (D EMPSTER et al., 1977): Given the surrounding observations and a first guess of the regression coefficients for the GLM, a first guess for the threshold exceedance/events of the Bernoulli trial can be calculated (expectation). From these data a new GLM can be estimated by maximizing the likelihood (maximization), which can be used to return to the expectation step. Upon convergence this will provide at any grid-point a probability for exceeding the given threshold including zero as a function of the surrounding observations. The EM algorithm is widely fused in statistics. In meteorology there are a couple of applications, e.g. A LAVI et al. (2006) filling gaps in measurements of latent heat fluxes. They compared several gap-filling methods including the EM algorithm. The results demonstrated that a Kalman filter approach provides a closer approximation of the original data and introduces smaller errors than the EM algorithm. Another example is the analysis of incomplete climate data (S CHNEIDER, 2001). In this article it is shown, that a modified EM algorithm leads to more accurate estimates of missing surface temperature than a conventional noniterative imputation technique. The application of the quality control procedure to numerical weather prediction output in real time implies the use of precipitation observations as predictors, because the dataset of NCEP reanalysis data is not available in real time. Alternatively it is possible to use short-range forecasts as predictor variables. According to MOS procedures these forecasts have to be available for an appropriate time period in order to get a training data set. Another feasible application could be the assistance by the construction of a gridded dataset of hourly precipitation in Germany (PAULAT et al., 2008). Thereby a quality control of 24-hour precipitation observations could be useful to flag questionable measurements.

References A LAVI , N., J.S. WARLAND , A.A. B ERG , 2006: Filling Gaps in Evatranspiration Measurements for Water Budget Studies: Evaluation of a Kalman Filtering Approach – Agricult. Forest Meteor. 141, 57–66. A PPLEQUIST, S., G.E. G AHRS , R.L. P FEFFER , 2002: Comparison of Methodologies for Probalistic Quantitative Precipitation Forecasting. – Wea. Forecast. 17, 783–799. B EHRENDT, J., 1992: Dokumentation ROUTKLI: Beschreibung der Pr¨ufkriterien im Programmsystem QUALKO. – Deutscher Wetterdienst, Abteilung Klimatologie, Offenbach a.M., Germany, 12 pp. B RIER , G.W., 1950: Verification of forecasts expressed in terms of probability. – Mon. Wea. Rev. 78, 1–3. C HANDLER , R.E., 2005: On the use of generalized linear models for interpreting climate variability. – Environmetrics 16, 699–715.

Meteorol. Z., 17, 2008

C LAUSSNITZER , A., N E´ VIR , P., L ANGER , I. R EIMER , E., U. C UBASCH , 2008: Scale-dependent analyses of precipitation forecasts and cloud properties using the Dynamic State Index. – Meteorol. Z. 17, 813–825. D EMPSTER , A.P., N.M. L AIRD , D.B. RUBIN , 1977: Maximum likelihood from incomplete data via the EM algorithm. – J. Roy. Stat. Soc., Series B, 39, 1–38. E FRON , B., R.J. T IBSHIRANI , 1993: Cross Validation and other estimates of prediction error. An Introduction to the Bootstrap. – Monographs on Statistics and applied Probability 57; Chapman & Hall. Kapitel 17, 237–257. FAHRMEIR , L., G. T UTZ , 1994: Multivariate Statistical Modelling Based on Generalized Linear Models. – Springer-Verlag, New York. ¨ , E. E LO F ORLAND , E.J., P. A LLERUP, B. DAHLSTR OM ¨ A¨ , P. R ISSA MAA , T. J ONSSON , H. M ADSEN , J. P ER AL NEN , H. V EDIN , F. V EJEN , 1996: Manual for Operational Correction of Nordic Precipitation Data. – Norwegian Meteorological Institute, Oslo, 66 pp. ¨ , 1998: A Precipitation Climatology of F REI , C., C. S CH AR the Alps from high-resolution Rain-Gauge Observations – Int. J. Climatol. 18, 873–900. —, —, 2001: Detection Probability of Trends in Rare Events: Theory and Application to Heavy Precipitation in the Alpine Region. – J. Climate 14, 1568–1584. F RITSCH , J.M., R.E. C ARBONE , 2004: Improving Quantitative Precipitation Forecasts in the Warm Season: A USWRP Research and Development Strategy. – Bull. Amer. Meteor. Soc. 85, 955–965, DOI: 10.1175/BAMS85-7-955. G ANDIN , L.S., 1993: Optimal Averaging of Meteorological Fields. – National Meteorological Center, Office Note 397, 68 pp. G LAHN , S., D. L OWRY, 1972: The Use of Model Output Statistics (MOS) in Objective Weather Forecasting. – J. Appl. Meteor. 11 1203–1211. ¨ G OBER , M., C. A. W ILSON , S. F. M ILTON , D. S TEPHEN SON , 2004: Fairplay in the Verification of Operational Quantitative Precipitation Forecasts. – J. Hydrol. 288, 225– 236. ¨ G ROEHN , I., R. S TEINACKER , C. H ABERLI , W. ¨ P OTTSCHACHER , M. D ORNINGER , 2000: Data Quality Control of MAP (DAQUAMAP). – MAP Newsletter 13, 6–8. H ENSE , A., G. A DRIAN , C H . KOTTMEIER , C. S IMMER , V. W ULFMEYER , 2006: The German Priority Program SPP1167 PQP Quantitative Precipitation Forecasting: an overview. – 2nd International Symposium on Quantitative Precipitation Forecasting (QPF) and Hydrology, Boulder, CO, USA, June 4–8, 2006. K ALNAY, E., M. K ANAMITSU , R. K ISTLER , W. C OLLINS , D. D EAVEN , L. G ANDIN , M. I REDELL , S. S AHA , G. W HITE , J. W OOLLEN , Y. Z HU , A. L EETMAA , R. R EYNOLDS , M. C HELLIAH , W. E BISUZAKI , W. H IG GINS , J. JANOWIAK , K.C. M O , C. ROPELEWESKI , J. WANG , R. J ENNE , D. J OSEPH , 1996: The NCEP/NCAR 40-year Reanalysis Project. – Bull. Amer. Meteor. Soc. 77, 437–471. K RIGE , D.G., 1981: Lognormal-de Wijsian Geostatistics for Ore Evaluation. – South African Institute of Mining and Metallurgy Monograph Series: Geostatistics 1, Johannesburg, 51 pp.

Meteorol. Z., 17, 2008

A. Mathes et al.: Towards a quality control of precipitation data

M ICHAELSEN , J., 1987: Cross-validation in statistical climate forecast models. – J. Climate Appl. Meteor. 26, 1589– 1600. ¨ M ULLER -W ESTERMEIER , G., A. WALTER , E. D ITTMANN , 1999: Klimaatlas Bundesrepublik Deutschland, Teil 1. – Deutscher Wetterdienst. N ELDER , J., R.W.M. W EDDERBURN , 1972: Generalized Linear Models. – J. Roy. Statist. Soc. A 135, 370–384. PAULAT, M., C. F REI , M. H AGEN , H. W ERNLI , 2008: A gridded dataset of hourly precipitation in Germany: its construction, climatology, and application. – Meteorol. Z. 17, 719–732. RUBEL , F., 1996: Scale dependent statistical Precipitation Analysis. – Proceedings of the International Conference on

749

Water Resources & Environmental Research: Towards the 21st Century, Kyoto, Japan, Volume I, 317–324. RUBEL , F., M. H ANTEL , 1999: Correction of daily rain gauge measurements in the Baltic Sea drainage basin. – Nordic Hydrol. 30, 191–208. S CHNEIDER , T., 2001: Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. – J. Climate 14, 221–232. VON S TORCH , H., F W. Z WIERS , 1999: Statistical Analysis in Climate Research. – Cambridge University Press, 484 pp. W ILKS , D., 1995: Statistical Methods in the Atmospheric Sciences. – Academic Press, San Diego.