Seismic Traces. Well Logs sand gravel sand gravel

Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data ABEL RODRIGUEZ Universidad Simon Bolvar, Venezuela abel esma.usb.ve  GI...
Author: Scot Barrett
3 downloads 2 Views 227KB Size
Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data ABEL RODRIGUEZ

Universidad Simon Bolvar, Venezuela abel esma.usb.ve

 GISELLE ALVAREZ

Universidad Simon Bolvar, Venezuela giselle esma.usb.ve

 BRUNO SANSO

University of California Santa Cruz, USA brunoams.u s .edu

SUMMARY We onsider a Bayesian referen e analysis for omparing two samples whi h follow Lapla e distributions as well as the analysis of the performan e of a Bayesian lassi ation pro edure for su h a model. The methodology is motivated by an oil exploration problem where the geologi al stru ture of a reservoir has to be des ribed. The aim is to hara terise the lithology of the reservoir using well registers and seismi tra es. We use a ontinuous wavelet transformation (CWT) of the original data and estimate the power of the transformation that links the energy of the CWT to its s ale. For both well logs and seismi tra es, we found that su h an estimate follows a Lapla e distribution with lo ation depending on whether the lithology orresponds to gravel or sand. In this work we explore the omparison problem from an obje tive Bayesian viewpoint through the use of the Intrinsi Bayes Fa tor (IBF).

Keywords:

LAPLACE DISTRIBUTIONS, INTRINSIC BAYES FACTORS, BAYESIAN

.

CLASSIFICATION

1. INTRODUCTION In this paper we onsider the problem of omparing two populations of independent samples from Lapla e or double exponential distributions using a non subje tive Bayesian approa h. In parti ular, we employ the Intrinsi Bayes Fa tor (IBF) presented by Berger and Peri

hi (1996). We also onsider the impli ations of the results obtained in this omparison on the behaviour of a Bayesian lassi ation algorithm. This study is motivated by an oil exploration problem where the geologi al stru ture of a reservoir has to be des ribed using seismi tra es and -ray logs of wells. This kind of data is shown to have a Lapla e distribution under suitable transformations whi h in lude the use of ontinuous wavelet transformations (CWT).

2

 A. Rodrguez, G. Alvarez and B. Sanso

However, the results presented here have mu h broader impli ations. Lapla e distributions have been proposed as a way to obtain statisti al pro edures that are more robust to outliers than those based on normal distributions. Some interesting appli ations are those of Manly (1976) who uses the Lapla e distribution to model tness fun tions, Easterling (1978) who onsiders a model for steam generator inspe tion as exponential responses with Lapla e errors and Bag hi et al. (1983) who use the Lapla e distribution while modeling demand during lead time for slow-moving items. The paper is organised as follows: rst, we des ribe the motivating problem and the data at hand. Then we pose the identi ation of di eren es in the lithology as a

omparison of models based on Lapla e distributions, obtain the predi tive distributions required for the al ulation of the IBF and perform those al ulations for our problem. Finally, we show a Bayesian approa h that an be used for the lassi ation of new observations and provide ROC urves that an be used to asses the mis lassi ation probability of the method if the existing data are used as the training sample. 2. DESCRIPTION OF THE DATA Seismi tra es and -ray logs are two types of data ommonly available in the area of geophysi al exploration. Well logs are signals of high frequen y re orded at well lo ations that an be helpful to di erentiate among ro k types. In our study we analyse a type of radioa tivity log, the -ray log. However, well logs are only available at lo ations where wells have been drilled. Previous to drilling at a spe i lo ation, an exhaustive seismi exploration of the reservoir is done. Seismi tra es are signals of low frequen y, dete ted at the surfa e, that register the motions aused by vibratory sour es in the earth. These re ords are generally available in the whole area and not only at the wells. Our data onsist of -ray logs for 17 wells, 9 of them known to be lo ated in lithologies that are predominantly gravel and the rest known to be lo ated in lithologies that are predominantly sand. The wells were drilled in a reservoir lo ated in Western Venezuela, overing an area of 100 Km2 for whi h seismi data are available. We are interested in dete ting di eren es in the well logs and seismi tra es arising from the lithologies and, in ase these di eren es exist, use the seismi data as training samples for a Bayesian lassi ation algorithm. 3. A MODEL FOR THE ENERGY OF SIGNALS The appli ation of statisti al methods on -ray logs and seismi tra es for hara terisation of oil reservoir lithology has been an uprising idea in the last years. Some examples are the works of Addy (1998) where neural networks are employed or that of Dumay and Fournier (1988), who apply luster and fa tor analysis. Both works base their methods on the raw data. A di erent approa h, presented on Jimenez et al. (1999), treats these signals as \times series" indexed on depth, whi h allow the authors to perform a dis rete wavelet de omposition of the signals. However this method has redu ed dis rimination power sin e the dis rete transformation provides only a limited number of points for low

Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data

3

frequen ies. To solve this we use the ontinuous wavelet transform, whi h is de ned for any square integrable fun tion f (t) as a fun tion of two variables (in our ase, depth and frequen y), Z F (a; b) = f (x) a;b(x) dx where, a;b (x) denotes the onjugate of a;b (x). The fun tion a;b (x) is obtained by translations and dilations of the mother wavelet , that is a;b (x) = xa b , for details see Vidakovi (1999). We use the Haar's base in the appli ation we present in this paper, but other options (like the \Mexi an hat" and the Gaussian) have provided similar results. There is a link between the energy of F and the regularity of f . In fa t, if f 2 C (Holder spa e with exponent ) then the lo alised energy is proportional to a power of the s ale, that is, (F (a; b))2 / a2 +1 : (1) Thus, after applying the CW T to the original signal, an estimate of is obtained for ea h depth b by regressing the points logf[F (a; b)℄2g on log(a). This pro edure, presented originally in Katul et al. (1999), an be onsidered as a generalisation of the one proposed in Jimenez et al. (1999). Well Logs

0.4 0.35

0.35

sand gravel

sand gravel

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05 0 −10

Seismic Traces

0.4

−5

α

0

5

0 −10

−5

α

0

5

Figure 1. Empiri al distribution of the power parameter . On the left panel the estimated densities of the oeÆ ients orresponding to sand and gravel wells orresponding to -ray logs. An analogous plot is shown on the right panel for seismi tra es around the same wells.

If we assume that there is no hange in the lithology with depth, the values of just obtained an be though as an iid sample of the energy of the signal. Now, for the data at hand, this distribution happens to have a peaky density that resembles the Lapla e distribution (see gure 1). To assess the validity of this assumption we

onsidered the more general family of power exponential distributions presented in Box and Tiao (1973), de ned by  2  i  1+ 1 p( i j ; ; ) = w( ) exp ( ) 

 A. Rodrguez, G. Alvarez and B. Sanso

4

where

( ) =



[ 23 (1 + )℄ [ 12 (1 + )℄



1 1+



1

w ( ) =

[ 23 (1 + )℄ 2

:

(1 + )f [ 12 (1 + )℄g 2 3

In this de nition 1 <  < 1 and 0 <  < 1 are, respe tively, lo ation and s ale parameters, while 1 <  1 is a measure of kurtosis. Changes on produ e a wide range of shapes, from smooth densities like the normal ( = 0) to peaky ones, like the Lapla e ( = 1). We onsider independent referen e priors for ea h parameter to obtain the posterior distribution of ;  and . Figure 2 shows the marginal distribution of in our ase. We observe a high on entration around one whi h on rms our sele tion of the Lapla e distribution. Sin e this same behavior has also been present in resistivity logs and in data obtained from di erent lo ations, we are tempted to assume that this is a general distributional result for these types of signals. Logs in Gravel

Logs in Sand 0.8

0.7

0.7 0.6

0.6 0.5

0.5

p(β |y)

p(β |y)

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

0.1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3

0.4

0.6

0.7

0.8

0.9

1

0.9

1

Seismi Tra es in Sand 0.8

0.7

0.7

0.6

0.6

0.5

0.5

p(β |y)

p(β |y)

Seismi Tra es in Gravel 0.8

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0

0.5

β

β

0.1

0

0.1

0.2

0.3

0.4

0.5

β

0.6

0.7

0.8

0.9

1

0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

β

Plot of  ( j ) (up to a multipli ative onstant) for -ray logs (upper ouple) and seismi tra es (lower ouple) for the two lithologies involved in our problem. In all ases = 1 is learly favoured by the data.

Figure 2.

Conditional on = 1, the posterior mean for  and  for ea h lithology an be seen on table 1.

Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data Table 1.

5

Estimated parameters for gravel and sand in well logs (left) and seismi tra es (right). Well Logs

Seismi Tra es

Lithology

E( j )

E( j )

n

Sand Gravel

-0.6500 -2.4550

1.5543 1.5953

4096 4608

Lithology

E( j )

E( j )

n

Sand Gravel

-0.8000 -2.4002

1.7013 1.7300

1024 1152

4. LITHOLOGY DISCRIMINATION It is apparent from Figure 1 and Table 1 that the lo ation of the densities hange with the lithology, pointing at the possibility of posing the problem of identifying di eren es between lithologies as one of omparing Lapla e populations. Indeed, it is important to verify if the di eren e is just on the mean or is also due to the varian e. Let x 2 Rnx be the ve tor of power oeÆ ients for -ray logs lo ated in gravel and y 2 Rny the ve tor of power oeÆ ients for logs in sand. Let z 0 = (x0 ; y0 ) and n = nx + ny . Based on the previous results we assume that xi  La(xi j 1 ; 1 ) i = 1; : : : ; nx and yi  La(xi j 2 ; 2) i = 1; : : : ; ny . So we need to onsider the following hypotheses M1 : 1 = 2 =  M2 : 1 6= 2 M3 1 = 2 =  M4 : 1 6= 2 1 = 2 =  1 = 2 =  1 6= 2 1 6= 2 where M2 , M3 or M4 imply that power oeÆ ients al ulated from signals that orrespond to di erent lithologies have di erent Lapla e distributions. This result ould be used for for lassi ation purposes. 5. MODEL COMPARISONS 5.1 The Intrinsi Bayes Fa tor

Denote by (i) M1 ; : : : ; Mk the models under onsideration. (ii) fi (z j i) the density of z under model Mi, i 2 i . (iii)  (i) the prior distribution of i and pi the prior probability of model Mi. (iv) mj (z ) the marginal density of z under Mj . The key to Bayesian model sele tion are the Bayes fa tors: R j fj (z j j )j (j )dj mj (z ) R Bji = = m i (z ) i fi(z j i)i (i)di

whi h an be ombined with the prior probabilities pi to get posterior probabilities for ea h model given the data 2

p(Mi j z ) = 41 +

X pj

j= 6 i pi

3

Bji(z )5

1

:

 A. Rodrguez, G. Alvarez and B. Sanso

6

In our ase we have k = 4 di erent models, ea h asso iated with one of the hypotheses presented in the previous se tion. The orresponding likelihoods, under the simplifying assumption of independen e, are

p hPnx xi  Pny yi  io 2 i  + i  P io  nx ny n p hP 2 ni x xi  + ni y yi  f (z j  ;  ;  ) = p  exp io  nx   ny n p hP nx xi  + Pny yi  p f (z j ;  ;  ) = p  exp 2 i i   P io  nx   ny n p hP nx xi  + ny yi  : p 2 f (z j  ;  ;  ;  ) = p  exp i i    

f1 (z j ;  ) = p12

2

1

3

4

1

1

2

1

1

2

2

2

nx +ny

exp

n

=1

=1

+

1

=1

2

1

1

2 1

2 2

1

1

2 1

2 2

=1

2

=1

=1

1

1

=1

1

=1

2

2

2

On the other hand, given the nature of our problem, we have very little prior knowledge about the power oeÆ ients. Thus, we assign equal prior probabilities to all models involved (that is, pi = 1=4 8 i) and use Je rey's priors for the parameters; these orrespond to 1 (;  ) / 1 , 2 (1 ; 2 ;  ) / 12 , 3 (; 1; 2 ) / 112 and:4 (1 ; 2 ; 1; 2 ) / 112 . However, Bayes fa tors are not well de ned when improper prior distributions are used. So, we resort to the Intrinsi Bayes Fa tor (IBF), whi h uses a minimal subset of the sample to train the prior distribution (making it proper) and the rest of the sample for model omparison. The dependen e on the training sample is eliminated by taking averages over all possible samples. A omplete des ription an be found in Berger and Peri

hi (1996). Let z (l) be the l-th minimal training sample (in our ase it is formed by 2 observations from x and 2 observations from y ) and let L be the total number of training samples. The Arithmeti Intrinsi Bayes Fa tor (AIBF) is de ned as

BjiAI (z ) = BjiN (z )

L 1X B N (z (l)) L l=1 ij

and the Geometri Intrinsi Bayes Fa tor (GIBF) is de ned as

BjiGI = BjiN (z )

L Y l=1

!1 L

BijN (z (l))

:

We hose the GIBF for our appli ation sin e it is oherent for multiple omparisons. 5.2 Predi tive distributions for the Lapla e likelihood In order to al ulate the Bayes fa tor we need to obtain the predi tive distributions for ea h of the models. These are provided by the following propositions. Their proves are rather te hni al and are omitted for reasons of spa e. Proposition 1.

The predi tive distribution for model M1 is: n (n 1) X m1 (z ) = w 2n i=0 i

Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data with

wi =

8 > >
> : (n + 1)

1

i 6= n2

1

Sin+11

z( n +1) z( n ) 2 2



1

S nn

i = n2

2

where z(i) denotes the i-th order statisti of z and Si = nz(n) P with z(k) = k 1 ki=1 z(k) and z(0) = 0. Proposition 2.

7

2iz(i) + 2iz(i) nz(n)

The predi tive distribution for model M2 is: ny

nx X (n 1) X m 2 (z ) = w 2n i=0 j =0 ij

with8

! > > 1 1 1 1 1 > > n 1 + Qn 1 > (2i nx )(2j ny ) > Qin+11;j +1 Qin+11;j Qi;j > +1 i;j3 > 2 0 1 > >   > > > > 1 A5 > y ny +1 y ny  4 2ni n1x  Qn1n > Qn ny < y 2 2 i; 2 i+1; 2 wij = h !# " i > > > 1 > x nx x( nx ) 2jn n1y Qnn1 > +1) Qnnx ( > 2 2 > x ;j ;j +1 > > 2  2  >  > > 1 >   > x( nx +1) x( nx ) Qn+1 n(n 1) y ny +1 y ny > > 2 2 2 2 : nx ; ny 2 2 y x x where Qi;j = Si + Sj , Si = nx x (nx) 2ix(i) + 2ix(i) nx x(nx) 2j y(j ) + 2jy(j ) ny y(ny ) . Proposition 3

i 6= n2x , j = n2y i = n2x , j 6= n2y i = n2x , j = n2y and Sjy = ny y(ny )

The predi tive distribution for model M3 is:

n (nx) (ny ) X m 3 (z ) = 2n i=0

where Rx () =

i 6= n2x , j 6= n2y

Pnx j =1

j

Z z i+1

zi P y xj and Ry () = nj =1

j

d

Rxnx ()Ryny ()

j  yj j .

A losed form expression of m3 (z ) an be obtained using partial fra tions, nevertheless, the numeri al error involved in solving the linear equations asso iated with partial fra tions for large nx and ny is larger than that of using numeri al integration, whi h is thus our preferred method. The predi tive distribution for model M4 an be written in terms of the predi tive distribution of M1 in the following way:

Proposition 4

m4 (z ) = m1 (x)m1 (y):

 A. Rodrguez, G. Alvarez and B. Sanso

8

5.3 Results Due to the sample sizes involved, the number of training samples L is in the order of 1012 for seismi tra es and 1014 for well logs. Sin e this makes the al ulations infeasible in terms of omputing time, we randomly hose 12,000 training samples. Table 2.

Model probabilities obtained using the GIBF

P (Mi j z)

Well Logs

Seismi Tra es

1 2 3 4

8:42  10 11 0:97 8:42  10 11 0:02

2:71  10 7 0:96 3:38  10 7 0:03

Table 2 shows the posterior probabilities of ea h model using the GIBF. The numbers learly indi ate that there is a di eren e between the lo ation of the distribution of power parameters obtained in gravel and those obtained in sand. We note that the

al ulation time using Fortran ode on a Pentium IV omputer was under 20 se onds for seismi tra es and under 6 minutes for well logs. The al ulations were repeated a dozen times to study its sensitivity to the training samples, showing no important hanges an thus no dependen e on the redu ed set of training samples employed. In reasing the number of training samples did not substantially hange the results. 6. BAYESIAN CLASSIFICATION The predi tive densities obtained in the previous se tion an be used to obtain a Bayesian lassi ation pro edure. Let w be a new sample of power oeÆ ients whose lithology is unknown. Using Bayes rule we an al ulate the probability of ea h type of lithology as:

P (sand) = where

m(w j x) m(w j x) + m(w j y)

P (gravel) =

m(w j y) = 1 P (sand) m(w j x) + m(w j y)

m (w ; y ) m1 (w; x) m(w j y) = 1 m1 (x) m1 (y) Then, we lassify the lithology of w as sand if P (sand) > P (gravel) and as gravel in the other ase. In order to quantify the mis lassi ation error we need to al ulate the distribution of a random variable, say B , de ned as m(w j x) =

B = log [m(W j X )℄ log [m(W j Y )℄ whi h depends on the distribution of X , Y and W . If we assume that there is a true model for the data, the following simpli ation holds:

Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data

9

Proposition 5. Let X = (X1 ; : : : ; Xnx ), Y = (Y1 ; : : : ; Yny ), W = (W1 ; : : : ; Wnw ) with Xi  La(Xi j 1 ; 1 ), Yi  La(Yi j 2; 2 ) and Wi  La(Wi j 1 ; 1 ). The distribution of B depends on 1 , 2 , 1 and 2 only through the value of

r=

1 2

jp  j :

d=

and

1

2

1 1 + r 2

1.0 0.0

0.5

Median 80% limit

Normalized difference between densities

1.5

The proof of this proposition an be found in the appendix. It an be seen that Proposition 5 holds not only for the lassi ation of Lapla e distributions but for any other lo ation-s ale family. We shall refer to d as the normalized distan e between densities. Obtaining an analyti al expression for the distribution of B as a fun tion of d and r is a diÆ ult task. However, for the purpose of our example we have used simulation to obtain random samples that orrespond to r = 1, whi h is the ase favoured by our IBF

al ulations, and to the same sample sizes as that of the seismi tra es. Plotting the median and 80% quantile of the mis lassi ation probability P (B < 0), we obtained what an be onsidered as a Bayesian variant of the ROC urve.

0.0

0.2

0.4

0.6

0.8

1.0

Missclasification probability

Figure 3.

Simulated response urse for lassi ation of seismi tra es with r = 1

The resulting urve is presented in Figure 3. From table 1 we see that d  0:65, whi h implies that our method produ es a negligible lassi ation error. 7. CONCLUSIONS We have used the Intrinsi Bayes Fa tor to ompare two Lapla e models in the absen e of prior knowledge for the parameters. Expli it al ulations of the terms involved have been performed, obtaining formulae for the marginals under all the onsidered models whi h an be used to obtain fast and a

urate results in a broad set of appli ations. Also, we presented results that justify the onstru tion of Bayesian ROC urves for the

lassi ation of samples in lo ation-s ale families.

 A. Rodrguez, G. Alvarez and B. Sanso

10

As an appli ation, we have provided theoreti al ba kground for the implementation of Bayesian methods in oil reservoir exploration problems, showing that there are substantial di eren es between the power oeÆ ients of both well logs and seismi tra es obtained in sand and in gravel. This empiri al fa t raises an interesting physi al theoreti al question. We also prove that, despite the redu ed sample sizes, lithology 

lassi ations based on seismi tra es instead of -ray logs are a

urate. Alvarez et al. (2001) have implemented similar ideas with great su

ess using BIC. REFERENCES

Addy, S. K. (1998). Neural network lassi ation method helps seismi interval interpretation. Oil and Gas, 47{58.  Alvarez, G., Sanso, B., Jimenez, J. R., Mi helena, R. (2001). Lithologi hara terization of a Reservoir Using Continuous Wavelet Transforms. Te h. Rep., CESMa, Universidad Simon Bolvar. 2001-03. Bag hy, U. and Hayya, J. C. and Ord, J. K. The Hermite distribution as a model of demand during lead time for slow-moving items. De ision S ien es 14, 447{466. Berger, J. O. and Peri

hi, L. R. (1996). The Intrinsi Bayes Fa tor for Model Sele tion and Predi tion. J. Roy. Statist. So . A 16, 109{122. Box, G. E. P. and Tiao, G. (1973). Bayesian Inferen e in Statisti al Analysis. New York: Wiley. Dumay, J. and Fournier, F., (1998). Multivariate statisti al analyses applied to seismi fa ies re ognition. Geophysi s 53, 1151{1159. Easterling, R. G. (1978). Exponential responses with double exponential measurement error-A model for steam generator inspe tion. Pro eedings of the DOE Statisti al Symposium, U.S Department of Energy, 90{110. Jimenez, J. R.,Peinado, A. and Mi helena, R. (1999). Fa ies re ognition using wavelet based fra tal analysis on ompressed seismi data. Te h. Rep., Intevep, Venezuela. Katul, G., Vidakovi , B. and Albertson, J. (1999). Estimating global and lo al s aling exponents in turbulent ows using ontinuous wavelet transformations. Te h. Rep., ISDS, Duke University. 24. Manly, B. (1976). Some examples of double exponential tness fun tions. Heredity 36, 229{234. Vidakovi , B. (1999). Statisti al Modeling by Wavelets.New York: Wiley.

APPENDIX 1. PROOF OF PROPOSITION 5 Proposition 5 is a straight orollary the following theorem: 

Let f (x j ;  ) = 1 g x   for a given density g , that is, f belongs to a lo ation-s ale family. Let Xi  f (Xi j ;  ), Yi  f (Yi j  + d; r ) and Wi  f (Wi j ;  ) for r and d xed and Xi ; Yi ; Wi  g . Let B = log [m(W j X )℄ log [m(W j Y )℄ and B  = log [m(W  j X  )℄ log [m(W  j rY  + d)℄ where m() denotes the predi tive density. Then B and B  have the same distribution. Proof. Noti e that Xi ; Yi ; Wi are standarised versions of Xi; Yi; Wi . Thus we have Theorem 1

the following relationships for the predi tive densities  nw    Z nx 1Y 1 Wi  1 Xi  Y dd  n g g  i=1     1 w i=1 m(W j X ) = = m(W  j X  )   Z nx Y r 1 1 Xi  dd g  i=1  

Obje tive Bayesian Comparison of Lapla e Samples from Geophysi al Data Z

m(W j Y ) =

ny



1 Y 1 Yi  g  i=1   Z

ny Y

 nw Y





1 Yi  1 dd g  i=1   



1 Wi  dd g   i=1



11

=

  nw

1 r

m(W  j rY  + d)



m(W j X ) j X) thus mm((W W j Y ) = m(W  j rY +d) . The former implies that the moment generating fun tions of B and B are equal, in fa t

B (t) = Z 

Z 



m(W j X ) t f (X ; Y ; W )dX dY dW m(W j Y ) 

m(W  j X  ) t g (X  ; Y  ; W  )dX  dY  dW  = B  (t); = m(W  j rY  + d) whi h implies that both variables follow the same distribution. ACKNOWLEDGEMENTS We spe ially thank Reinaldo Mi helena and Juan Ramon Jimenez from INTEVEP for their support and ollaboration. This resear h was partially funded by grant G97000592 of CONICIT, Venezuela.

Suggest Documents