arXiv: math.PR/0000000

Max-linear models on directed acyclic graphs Nadine Gissibl∗ and Claudia Klüppelberg

arXiv:1512.07522v1 [math.PR] 23 Dec 2015

Technical University of Munich Abstract: We consider a new structural equation model in which all random variables can be written as a max-linear function of their parents and independent noise variables. For the corresponding graph we assume that it is a directed acyclic graph. We show that the model is max-linear and detail the relation between the weights of the structural equation model and the max-linear coefficients. We characterize all max-linear models which are generated by this structural equation model. This leads to the presentation of a max-linear structural equation model as the solution of a fixed point equation and to a unique minimal DAG describing the relationships between the variables. The model structure introduces an order between the random variables, which yields certain model reductions, represented by subgraphs of the DAG which we call order DAGs. This results also in a reduced form for the regular conditional distributions compared to previous representations. Primary 60G70, 60E15, 05C20; secondary 05C75. Keywords and phrases: directed acyclic graph, graphical model, max-linear model, regular conditional distribution, structural equation model.

1. Introduction We define a graphical model as a pair (D, L(X)), where the joint probability distribution L(X) of the random vector X = (X1 , . . . , Xd ) is Markov relative to the directed acyclic graph (DAG) D (cf. [6], Chapter 3.2). In the literature such models are also called Bayesian networks (cf. [5]). Structural equation models (SEMs) offer a possibility to construct such graphical models (cf. [8, 11]). More precisely, for (measurable) functions fi and sets pa(i) ⊆ {1, . . . , d} ∖{i}, the parents of node i, define Xi = fi (Xpa(i) , Zi ),

i = 1, . . . , d,

(1.1)

where the noise variables Z1 , . . . , Zd are jointly independent. The corresponding graph D = (V, E) of the SEM (1.1) with node set V and edge set E is obtained by drawing directed edges from each variable Xk for k ∈ pa(i) to Xi . We require the resulting graph D to be a DAG. Throughout the paper we identify the vector X with the set of nodes V = {1, . . . , d} and write k → i if there is an edge from k to i. From Theorem 1.4.1 of [8] we know that the joint distribution L(X) is uniquely defined by the distribution of (Z1 , . . . , Zd ) and, denoting by nd(v) the non-descendants of node i, Xi á Xnd(i)/pa(i) ∣ Xpa(i)

(1.2)

for all i ∈ V ; i.e., the distribution of X is Markov relative to D, and (D, L(X)) is a graphical model. Particular attention has been attracted by linear or Gaussian SEMs resulting in a graphical model (see for example [5, 9, 10]). Our focus is not on sums, but on maxima. Consequently, we introduce a SEM, which is to the best of our knowledge new, defined as Xi =

i i ⋁ ck X k ∨ ci Z i ,

i = 1, . . . , d,

(1.3)

k∈pa(i)

where Z1 , . . . , Zd are independent continuous random variables with supp(Zi ) = R+ ∶= (0, ∞) and cik > 0 for all i = 1, . . . , d and k ∈ Pa(i) ∶= pa(i) ∪ {i}; the corresponding graph D = (V, E) is again required to ∗ Support from the TUM Graduate School’s International School of Applied Mathematics (ISAM) at Technische Universität München is gratefully acknowledged. Address: Center for Mathematical Sciences, Boltzmannstrasse 3, 85748 Garching, Germany, e-mail: [email protected]; [email protected]; url: http://www.statistics.tum.de

1

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

2

be a DAG. We call the resulting graphical model (D, L(X)) max-linear graphical model (ML graphical model). This model is motivated by applications to risk anlaysis, where extreme risks play an essential role and may propagate through a network. In such a risk setting we may think of the weights cik as relative quantities, such that a risk may origin with certain proportions in its different ancestors. In this paper we investigate distributional and graph properties of a ML graphical model (D, L(X)). We show that X can be written as a max-linear function of the noise variables and detail the relation between the weights of the representation (1.3) and the max-linear coefficients, which are determined by a path analysis of the DAG D. We also characterize all max-linear models which give rise to a ML graphical model. This leads to the presentation of X as the solution of a fixed point equation. It is a simple but important observation that a large noise variable Zi may have a lasting influence on the descendants of node i. Furthermore, since the max-linear coefficients are defined via maxima of weights over different paths, there exist paths which are relevant and we call them max-weighted, since they carry these maxima, and others that can be disposed of without changing the model. We present a DAG DB with minimal number of edges such that (DB , L(X)) is a ML graphical model. The size of the noise variables in (1.3) gives rise to a certain order between the components of the vector X. In particular, if the same noise variable determines the maximum for different nodes, the component variables realize (up to scaling) the same values. Likewise one scaled component can be strictly smaller or larger than another. Possible equalitites or strict inequalities between components of X decompose the sample space Ω in finitely many subspaces, each of which may give rise to a new DAG. It is a natural consequence that a restriction of the sample space given by a specific order between components leads to a complexity reduction of the model. We translate this into a new concept of order sets and order DAGs, which leads to an almost sure representation of some node by a minimal number of ancestors of this node. The order DAG corresponding to the restriction of the sample space by some specific order in a subset O of nodes has itself the node set consisting of O and its ancestors. As edge set we choose all edges, which are relevant with regard to the specific order and are on a max-weighted path in the original DAG D. This concept leads to reduced representations of certain nodes, even to almost sure prediction of unknown nodes. Our paper is organized as follows. In Section 2 we investigate our new ML graphical model in detail and provide in particular necessary conditions on max-linear models to give rise to a ML graphical model (D, L(X)). We also introduce the notion of a max-weighted path, and study its consequences for D. In Section 3 we characterize those max-linear models which give rise to a ML graphical model. Here we also present the DAG DB with minimal number of edges such that (DB , L(X)) is a graphical model with unique representation (1.3). Section 4 is devoted to structural properties of X like bounds for some component, information given by lowest and highest max-weighted ancestors. This leads to minimal representations of X with respect to subsets of nodes of the DAG. In Section 5 we use these results extensively, when knowing (observing) parts of the DAG and show that they may reduce the complexity of the model considerably. The concept of order sets and order DAGs introduced in this section is based on specific orders between components of the vector X and yields model reductions as well as almost sure prediction of some unknown (non-observed) nodes in the DAG. Finally, in Section 6 we compute regular conditional distribution functions of parts of the vector X conditioned on the observation of some part of the vector making extensive use of this complexity reduction. Throughout we illustrate our findings with examples. Notation: We will use the following notation throughout. For a node i ∈ V the sets an(i), pa(i), and de(i) denote the ancestors, parents, and descendants of i with respect to D. Furthermore, we use the notation An(i) ∶= an(i) ∪ {i}, Pa(i) ∶= pa(i) ∪ {i}, and De(i) ∶= de(i) ∪ {i}. For a set U ⊆ V of nodes we extend this notation in a natural way by writing an(U ) = ⋃i∈U an(i), An(U ) ∶= an(U ) ∪ U , and so on. We write U ⊆ V for a non-empty subset U of nodes. For such a subset we then define XU = (Xi , i ∈ U ). In general, we consider statements for i ∈ ∅ as invalid. Moreover, for arbitrary (possibly random) ai ≥ 0

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

3

we set ⋁i∈∅ ai = 0 and ⋀i∈∅ ai = ∞. Occasionally, some DAG D = (V, E) is well-ordered, which means that the set V = {1, . . . , d} of nodes is linearly ordered in a way compatible with D such that j ∈ pa(i) implies j < i. 2. Distributional and graph properties of a max-linear graphical model Let (D, L(X)) be a ML graphical model. The random vector X can be written as a max-linear function of the noise variables. We explain this first for an example, which will appear again later in the paper. Example 2.1. [Every random vector corresponding to a ML graphical model has a max-linear representation] Consider a ML graphical model (D, L(X1 , . . . , X4 )) with D = ({1, 2, 3, 4}, {(1, 2), (1, 3), (2, 4), (3, 4)}) depicted below. We obtain for the random variables X1 , X2 , X3 and X4 : X1 = c11 Z1 X2 = X3 = X4 = = =

1

c21 X1 ∨ c22 Z2 = c21 c11 Z1 ∨ c22 Z2 c31 X1 ∨ c33 Z3 = c31 c11 Z1 ∨ c33 Z3 c42 X2 ∨ c43 X3 ∨ c44 Z4 c42 (c21 c11 Z1 ∨ c22 Z2 ) ∨ c43 (c31 c11 Z1 ∨ c33 Z3 ) ∨ c44 Z4 (c42 c21 c11 ∨ c43 c31 c11 )Z1 ∨ c42 c22 Z2 ∨ c43 c33 Z3 ∨ c44 Z4 .

2

3

4

The right hand side of this representation of X is called max-linear; a formal definition is given in (2.4). We summarize the weights of the noise variables in a matrix B: ⎡c1 ⎢ 1 ⎢ ⎢0 B=⎢ ⎢0 ⎢ ⎢ ⎢0 ⎣

c11 c21 c22 0 0

c11 c31 0 c33 0

c11 c21 c42 ∨ c11 c31 c43 ⎤ ⎥ ⎥ ⎥ c22 c42 ⎥. ⎥ 3 4 c3 c3 ⎥ ⎥ 4 ⎥ c4 ⎦

Note that B is upper triangular, since the DAG D is well-ordered.



We provide a general method to calculate the max-linear representation of X by a path analysis. A (directed) path in D from j to i is a sequence of distinct nodes in V such that j = k0 → k1 → . . . → kn−1 → kn = i for some n ∈ N. We write such a path p = [j ⇒ i] = [j = k0 → k1 → ⋯ → kn−1 → kn = i] and denote by Pji the set of all paths from j to i. Theorem 2.2. Let (D, L(X)) be a ML graphical model. Define for a path p = [j = k0 → k1 → ⋅ ⋅ ⋅ → kn = i] in D the constants n−1

n−1 kn ckn−1 = ckk00 ∏ ckkl+1 dpji ∶= ckk00 ckk10 . . . ckkn−2 l

(2.1)

l=0

and set bji = ⋁ dpji ,

bii = cii

for all j ∈ an(i) and i = 1, . . . , d

(2.2)

p∈Pji

and all other bji = 0. Then Xi =

⋁ j∈An(i)

bji Zj ,

i = 1, . . . , d.

(2.3)

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

4

Proof. First note that wlog we can assume that the DAG D = (V, E) is well-ordered. We prove the identity (2.3) by induction on the number of nodes of D. For d = 1 we have X1 = c11 Z1 = b11 Z1 , such that both representations are equal provided that b11 = c11 . Assume that (2.3) holds for a ML graphical model (D, L(X)) of dimension d; i.e., Xk =

ckj Xj ∨ ckk Zk =

⋁ j∈pa(k)

bjk Zj =



p k ⋁ djk Zj ∨ ck Zk ,



k = 1, . . . , d.

j∈an(k) p∈Pjk

j∈An(k)

Now consider a well-ordered DAG D corresponding to a ML graphical model with d + 1 nodes, and note that for i = 1, . . . , d we have (d + 1) ∈/ pa(i). In order to verify (2.3) for the nodes i = 1, . . . , d, it is therefore sufficient to consider the subgraph D[{1, . . . , d}] = ({1, . . . , d}, E ∩ ({1, . . . , d} × {1, . . . , d})). Due to the induction hypothesis, (2.3) holds for D[{1, . . . , d}] and therefore also for D. So we can use this hypothesis and (A.1) to obtain Xd+1 =



d+1 cd+1 k Xk ∨ cd+1 Zd+1

k∈pa(d+1)

=





d+1 p ⋁ ck djk Zj ∨

k∈pa(d+1) j∈an(k) p∈Pjk

=

⋁ j∈an(d+1)

(

k d+1 cd+1 k ck Zk ∨ cd+1 Zd+1

⋁ k∈pa(d+1)





k∈de(j)∩pa(d+1) p∈Pjk

p cd+1 k djk



⋁ k∈pa(d+1)∩{j}

k d+1 cd+1 k ck )Zj ∨ cd+1 Zd+1 .

For some path p from j to d + 1 of the form [j ⇒ k → d + 1] for some k ∈ de(j) ∩ pa(d + 1), we have [j→d+1] dpj,d+1 = dpjk cd+1 = cjj cd+1 j . Observe that all paths k , and for some edge [j → d + 1] we must have dj,d+1 from some j ∈ an(d + 1) to d + 1 satisfy one of these two presentations. This yields Xd+1 =





j∈an(d+1) p∈Pj,d+1

dpj,d+1 Zj ∨ cd+1 d+1 Zd+1 =



bj,d+1 Zj ,

j∈An(d+1)

where we have set bd+1,d+1 = cd+1 d+1 and bj,d+1 =



p∈Pj,d+1

dpj,d+1 for all j ∈ an(d + 1).

In the following we study the random vector X as a max-linear model with specific properties induced by the graph D. We call a random vector X max-linear if there exist independent continuous random variables Z1 , . . . , Zd with supp(Zi ) = R+ ∶= (0, ∞) and a matrix B = (bij )d×d , which we call max-linear coefficient matrix (ML coefficient matrix), with non-negative entries such that d

Xi = ⋁ bji Zj ,

i = 1, . . . , d,

(2.4)

j=1

(for background on max-linear models in the context of extreme value theory see for example [2], Chapter 6). Max-linear models can be linked in a natural way to ML graphical models. Proposition 2.3. Let X be generated by a SEM (1.1) whose corresponding graph is a DAG D = (V, E), and assume that X is max-linear. Then the components of X satisfy (1.3) and (D, L(X)) is a ML graphical model. Proof. For i ∈ V we substitute the parents of Xi by their corresponding representation (1.1) and proceed recursively. After finitely many steps we obtain Xi = gi (ZAn(i) ) for appropriate functions gi . Comparing Xi = gi (ZAn(i) ) and the max-linear representation Xi = ⋁dj=1 bji Zj yields that we may set bji = 0 if j ∈/ An(i) and, hence, Xi = ⋁j∈An(i) bji Zj . We therefore have the two representations Xi =

⋁ j∈An(i)

bji Zj = fi (Xpa(i) , Zi ).

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

5

Since for every k ∈ pa(i) we have Xk = bj(k)k Zj(k) for some j(k) ∈ An(k), we obtain Xi =

⋁ j∈An(i)

bji Zj = fi (Xpa(i) , Zi ) = hi (Z{j(k)∶k∈pa(i)} , Zi )

for appropriate functions hi , where the change from fi to hi takes care of the coefficients bj(k)k for k ∈ pa(i). Thus we can write the max-linear representation as Xi =

⋁ bj(k)i Zj(k) ∨ bii Zi k∈pa(i)

which is (1.3) with cik =

bj(k)i bj(k)k

=

bj(k)i bj(k)k Zj(k) ∨ bii Zi b k∈pa(i) j(k)k

=

bj(k)i Xk ∨ bii Zi , k∈pa(i) bj(k)k





and cii = bii .

For what follows we need some graph theoretical notions. The reachability matrix R = (rij )d×d of a DAG D is the matrix ⎧ ⎪ ⎪1, if there is a path from j to i, rji = ⎨ ⎪ ⎪ ⎩0, otherwise. If rji = 1, we say i is reachable from j, and we set for practical reasons rii = 1 for i = 1, . . . , d. By Theorem 2.2 we have for the ML coefficient matrix B = (bij )d×d corresponding to a ML graphical model that bji ≠ 0 if and only if i is reachable from j or, equivalently, if and only if j ∈ An(i). Remark 2.4. Let (D, L(X)) be a ML graphical model. (i) Between the ML coefficient matrix B of X and the reachability matrix R of D the following relation holds: R = sgn(B), where we define equality of matrices componentwise. Hence, the ML coefficient matrix B is a weighted reachability matrix of the DAG D. (ii) (ii) Moreover, if the DAG D is well-ordered, then R is an upper triangular matrix. ◻ In the following we investigate further the relationship between the weights in (1.3) and the ML coefficient matrix B = (bij )d×d in (2.3). To this end we define the following matrix operation ⊙ ∶ Rp1 ×d × Rd×p2 → Rp1 ×p2 ,

(F, G) ↦ F ⊙ G =∶ H

d

such that

⋁ fik gkj ↦ hij .

(2.5)

k=1

Denote F ⊙0 = idd×d and F ⊙n = F ⊙(n−1) ⊙ F for n ≥ 1. Theorem 2.5. Let (D, L(X)) be a ML graphical model with weights cik for i = 1, . . . , d and k ∈ Pa(i) and ML coefficient matrix B = (bij )d×d . Define the matrices D = (dij )d×d ∶= diag(cii , i = 1, . . . , d),

D0 = (d0,ji )d×d ∶= (cij 1([j → i]))d×d ,

and D1 = (d1,ji )d×d ∶= (cjj cij 1([j → i]))d×d . Then the matrix B has representation B=D

for d = 1

d−2

and

B = D ∨ ⋁ (D1 ⊙ D0⊙k ) k=0

where ∨ denotes componentwise maxima.

for d ≥ 2,

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

6

Proof. For d = 1 we know from (2.2) that b11 = c11 . Hence, B = D. Now assume that d ≥ 2. We denote by Pjin the set of all paths of length n ∈ N (defined by n edges) from j to i. Recall from (2.1) and (2.2) that the coefficients bji are defined via maxima over the different paths from j to i. Splitting up this operation into maxima over paths of fixed length n we obtain ∞

bji = ⋁ ⋁ dpji ,

j ∈ an(i).

n n=1 p∈Pji

First, we show that if there exist paths of length n from j to i the ji-th component of the matrix D1 ⊙ D0⊙n−1 equals the maximal weight of these paths, otherwise it is zero. For an edge p = [j → i], which is the only path of length n = 1, we obtain dpji = cjj cij . Observe that the ji-th component of the matrix D1 ⊙D0⊙0 = D1 ⊙idd×d = D1 is given by d1,ji = cjj cij 1([j → i]). So the statement is true for n = 1. We proceed by induction on n ∈ N, assuming that the statement is true for n. For a path p = [j → k1 → . . . → kn → i] from j to i of length n + 1 for arbitrary nodes k1 , . . . , kn we obtain dpji = cjj ckj 1 . . . ckknn−1 cikn and, therefore, p ⋁ dji =

n+1 p∈Pji

⋁ {k1 ,...,kn ∶[j→k1 →...→kn →i]}

cjj ckj 1 . . . ckknn−1 cikn .

Let dn,ji and dn+1,ji be the ji-th component of D1 ⊙D⊙(n−1) and D1 ⊙D⊙n , respectively. Since D1 ⊙D0⊙n = (D1 ⊙ D⊙(n−1) ) ⊙ D0 , we have dn+1,ji = ⋁dk=1 dn,jk d0,ki = ⋁dk=1 dn,jk cik 1([k → i]). Assuming the existence of a path of length n from j to k and an edge [k → i], we obtain from the induction hypothesis that dn,jk d0,ki = dn,jk cik =

cjj ckj 1 . . . ckkn−1 cik ,

⋁ {k1 ,...,kn−1 ∶[j→k1 →...→kn−1 →k→i]}

otherwise we have dn,jk d0,ki = 0. In particular, all paths from j to i of length n+1 are of this form for some node k. Thus, if there exist paths of length n + 1 from j to i, the ji-th component dn+1,ji = ⋁dk=1 dn,jk d0,ki of D1 ⊙ D⊙n is indeed equal to the maximal weight of these paths, otherwise it is zero. Noting that due to the acyclicity a path in a DAG with d nodes does not contain more than d − 1 d−2 edges we obtain that for j ∈ an(i) the ji-th component of ⋁k=0 D1 ⊙ D0⊙k equals the maximal weight of all paths from j to i, otherwise it is zero. Recall from (2.2) that bji = 0 if j /∈ An(i) and bii = cii . This is taken care of by the diagonal matrix D. Thus the ML coefficient matrix B is given by B = D ∨ D1 ∨ (D1 ⊙ D0 ) ∨ (D1 ⊙ D0⊙2 ) ∨ ⋅ ⋅ ⋅ ∨ (D1 ⊙ D0

⊙(d−2)

).

The definition of the matrix operation ⊙ in (2.5) modifies and extends the definition given in Section 2.1 of [12]. For some ML graphical model (D, L(X)) with ML coefficient matrix B = (bij )d×d summarizing the noise variables of X into the row vector Z = (Z1 , . . . , Zd ), the matrix operation reduces to d

X⊺ = Z ⊙ B = ( ⋁ bji Zj , i = 1, . . . , d) = ( j=1



bji Zj , i = 1, . . . , d).

j∈An(i)

The coefficients of the max-linear representation (2.3) of X are determined as maxima of products along all paths from j to i (cf. (2.1) and (2.2)). This suggests the following definition. Definition 2.6. Let (D, L(X)) be a ML graphical model with ML coefficient matrix B = (bij )d×d . For j ∈ an(i) and i ∈ V we call a path p = [j = k0 → k1 → ⋯ → kn = i] from j to i max-weighted if kl+1 bji = ckk00 ∏n−1 = dpji . Obviously, there may exist several max-weighted paths from j to i, and we l=0 ckl denote by Pjimw the set of all max-weighted paths from j to i. ◻ Remark 2.7. Assume the situation of Definition 2.6. Let i ∈ V and j ∈ An(i). Then following statements are immediate consequences from the definition of max-weighted paths and (2.1). (i) A path p from j to i is max-weighted if and only if bji = dpji . (ii) A path p from j to i is not max-weighted if and only if bji > dpji .

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

7

(iii) Let p ∈ Pjimw . Then every sub-path of p is also max-weighted. (iv) Let l1 , l2 ∈ V and p ∈ Pjimw such that p contains some sub-path p1 from l1 to l2 . Then the path which results from p by replacing p1 by some p2 ∈ Plmw is also max-weighted. 1 l2 ◻ Remark 2.7(i) indicates that for i ∈ V the max-linear representation Xi = ⋁j∈An(i) bji Zj only depends on the weight bii and the weights along max-weighted paths from ancestors of i to i. We discuss Example 2.1 in the light of Definition 2.6 and this obervation. Example 2.8. [Continuation of Example 2.1: max-weighted paths] mw = {[1 → 2 → Assume that b14 = b12b22b24 > b13b33b34 , which is equivalent to c11 c21 c42 > c11 c31 c43 . Thus we have P14 4]}; i.e., the only max-weighted path from 1 to 4 contains node 2, and there is no max-weighted path which contains node 3. Observe from this that the DAG ({1, 2, 3, 4}, {(1, 2), (2, 4), (3, 4)}) completely determines the max-linear representation of X4 = b14 Z1 ∨ b24 Z2 ∨ b34 Z3 ∨ b44 Z4 . ◻ This important observation motivates the following. Definition 2.9. Let (D, L(X)) be a ML graphical model, and let A ⊆ V . (a) For some i ∈ V we call an ancestor j of i max-weighted by A if there exists a max-weighted path from j to i which contains a node of A. We denote by anA mw (i) the set of the ancestors of i which are max-weighted by A. By anA (i) we further denote the set of the ancestors of i which are not nmw A A max-weighted by A; i.e., annmw (i) = an(i) ∖ anmw (i). (b) For some j ∈ V we call a descendant i of j max-weighted by A if there exists a max-weighted path from j to i which contains a node of A. We denote by deA mw (j) the set of the descendants of j which A are max-weighted by A. By denmw (j) we further denote the set of the descendants of j which are A not max-weighted by A; i.e., deA nmw (i) = de(i) ∖ demw (i). ◻ Obviously, if i ∈ A, then all ancestors and descendants of i are max-weighted by A; i.e., we have A anA mw (i) = an(i) and demw (i) = de(i). The following auxiliary result presents properties of the coefficients bji corresponding to the max-linear representation (2.3) with regard to Definition 2.9(a). We will need them throughout the paper. Lemma 2.10. Let (D, L(X)) be a ML graphical model with ML coefficient matrix B = (bij )d×d . Let A ⊆ V , and let i ∈ V and j ∈ an(i). Then j ∈ anA mw (i) if and only if bji =

bjk bki , k∈De(j)∩A∩An(i) bkk

(2.6)

bji >

bjk bki . k∈De(j)∩A∩An(i) bkk

(2.7)



and j ∈ anA nmw (i) if and only if ⋁

Proof. Denote by Pji,A the set of all paths from j to i which pass through at least one node of A. p Observe from (2.2), Definition 2.6, and Definition 2.9(a) that j ∈ anA mw (i) if and only if bji = ⋁p∈Pji,A dji p and j ∈ anA nmw (i) if and only if bji > ⋁p∈Pji,A dji . In analogy to Pji,A we define Pji,{k} as the set of paths from j to i which pass through k ∈ V . Since every path from j to i in Pji,A contains some k ∈ A, we obtain by (2.2), p ⋁ dji =

p∈Pji,A





k∈De(j)∩A∩An(i) p∈Pji,{k}

dpji =

⋁ k∈{j}∩A

bki ∨





k∈de(j)∩A∩an(i) p∈Pji,{k}

dpji ∨



bjk .

k∈A∩{i}

For k ∈ de(j) ∩ A ∩ an(i) we can decompose every path from j to i containing k into two successive parts, the first one from j to k, and the second from k to i. By (2.1) and (2.2) this implies that ⋁ p∈Pji,{k}

dpji =

1 ( ⋁ dp )( ⋁ dp ). ckk p∈Pjk jk p∈Pki ki

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

8

Thus we obtain by (2.2), ⋁



k∈de(j)∩A∩an(i) p∈Pji,{k}

dpji =

bjk bki 1 . ( ⋁ dpjk )( ⋁ dpki ) = ⋁ k c p∈Pki k∈de(j)∩A∩an(i) k p∈Pjk k∈de(j)∩A∩an(i) bkk

bki ∨

bjk bki bjk bki ∨ ⋁ bjk = , ⋁ k∈de(j)∩A∩an(i) bkk k∈A∩{i} k∈De(j)∩A∩An(i) bkk



Then altogether we have p ⋁ dji =

p∈Pji,A

⋁ k∈{j}∩A



and (2.6) as well as (2.7) for j ∈ anA nmw (i) follow immediately. For some ML graphical model (D, L(X)) let X be defined on the probability space (Ω, A, P). For ω ∈ Ω and i ∈ V we denote by j(i) the node of An(i) such that Xi (ω) = bj(i)i Zj(i) (ω).

(2.8)

Note that we have already used this notation in the proof of Proposition 2.3. Obviously, j(i) depends on ω and is therefore random. In the following we present some properties of the node j(i) and their consequences for the representation of X. We show that equality between appropriately scaled components may occur with positive probability, although the corresponding noise variables are continuous. In the context of max-stable processes Dombry et al. [3] call these events extremal concurrent. In their Example 2 the extremal concurrence probabilities for max-linear models with unit Fréchet noise variables are calculated. Theorem 2.11. Let (D, L(X)) be a ML graphical model with ML coefficient matrix B = (bij )d×d . Denote by (Ω, A, P) the probability space on which X is defined. (a) For i ∈ V the node j(i) is P−a.s. unique. (b) For i1 , i2 ∈ V there exists some a ∈ R+ such that the equality Xi1 = aXi2 holds with positive probability bji1 for some j ∈ An(i1 ) ∩ An(i2 ) and P− a.s. if and only if An(i1 ) ∩ An(i2 ) ≠ ∅. In this case, a = bji 2 on {ω ∈ Ω ∶ Xi1 (ω) = aXi2 (ω)}, j(i1 ) = j(i2 ) ∈ M ∶= {l ∈ An(i1 ) ∩ An(i2 ) ∶ a =

bli1 }; bli2

i.e., Xi1 = ⋁l∈M bli1 Zl and Xi2 = ⋁l∈M bli2 Zl . Proof. (a) Since Xi = ⋁j∈An(i) bji Zj and the noise variables are independent and continuous, no two of the scaled noise variables can be equal with positive probability such that the maximum has to be taken P−a.s. for a unique node of An(i). (b) For a ∈ R+ assume that Ω1 = {ω ∈ Ω ∶ Xi1 (ω) = aXi2 (ω)} is non-empty. Then we have on Ω1 that Xi1 =

bli1 Zl =





abli2 Zl = aXi2 .

l∈An(i2 )

l∈An(i1 )

Observe from this that as in (a), P(Ω1 ) > 0 may only hold if An(i1 ) ∩ An(i2 ) ≠ ∅ and bji1 = abji2 for some j ∈ An(i1 ) ∩ An(i2 ). b 1 To prove the converse assume that An(i1 ) ∩ An(i2 ) ≠ ∅ and set a = bji for some j ∈ An(i1 ) ∩ An(i2 ). ji2 Note that j ∈ M and, hence, M ≠ ∅. We obtain on Ω1 that ⋁

bli1 Zl = ⋁ bli1 Zl ∨

l∈An(i1 )∖M

l∈M

Xi1 = ⋁ bli1 Zl ∨ l∈M

bji1 bji2



bli2 Zl = aXi2 .

(2.9)

l∈An(i2 )∖M

This holds for instance on Ω2 = {ω ∈ Ω ∶ ⋁ bli1 Zl (ω) > l∈M

⋁ l∈An(i1 )∖M

bli1 Zl (ω) ∨

⋁ l∈An(i2 )∖M

abli2 Zl (ω)}.

Since the noise variables Zi are independent continuous random variables with supp(Zi ) = R+ for i = 1, . . . , d, we know that P (Ω2 ) > 0 and, as Ω2 ⊆ Ω1 , also P (Ω1 ) > 0. Furthermore, we see from (2.9) that, again as in (a), Xi1 and Xi2 are realized P-a.s. on Ω1 for the same (unique) node in M ; i.e., j(i1 ) = j(i2 ) ∈ M and Xi1 = ⋁l∈M bli1 Zl as well as Xi2 = ⋁l∈M bli2 Zl .

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

9

Corollary 2.12. For i ∈ V and k ∈ pa(i) the equality Xi = cik Xk holds with positive probability if and only if the edge [k → i] is a max-weighted path from k to i, which is equivalent to bki = ckk cik . bji bjk ckk cik ,

Proof. By Theorem 2.11(b) the equality Xi = cik Xk holds with positive probability if and only if cik =

for some j ∈ An(k) ∩ An(i) = An(k). We show that this is equivalent to bki = ckk cik . f If bki = then j = k, as k ∈ An(k) and bkk = ckk . To prove the converse assume that bji = bjk cik for some j ∈ An(k) ∩ An(i) = An(k). We find from Definition 2.6 that in this case there exists a max-weighted path {k} from j to i consisting of a max-weighted path from j to k and the edge [k → i] and, hence, j ∈ anmw (i). bjk bki i i k i Then by (2.6) we have bji = bkk and, thus, as bji = bjk ck , bki = bkk ck = ck ck . 3. Characterization of a max-linear graphical model In Theorem 2.2 we have started with a ML graphical model (D, L(X)) and have derived the max-linear representation of X. In this section, for a given max-linear model (2.4), we investigate under which conditions this model has a representation as in (1.3) or, equivalently, when gives this model rise to a ML graphical model. First, recall from Remark 2.4(i) that a DAG corresponding to a ML graphical model with ML coefficient matrix B = (bij )d×d has reachability matrix R = sgn(B). Thus this condition has to be satisfied in any case. Before we discuss further conditions we give an example. Example 3.1. [Not every ML model gives rise to a ML graphical Consider for d = 3 the max-linear model ⎡b11 b12 ⎢ ⎢ X = Z ⊙ B with B = ⎢ 0 b22 ⎢ ⎢0 0 ⎣

model] b13 ⎤ ⎥ ⎥ b23 ⎥ ⎥ b33 ⎥ ⎦

such that bij > 0 for i ≤ j. Thus B is a weighted reachability matrix corresponding to the DAG given by 1

D = ({1, 2, 3, }, {(1, 2), (1, 3), (2, 3)}). 2

3

Assume that (D, L(X)) is a ML graphical model. Thus we have X1 = b11 Z1 = c11 Z1 X2 = b12 Z1 ∨ b22 Z2 = c21 X1 ∨ c22 Z2 = c21 b11 Z1 ∨ c22 Z2 X3 = b13 Z1 ∨ b23 Z2 ∨ b33 Z3 = c31 X1 ∨ c32 X2 ∨ c33 Z3

= c31 b11 Z1 ∨ c32 (b12 Z1 ∨ b22 Z2 ) ∨ b33 Z3 = (c31 b11 ∨ c32 b12 )Z1 ∨ c32 b22 Z2 ∨ c33 Z3 .

Hence, c11 = b11 ,

c21 =

b12 , b11

c22 = b22 ,

c32 =

b23 , b22

c33 = b33 ,

and b13 = c31 b11 ∨ c32 b12 = c31 b11 ∨

b12 b23 . b22

In order to find c31 we consider the three possible cases: (1) If b13 < b12b22b23 , then there is no DAG consistent with B; i.e., B is not a ML coefficient matrix corresponding to a ML graphical model. is consistent with D, and B is a ML coefficient matrix corresponding (2) If b13 > b12b22b23 , then c31 = bb13 11 to a ML graphical model. 13 is consistent with D. This indicates that the edge [1 → 3] is (3) If b13 = b12b22b23 , then every c31 < bb11 irrelevant and can be dropped. Thus (D1 , L(X)) with D1 = ({1, 2, 3}, {(1, 2), (2, 3)}) is also a ML graphical model, and the matrix B is also a ML coefficient matrix corresponding to a ML graphical model.

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

10

◻ Observe that in Example 3.1 the DAG ({1, 2, 3, }, {(1, 2), (2, 3)}) is the smallest DAG (with respect to the number of edges) such that sgn(B) is its reachability matrix. We give the general definition of this specific DAG, see for example Aho et al. [1]. Definition 3.2. Let D = (V, E) be a DAG. A graph Dtr = (V, E tr ) is a transitive reduction of D if (a) for all i, j ∈ V , Dtr has a path from j to i if and only if D has a path from j to i, and (b) there is no graph with fewer edges than Dtr satisfying condition (a).



Since we work with finite DAGs throughout, the transitive reduction is unique and is also a subgraph of the original DAG. For a finite DAG it is therefore the same as the so-called minimum equivalent graph (see for example Moyles and Thompson [7]). The transitive reduction Dtr of a DAG D can be computed from the original graph by successively examining the edges of D, in any order, and deleting those edges which are redundant, where an edge [j → i] is redundant if the graph contains a path from j to i which does not include this edge. Obviously, D and Dtr have the same reachability matrix. We denote by patr (i) the set of parents of i with respect to the transitive reduction of D. The following theorem characterizes all max-linear random vectors which give rise to a ML graphical model. Theorem 3.3. Let D be a DAG with reachability matrix R. Assume that X is a ML model as in (2.4) with ML coefficient matrix B = (bij )d×d such that R = sgn(B). Define D = (dij )d×d ∶= diag(bii , i = 1, . . . , d)

and

D0 = (d0,ji )d×d ∶= (

bji 1([j → i])) . d×d bjj

Then (D, L(X)) is a ML graphical model if and only if the following fixed point equation holds: B = D ∨ B ⊙ D0 .

(3.1)

In this case, Xi = i.e., the weights in (1.3) are given by cik =

bki Xk ∨ bii Zi ; b k∈pa(i) kk

bki bkk



and cii = bii .

Proof. Let Dtr be the transitive reduction of D. First, we have a closer look at the fixed point equation (3.1). We start computing the right-hand side of (3.1) for the ji-th component and obtain d

bjk bki bjk bki 1([k → i]) = dji ∨ , ⋁ k=1 bkk k∈De(j)∩pa(i) bkk

dji ∨ ⋁

(3.2)

since R = sgn(B). For j ∈/ an(i) we have De(j) ∩ pa(i) = ∅ such that (3.2) reduces to the coefficient in D. For j ∈ an(i) we have dji = 0 such that (3.2) reduces to the coefficient in B ⊙ D0 . Furthermore, if j ∈ an(i), then either j ∈ an(i) ∖ pa(i), j ∈ pa(i) ∖ patr (i), or j ∈ patr (i). For j ∈ an(i) ∖ pa(i) (3.2) reduces to bjk bki . k∈de(j)∩pa(i) bkk ⋁

Note that de(j) ∩ pa(i) ≠ ∅ for j ∈ pa(i) ∖ patr (i) and de(j) ∩ pa(i) = ∅ for j ∈ patr (i). Thus, for j ∈ pa(i) ∖ patr (i) we obtain for (3.2), bjk bki bjk bki = bji ∨ , ⋁ k∈De(j)∩pa(i) bkk k∈de(j)∩pa(i) bkk ⋁

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

11

and for j ∈ patr (i) (3.2) reduces to bjk bki = bji . k∈De(j)∩pa(i) bkk ⋁

Altogether we obtain that (3.1) holds if and only if for i, j = 1, . . . , d ⎧ ⎪ 0, ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ bii , ⎪ ⎪ ⎪ ⎪ ⎪ bjk bki ⎪ ⎪ ⎪ , ⋁ bji = ⎨k∈de(j)∩pa(i) bkk ⎪ ⎪ ⎪ bjk bki ⎪ ⎪ ⎪ bji ∨ , ⋁ ⎪ ⎪ ⎪ k∈de(j)∩pa(i) bkk ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩bji ,

if j ∈/ An(i), if j = i, if j ∈ an(i) ∖ pa(i), if j ∈ pa(i) ∖ patr (i), if j ∈ patr (i).

Since for j ∈/ An(i), R = sgn(B) implies bji = 0, the fixed point equation (3.1) is satisfied if and only if bji =

bjk bki k∈de(j)∩pa(i) bkk ⋁

for all j ∈ an(i) ∖ pa(i)

(3.3)

for all j ∈ pa(i) ∖ patr (i).

(3.4)

and bji = bji ∨

bjk bki k∈de(j)∩pa(i) bkk ⋁

Now we prove that, under the conditions above, (D, L(X)) is a ML graphical model if and only if (3.1) or, equivalently, (3.3) and (3.4) hold. pa(i) Assume that (D, L(X)) is a ML graphical model. For j ∈ an(i) observe that j ∈ anmw (i), since every path from j to i contains a node of pa(i). Thus by (2.6) we have for j ∈ an(i) ∖ pa(i), bji =

bjk bki bjk bki = . ⋁ k∈De(j)∩pa(i)∩an(i) bkk k∈de(j)∩pa(i) bkk ⋁

and for j ∈ pa(i) ∖ patr (i), bji =

bjk bki bjk bki = bji ∨ . ⋁ b kk k∈De(j)∩pa(i)∩An(i) k∈de(j)∩pa(i) bkk ⋁

Assume conversely that (3.3) and (3.4) hold. We split up the index set and use (3.3) in the first place, and (3.4) in the second place to obtain Xi =



bji Zj

j∈An(i)

=

bji Zj ∨



=

⋁ j∈pa(i)∖patr (i)

j∈an(i)∖pa(i)

bji Zj ∨



bji Zj ∨ bii Zi

j∈patr (i)

bjk bki bjk bki Zj ∨ )Zj ∨ ⋁ bji Zj ∨ bii Zi . (bji ∨ ⋁ ⋁ b tr kk j∈an(i)∖pa(i) k∈de(j)∩pa(i) j∈pa(i)∖pa (i) j∈patr (i) k∈de(j)∩pa(i) bkk ⋁



Now we use (A.2) and (A.3) to interchange the first two as well as the fourth and the fifth maxima to obtain Xi =

bjk bki bjk bki Zj ∨ Zj ∨ ⋁ bji Zj ∨ bii Zi . bji Zj ∨ ⋁ ⋁ ⋁ b tr tr kk k∈pa(i) j∈an(k)∖pa(i) j∈pa(i)∖pa (i) j∈patr (i) k∈pa(i) j∈an(k)∩pa(i)∖pa (i) bkk ⋁



Next, observe for k ∈ pa(i) that an(k)∩pa(i)∖patr (i) = an(k)∩pa(i). Indeed, assume some l ∈ patr (i) such that l ∈ an(k) ∩ pa(i). Then, k ∈ de(l) ∩ pa(i) which implies that de(l) ∩ pa(i) ≠ ∅. This is a contradiction

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

12

to the fact that l ∈ patr (i) (since de(k) ∩ pa(i) = ∅ for all k ∈ patr (i)). Moreover, we combine the maxima over j ∈ pa(i) ∖ patr (i) and j ∈ patr (i). Thus we obtain Xi = =

bki b k∈pa(i) kk ⋁

=

j∈an(k)∖pa(i)

⋁ (

bki bkk

j∈an(k)∖pa(i)

⋁ (

bki bkk

j∈an(k)

k∈pa(i)

k∈pa(i)

j∈pa(i)

bjk Zj ∨ bki Zk ∨

⋁ ⋁

bki b k∈pa(i) kk

bjk Zj ∨ ⋁ bji Zj ∨ ⋁



bki bkk



bjk Zj ∨ bii Zi

j∈an(k)∩pa(i)

⋁ j∈an(k)∩pa(i)

bjk Zj ) ∨ bii Zi

bjk Zj ∨ bki Zk ) ∨ bii Zi

=

bki ( ⋁ bjk Zj ∨ bkk Zk ) ∨ bii Zi k∈pa(i) bkk j∈an(k)

=

bki Xk ∨ bii Zi . b k∈pa(i) kk





Hence, X has representation (1.3) with weights cik = model.

bki bkk

and cii = bii , and (D, L(X)) is a ML graphical

In the proof of Theorem 3.3 we have shown that under the given conditions the fixed point equation (3.1) holds if and only if (3.3) and (3.4) hold. This implies the following equivalent version of Theorem 3.3. Corollary 3.4. Let Dtr be the transitive reduction of D. Then (D, L(X)) is a ML graphical model if and only if for every i = 1, . . . , d, bji =

bjk bki k∈de(j)∩pa(i) bkk

for all j ∈ an(i) ∖ pa(i)

bji ≥

bjk bki k∈de(j)∩pa(i) bkk

for all j ∈ pa(i) ∖ patr (i).



(3.5)

and ⋁

Given a DAG D Theorem 3.3 or Corollary 3.4 allow us to find all ML models X such that (D, L(X)) is a ML graphical model. On the other hand, we know from Example 3.1 that for a ML graphical model (D, L(X)) the distribution L(X) may also form together with other DAGs a ML graphical model. This raises the question what the smallest DAG (with respect to the number of edges) is, forming together with L(X) a ML graphical model. The following provides an answer. Theorem 3.5. Let (D, L(X)) be a ML graphical model with ML coefficient matrix B = (bij )d×d . Let further Dtr = (V, E tr ) be the transitive reduction of D. Define B = ∶= {(k, i) ∈ V × V ∶ k ∈ pa(i)/patr (i) ∩ anpa(i)∖{k} (i)} mw = {(k, i) ∈ V × V ∶ k ∈ pa(i)/patr (i) and bki =

bkl bli } l∈de(k)∩pa(i) bll ⋁

and for E B ∶= E ∖ B = the DAG DB ∶= (V, E B ). Then DB is the DAG with minimal number of edges such that (DB , L(X)) is a ML graphical model. Proof. The equality of the two sets in the definition of B = follows from (2.6). First, we prove that (DB , L(X)) is a ML graphical model. By Theorem 2.2 and Definition 2.6 it suffices to show that for all i ∈ V and j ∈ an(i) there exists a max-weighted path which only contains edges of E B . For some i ∈ V and j ∈ an(i) let p = [j = k0 → k1 → . . . → kn = i] be a max-weighted path in D with maximal number of edges of all max-weighted paths from j to i. Assume that p contains an edge kl−1 → kl with (kl−1 , kl ) ∈ B = for some l ∈ {1, . . . , n}. By the definition of B = there exists a max-weighted path p1 from kl−1 to kl which does not include the edge kl−1 → kl . Thus by replacing in p the edge kl−1 → kl by p1

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

13

we obtain by Remark 2.7(iv) a max-weighted path from j to i containing more edges than p. This is, however, a contradiction to the fact that p is of maximal length of all max-weighted paths from j to i. It remains to show that there exists no DAG which has less edges than DB and forms together with L(X) a ML graphical model. Recall from Remark 2.4(i) that a DAG corresponding to a ML graphical model cannot have less edges than the transitive reduction of any DAG with reachability matrix R = sgn(B); i.e., E tr ⊆ E B . Let i ∈ V and k ∈ pa(i) ∖ patr (i) such that the only max-weighted path from k pa(i)∖{k} to i is [k → i]; i.e., k ∈ annmw (i) and, hence, k ∈ E B ∖ E tr . Since [k → i] is the only max-weighted path from k to i in D, every path from k to i in a subgraph of D which does not contain the edge k → i has a smaller weight. Thus all edges of E B have to be in a DAG, which forms together with L(X) a ML graphical model. We call DB as in Theorem 3.5 minimal max-linear DAG, and for i ∈ V we denote by paB (i) the parents of i with respect to DB . For i ∈ V and k ∈ paB (i) observe from the definition of DB that the only max-weighted path from k to i is [k → i]. Thus the following is an immediate consequence of Definition 2.6 and Corollary 2.12. Corollary 3.6. Let (D, L(X)) be a ML graphical model with ML coefficient matrix B = (bij )d×d . Assume that the DAG D is minimal max-linear; i.e., D = DB . Then for all i ∈ V and k ∈ pa(i), ki , and (a) the weights cik are uniquely given by B, namely, cik = bbkk i (b) the equality Xi = ck Xk holds with positive probability.

Remark 3.7. There are two advantages for considering (DB , L(X)) instead of (D, L(X)): (i) The representation (1.3) of X is unique. (ii) As we know from (1.2), missing edges correspond to conditional independence in the distribution of X; i.e., we can read off from DB more conditional independence properties of L(X) than from D. ◻ 4. Structural properties of a max-linear graphical model Throughout this section we assume that (D, L(X)) is a ML graphical model with ML coefficient matrix B = (bij )d×d . From (1.3), we learn immediately that cik Xk ≤ Xi

for all k ∈ pa(i) and i ∈ V.

(4.1)

The following lemma states inequality (4.1) for nodes Xj and Xi which are not simply connected by an edge [j → i], but by a path [j ⇒ i] of arbitrary length. Lemma 4.1. Let U ⊆ V . Then for i ∈ V , ⋁ j∈an(i)∩U

bji bii Xj ≤ Xi ≤ ⋀ Xl . bjj l∈de(i)∩U bil

(4.2)

Proof. Assume nodes i, j ∈ V such that j ∈ an(i), and let p = [j = k0 → k1 → ⋅ ⋅ ⋅ → kn = i] be an arbitrary path in D. By applying (4.1) iteratively we obtain n−1

cjj Xi = ckk00 Xkn ≥ ckk00 ckknn−1 Xkn−1 ≥ ⋯ ≥ ckk00 ∏ ckkl+1 Xk0 = dpji Xj l l=0

and, thus, as this holds for all paths, cjj Xi ≥ ⋁ dpji Xj , which is by (2.2) equivalent to we have

bji bjj

Xj ≤ Xi for all j ∈ an(i)∩U and

p∈Pji Xi ≤ bbii Xl il

bji bjj

Xj ≤ Xi . Thus

for all l ∈ de(i)∩U , which, finally, implies (4.2).

The bounds given in (4.2) can, under certain conditions, be modified such that they are based on a smaller number of nodes. In certain cases, they reduce to a single properly scaled random variable Xj and Xl , respectively. The following example illustrates this.

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

14

Example 4.2. Consider the ML graphical model (D, L(X1 , . . . , X5 )) with DAG D = ({1, 2, 3, 4, 5}, {(1, 2), (2, 3), (3, 4), (4, 5)}) 1

2

3

4

5

and ML coefficient matrix B = (bij )5×5 in the context of Lemma 4.1. For U = {1, 2, 4, 5} and i = 3 we obtain b13 b23 b33 b33 X1 ∧ X2 ≤ X3 ≤ X4 ∧ X5 . b11 b22 b34 b35

(4.3)

Furthermore, using Lemma 4.1 with U = {4} and i = 5 as well as (3.5), we obtain b45 X4 ≤ X5 b44 Similarly, we derive

⇐⇒ b13 X b11 1

b34 b45 X4 ≤ b34 X5 b44 ≤

b23 X . b22 2

⇐⇒

b35 X4 ≤ b34 X5

⇐⇒

b33 b33 X4 ≤ X5 . b34 b35

Thus we have identified the bounds in (4.3) as

b23 b23 b13 X1 ∧ X2 = X2 b11 b22 b22

and

b33 b33 b33 X4 ∧ X5 = X4 . b34 b35 b34 ◻

In order to investigate this effect in general, we need the following definition. Since it is not completely obvious how to obtain these particular ancestors and descendants, we present an algorithm in Remark 4.4 and illustrate the concept in Example 4.5 below. Definition 4.3. Let U ⊆ V . (a) For nodes i, j ∈ V such that j ∈ an(i) ∩ U , we call j lowest max-weighted ancestor of i in U if U ∖ {i, j} = ∅ or if U ∖ {i, j} ≠ ∅ and there exists no max-weighted path from j to i containing nodes of U ∖ {i, j}. We denote the set of the lowest max-weighted ancestors of i in U by anU low (i). (b) For nodes i, j ∈ V such that i ∈ de(j) ∩ U , we call i highest max-weighted descendant of j in U if U ∖ {i, j} = ∅ or if U ∖ {i, j} ≠ ∅ and there exists no max-weighted path from j to i containing nodes of U ∖ {i, j}. We denote the set of the highest max-weighted descendants of j in U by deU high (j). ◻ Remark 4.4. Assume the situation of Definition 4.3(a). The set anU low (i) may be determined as follows: (i) Identify every j ∈ an(i) ∩ U such that U ∖ {i, j} = ∅ or that U ∖ {i, j} ≠ ∅ and there exists a path [j ⇒ i] which does not contain any node of U ∖ {i, j}. We summarize these nodes in the set A. (ii) For all j ∈ A such that U ∖ {i, j} ≠ ∅, remove j from A if there exists a max-weigthed path [j ⇒ i] containing some node of U ∖ {i, j}. (iii) The remaining nodes of A are the lowest max-weighted ancestors of i in U ; i.e., A = anU low (i). ◻ U Example 4.5. [Continuation of Examples 2.1 and 2.8: anU low (i) and dehigh (j)] Consider U = {1, 2} and i = 4 in the context of Definition 4.3(a). The only and thus max-weighted path [2 → 4] from 2 to 4 does not contain any node of U ∖ {2, 4} = {1}; i.e., 2 ∈ anU low (4). It remains to discuss node 1, which gives rise to two cases:

(1) If b14 = b12b22b24 ≥ b13b33b34 , then the max-weighted path [1 → 2 → 4] does also contain node 2 of U U ∖ {1, 4} = {2}. Thus we have that 1 /∈ anU low (4) and, consequently, anlow (4) = {2}. b12 b24 b13 b34 (2) If b14 = b33 > b22 , then the only max-weighted path from 1 to 4 is [1 → 3 → 4]. Since this path does not contain node 2, we have anU low (4) = {1, 2}. Now consider the case U = {3, 4} and j = 1 in the context of Definition 4.3(b). Similarly as above, we obtain the two cases:

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

(1) If b14 = (2) If b14 =

b13 b34 b33 b12 b24 b22

≥ >

b12 b24 , b22 b13 b34 , b33

15

then deU high (1) = {3}. then deU high (1) = {3, 4}.



The following shows now that, in order to determine the lower and upper bounds in (4.2), it suffices to consider the lowest max-weighted ancestors of i in U and the highest max-weighted descendants of i in U . Lemma 4.6. Let U ⊆ V . Then for i ∈ V , ⋁

j∈anU low

bji bji Xj = Xj ≤ Xi ≤ ⋁ ⋀ j∈an(i)∩U bjj (i) bjj l∈deU

high

bii bii Xl = ⋀ Xl . b il l∈de(i)∩U bil (i)

(4.4)

Proof. Since (4.2) holds, we only have to show the equalities for the lower and upper bounds. Let k ∈ U (an(i) ∩ U ) ∖ anU low (i). By the definition of the set anlow (i) we know that U ∖ {k, i} ≠ ∅ and that there exists a max-weighted path from k to i containing some node of U ∖{k, i}; i.e., there exists a max-weighted path containing some node of U ∖ {i}. Then by Lemma A.2(a) there exists a max-weighted path from k {j} to i containing some j ∈ anU low (i); i.e., k ∈ anmw (i). Thus, using (2.6) and Lemma 4.1, we obtain bkj bji bji bki Xk = Xk ≤ Xj . bkk bkk bjj bjj

(4.5)

U Since for all k ∈ (an(i) ∩ U ) ∖ anU low (i), there exists some j ∈ anlow (i) such that (4.5) holds, we have



j∈an(i)∩U

bji Xj = ⋁ bjj j∈anU

low

bji Xj . b (i) jj

Similarly, we obtain the equality for the upper bound of Xi in (4.4). Next, for U ⊊ V , we write the random variables Xi for i ∈ U c ∶= V ∖ U as functions of the random vector XU and noise variables. Of course, there are many such representations. We focus on that with minimal number of nodes in U and noise variables. c U Theorem 4.7. Let U ⊊ V and denote AnU nmw (i) = annmw (i) ∪ {i} (cf. Definition 2.9(a)). Then for i ∈ U ,

Xi =



k∈anU low

bki Xk ∨ ⋁ (i) bkk j∈AnU

(4.6)

bji Zj .

nmw (i)

This representation of Xi as a function of XU and noise variables involves the minimal number of nodes in U and noise variables. Proof. Using the first equality in (4.4) and (2.3) as well as in a second step (A.4) to interchange the first two maxima, we obtain for the right-hand side of (4.6) ⋁

k∈anU low

bki Xk ∨ ⋁ b (i) kk j∈AnU



bji Zj =

k∈an(i)∩U

nmw (i)

=



bki ( ⋁ bjk Zj ) ∨ ⋁ bkk j∈An(k) j∈AnU

bji Zj

nmw (i)



j∈an(i) k∈De(j)∩an(i)∩U

bjk bki Zj ∨ ⋁ bkk j∈AnU

bji Zj .

nmw (i)

U U c Splitting the set an(i) into anU mw (i) and an(i) ∖ anmw (i) = annmw (i) and, as i ∈ U , using (2.6) yields





j∈anU mw (i) k∈De(j)∩an(i)∩U

=



j∈anU mw (i)

bji Zj ∨

bjk bki bjk bki Zj ∨ Zj ∨ ⋁ ⋁ ⋁ bkk U j∈annmw (i) k∈De(j)∩an(i)∩U bkk j∈AnU



bji Zj

nmw (i)



j∈anU nmw (i) k∈De(j)∩an(i)∩U

bjk bki Zj ∨ ⋁ bkk j∈AnU

bji Zj .

(4.7)

nmw (i)

c For j ∈ anU nmw (i) we obtain by (2.7), as i ∈ U ,



k∈De(j)∩an(i)∩U

bjk bki < bji . bkk

(4.8)

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

16

Hence, we may omit the lower terms in (4.7) corresponding to the nodes of anU nmw (i). This yields for the U U right-hand side of (4.6), since Annmw (i) = An(i) ∖ anmw (i), ⋁

bji Zj ∨

j∈anU mw (i)



bji Zj =

j∈AnU nmw (i)



bji Zj = Xi .

j∈An(i)

It remains to show that (4.6) is the representation of Xi = ⋁j∈An(i) bji Zj with minimal number of nodes in U and noise variables. We prove that we cannot omit any term in (4.6). Let j ∈ anU low (i). Since U U U anlow (i) ⊆ U , we know that j ∈ anmw (i) and, hence, j ∈/ Annmw (i). Thus the term bji Zj is only included in bki ki Xk . Assume that bji Zj is only included in bbkk Xk for some k ∈ anU ⋁k∈anUlow (i) bkk low (i) ∖ {j}. This implies ki that jk = bji and, hence, we have by (2.6), j ∈ anmw (i); i.e., there exists a max-weighted path from j bkk to i containing some node of U ∖ {j, i}. This is a contradiction to the fact that j ∈ anU low (i). Thus bji Zj is bji only included in bjj Xj , and we cannot omit this term in (4.6). Finally, observe from (4.7) and (4.8) that

b

{k}

b

we also cannot omit any of the noise variables of AnU nmw (i).

The following corollary presents the components of X as minimal functions (with respect to the number of components) of their parent nodes. Corollary 4.8. Let DB be the minimal max-linear DAG from Theorem 3.5.F Then for i ∈ V , anlow (i) = paB (i) and pa(i)

Xi =

bki Xk ∨ bii Zi = ⋁ cik Xk ∨ cii Zi . b B kk k∈pa (i) k∈paB (i) ⋁

(4.9)

Proof. If k ∈ pa(i), then either k ∈ patr (i), k ∈ paB (i) ∖ patr (i) or k ∈ pa(i) ∖ paB (i). For k ∈ patr (i) observe from the definition of the transitive reduction that [k → i] is the only and, thus, max-weighted pa(i) path from k to i. Hence, patr (i) ⊆ anlow (i). For k ∈ paB (i) ∖ patr (i) we have by the definition of DB pa(i)∖{k} (i). Thus the only max-weighted path from k to i is the edge [k → i] and, and (2.6) that k ∈/ anmw pa(i) pa(i)∖{k} therefore, k ∈ anlow (i). For k ∈ pa(i) ∖ paB (i), however, we know that k ∈ anmw (i); i.e., there exists pa(i) a max-weighted path from k to i containing nodes of pa(i) ∖ {i, j} and, hence, k ∈ pa(i) ∖ anlow (i). pa(i) Altogether, we have anlow (i) = paB (i). pa(i) Since every path from some j ∈ an(i) to i contains a node of pa(i), it holds that anmw (i) = an(i); i.e., pa(i) bki B i Anpa(i) nmw (i) = An(i) ∖ anmw (i) = {i}. Furthermore, recall from Theorem 3.5 that for k ∈ pa (i), ck = bkk . Then, by Theorem 4.7, we obtain (4.9). Remark 4.9. Since (4.9) is the representation of Xi as a function of its parents pa(i) with minimal number of nodes in pa(i), we recover again DB as the DAG with minimal number of edges corresponding to the SEM X. ◻ The following example illustrates Theorem 4.7. Example 4.10. [Continuation of Examples 2.1, 2.8 and 4.5: minimal representation by U ] Consider U = {1, 2} and i = 4. Obviously, we have 1, 2 ∈ anU mw (4). Since there is no path from 3 to 4 which U contains any node of U , we have anU (4) = {1, 2} and, hence, AnU mw nmw (4) = An(i) ∖ anmw (4) = {3, 4}. U In Example 4.5 we have already determined the set anlow (4) depending on the ML coefficient matrix B = (bij )4×4 . Thus we obtain the two cases: (1) If b14 = (2) If b14 =

b12 b24 b22 b13 b34 b33

≥ >

b13 b34 , b33 b12 b24 , b22

then X4 = then X4 =

b24 X b22 2 b14 X b11 1

∨ b34 Z3 ∨ b44 Z4 . 24 ∨ bb22 X2 ∨ b34 Z3 ∨ b44 Z4 .

U For U = {2} and i = 4, we have anU low (4) = {2}, since U ∖ {2, 4} = ∅. Obviously, we have 2 ∈ anmw (4). Since U there is no path from 3 to 4, which contains 2, we have that 3, 4 ∈ Annmw (4). It remains to discuss node 1, which gives rise to the same two cases as above: U b24 (1) If b14 = b12b22b24 ≥ b13b33b34 , then 1 ∈ anU mw (4) and, consequently, Annmw (i) = {3, 4}. Then X4 = b22 X2 ∨ b34 Z3 ∨ b44 Z4 . U (2) If b14 = b13b33b34 > b12b22b24 , then 1 ∈/ anU mw (4) and, consequently, Annmw (i) = {1, 3, 4}. Then X4 = b24 X ∨ b14 Z1 ∨ b34 Z3 ∨ b44 Z4 . b22 2 ◻

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

17

5. Order and ties between components of a max-linear graphical model Throughout this section (D, L(X)) is a ML graphical model with ML coefficient matrix B = (bij )d×d . Denote again by (Ω, A, P) the probability space on which X is defined. Simply by definition of the ML graphical model the order between the weights in (1.3), the max-linar coefficients in (2.3), and also the components of X plays an important role. From Lemma 4.1, for instance, we know that there is a natural order between the (appropriately scaled) components of X, which holds on the whole of Ω, and Theorem 2.11 shows conditions and properties for two random variables beeing equal on a subset of Ω. In this section we take this a step further. For a given subset O ⊆ V we investigate the information which can be drawn from the fact that on a specific subset Ωo of Ω some of the components of XO = (Xi , i ∈ O) may be equal, or one component may be strictly larger or smaller than another. We characterize this subset Ωo and recover on Ωo a reduced model, which may be easier to analyze. When we think of the nodes O as being observed in a statistical experiment, then this section provides probabilistic tools for predicting some part of the DAG which is not observed. Nodes in O naturally lead to information within all its ancestors An(O), but even beyond certain conclusions are still possible. 5.1. Subgraphs of D induced by orders between nodes in O ⊆ V We start by applying (4.2) to obtain for random variables in An(O) an upper bound based on the nodes in O: Xj ≤

bjj Xi =∶ Xju , i∈De(j)∩O bji ⋀

j ∈ An(O).

(5.1)

Note from (4.2) and the definition of Xju that Xju = Xj

for all j ∈ O

and Xju ≤

bjj Xi bji

for all j ∈ An(O) and i ∈ De(j) ∩ O.

(5.2)

A similar concept has been used for conditional simulation of arbitrary max-linear models in [12]. This subsection is related to their important notion of hitting scenarios. For j ∈ An(O) the upper bound Xju is not uniquely given by one node, but may be realized by several nodes. Obviously, the order between the components of XO has a direct influence on which random variables of O realize the upper bound Xju . Different realizations of XO may correspond to different upper bounds in (5.1) and in particular to different nodes in O attaining the bounds. We call Ωo ⊆ Ω order set if for all j ∈ An(O) the upper bound Xju is realized by the same nodes of O on the whole of Ωo . The following provides a formal definition of an order set as well as the definition of an important subgraph corresponding to Ωo . Definition 5.1. (a) Let Ωo ⊆ Ω. If for all j ∈ An(O) and i ∈ de(j) ∩ O it holds on Ωo that either b b Xju = bjj Xi or Xju < bjj Xi , and Ωo is the largest subset with this property, then we call Ωo order ji ji set. (b) Let Ωo be some order set as defined in (a). Then the corresponding order DAG Do is the subgraph of D with node set An(O) containing only those paths from D which are max-weighted and have b ◻ Xi on Ωo . start and end node j ∈ An(O) and i ∈ O, respectively, satisfying Xju = bjj ji Whereas the node set An(O) is given naturally by O, there is some freedom in choosing the edges of b Xi on Ωo , we only Do . Instead of keeping all paths from nodes j ∈ An(O) and i ∈ O satisfying Xju = bjj ji keep the edges on max-weighted paths from j to i, since they result in a subgraph of D containing only the relevant edges. This kind of construction will also allow us to find a simple characterization of the nodes which we can predict P−a.s. (cf. Section 5.2). Note that, since the set V of nodes is finite, there exist only finitely many different order sets and, hence, also only finitely many different order DAGs.

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

18

Remark 5.2. Here and in what follows all node sets related to an order DAG Do are labelled by o, for instance, for i ∈ An(O) we denote by ano (i), pao (i), and deo (i) the sets of the ancestors, parents, and descendants of i with respect to Do and, again, Ano (i) = ano (i) ∪ {i}. All other node sets related to some graphical concept, which are not labelled by o, are defined with respect to the original DAG D. ◻ The following example illustrates the order DAGs implied by partial order information. Example 5.3. Consider the ML graphical model (D, L(X1 , . . . , X7 )) with DAG D = ({1, 2, 3, 4, 5, 6, 7}, {(1, 2), (2, 3), (2, 4), (2, 5), (3, 5), (4, 6), (5, 7), (6, 7)}) 1

2

4

6

3

5

7

and ML coefficient matrix B = (bij )7×7 . Let O = {4, 5}; hence, An(O) = {1, 2, 3, 4, 5}. Then the upper bounds in (5.1) are given by X1u =

b11 b11 X4 ∧ X5 , b14 b15

X2u =

b22 b22 X4 ∧ X5 , b24 b25

X3u =

b33 X5 , b35

X4u = X4 ,

X5u = X5 .

11 11 X4 = bb15 X5 or X1u = bb11 X4 < bb11 X5 or X1u = bb15 X5 < bb11 X4 yield The three different scenarios X1u = bb11 14 14 15 14 u the analogous situations for X2 and vice versa, which results in the three different order sets:

Ωo1 = {ω ∈ Ω ∶ b15 X4 (ω) = b14 X5 (ω)} = {ω ∈ Ω ∶ b25 X4 (ω) = b24 X5 (ω)},

Ωo2 = {ω ∈ Ω ∶ b15 X4 (ω) < b14 X5 (ω)} = {ω ∈ Ω ∶ b25 X4 (ω) < b24 X5 (ω)},

Ωo3 = {ω ∈ Ω ∶ b15 X4 (ω) > b14 X5 (ω)} = {ω ∈ Ω ∶ b25 X4 (ω) > b24 X5 (ω)}.

Next, we find the order DAGs from Definition 5.1(b) by identifying the max-weighted paths from j ∈ An(O) to all i ∈ de(j) ∩ O which realize the bounds Xju . The edge [3 → 5] is the only path from 3 to 5 and, hence, max-weighted. Observe that the sets of paths from 1 to 5 and from 2 to 5 are given by P15 = {[1 → 2 → 3 → 5], [1 → 2 → 5]} and P25 = {[2 → 3 → 5], [2 → 5]}. We assume that the DAG D is minimal max-linear; i.e., D = DB . Since by Theorem 3.5, b25 > b23b33b35 , the path [2 → 3 → 5] is not max-weighted, and the only max-weighted path from 2 to 5 is [2 → 5]. Consequently, the only max-weighted path from 1 to 5 is [1 → 2 → 5]. Thus, corresponding to the realized bounds X1u , X2u , we obtain the following three possible order DAGs D1o , D2o , and D3o corresponding to the situations on Ωo1 , Ωo2 , and Ωo3 from left to right: 1

2

4

3

5

1

2

4

3

5

1

2

4

3

5 ◻

Some properties of an order DAG will be needed later on. Lemma 5.4. Let Ωo be some order set with order DAG Do . Then (a) all paths in Do are max-weighted in D, and b Xi on Ωo . (b) for i ∈ O, j ∈ Ano (i) if and only if Xju = bjj ji

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

19

Proof. (a) The statement follows directly from the construction of Do and Remark 2.7(iii). b (b) Note first from the definition of Do that, if Xju = bjj Xi on Ωo , then j ∈ Ano (i). ji In what follows all random variables are considered on Ωo . Assume there exists some j ∈ Ano (i) such b Xi . By the definition of Do , however, there exists some k ∈ An(i) such that Xku = bbkk Xi and that Xju ≠ bjj ji ki there is a max-weighted path from k to i in D which contains j; i.e., k ∈ anmw (i). Let l ∈ De(j) ∩ O; hence, {j} {j} l ∈ De(k) ∩ O. Furthermore, observe from (2.6) and (2.7) that, since either k ∈ anmw (l) or k ∈ annmw (l), bkj bjl bkl ≥ bjj . Thus altogether, also using (5.2) and (2.6), we obtain {j}

bkj bkj u bkj bkj bjj bjj bjj Xl ≥ Xl ≥ Xk = Xi = Xi = Xi . bjl bkl bkk bki bkj bji bji Consequently, for all l ∈ De(j) ∩ O, contradiction to

Xju



bjj bji

bjj bji

Xi ≤

bjj bjl

Xl and, hence, Xju = ⋀l∈De(j)∩O

bjj bjl

Xl =

bjj bji

Xi , which is a

Xi .

Definition 5.5. Let Do be some order DAG. Define XoAn(O) as the random vector given by Xio =



cik Xko ∨ cii Zi ,

i ∈ An(O),

k∈pao (i)

with the same weights cik and noise variables Zi as in the representation (1.3) of X. We call the resulting ML graphical model (Do , L(XoAn(O) )) ML order graphical model. ◻ Recall that XoAn(O) has a max-linear representation as in (2.3). The following is a consequence of Lemma 5.4(a) and Definition 2.6 and shows how the max-linear coefficients of X are related to those of XoAn(O) . Lemma 5.6. Let Ωo be some order set and (Do , L(XoAn(O) )) be the corresponding ML order graphical model. Denote by B o = (boij )∣An(O)∣×∣An(O)∣ the ML coefficient matrix of XoAn(O) . Then for all i ∈ An(O) and j ∈ Ano (i), boji = bji ; i.e., Xio =



j∈Ano (i)

boji Zj =



bji Zj ,

i ∈ An(O).

(5.3)

j∈Ano (i)

It is remarkable that XO and XoO coincide on Ωo . Proposition 5.7. Let Ωo be some order set and (Do , L(XoAn(O) )) be the corresponding ML order graphical model. Then for i ∈ O we have on Ωo that Xi = Xio . Proof. In what follows all random variables are considered on Ωo . Due to (5.3) it suffices to show that Xi = ⋁j∈An(i) bji Zj = ⋁j∈Ano (i) bji Zj . Let k ∈ an(i) ∖ ano (i). By Lemma 5.4(b) and (5.2) we have Xku < bkk Xi . Thus, also using (5.1), we obtain bki bkk Xi > Xku ≥ Xk = ⋁ bjk Zj ≥ bkk Zk . bki j∈An(k) Consequently, we have for all k ∈ an(i) ∖ ano (i), bki Zk < Xi = ⋁j∈An(i) bji Zj and, hence, Xi = ⋁j∈An(i) bji Zj = ⋁j∈Ano (i) bji Zj . We know from Theorem 2.2 that Xi = ⋁j∈An(i) bji Zj holds for all i ∈ V . Proposition 5.7 implies that, on some order set Ωo , for all i ∈ O the maximum value of {bji Zj , j ∈ An(i)} can only be achieved for some j ∈ Ano (i); i.e., we have for the node j(i) as in (2.8) that j(i) ∈ Ano (i). Observe from the Definition 5.1(b) of Do and also from the proof of Proposition 5.7 that this model reduction originates in the upper bounds in (5.1). Now assume that there are two distinct nodes i1 , i2 ∈ O such that Ano (i1 ) ∩ Ano (i2 ) ≠ ∅. So b b take some j ∈ Ano (i1 ) ∩ Ano (i2 ). By Lemma 5.4(b) we then have on Ωo that Xju = bjijj Xi1 = bjijj Xi2 and, 1 2 consequently, by Theorem 2.11(b) that, P-a.s. on Ωo , j(i1 ) = j(i2 ) ∈ Ano (i1 ) ∩ Ano (i2 ). This shows that we may reduce the max-linear representation of XO even further. In the following we therefore investigate these upper bounds more closely and extend Theorem 2.11(b) to more than two nodes. To this end, we first introduce some graphical concepts.

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

20

Definition 5.8. Let D = (V, E) be a DAG. (a) A DAG is called weakly connected if, replacing all of its directed edges with undirected edges, produces a connected graph; i.e., there is a (undirected) path between every pair of nodes. Moreover, a weakly connected component is a maximal weakly connected subgraph of a DAG. (b) Let A ⊆ V . A node in ⋂i∈A An(i) is a common ancestor of A, and we define cAn(A) ∶= ⋂i∈A An(i). A node j ∈ cAn(A) is a lowest common ancestor of A if there is no node l ∈ cAn(A) such that there exists a path from j to l. We denote the corresponding set by lcAn(A). ◻ Assume that the order DAG Do has T ∈ N weakly connected components, whose corresponding node sets we denote by C1 , . . . , CT . They provide a partition of An(O) and O by T

An(O) = ⋃ Ct t=1

T

T

t=1

t=1

and O = ⋃ (Ct ∩ O) =∶ ⋃ Ot .

(5.4)

The following gives now a minimal max-linear representation of XO on Ωo with respect to the number of noise variables. Theorem 5.9. Let Ωo be some order set and (Do , L(XoAn(O) )) the corresponding ML order graphical model. Assume that the weakly connected components of Do have node sets C1 , . . . , CT , and recall (5.4). Then the following holds P−a.s. on Ωo for every t ∈ {1, . . . , T }: (a) For all i ∈ Ot the node j(i) as in (2.8) is the same unique node and belongs to cAno (Ot ). (b) For every i ∈ Ot we have Xi =

⋁ o

bji Zj =

j∈cAn (Ot )



k∈lcAn

o

bki o Xk . b (Ot ) kk

(5.5)

(c) The number of noise variables Zj in representation (5.5) is minimal. In particular, the maximum occurs in every Zj for j ∈ cAno (Ot ) with positive probability. Proof. Assume wlog that P(Ωo ) > 0. Restrict all random variables to Ωo , and let t ∈ {1, . . . , T } be fixed. First, suppose that ∣Ot ∣ = 1; i.e., Ot = {i} for some i ∈ V . Observe from Definition 5.8(b) that cAno (Ot ) = cAno ({i}) = Ano (i) and, since for every j ∈ ano (i) there exists a path from j to i in Do , that lcAno (Ot ) = lcAno ({i}) = {i}. Thus (5.5) follows directly from (5.3) and Proposition 5.7. From (5.5) and Theorem 2.11(a) we get (a). Now suppose that ∣Ot ∣ ≥ 2, and let i1 , i2 ∈ Ot and i1 ≠ i2 . By Definition 5.1(b), as i1 and i2 are in the same weakly connected component of Do , there must exist nodes i1 = l0 , l1 , . . . , lm = i2 ∈ Ot

and js ∈ Ano (ls−1 ) ∩ Ano (ls ),

s = 1, . . . , m,

such that Xjus =

bjs js bj j Xls−1 = s s Xls , bjs ls−1 bjs ls

s = 1, . . . , m.

First, assume that m = 1; i.e., Xju =

bjj bjj Xi = Xi bji1 1 bji2 2

for some j ∈ Ano (i1 ) ∩ Ano (i2 ).

By Theorem 2.11(b) we have P−a.s. on Ωo that j(i1 ) = j(i2 ). Furthermore, we know from (5.3) and Proposition 5.7 that j(i1 ) ∈ Ano (i1 ) and j(i2 ) ∈ Ano (i2 ). Hence, P−a.s. on Ωo , j(i1 ) = j(i2 ) ∈ cAno ({i1 , i2 }) = Ano (i1 ) ∩ Ano (i2 ). For m > 1 we may conclude from above that, P−a.s. on Ωo , j(i1 ) = j(l0 ) = j(l1 ) = ⋅ ⋅ ⋅ = j(lm ) = j(i2 ) ∈



s∈{1,...,m}

Ano (ls ),

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

21

since any finite union of null sets is a null set. This argument applies to all different pairs of nodes in Ot such that j(i) is the same node for all i ∈ Ot , and, hence, by (5.3) and Proposition 5.7, j(i) ∈ ⋂i∈Ot Ano (i) = cAno (Ot ). Its uniqueness is already given in Theorem 2.11(a). Hence, we have proved (a). The first equality of (5.5) follows from (a), and also (c) follows immediately from the above construction. Thus it remains to show that the second equality of (5.5) holds. Observe from the Definition 5.8(b) of the lowest common ancestors that cAno (Ot ) =



k∈lcAno (Ot )

Ano (k).

We therefore obtain for i ∈ Ot that, P−a.s. on Ωo , ⋁

Xi =



bji Zj =

j∈cAno (Ot )



bji Zj .

k∈lcAno (Ot ) j∈Ano (k)

Now let k ∈ lcAno (Ot ) and j ∈ ano (k). Observe from Lemma 5.4(a) that in D there exists a max-weighted {k} path from j to i containing k; hence, j ∈ anmw (i). Thus by (2.6) and (5.3) we have P−a.s. on Ωo , Xi =



o

k∈lcAn (Ot )

(



j∈ano (k)

bjk bki bki bki o Zj ∨ bkk Zk ) = ( ⋁ bjk Zj ) = Xk . ⋁ ⋁ bkk b b o o o k∈lcAn (Ot ) kk j∈An (k) k∈lcAn (Ot ) kk

5.2. Almost sure prediction Let Ωo be some order set with order DAG Do . From Theorem 5.9(a) we know that for every node i ∈ O the maximum value over all {bji Zj , j ∈ An(i)} can P-a.s. on Ωo only be achieved for those j which are in Do common ancestors of all nodes of O contained in the same weakly connected component; i.e., j(i) ∈ cAno (Ot ). In the following, we investigate the information which can be drawn from this fact for nodes outside O, i.e. for nodes of Oc = V ∖ O. As a motivation we present an example. Example 5.10. Consider the ML graphical model (D, L(X1 , X2 , X3 , X4 )) with DAG 2

D = ({1, 2, 3, 4}, {(1, 2), (1, 3), (3, 4)})

1

3

4

and ML coefficient matrix B = (bij )4×4 . Let O = {2, 4}. Find out that the order DAG corresponding to the order set Ωo = {ω ∈ Ω ∶ b14 X2 (ω) = b12 X4 (ω)} is given by Do = D. The following holds P−a.s. on Ωo . Theorem 5.9(b) yields that X4 = Thus, using (3.5), we obtain

b33 X b34 4

=

b14 bki Xk = X1 . b11 k∈lcAno ({2,4}) bkk ⋁

b13 X . b11 1

Furthermore, we know from (4.2) that

b13 b33 X1 ≤ X3 ≤ X4 . b11 b34 Since the lower and upper bounds are equal, X3 = X1 =

b33 X b34 4

b11 b11 X4 = X2 b14 b12

must hold. Altogether, we obtain

and X3 =

b33 X4 . b34

We denote this extension of O by O = {1, 2, 3, 4}. We formulate a general result on nodes outside O leading to this extension of O.



Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

22

Theorem 5.11. Assume the same conditions as in Theorem 5.9 hold. For some t ∈ {1, . . . , T } let l ∈ Ct ∖O such that cAno (Ot ) ⊆ Ano (l). Then the following holds P−a.s. on Ωo : (a) The node j(l) as in (2.8) is unique and j(l) = j(i) for all i ∈ Ot . Moreover, j(l) ∈ cAno (Ot ). (b) We have ⋁

Xl = Xlo = Xlu =



bjl Zj =

o

j∈cAn (Ot )

k∈lcAn

o

bkl o Xk . b (Ot ) kk

(5.6)

Proof. Assume wlog that P(Ωo ) > 0. The following holds P−a.s. on Ωo . From Definition 5.1(b) we know that all terminal nodes of Do belong to O. Thus, we have that deo (l)∩Ot ≠ ∅. So take some i ∈ deo (l)∩Ot . From Lemma 5.4(b), as l ∈ Ano (i), and (5.5) we obtain Xlu =

bll bji bll bll bki o Xi = Zj = Xk . ⋁ ⋁ bli b b o o li j∈cAn (Ot ) k∈lcAn (Ot ) li bkk

Let j ∈ cAno (Ot ). Since cAno (Ot ) ⊆ Ano (l), there exists a path in Do from j to i containing node l. {l} Hence, by Lemma 5.4(a) there is also a max-weigthed path in D from j to i containing l; i.e., j ∈ anmw (i). Thus, as lcAno (Ot ) ⊆ cAno (Ot ), we obtain with (2.6), Xlu =



bjl Zj =

j∈cAno (Ot )

bkl o Xk . b k∈lcAno (Ot ) kk ⋁

Furthermore, using first (5.1), then Ano (l) ⊆ An(l) as well as (5.3), and finally cAno (Ot ) ⊆ Ano (l) yields Xlu ≥ Xl =





bjl Zj ≥

o

bjl Zj = Xlo ≥

j∈An (l)

j∈An(l)

⋁ o

bjl Zj = Xlu .

j∈cAn (Ot )

Since the lower and upper bounds are equal, we have shown (5.6). From above we also see that Xl = which implies by Theorem 2.11(b) that j(l) = j(i). Then by Theorem 5.9(a) we have (a).

bll X, bli i

Define the set ̃ ∶= {l ∈ V ∶ Xl = aXi P−a.s. on Ωo for some a ∈ R+ and i ∈ O}. O ̃ contains exactly When we think of the nodes O as being observed in a statistical experiment, then O ̃ contains exactly the nodes those nodes which we can predict, at least P−a.s. The following shows that O from O and Theorem 5.11. Theorem 5.12. Assume the same conditions as in Theorem 5.9 hold. Define the extended O-set O ∶=

o o ⋃ {l ∈ Ct ∶ cAn (Ot ) ⊆ An (l)}.

(5.7)

t=1,...,T

Then ̃ ⊆ An(O). O=O

(5.8)

Moreover, the following assertions hold P-a.s. on Ωo for every t ∈ {1, . . . , T }: (a) For all i ∈ Ot ∶= Ct ∩ O the node j(i) as in (2.8) is the same unique node and belongs to cAno (Ot ). (b) We have for i ∈ O t , Xi = Xio = Xiu =



j∈cAno (Ot )

bji Zj =

bki o Xk . k∈lcAno (Ot ) bkk ⋁

(5.9)

̃ and O. Proof. Assume wlog that P(Ωo ) > 0. First, observe that all nodes of O are obtained in the sets O ̃ ∖ O. Thus it remains to show that O ∖ O = O

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

23

Let l ∈ O ∖ O; i.e., l ∈ Ct ∖ O for some t ∈ {1, . . . , T } and cAno (Ot ) ⊆ Ano (l). We know from (5.6) and the definition of Xlu in (5.1) that P-a.s. on Ωo , Xl = Xlu =

bll Xi . b i∈De(l)∩O li ⋀

̃ ∖ O. Hence, l ∈ O ̃ ∖ O)c , by Theorem 2.11(b) it suffices to construct an event Let now l ∈ (O ∖ O)c . To show that l ∈ (O o E ⊆ Ω with positive probability such that on E we have j(l) ≠ j(i) for all i ∈ O. We consider the cases l ∈ An(O) and l /∈ An(O) separately. Let l ∈ An(O); i.e., l ∈ Ctl for some tl ∈ {1, . . . , T }. As l ∈ (O ∖ O)c , we also have cAno (Otl ) ⊆/ Ano (l). Recall from Theorem 5.9(a) that for all t ∈ {1, . . . , T } the set cAno (Ot ) is non-empty; i.e., there exists some jt ∈ cAno (Ot ). For t ∈ {1, . . . , T } ∖ {tl } let jt ∈ cAno (Ot ) ⊆ Ct be some arbitrary but fixed node, and let jtl ∈ cAno (Otl ) ∖ Ano (l), which exists as cAno (Otl ) ⊆/ Ano (l). Since the sets C1 , . . . , CT are disjoint and jtl ∈/ Ano (l), we have jt ≠ l for all t ∈ {1, . . . , T }. Observe from Theorem 5.9 that E1 = {ω ∈ Ω ∶ bjt i Zjt (ω) >



j∈An(i)∖{jt }

bji Zj (ω)∀t ∈ {1, . . . , T } and i ∈ Ot }

with P(E1 ) > 0. Immediately by construction we have E1 ⊆ Ωo . Define E2 = {ω ∈ Ω ∶ bll Zl (ω) > ⋁j∈an(l) bjl Zj (ω)}. On E1 ∩ E2 we have j(l) = l ≠ j(i) = jt for all t ∈ {1, . . . , T } and i ∈ Ot ; i.e., j(l) ≠ j(i) for all i ∈ O. Thus it remains to show that P(E1 ∩ E2 ) > 0. Since the noise variables are continuous with support R+ , P(E = E1 ∩ E2 ) = 0 could happen if and only if E1 and E2 are contradictory. From the definitions of E1 and E2 this would only be the case if and only if b b there exist some t ∈ {1, . . . , T } and i ∈ Ot such that l ∈ An(i) ∖ {jt }, jt ∈ an(l), and bjt i = jbt lll li . This would lead to a contradiction for this constellation of nodes due to the fact that E1 and E2 imply the following two inequalities, respectively, bjt i Zjt > bli Zl > which indeed contradicts bjt i =

bjt l bli . bll

bjt l bli Zjt , bll

Thus it remains to show that for all t ∈ {1, . . . , T } and i ∈ Ot such

that jt ∈ an(l) and l ∈ An(i) ∖ {jt } we have bjt i > jbt lll li or, equivalently, by (2.7) jt ∈ annmw (i). We consider the nodes i ∈ Otl with this property separately. First, let t ∈ {1, . . . , T } ∖ {tl }, and assume that there exists some i ∈ Ot such that jt ∈ an(l) and l ∈ An(i) ∖ {jt }. Hence, there exists at least one path p from j to i containing node l. Assume that p is max-weighted. As jt ∈ cAno (Ot ), we have by Lemma 5.4(b) b Xjut = bjjt jit Xi , and, hence, by the construction of Do (cf. Definition 5.1(b)) that l ∈ Ct . This is, however, t

b

b

{l}

a contradiction to the fact that l ∈ Ctl and, hence, p is not max-weighted; i.e., jt ∈ annmw (i). Assume now that there exists some i ∈ Otl such that jtl ∈ an(l) and l ∈ An(i) ∖ {jtl }. Observe again from Definition 5.1(b), as jtl ∈ an(l) but jtl /∈ Ano (l), that there exists no max-weighted path from jtl to {l} ̃ ∖ O)c . i containing l; i.e., jtl ∈ annmw (i). Thus we have shown that for all l ∈ (O ∖ O)c ∩ An(O), l ∈ (O c Let now l ∈ (O∖O) with l ∈/ An(O). We may proceed similiarly as in the previous case. For t ∈ {1, . . . , T } let jt ∈ cAno (Ot ) be arbitrary but fixed nodes. Since ⋃Tt=1 cAno (Ot ) ⊆ An(O), jt ≠ l for all t ∈ {1, . . . , T }. Define the sets E1 and E2 as above, and observe that E1 and E2 are not contradictory, since the existence of t ∈ {1, . . . , T } and i ∈ Ot such that jt ∈ an(l) and l ∈ An(i) ∖ {jt } is a contradiction to the fact that l ∈/ An(O). ̃ Since by (5.4) ⋃T Ct = An(O), we have O ⊆ An(O) and, hence, Thus we have shown that O = O. t=1 (5.8). We have already shown the assertions (a) and (b) in (5.2), Proposition 5.7, and Theorems 5.9 and 5.11 (depending on whether i ∈ O or i ∈ O ∖ O). {l}

The following provides a representation of the nodes of O contained in the same weakly connected component, which is needed later on.

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

24

Proposition 5.13. Assume the same conditions as in Theorem 5.9 hold, and recall (5.7). For t = 1, . . . , T let it ∈ Ot and jt ∈ cAno (Ot ) be arbitrary but fixed nodes. Then P-a.s. on Ωo we have for all t ∈ {1, . . . , T } and i ∈ Ot = Ct ∩ O, Xi = Xiu =

bj i bjt i Xi = t X u . bjt it t bjt jt jt

(5.10)

Proof. Assume wlog that P(Ωo ) > 0. The following holds P−a.s. on Ωo . Let t ∈ {1, . . . , T } be fixed. The first equality of (5.10) is already given in (5.9). Recall from Theorem 5.9(a) that cAno (Ot ) is non-empty; i.e., there exists some jt ∈ cAno (Ot ). Furthermore, observe that Ot ≠ ∅, since all terminal nodes of Do belong to O; i.e., there exists some it ∈ Ot . b b Let i ∈ Ot . By Lemma 5.4(b), as jt ∈ Ano (i) ∩ Ano (it ), we have Xjut = bjjt jit Xi = bjjt jit Xit , which implies t

that Xi =

bjt i bjt it

Xit =

bjt i bjt jt

t t

Xjut .

Now let i ∈ O t ∖ Ot . Since all terminal nodes of Do belong to O, there exists some k ∈ deo (i) ∩ Ot . Thus from Lemma 5.4(b), as i ∈ Ano (k), and (5.10), which we have already shown for k ∈ Ot , we obtain Xiu =

bii bjt k u bii bjt k bii Xi = X . Xk = bik bik bjt it t bik bjt jt jt

Now observe from Lemma 5.4(a) that, as by (5.7) jt ∈ Ano (i), there exists a max-weighted path in D {i} from jt to k containing i; i.e., jt ∈ anmw (k). Thus by (2.6) we have Xiu =

bj i bj i bii Xk = t Xit = t Xjut . bik bjt it bjt jt

5.3. What we know about nodes outside O Let Ωo be some order set with order DAG Do . From Theorem 5.12 we know that the extended O-set O contains exactly those nodes which are at least P-a.s. on Ωo equal to some appropriately scaled node of O. Next, we investigate what information we can still deduce for nodes outside O from nodes in it. Assume that O ⊊ V . Then by setting U = O, Theorem 4.7 provides a representation for the nodes outside the set O, which holds P-a.s. on Ωo , as a function of the nodes of O and some noise variables. This allows for a substantial reduction in the representation of certain random variables, however, on some notational costs. Corollary 5.14. Assume the same conditions as in Proposition 5.13 hold. Recall Definition 4.3(a). Then c P−a.s. on Ωo we have for i ∈ O , Xi =

⋁ k∈anO low

=

bki u Xk ∨ ⋁ b (i) kk j∈AnO



t=1,...,T

=



t=1,...,T

bji Zj

nmw (i)



(

k∈anO (i)∩Ct low



(

(i)∩Ct k∈anO low

bjt k bki )Xjut ∨ ⋁ bjt jt bkk j∈AnO

bji Zj

bjt k bki )Xit ∨ ⋁ bjt it bkk j∈AnO

bji Zj .

nmw (i)

nmw (i)

Proof. Observe from (5.7) that every k ∈ anO low (i) ⊆ O belongs to some Ct . Thus we obtain by (4.6), Xi =

⋁ k∈anO low

bki Xk ∨ ⋁ b (i) kk j∈AnO

nmw (i)

bji Zj =





t=1,...,T k∈anO (i)∩Ct low

Then by (5.10) we have P−a.s. on Ωo the three representations.

bki Xk ∨ ⋁ bkk j∈AnO

nmw (i)

bji Zj .

(5.11)

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

25

We proceed with two examples, which indicate that there is still some room for reduction in our representations. Example 5.15. [Continuation of Examples 2.1, 2.8, 4.5, 4.10: a new representation] Assume that O = {2, 3} and Ωo = {ω ∈ Ω ∶ b13 X2 (ω) = b12 X3 (ω)}. Hence, Do = ({1, 2, 3}, {(1, 2), (1, 3)}). By Corollary 5.14 for i = 4 we obtain P−a.s. on Ωo , X4 = (

b12 b24 b13 b34 b14 u ∨ )X u ∨ b44 Z4 = X ∨ b44 Z4 , b11 b22 b11 b33 1 b11 1 ◻

where we have applied (3.5) for the last equality. Example 5.16. Consider the ML graphical model (D, L(X1 , X2 , X3 , X4 , X5 )) with DAG 3 D = ({1, 2, 3, 4, 5}, {(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (3, 5)})

2

5

1

4

and ML coefficient matrix B = (bij )4×4 . Let O = {3, 4}. Consider the order set Ωo = {b14 X3 = b13 X4 and b24 X3 = b23 X4 }, then we find Do = ({1, 2, 3, 4}, {(1, 3), (1, 4), (2, 3), (2, 4)}). We assume that the DAG D is minimal max-linear; i.e., D = DB (cf. Theorem 3.5). Then, as anO nmw (5) = b35 {1, 2}, we obtain by Theorem 4.7 for i = 5 that X5 = b33 X3 ∨ b15 Z1 ∨ b25 Z2 ∨ b55 Z5 . Furthermore, we have by (5.5) P−a.s. on Ωo , X3 = b13 Z1 ∨ b23 Z2 . Finally, since by Theorem 3.5 b15 > b13b33b35 and b25 > b23b33b35 , we obtain P-a.s. on Ωo , X5 =

b35 (b13 Z1 ∨ b23 Z2 ) ∨ b15 Z1 ∨ b25 Z2 ∨ b55 Z5 = b15 Z1 ∨ b25 Z2 ∨ b55 Z5 . b33

◻ Examples 5.15 and 5.16 indicate that the decomposition of Corollary 5.14 can be further reduced. The following is our final result in this direction. c

Proposition 5.17. Assume the same conditions as in Theorem 5.9 hold, and recall (5.7). For i ∈ O and mw t = 1, . . . , T let jt,i be a common ancestor of Ot , which is at the same time an ancestor of i, which is mw Ot (i) ∩ cAno (Ot ), and it ∈ Ot , both arbitrary but fixed. Then we have ∈ anmw max-weighted by Ot , i.e. jt,i o P−a.s. on Ω , mw i bjt,i



Xi =

t=1,...,T ∶ Ot (anmw (i)∩cAno (Ot ))≠∅

mw i bjt,i



=

mw j mw bjt,i t,i

t=1,...,T ∶ Ot (anmw (i)∩cAno (Ot ))≠∅

mw i bjt,i t



Xjumw ∨ t,i

Xit ∨

bji Zj

j∈AnO nmw (i)



bji Zj .

(5.12)

j∈AnO nmw (i)

Proof. Assume wlog that P(Ωo ) > 0. The following holds P−a.s. on Ωo . For the existence of the nodes it see the proof of Proposition 5.13. Observe that, as the first equality in (4.4) holds, we can replace in (5.11) anU low (i) by an(i) ∩ O; i.e., we have Xi =





t=1,...,T k∈an(i)∩Ot

bki Xk ∨ ⋁ bkk j∈AnO

nmw (i)

bji Zj .

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

26

o t Now let t ∈ {1, . . . , T } such that anO mw (i) ∩ cAn (Ot ) = ∅, and let k ∈ an(i) ∩ Ot . Observe from (5.7) that cAno (Ot ) ⊆ Ano (k) ⊆ An(k); hence, cAno (Ot ) ⊆ An(i). Furthermore, note for all j ∈ cAno (Ot ) that {k} o t j ∈ annmw (i), since anO mw (i) ∩ cAn (Ot ) = ∅. Altogether, also using (5.9) and (2.7), we obtain

bki bki ( bji Zj ≤ ⋁ bji Zj = Xi . Xk = bjk Zj ) < ⋁ ⋁ bkk bkk j∈cAno (Ot ) j∈cAno (Ot ) j∈An(i) Thus we can omit those terms in (5.11) which correspond to a weakly connected component Ct satisfying Ct ∩O anmw (i) ∩ cAno (Ot ) = ∅ and obtain with (5.10) ⋁

Xi =

t=1,...,T ∶ Ct ∩O (anmw (i)∩cAno (Ot ))≠∅

=



t=1,...,T ∶ Ct ∩O (i)∩cAno (Ot ))≠∅ (anmw

(

mw k bki bjt,i

⋁ k∈an(i)∩Ot

(

mw j mw bkk bjt,i t,i mw k bki bjt,i

⋁ k∈an(i)∩Ot

mw i bkk bjt,i t

)Xjut,i mw ∨

)Xit ∨



bji Zj

j∈AnO nmw (i)



bji Zj .

j∈AnO nmw (i)

t Since jtmw ∈ anO mw (i), applying (2.6) finishes the proof.

6. Regular conditional distributions of a max-linear graphical model Throughout this section (D, L(X)) is a ML graphical model with ML coefficient matrix B = (bij )d×d . Also denote again by (Ω, A, P) the probability space on which X is defined. For A, O ⊆ V our goal is to provide an explicit formula for a regular conditional distribution function FXA ∣XO of XA given XO . Recall the following facts about regular conditional distributions (see for example Chapter 8.3 of [4]). Definition 6.1. For d ∈ N we denote by B(Rd+ ) the Borel σ-algebra on Rd+ . Let Y = (Y1 , . . . , Ym ) m n n and Z = (Z1 , . . . , Zn ) be random vectors on (Ω, A, P) with values in (Rm + , B(R+ )) and (R+ , B(R+ )), n m respectively. We denote by PZ the image measure of P under Z. A map κ ∶ R+ × B(R+ ) → [0, 1] is called regular conditional distribution of Y given Z if the following conditions are satisfied: (a) the function z ↦ κ(z, B) is measurable for every fixed B ∈ B(Rm + ), m m (b) B ↦ κ(z, B) is a probability measure on (R+ , B(R+ )) for every fixed z ∈ Rn+ , and (c) P(Z ∈ A, Y ∈ B) = ∫A κ(z, B)PZ (dz) for every A ∈ B(Rn+ ) and B ∈ B(Rm + ).

n The function y ↦ FY∣Z (y ∣ z) = κ(z, (0, y]) on Rm + for z ∈ R+ is called regular conditional distribution function of Y given Z = z. (Note that FY∣Z is uniquely defined up to almost sure equality with respect to PZ .) ◻

Define BO as the ∣An(O)∣ × ∣O∣−submatrix of B with entries bji for j ∈ An(O) and i ∈ O. Observe from ∣O∣ Definition 6.1(c) that it suffices to specify FXA ∣XO (yA ∣ xO ) only for xO ∈ supp(XO ) = {x ∈ R+ ∶ x =

} (cf. (2.5)) and yA ∈ R+ . Also note from the definition of the upper bounds in z ⊙ BO for all z ∈ R+ (5.1) and Definition 5.1(a) that the images of the finitely many different order sets under XO are disjoint sets; i.e., for every xO ∈ supp(XO ) there exists exactly one order set Ωo such that xO ∈ XO (Ωo ). We show that FXA ∣XO (yA ∣ xO ) depends on which order set corresponds to xO . For a general max-linear X = (X1 , . . . , Xd ) as in (2.4) a regular conditional distribution of (Z1 , . . . , Zd ) given X has been computed in [12], mainly for simulation purposes in the framework of prediction. We reduce a regular conditional distribution function FXA ∣XO of XA given XO to the situation considered by these authors such that we can use their results. In contrast to the case of general max-linear models we can use the DAG D, the order DAG Do corresponding to the value of XO on which we condition, and other embedded graph structures. This results in a reduced form for the regular conditional distribution function of XA given XO . ∣An(O)∣

∣A∣

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

27

We start with an example. Example 6.2. [Continuation of Examples 2.1, 2.8, 4.5, 4.10, 5.15: computing a regular conditional distribution] For i = 1, . . . , d assume that the noise variable Zi has distribution function FZi and density fZi . For this simple example we can compute regular conditional distributions in an informal way, and we do this for A = {1, 4} and O = {2, 3}. A precise mathematical justification will be given later on in this section. Using the independence of the noise variables we obtain for (y1 , y4 ) ∈ R2+ and (x2 , x3 ) ∈ supp(XO ) = R2+ , F(X1 ,X4 )∣(X2 ,X3 ) (y1 , y4 ∣ x2 , x3 )

= P(X1 ≤ y1 , X4 ≤ y4 ∣ X2 = x2 , X3 = x3 )

= P(c11 Z1 ≤ y1 , c42 x2 ∨ c43 x3 ∨ c44 Z4 ≤ y4 ∣ X2 = c21 Z1 ∨ c22 Z2 = x2 , X3 = c31 Z1 ∨ c33 Z3 = x3 ) y4 = 1(0,y4 ] (c42 x2 ∨ c43 x3 )FZ4 ( 4 )P(X1 ≤ y1 ∣ X2 = x2 , X3 = x3 ) c4 y4 b34 b24 x2 ∨ x3 )FZ4 ( )FX1 ∣(X2 ,X3 ) (y1 ∣ x2 , x3 ), = 1(0,y4 ] ( b22 b33 b44 where FX1 ∣(X2 ,X3 ) denotes a regular conditional distribution function of X1 given (X2 , X3 ). In the following we provide an explicit formula for FX1 ∣(X2 ,X3 ) . Observe that we have three different order sets, namely Ωo1 = {ω ∈ Ω ∶ b13 X2 (ω) = b12 X3 (ω)}, Ωo2 = {ω ∈ Ω ∶ b13 X2 (ω) < b12 X3 (ω)}, and Ωo3 = {ω ∈ Ω ∶ b13 X2 (ω) > b12 X3 (ω)} with order DAGs D1o = ({1, 2, 3}, {(1, 2), (1, 3)}), D2o = ({1, 2, 3}, {(1, 2)}), and D3o = ({1, 2, 3}, {(1, 3)}). Firstly, let (x2 , x3 ) ∈ (XO (Ωo1 ) ∩ supp(XO )) = {(x2 , x3 ) ∈ R2+ ∶ b13 x2 = b12 x3 }. Then from (5.10) we obtain P−a.s. on Ωo1 , X1 =

b11 b11 X2 = X3 . b12 b13

Hence, we have FX1 ∣(X2 ,X3 ) (y1 ∣ x2 , x3 ) = P(X1 ≤ y1 ∣ X2 =

b12 b11 b11 X3 = x2 ) = P(X1 ≤ y1 ∣ X1 = x2 ) = 1(0,y1 ] ( x2 ). b13 b12 b12

Secondly, let (x2 , x3 ) ∈ (XO (Ωo2 ) ∩ supp(XO )) = {(x2 , x3 ) ∈ R2+ ∶ b13 x2 < b12 x3 }. Using (5.5), the independence of the noise variables, and the fact that by Theorem 2.11(a) the node j(2) as in (2.8) is P-a.s. unique, we obtain FX1 ∣(X2 ,X3 ) (y1 , y4 ∣ x2 , x3 ) = P(b11 Z1 ≤ y1 ∣ b12 Z1 ∨ b22 Z2 = x2 , b33 Z3 = x3 )

P(b11 Z1 ≤ y1 , (b12 Z1 ∨ b22 Z2 ) ∈ [x2 , x2 + ε)) P(X2 ∈ [x2 + ε)) P(b11 Z1 ≤ y1 , {b12 Z1 ∈ [x2 , x2 + ε), b22 Z2 ≤ x2 or b12 Z1 ≤ x2 , b22 Z2 ∈ [x2 , x2 + ε)}) = lim ε↓0 P(X2 ∈ [x2 , x2 + ε)) = P(b11 Z1 ≤ y1 ∣ b12 Z1 ∨ b22 Z2 = x2 ) = lim ε↓0

= lim

1 P(b11 Z1 ε

≤ y1 , b12 Z1 ∈ [x2 , x2 + ε), b22 Z2 ≤ x2 ) + 1ε P(b11 Z1 ≤ y1 , b12 Z1 ≤ x2 , b22 Z2 ∈ [x2 , x2 + ε))

ε↓0

= =

1 1 y1 ] ( bx122 )fZ1 ( bx122 )FZ2 ( bx222 ) + b12 (0, b11

1 P(X2 ∈ [x2 , x2 + ε)) ε 1 f ( x2 )FZ1 ( by111 ∧ bx122 ) b22 Z2 b22

fX2 (x2 ) x x 1 2 2 1 y1 ] ( b12 )fZ1 ( b12 )FZ2 ( bx222 ) + b122 fZ2 ( bx222 )FZ1 ( bx122 b12 (0, b11 1 f ( x2 )FZ2 ( bx222 ) + b122 fZ2 ( bx222 )FZ1 ( bx122 ) b12 Z1 b12



y1 ) b11

.

By the symmetry of the graph a regular conditional distribution function of X1 given (X2 , X3 ) = (x2 , x3 ) corresponding to the third case, (x2 , x3 ) ∈ (XO (Ωo3 ) ∩ supp(XO )) = {(x2 , x3 ) ∈ R2+ ∶ b13 x2 > b12 x3 }, can be obtained by reversing the roles of nodes 2 and 3 in the second case. ◻

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

28

To prove that a function FXA ∣XO ∶ R+ ×supp(XO ) → [0, 1] is a regular conditional distribution function of XA given XO , we have to verify the properties of Definition 6.1. In particular, to verify (c) it suffices to show that ∣A∣

P(XA ≤ yA , XO ≤ wO ) = ∫

(0,wO ]

∣A∣

FXA ∣XO (yA ∣ xO )PXO (dxO )

(6.1)

∣O∣

for all yA ∈ R+ and wO ∈ R+ . We first present our concept of proof. Assume that there exist n ∈ N different order sets, say Ωo1 , . . . , Ωon . Since the image sets XO (Ωos ) for ∣A∣ s = 1, . . . , n are disjoint, we can write for yA ∈ R+ and xO ∈ supp(XO ), n

FXA ∣XO (yA ∣ xO ) = ∑ FXA ∣XO (yA ∣ xO )1XO (Ωos ) (xO ). s=1

From this we can conclude n



(0,wO ]

FXA ∣XO (yA ∣ xO )PXO (dxO ) = ∑ ∫ s=1

(0,wO ]

FXA ∣XO (yA ∣ xO )1XO (Ωos ) (xO )PXO (xO ). ∣O∣

Furthermore, since the order sets Ωos for s = 1, . . . , n are disjoint, we also have for wO ∈ R+ , n

P(XA ≤ yA , XO ≤ wO ) = ∑ P(XA ≤ yA , XO ∈ ((0, wO ] ∩ XO (Ωos ))). s=1

Hence, (6.1) holds, if we can show for s = 1, . . . , n that P(XA ≤ yA , XO ∈ ((0, wO ] ∩ XO (Ωos ))) = ∫

(0,wO ]

∣A∣

FXA ∣XO (yA ∣ xO )1XO (Ωos ) (xO )PXO (xO )

∣O∣

for all yA ∈ R+ and wO ∈ R+ . These considerations justify our proceeding in the following. For some fixed but arbitrary order set Ωo ∣A∣ we present a function FXA ∣XO ∶ R+ × (XO (Ωo ) ∩ supp(XO )) → [0, 1] and show that the properties (a) and (b) from Definition 6.1 as well as P(XA ≤ yA , XO ∈ ((0, wO ] ∩ XO (Ωo ))) = ∫

(0,wO ]

∣A∣

FXA ∣XO (yA ∣ xO )1XO (Ωo ) (xO )PXO (xO )

(6.2)

∣O∣

for all yA ∈ R+ and wO ∈ R+ hold. For this Ωo let O be the corresponding extended O-set as defined in (5.7). For i ∈ V recall from Definition 2.9(a) the set AnO nmw (i) of ancestors of i which are not max-weighted by O, and define O S ∶= AnO nmw (A ∖ O) = ⋃ Annmw (i).

(6.3)

i∈A∖O

Proposition 5.17 allows us now to reduce a regular conditional distribution function FXA ∣XO to a regular conditional distribution function of ZS given XO , which we will calculate subsequently. The following is the main structural result of this section. Here and in what follows, for arbitrary ai ≥ 0 we set ∏i∈∅ ai = 1. Theorem 6.3. Let Ωo be some order set with order DAG Do . Assume that the weakly connected compoc nents of Do have node sets C1 , . . . , CT , and recall (5.4) as well as (5.7). For i ∈ O and t = 1, . . . , T o Ot mw (i) ∩ cAn (Ot ) and it ∈ Ot be arbitrary but fixed nodes as in Proposition 5.17. For let jt,i ∈ anmw o xO ∈ XO (Ω ) we denote the corresponding realized upper bounds in (5.1) by xuj . If the set S as in (6.3) is non-empty, let FZS ∣XO be a regular conditional distribution function of ZS given XO . Then for ∣A∣ yA ∈ R+ and xO ∈ (XO (Ωo ) ∩ supp(XO )), FXA ∣XO (yA ∣ xO ) = ∏ 1[0,yi ] (xui ) ∏ 1[0,yi ] ( i∈A∩O

=



i∈A∖O

∏ 1[0,yi ] (

t=1,...,T i∈A∩O t

mw i bjt,i



t=1,...,T ∶ Ot (i)∩cAno (Ot ))≠∅ (anmw

bjt i xi ) ∏ 1[0,yi ] ( bjt it t i∈A∖O

mw j mw bjt,i t,i

xujt,i mw )FZ ∣X (zS ∣ xO ) S O



t=1,...,T ∶ Ot (i)∩cAno (Ot ))≠∅ (anmw

mw i bjt,i mw i bjt,i t

xit )FZS ∣XO (zS ∣ xO ),

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs



where zj =

i∈DeO nmw (j)∩A∖O

yi bji

29

for j ∈ S. In case that S is empty, a regular conditional distribution function

of XA given XO is given by the formulas obtained from the above by replacing FZS ∣XO by 1. Proof. Assume wlog that P(Ωo ) > 0. First, observe from (5.10) that the two formulas are equivalent. By the definition of FXA ∣XO (yA ∣ xO ), the properties (a) and (b) from Definition 6.1 are satisfied. Thus it ∣O∣ ∣A∣ remains to show that (6.2) holds. For wO ∈ R+ and yA ∈ R+ define Po (XA ≤ yA ) ∶= P(XA ≤ yA , XO ∈ ((0, wO ] ∩ XO (Ωo ))). Using (5.10) and (5.12) we obtain Po (XA ≤ yA ) = P(Xi ≤ yi ∀i ∈ A ∩ O, Xi ≤ yi ∀i ∈ A ∖ O) bj i = Po ( t Xit ≤ yi ∀t ∈ {t ∈ {1, . . . , T } ∶ A ∩ O t ≠ ∅} and i ∈ A ∩ O t , bji it mw i bjt,i bji Zj ≤ yi ∀i ∈ A ∖ O). Xit ∨ ⋁ ⋁ mw i bjt,i t=1,...,T ∶ t j∈AnO (i) nmw

Ot (i)∩cAno (Ot ))≠∅ (anmw

For notational convenience we define the sets E1 ∶= {t ∈ {1, . . . , T } ∶ A ∩ O t ≠ ∅}

o t and E2 ∶= {t ∈ {1, . . . , T } ∶ anO mw (A ∖ O) ∩ cAn (Ot ) ≠ ∅}.

We simplify both components in the above probability, where the first is obvious. For the second we use Lemma A.3 with E = A ∖ O. Hence, ⋀

Po (XA ≤ yA ) = Po (Xit ≤

i∈A∩O t

bjt it yi ∀t ∈ E1 , Xit ≤ bjt i

⋀ O

t (cAno (O ))∩A∖O i∈demw t

bjtmw it yi ∀t ∈ E2 , ZS ≤ zS ). bjtmw i

Similar reformulations as above for xit instead of Xit yield FXA ∣XO (yA ∣ xO ) = ∏1 [0, t∈E1

⋀ i∈A∩O

bj i t t bj i t

yi ]

(xit ) ∏ 1 t∈E2

[0,

⋀ Ot (cAno (Ot ))∩A∖O i∈demw

bj mw i t,i t bj mw i t,i

yi ]

(xit )FZS ∣XO (zS ∣ xO ).

Integration shows that the property of Definition 6.1(c) holds, since it holds for FZS ∣XO , which we have assumed to be a regular conditional distribution function. Hence, (6.2) holds. Observe that in the proof of Theorem 6.3 for nodes of A ∖ O we have used representation (5.12). Similarly, we may also apply the representations from Corollary 5.14. The following remark shows the regular conditional distribution function of XA given XO corresponding to these representations. Of course all formulas are equivalent. Remark 6.4. Assume the same conditions as in Theorem 6.3 hold. For t = 1, . . . , T let it ∈ Ot and ∣A∣ jt ∈ cAno (Ot ) be arbitrary but fixed nodes. Then, if the set S as in (6.3) is non-empty, for yA ∈ R+ and xO ∈ (XO (Ωo ) ∩ supp(XO )), FXA ∣XO (yA ∣ xO ) = ∏ 1[0,yi ] (xui ) ∏ 1[0,yi ] ( i∈A∩O

=



i∈A∖O

∏ 1[0,yi ] (

t=1,...,T i∈A∩O t

⋁ k∈anO low

bki u xk )FZS ∣XO (zS ∣ xO ) b (i) kk

bjt i bjt k bki u )xjt )FZS ∣XO (zS ∣ xO ) xit ) ∏ 1[0,yi ] ( ⋁ ( ⋁ bjt it t=1,...,T k∈anO (i)∩C bjt jt bkk i∈A∖O low

t

bjt i bjt k bki )xit )FZS ∣XO (zS ∣ xO ). xit ) ∏ 1[0,yi ] ( ⋁ ( = ∏ ⋁ ∏ 1[0,yi ] ( bjt it t=1,...,T k∈anO (i)∩C bjt it bkk t=1,...,T i∈A∩O i∈A∖O t

low

t

In case that S is empty, a regular conditional distribution function of XA given XO is given by the formulas obtained from the above by replacing FZS ∣XO by 1. ◻

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

30

In the next step our intention is to find a regular conditional distribution function of ZS given XO = xO ∣O∣ for xO ∈ (XO (Ωo ) ∩ supp(XO )). First, for xO ∈ R+ we reduce a regular conditional distribution function of ZS given XO = xO to a regular conditional distribution function of a subvector of ZS given XO = xO . For this purpose recall from Theorem 5.9 that XO is a function of ZAn(O) . Thus the independence of the noise variables suggests a partition of the set S = AnO nmw (A ∖ O) into the sets O S1 ∶= AnO nmw (A ∖ O) ∖ An(O) = Annmw (A ∖ O) ∖ an(O),

(6.4)

O S2 ∶= AnO nmw (A ∖ O) ∩ An(O) = Annmw (A ∖ O) ∩ an(O),

(6.5)

where we have the second equality in each definition, since every path from a node in O to i ∈ A ∖ O is max-weighted by O. This partition determines a factorization of a regular conditional distribution function FZS ∣XO of ZS given XO . As we present the following lemma in the context of Theorem 6.3, we work under the very same conditions. Indeed, the result would also hold under weaker conditions. Lemma 6.5. Assume the same conditions as in Theorem 6.3 hold. Suppose also that the set S as in (6.3) is non-empty. Let S1 and S2 be as in (6.4) and (6.5). If S2 is non-empty, let FZS2 ∣XO be a regular ∣S∣

∣O∣

conditional distribution function of ZS2 given XO . Then for zS ∈ R+ and xO ∈ R+ , FZS ∣XO (zS ∣ xO ) = FZS1 (zS1 )FZS2 ∣XO (zS2 ∣ xO ) = ∏ FZj (zj )FZS2 ∣XO (zS2 ∣ xO ). j∈S1

In case that S2 is empty, a regular conditional distribution function of ZS given XO is given by the formulas obtained from the above by replacing FZS2 ∣XO by 1. Proof. We have to verify the properties of Definition 6.1. By the definition of FZS ∣XO (zS ∣ xO ), the properties (a) and (b) from Definition 6.1 are satisfied. Using the independence of the noise variables and ∣S∣ ∣O∣ the property of Definition 6.1(c) for FZS2 ∣XO , we obtain for wO ∈ R+ and zS = (zS1 , zS2 ) ∈ R+ , ∫

(0,wO ]

FZS1 (zS1 )FZS2 ∣XO (zS2 ∣ xO )PXO (dxO )

= ∏ FZj (zj ) ∫ j∈S1

(0,wO ]

FZS2 ∣XO (zS2 ∣ xO )PXO (dxO )

= ∏ FZj (zj )P(XO ≤ wO , ZS2 ≤ zS2 ). j∈S1

Since XO is a function of ZAn(O) and the sets S1 and An(O) ∪ S2 = S2 are disjoint, we have by the independence of the noise variables that ∏ FZj (zj )P(XO ≤ wO , ZS2 ≤ zS2 ) = P(XO ≤ wO , ZS1 ∪S2 ≤ zS1 ∪S2 ) = P(XO ≤ wO , ZS ≤ zS ).

j∈S1

Hence, property (c) is also satisfied. The following lemma provides a formula for a regular conditional distribution function FZS2 ∣XO of ZS2 given XO = xO for xO ∈ (XO (Ωo ) ∩ supp(XO )). Theorem 2 of [12] gives one of ZAn(O) given XO .

By setting in this result E = (0, zAn(O) ] for zAn(O) ∈ R+ we have a regular conditional distribution function FZAn(O) ∣XO of ZAn(O) given XO . Furthermore, since S2 ⊆ An(O), we obtain from this theorem also a regular distribution function of ZS2 given XO by setting zj = ∞ for all j ∈ An(O) ∖ S2 . To verify that our formula for FZS2 ∣XO is indeed a version of Theorem 2 of [12] also note that their upper bounds ∣An(O)∣

(s)

for s = 1, . . . , r to cAno (Ot ) and Ct for t = 1, . . . , T in our zˆj correspond to xuj /bjj and their J (s) and J result, respectively. The following lemma holds not only for S2 as in (6.5) but for some arbitrary set S2 ⊆ An(O). The second equality is a consequence of (5.10). In what follows, for arbitrary ai ≥ 0 we set ∑i∈∅ ai = 0 .

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

31

Lemma 6.6. Assume the same conditions as in Lemma 6.5 hold. Suppose also that the set S2 as in (6.5) xu ∣S ∣ is non-empty. For j ∈ An(O) define zju ∶= bjjj . Then for zS2 ∈ R+ 2 and xO ∈ (XO (Ωo ) ∩ supp(XO )), FZS2 ∣XO (zS2 ∣ xO ) ∏

=

FZj (zju ∧ zj ) FZj (zju )

o j∈S2 ∖(⋃T t=1 cAn (Ot ))



t=1,...,T ∶ cAno (Ot )∩S2 = /∅

{



o

j∈cAn (Ot )∖S2



+

o

j∈cAn (Ot )∩S2

t=1,...,T ∶ cAno (Ot )∩S2 = /∅

+

{

∧ zj )



j∈cAn (Ot )∩S2

o

bjit



j∈cAno (Ot )



o

zju fZj (zju )



(

1



j∈cAno (Ot )

k∈cAno (Ot )∖{j}

bjit

FZk (zku ))



o

k∈cAn (Ot )∖(S2 ∪{j})

FZk (zku ∧ zk ) fZj (

−1



FZk (zku ∧ zk )

k∈(cAn (Ot )∩S2 )∖{j}



fZj (

(

k∈(cAn (Ot )∩S2 )∖{j}

t=1,...,T ∶ cAno (Ot )∩S2 = /∅

1

j∈cAno (Ot )∖S2



o

zju fZj (zju )

FZj (zju )

o j∈S2 ∖(⋃T t=1 cAn (Ot ))



t=1,...,T ∶ cAno (Ot )∩S2 = /∅

1(0,zj ] (zju )zju fZj (zju )

FZj (zju



=



o

FZk (zku )



k∈cAn (Ot )∖(S2 ∪{j})

FZk (zku )}

−1 xit xi ) FZk ( t )) ∏ bjit k∈cAno (Ot )∖{j} bkit

xit xi xi ) FZk ( t ∧ zk ) FZk ( t ) ∏ ∏ bjit k∈(cAno (Ot )∩S2 )∖{j} bkit bkit k∈cAno (Ot )∖(S2 ∪{j})

xi 1 xi xi xi 1(0,zj ] ( t )fZj ( t ) FZ ( t ∧ zk ) FZk ( t )}. ∏ ∏ bjit bjit bjit k∈(cAno (Ot )∩S2 )∖{j} k bkit b o kit k∈cAn (Ot )∖(S2 ∪{j})

In the following remark we reformulate the terms in the denominator of FZS2 ∣XO , giving the intuition behind. A more formal explanation can be found in Lemma 2 of [12]. Remark 6.7. Assume the same conditions as in Lemma 6.6 hold. Recall from (5.5) that P-a.s. on Ωo for all t = 1, . . . , T , Xit =



bjit Zj .

j∈cAno (Ot )

Let t ∈ {1, . . . , T } be fixed. Since the noise variables are independent, the distribution function Gt of ⋁j∈cAno (Ot ) bjit Zj is given by Gt (y) = P(





bjit Zj ≤ y) =

j∈cAno (Ot )

j∈cAno (Ot )

FZj (

y bjit

),

y ∈ R+ .

By taking derivatives with respect to y we find the corresponding density gt , namely gt (y) =



1

j∈cAno (Ot )

bjit

fZj (

y bjit

)



k∈cAno (Ot )∖{j}

FZk (

y bkit

).

These densities are the terms in the denominator of FZS2 ∣XO .



Example 6.8. [Continuation of Examples 2.1, 2.8, 4.5, 4.10, 5.15, 6.2] We verify the regular conditional distribution functions calculated in an informal way there. From Theorem 6.3 and Lemma 6.5 we obtain for (y1 , y4 ) ∈ R2+ and (x2 , x3 ) ∈ (XO (Ωo1 ) ∩ supp(XO )), b14 y4 b11 x2 )1[0,y4 ] ( x2 )FZ4 ∣(X2 ,X3 ) ( ∣ x2 , x3 ) b12 b12 b44 b11 y4 b14 = 1[0,y1 ] ( x2 )1[0,y4 ] ( x2 )FZ4 ( ). b12 b12 b44

F(X1 ,X4 )∣(X2 ,X3 ) (y1 , y4 ∣ x2 , x3 ) = 1[0,y1 ] (

Observe that by (3.5) we have for (x2 , x3 ) ∈ (XO (Ωo1 ) ∩ supp(XO )), as b13 x2 = b12 x3 , b14 1 b12 b24 b13 b34 1 b12 b24 b12 b34 b24 b34 x2 = ( x2 ∨ x2 ) = ( x2 ∨ x3 ) = x2 ∨ x3 . b12 b12 b22 b33 b12 b22 b33 b22 b33 Thus F(X1 ,X4 )∣(X2 ,X3 ) (y1 , y4 ∣ x2 , x3 ) for (x2 , x3 ) ∈ XO (Ωo1 ) coincides with the one calculated in Example 6.2.

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

32

Applying first Theorem 6.3 and then Lemma 6.5 we obtain for (y1 , y4 ) ∈ R2+ and (x2 , x3 ) ∈ (XO (Ωo2 ) ∩ supp(XO )), b24 x2 ∨ b22 b24 x2 ∨ = 1[0,y4 ] ( b22

F(X1 ,X4 )∣X2 ,X3 (x1 , x4 ∣ x2 , x3 ) = 1[0,y4 ] (

y1 y4 b34 x3 )FZ1 ,Z4 ∣X2 ( , ∣ x2 , x3 ) b33 b11 b44 y4 y1 b34 x3 )FZ4 ( )FZ1 ∣X2 ( ∣ x2 , x3 ). b33 b44 b11

For the last term we have by Lemma 6.6, y1 FZ1 ∣X2 ( ∣ x2 , x3 ) = b11

1 1 y1 ] ( bx122 )fZ1 ( bx122 )FZ2 ( bx222 ) + b122 fZ2 ( bx222 )FZ1 ( bx122 b12 (0, b11



1 f ( x2 )FZ2 ( bx222 ) + b122 fZ2 ( bx222 )FZ1 ( bx122 ) b12 Z1 b12

y1 ) b11

, ◻

what we have also calculated in Example 6.2. Appendix A: Auxiliary lemmata

Lemma A.1. Let (D, L(X)) be a ML graphical model, and let U ⊆ V . For coefficients a(i, j, k) > 0 for i, j, k ∈ V we have for all i ∈ V , ⋁



a(i, j, k)Zj =

k∈pa(i) j∈an(k)





a(i, j, k)Zj =



a(i, j, k)Zj =

k∈pa(i) j∈an(k)∖pa(i)









a(i, j, k)Zj ,

(A.1)





a(i, j, k)Zj ,

(A.2)

j∈an(i)∖pa(i) k∈de(j)∩pa(i)

j∈pa(i)∖patr (i) k∈de(j)∩pa(i)

k∈an(i)∩U j∈An(k)



j∈an(i) k∈de(j)∩pa(i)





a(i, j, k)Zj ,

(A.3)

k∈pa(i) j∈an(k)∩pa(i)∖patr (i)

a(i, j, k)Zj =





a(i, j, k)Zj .

(A.4)

j∈an(i) k∈De(j)∩an(i)∩U

Proof. Since we take maxima, we only have to prove that each combination of nodes (k, j) on the lefthand side appears also on the right-hand side and vice versa. In order to prove (A.1), it therefore suffices to show k ∈ pa(i) and j ∈ an(k) ⇐⇒ j ∈ an(i) and k ∈ de(j) ∩ pa(i). By observing that j ∈ an(k) if and only if k ∈ de(j) and an(pa(i)) ⊆ an(i) this equivalence is obvious. The other identities are proved in the same way. Lemma A.2. Let (D, L(X)) be a ML graphical model, and let U ⊆ V . (a) For i, j ∈ V such that j ∈ an(i), there exists a max-weighted path from j to i containing some node of U ∖ {i} if and only if there exists a max-weighted path from j to i containing some node of anU low (i). (b) For i, j ∈ V such that i ∈ de(j), there exists a max-weighted path from j to i containing some node of U ∖{j} if and only if there exists a max-weighted path from j to i containing some node of deU high (j). Proof. We only show (a), since we may prove (b) analogously. If there exists a max-weighted path from j U to i containing some node of anU low (i), then this path contains also a node of U ∖ {i}, as anlow (i) ⊆ U ∖ {i}. U∖{i} To prove the converse we construct for j ∈ anmw (i) a max-weighted path from j to i containing some node of anU low (i). Let p = [j = k0 → k1 → . . . → kn = i] be a max-weighted path from j to i with maximal number of nodes in U ∖ {i} of all max-weighted path from j to i. Denote by l the lowest node on p contained in U ∖ {i}; i.e., for some s ∈ {0, 1, . . . , n}, the sub-path p1 = [l = ks → ks+1 → . . . → kn = i] U contains no node of U ∖ {i}. Assume that l /∈ anU low (i). Since l ∈ U ∖ anlow (i), there exists a max-weighted path p2 from l to i containing some node in U ∖ {l, i}. Thus by replacing in p the sub-path p1 by p2 we obtain by Remark 2.7(iv) a max-weighted path from j to i containing more nodes of U ∖ {i} than p. This is, however, a contradiction to the fact that p has maximal number of nodes in U ∖{i} of all max-weighted path from j to i. Hence, l ∈ anU low (i), and p is a max-weighted path from j to i containing some node of anU (i). low

Nadine Gissibl and Claudia Klüppelberg/Max-linear models on directed acyclic graphs

33 ∣E∣

Lemma A.3. Assume the same situation as in Theorem 6.3. Set E ∶= A ∖ O. Then for yE ∈ R+ we have mw i bjt,i



t=1,...,T ∶ Ot (i)∩cAno (Ot ))≠∅ (anmw

mw i bjt,i t

Xit ∨



bji Zj ≤ yi

for all i ∈ E

(A.5)

j∈AnO nmw (i)

if and only if mw i bjt,i t



Xit ≤ O

t (cAno (O ))∩E i∈demw t

and

mw i bjt,i

yi

o t for all t ∈ {t ∈ {1, . . . , T } ∶ anO mw (E) ∩ cAn (Ot ) ≠ ∅}



Zj ≤

i∈DeO nmw (j)∩E

yi bji

for all j ∈ AnO nmw (E).

Proof. The inequalities in (A.5) are equivalent to Xit ≤

mw i bjt,i t mw i bjt,i

yi

o t for all i ∈ E and t ∈ {t ∈ {1, . . . , T } ∶ anO mw (i) ∩ cAn (Ot ) ≠ ∅}

and Zj ≤

yi bji

for all i ∈ E and j ∈ AnO nmw (i),

We have to show that o t i ∈ E and t ∈ {t ∈ {1, . . . , T } ∶ anO mw (i) ∩ cAn (Ot ) ≠ ∅}

if and only if o o Ot t t ∈ {t ∈ {1, . . . , T } ∶ anO mw (E) ∩ cAn (Ot ) ≠ ∅} and i ∈ demw (cAn (Ot )) ∩ E. Ot o o t This holds true by observing that anO mw (i) ∩ cAn (Ot ) ≠ ∅ if and only if i ∈ demw (cAn (Ot )). Similarly, we obtain that O O i ∈ E and j ∈ AnO nmw (i) if and only if j ∈ Annmw (E) and i ∈ Denmw (j) ∩ E.

From these equivalences and the inequalities above assertion holds. References [1] A. V. Aho, M. R. Garey, and J. D. Ullman. The transitive reduction of a directed graph. SIAM Journal on Computing, 1(2):131–137, 1972. [2] L. De Haan and A. Ferreira. Extreme Value Theory: An Introduction. Springer, New York, 2006. [3] C. Dombry, M. Ribatet, and S. Stoev. Probabilities of concurrent events. 2015. URL http://arxiv.org/pdf/1503.05748.pdf. [4] A. Klenke. Probability Theory: A Comprehensive Course. Springer, 2007. [5] D. Koller and N. Friedman. Probabilistic Graphical Models. MIT Press, 2009. [6] S. L. Lauritzen. Graphical Models. Oxford University Press, Oxford, 1996. [7] D. M. Moyles and G. L. Thompson. An algorithm for finding a minimum equivalent graph of a digraph. Journal of the ACM, 16(3):455–460, 1969. [8] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2nd edition, 2009. [9] J. Peters and P. Bühlmann. Identifiability of Gaussian structural equation models with equal error variances. Biometrika, 2013. [10] J. Peters, J. Mooij, D. Janzing, and B. Schölkopf. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053, 2014. [11] P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, New York, 2 edition, 2000. [12] Y. Wang and S. A. Stoev. Conditional sampling for spectrally discrete max-stable random fields. Advances in Applied Probability, 43(2):461–483, 2011.