Efficient Algorithms for Reachability and. Path-Selection Problems with Applications

Efficient Algorithms for Reachability and Path-Selection Problems with Applications Final Report of Project Funded by the John S. Latsis Public Benef...

Author: Nickolas Cook

2 downloads 2 Views 350KB Size

Report

Download PDF

Recommend Documents

EFFICIENT ALGORITHMS FOR ZECKENDORF ARITHMETIC

Algorithms for Graph Connectivity and Cut Problems

Improved Algorithms for Orienteering and Related Problems

THE SEARCH FOR efficient algorithms for reconstructing

Data Mining with MAPREDUCE: Graph and Tensor Algorithms with Applications

Bin Packing Algorithms with Applications to Passenger Bus Loading and Multiprocessor Scheduling Problems

Sparse Dataflow Analysis with Pointers and Reachability

Efficient Active Algorithms for Hierarchical Clustering

Efficient Algorithms For Normalized Edit Distance

1 Algorithms for Massive Data Problems

DATA MINING ALGORITHMS FOR RANKING PROBLEMS

Algorithms for Large-Scale Astronomical Problems

Approximation Algorithms for Metric Facility Location Problems

Improved Combinatorial Algorithms for Facility Location Problems

Genetic Algorithms for Use in Financial Problems

Search Algorithms for Discrete Optimization Problems

Efficient Merging and Filtering Algorithms for Approximate String Searches

Efficient Algorithms for Online Game Playing and Universal Portfolio Management

Minimax Rates and Efficient Algorithms for Noisy Sorting

Efficient Routing and Centralized Scheduling Algorithms for IEEE Mesh Networks

Warehouse layout problems : Types of problems and solution algorithms

Genetic Algorithms. Optimization problems and genetic programming

Classical Problems, Data structures and Algorithms

Graph Algorithms: Applications

Efficient Algorithms for Reachability and Path-Selection Problems with Applications

Final Report of Project Funded by the John S. Latsis Public Benefit Foundation December 2010

Team Members

Department of Informatics and

Department of Computer Science

Telecommunications Engineering

University of Ioannina

University of Western Macedonia

Alexandra Galani

Stavros D. Nikolopoulos

Research and Teaching Staff

Professor

Loukas Georgiadis

Leonidas Palios

Assistant Professor (Coordinator)

Associate Professor

Abstract Graphs are mathematical structures that model many important entities such as the world-wide web, transportation, communication and social networks, databases, and biological systems. The objective of this research project was the design of efficient algorithms for a collection of graph problems related to Reachability and Path-Selection. In Reachability and Path-Selection problems we are given an input graph and wish to efficiently perform queries that report if two vertices are connected by a path or compute paths connecting specified vertices so that certain requirements are satisfied. Algorithmic problems of this kind have numerous applications, including internet routing, geographical navigation, and knowledge-representation systems. Specifically, in this project we studied the following types of problems: Join-Reachability: This is a natural extension of the standard Reachability problem for a collection of graphs G . We wish to process G so that we can report fast the set of vertices that reach a given vertex in all graphs of G . Computation of Disjoint Paths: Our goal is to compute a pair of disjoint paths from a given source vertex to every other vertex, or to a specific target vertex. We developed algorithmic techniques and provided efficient algorithms for problems of the above types. We also considered new applications of our techniques and algorithms.

Περίληψη Τα γραφήµατα είναι µαθηµατικές δοµές που µοντελοποιούν πολλές σηµαντικές οντότητες όπως ο παγκόσµιος ιστός, µεταφορικά, επικοινωνιακά και κοινωνικά δίκτυα, βάσεις δεδοµένων και βιολογικά συστήµατα. Ο σκοπός του ερευνητικού προγράµµατος ήταν η σχεδίαση αποδοτικών αλγόριθµων για µια συλλογή προβληµάτων που σχετίζονται µε τη Συνδετικότητα και την Επιλογή Μονοπατιών σε γραφήµατα. Σε προβλήµατα Συνδετικότητας και Επιλογής Μονοπατιών µας δίνεται ένα γράφηµα εισόδου για το οποίο επιθυµούµε να απαντούµε αποδοτικά ερωτήµατα για το εάν δύο κορυφές του συνδέονται µε κάποιο µονοπάτι ή να υπολογίζουµε µονοπάτια που συνδέουν συγκεκριµένες κορυφές και ταυτόχρονα ικανοποιούν καθορισµένες απαιτήσεις. Αλγοριθµικά προβλήµατα αυτού του τύπου έχουν πολυάριθµες εφαρµογές που περιλαµβάνουν τη δροµολόγηση σε δίκτυα, τη γεωγραφική πλοήγηση και τα συστήµατα αναπαράστασης γνώσης. Συγκεκριµένα, στο πρόγραµµα αυτό εξερευνήσαµε τους ακόλουθους τύπους προβληµάτων: Από Κοινού Συνδετικότητα: Είναι µια φυσική επέκταση του τυπικού προβλήµατος συνδετικότητας για µια συλλογή γραφηµάτων G. Επιθυµούµε να επεξεργαστούµε τη G έτσι ώστε να µπορούµε να αναφέρουµε γρήγορα το σύνολο των κορυφών για τις οποίες υπάρχει µονοπάτι προς µια δεδοµένη κορυφή σε όλα τα γραφήµατα της G. Υπολογισµός Μη Τεµνόµενων Μονοπατιών: Ο στόχος µας είναι να υπολογίσουµε ένα ζεύγος µη τεµνόµενων µονοπατιών από µια δεδοµένη αφετηριακή κορυφή προς κάθε άλλη κορυφή ή προς µια συγκεκριµένη καταληκτική κορυφή. Αναπτύξαµε αλγοριθµικές τεχνικές και παρουσιάσαµε αποδοτικούς αλγόριθµους για προβλήµατα των παραπάνω τύπων. Επιπλέον αναζητήσαµε νέες εφαρµογές των τεχνικών και των αλγορίθµων µας.

Contents 1 Introduction

4

1.1

Fundamental Concepts in Graph Theory . . . . . . . . . . . . . . . . . . .

5

1.2

Reachability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

1.3

Path-Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2 Join-Reachability

8

2.1

Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.2

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.3

Explicit Join-Reachability . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.3.1

Computational Complexity . . . . . . . . . . . . . . . . . . . . . . .

11

2.3.2

Combinatorial Complexity . . . . . . . . . . . . . . . . . . . . . . .

12

Implicit Join-Reachability . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.4

3 Connectivity and Vertex-Disjoint Paths

16

3.1

Vertex Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

3.2

Dominator Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.3

Independent Spanning Trees . . . . . . . . . . . . . . . . . . . . . . . . . .

19

3.4

Testing 2-Vertex Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . .

20

3.5

Computing Pairs of Vertex-Disjoint s-t Paths . . . . . . . . . . . . . . . . .

20

4 Further Applications

22

1

4.1

Interprocedural Dominance . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

4.2

Computational Morphological Analysis . . . . . . . . . . . . . . . . . . . .

24

5 Conclusions and Future Work

26

2

List of Figures 1.1

A directed graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.1

An instance of join-reachability for two digraphs. . . . . . . . . . . . . . .

9

2.2

Reducing the size of a graph with the use of a Steiner vertex. . . . . . . .

12

2.3

Mapping the vertices of two paths to points in the plane. . . . . . . . . . .

13

2.4

A Cartesian tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

3.1

A strongly connected and a 2-vertex connected digraph. . . . . . . . . . .

17

3.2

A flowgraph G (s) and its dominator tree D(s). . . . . . . . . . . . . . . . .

18

3.3

Two independent spanning trees of a 2-vertex connected graph. . . . . . .

19

3.4

Two vertex-disjoint paths in a 2-vertex connected graph. . . . . . . . . . .

21

4.1

An interprocedural flowgraph and its dominator dag. . . . . . . . . . . . .

23

4.2

A graph of a morphological analysis. . . . . . . . . . . . . . . . . . . . . .

25

3

Chapter 1

Introduction The area of graph algorithms is rich and successful as graphs are mathematical structures that model many important and diverse entities such as the world-wide web, transportation, communication and social networks, databases, biological systems and the control-flow of computer programs. The problems of reachability and path-selection are fundamental in graph algorithms, with numerous application areas, including internet routing, geographical navigation, knowledge-representation, computational biology, program optimization and natural language processing. Our project was motivated by recent advances in these areas, as well as emerging applications of graph-based data structures. We studied a collection of graph problems related to reachability and path-selection. The outcomes of this program were the design of efficient algorithms, the development of new algorithmic techniques, the design and implementation of practical algorithms, and the identification of new applications of our algorithms and techniques. In this report we restrict ourselves to an overview of the research project, in order to make the content comprehensive to nonspecialists in theoretical computer science. For the full technical details and proofs of our results we refer to the research articles, which are preliminary versions of [Geo10, GNP10, GT10], posted at the project’s website:

4

a

d V = { a, b, c, d, e}

c

E = {( a, b), (b, c), (c, d), (d, a), (d, e), (e, c)} b

G = (V, E)

e

Figure 1.1: A directed graph. http://www.icte.uowm.gr/lgeorg/RPS/ In this chapter we introduce the basic terminology and define the problems in our study. Chapter 2 deals with reachability problems and Chapter 3 discusses pathselection problems. In Chapter 4 we present further applications of our techniques. Finally, in Chapter 5 we discuss directions for future research.

1.1 Fundamental Concepts in Graph Theory A graph G = (V, E) is an abstract representation of a set of objects V, called vertices, and a set of links E, called edges, which connect pairs of objects. The edges may be directed (asymmetric), in which case we have a directed graph or undirected (symmetric), in which case we have an undirected graph. Figure 1.1 shows a directed graph with 5 vertices and 6 edges. A path v1 , v2 , . . . , vk in G is a sequence of vertices vi ∈ V such that there is an edge in E from vi to vi+1 , denoted as (vi , vi+1 ), for i = 1, . . . , k − 1; v1 is the start vertex and vk is the end vertex of the path. For example, the sequence a, b, c, d is a path from a to d in the graph of Figure 1.1. For any pair of vertices v, u ∈ V, vertex v is reachable from u (equivalently u reaches v) if there a path with start vertex u and end vertex v. A cycle is a path such that the start vertex and the end vertex are the same, e.g., e, c, d, e in the graph of Figure 1.1. A graph with no cycle is called acyclic. An undirected graph G = (V, E) is connected if for every pair of vertices u, v ∈ V 5

there is a path connecting u and v. A tree is an undirected graph that is acyclic and connected. Let G = (V, E) and G 0 = (V 0 , E0 ) be two undirected graphs. Then G 0 is a subgraph of G if V 0 is a subset of V (V 0 ⊆ V) and E0 is a subset of E (E0 ⊆ E). If V 0 = V then G 0 is a spanning subgraph of G. If, moreover, G 0 is a tree then G 0 is a spanning tree of G. A directed graph G = (V, E) is strongly connected if for every pair of vertices u, v ∈ V, u reaches v and v reaches u. E.g., the graph of Figure 1.1 is strongly connected. A spanning tree of G rooted at a vertex s is a subgraph of G such that for any other vertex v ∈ V there is exactly one path from s to v. E.g., the spanning subgraph of the graph of Figure 1.1 that is formed by the subset of edges {( a, b), (d, a), (d, e), (e, c)} is a spanning tree with root d. A planar graph is a graph that can be embedded in the plane, i.e., it can be drawn on the plane in such a way that its edges intersect only at their endpoints. The graph of Figure 1.1 is planar. All the graphs considered in this study are directed, although some of the problems we discuss can be also defined (and are interesting) for undirected graphs.

1.2 Reachability In the reachability problem our goal is to preprocess an input graph into a data structure so that queries of whether a vertex b is reachable from a vertex a can be answered quickly. This has applications in internet routing, geographical navigation, knowledgerepresentation systems and other areas [WHY+ 06]. In this project we introduced the study of a related collection of novel problems which we call join-reachability problems. These are motivated by recent work on graph-structured databases, social networks and program optimization. Formally, we are given a collection of graphs G , where each graph Gi ∈ G represents a binary relation over a set of elements V. We define the joinreachability relation R as follows: a is related to b under R if and only if b is reachable 6

from a in all graphs in G . Our goal is to find an efficient representation of R such that, for any given b ∈ V, we can quickly report all the elements that are related to b in R. In Chapter 2 we distinguish two versions of this problem, depending on the type of desired representation of R, and provide an overview of our results.

1.3 Path-Selection The second main thread of our project deals with the design of algorithms for a specific type of path-selection problems. Path-selection refers to the computation of paths connecting a given subset of the vertices of a graph, such that certain requirements are satisfied. Typical examples are the computation of a shortest path between two vertices, finding a path connecting two vertices such that a region of the graph is avoided, or computing edge-disjoint or vertex-disjoint paths. This area contains some of the most important network optimization problems which have been extensively studied. In this project we explored the computation of pairs of disjoint paths: Given a source vertex our goal is to compute two disjoint paths to every other vertex, or to a specific target vertex. We also considered the problem of testing if the input graph has the necessary connectivity requirements for such disjoint paths to exist. An overview of this study is presented in Chapter 3.

7

Chapter 2

Join-Reachability In the reachability problem our goal is to preprocess a graph G into a data structure that can quickly answer queries that ask if a vertex b is reachable from a vertex a. This problem is fundamental for many application areas, including internet routing, geographical navigation, and knowledge-representation systems [WHY+ 06]. Recently, the interest in graph reachability problems has been rekindled by emerging applications of graph data structures in areas such as the semantic web, bio-informatics and social networks. The above developments together with recent applications in graph algorithms [Geo08, Geo10, GT05] have motivated us to introduce the study of the join-reachability problem: We are given a collection G of λ directed graphs Gi = (Vi , Ai ), 1 ≤ i ≤ λ, where each graph Gi represents a binary relation Ri over a set of elements V ⊆ Vi in the following sense: For any a, b ∈ V, we have that a is related to b under Ri , denoted by aRi b, if and only if b is reachable from a in Gi . Let R ≡ R(G) be the binary relation over V defined by: aRb if and only if aRi b for all i ∈ {1, . . . , λ} (i.e., b is reachable from a in all graphs in G ). An example is given in Figure 2.1; a join-reachability query for vertex c returns the set { a, b, f , g} which consists of the vertices reaching c in both G1 and G2 . Our objective is to find an efficient representation of this relation. For simplicity, we will restrict our attention to the case of two input graphs (λ = 2). 8

b

a

d

a

b

e

c d

c

f

g

e

g

f h

h

G2

G1

Figure 2.1: An instance of join-reachability for two digraphs. The join-reachability problem admits a simple solution, which is to precompute the answer to all possible join-reachability queries: For vertex a ∈ V and for each graph Gi ∈ G we can compute the set reach( a, i ) consisting of the vertices that reach a in Gi . Then we can store the answer to the join-reachability query for a by computing the intersection

Tλ

i =1 reach( a, i ).

With this representation join-reachability queries can

be answered in optimal time, but it requires O(n2 ) storage space, which is prohibitive for large graphs. Our goal is to construct space-efficient representations that allow fast join-reachability reporting.

2.1 Applications Instances of the join-reachability problem appear in various applications. For example, in the rank aggregation problem [DKNS01] we are given a collection of rankings of some elements and we may wish to report which (or how many) elements have the same ranking relative to a given element. This is a special version of join-reachability since the given collection of rankings can be represented by a collection of directed paths with the elements being the vertices of the paths. Similarly, in a graph-structured database

9

with an associated ranking of its vertices we may wish to find the vertices that are related to a query vertex and have higher or lower ranking than this vertex. Instances of join-reachability also appear in graph algorithms arising from program optimization. Specifically, in [Geo08] we need a data structure capable of reporting which vertices satisfy certain ancestor-descendant relations in a collection of rooted trees. Also, in current work in progress, we show that join-reachability structures for two trees can yield efficient solutions to special cases of the interprocedural dominance problem [dSvPdB07]. See Section 4.1. There are also instances of join-reachability that are related to the topics considered in Chapter 3. In [GT05] (see also [GT10]) it is shown that any directed graph G with a distinguished source vertex s has two spanning trees rooted at s such that a vertex a is a dominator of a vertex b (meaning that all paths in G from s to b pass through a) if and only if a is an ancestor of b in both spanning trees. This generalizes the graphtheoretical concept of independent spanning trees. Two spanning trees of a graph G are independent if they are both rooted at the same vertex s and for each vertex v the paths from s to v in the two trees are internally vertex-disjoint. Similarly, λ spanning trees of G are independent if they are pairwise independent. In this setting, we can apply a joinreachability structure to decide if λ given spanning trees are independent. Moreover, a variant of the join-reachability problem appears in our algorithm for computing pairs of vertex-disjoint paths [Geo10].

2.2 Results In [GNP10] we explored two versions of the join-reachability problem. In the explicit version we wish to represent R with a directed graph J ≡ J (G), which we call the join-reachability graph of G , i.e., for any a, b ∈ V, we have aRb if and only if b is reachable from a in J . Our goal is to minimize the size (i.e., the number of vertices plus edges) of J . We presented results on the computational and combinatorial complexity of J . 10

In the implicit version we wish to represent R with an efficient data structure (in terms of storage space and query time) that can report fast all elements a ∈ V satisfying aRb for any query element b ∈ V. First, we provided efficient join-reachability structures for simple graph classes. Then, based on these results, we considered planar graphs and general directed graphs.

2.3 Explicit Join-Reachability In the explicit version of join-reachability we wish to construct a join-reachability graph of small size. First we explore the computational complexity of computing the smallest such graph, and then we provide bounds for its size in several cases.

2.3.1 Computational Complexity We consider the computational complexity of computing the smallest J ({ G1 , G2 }): Given two graphs G1 = (V, A1 ) and G2 = (V, A2 ) we wish to compute a graph

J ≡ J ({ G1 , G2 }) of minimum size such that for any a, b ∈ V, b is reachable from a in J if and only if b is reachable from a in both G1 and G2 . We can further distinguish two versions of this problem, depending on whether J is allowed to have Steiner vertices (i.e., vertices not in V) or not: In the unrestricted version V (J ) ⊇ V, while in the restricted version V (J ) = V. The problem of computing the smallest J in the unrestricted case belongs to the class of NP-hard problems. This is implied by a straightforward reduction to the reachability substitute problem, which was shown to be NP-hard by Katriel et al. [KKS05]. In the restricted case, on the other hand, we can compute J using transitive closure and transitive reduction computations, which can be done in polynomial time [AGU72]. Note that the existence of Steiner vertices can reduce the size of J significantly. Consider for example a complete bipartite digraph G with V ( G ) = X ∪ Y and A( G ) = X × Y. This digraph has the same transitive closure as the digraph G 0 with V ( G 0 ) = 11

X

X

Y

Y

z

Figure 2.2: Reducing the size of a graph with the use of a Steiner vertex z. V ( G ) ∪ {z} and A( G 0 ) = {( x, z), (z, y) | x ∈ X, y ∈ Y }. See Figure 2.2.

2.3.2 Combinatorial Complexity The objective here is to develop methods for constructing small join-reachability graphs (but not necessarily optimal), and then provide bounds on the size (number of vertices plus edges) of these constructions. Our starting point is to build join-reachability graphs for paths and trees. The basic idea is to map the vertices of V to geometric objects in a d-dimensional space, where d is some constant. Then, the join-reachability relation can be decided from the position of these objects in the d-dimensional space. An example for the case of two paths is depicted in Figure 2.3: Each vertex v ∈ V receives coordinates ( x1 (v), x2 (v)), where x1 (v) corresponds to the position of v in the first path, and x2 (v) corresponds to the position of v in the second path. Specifically, x1 (v) is equal to the number of vertices (other than itself) that reach v in G1 ; x2 (v) is defined analogously. It follows that each vertex is mapped to a point in the 2d-space

[O, n − 1]2 and has integer coordinates. Moreover, for any two vertices a, b ∈ V, we have that b is reachable from a if and only if ( x1 ( a), x2 ( a)) ≤ ( x1 (b), x2 (b)). In Figure 2.3 the vertices that reach f in both G1 and G2 are inside the dashed rectangle. Based on this geometrical view we can find the necessary edges (and Steiner vertices) that

12

x2

G1

a [0]

a [0]

b [1]

e [1]

c [2]

c [2]

d [3]

g [3]

e [4]

b [4]

f [5]

f [5]

g [6]

d [6]

h [7]

h [7]

h

7 d

6

f

5 b

4 3

g c

2

e

1 a

0

G2 0

1

2

3

4

5

6

7 x1

Figure 2.3: Mapping the vertices of two paths to points in the plane. should be included in the join-reachability graph. The bound we derive is O(n log n) which turns out to be tight in the worst case: We presented examples where the smallest join-reachability graph must have Ω(n log n) edges. Based on similar ideas we provide methods for building join-reachability graphs of size O(n logk n), for some constant k ≤ 3, when we deal with trees and planar graphs. These methods can also be applied to general graphs, but the quality of the produced structures depends on number of disjoint-paths into which the graphs can be decomposed.

2.4 Implicit Join-Reachability In the implicit version of the join-reachability problem our goal is to construct an efficient data structure that supports the following type of query: Given a query vertex b report all vertices a that reach b in J ({ G1 , G2 }). We measure the efficiency of a data structure in terms of the storage space it requires, and the time it needs to answer a join-reachability query (i.e., the time it needs to locate all vertices that reach b in

J ({ G1 , G2 })). To that end, we use the notation hs(n), q(n, k)i to refer to a data structure

13

x2

G1

a [0]

a [0]

b [1]

e [1]

c [2]

c [2]

d [3]

g [3]

e [4]

b [4]

f [5]

f [5]

g [6]

d [6]

h [7]

h [7]

h

7 d

6

f

5 b

4 3

g c

2

e

1 a

0

G2 0

1

2

3

4

5

6

7 x1

Figure 2.4: A Cartesian tree. with O(s(n)) space and O(q(n, k)) query time for reporting k elements. In order to design efficient join-reachability data structures we apply the techniques of Section 2.3.2 combined with data structures from computational geometry. Consider, for example, the case of two paths. Using the mapping of 2.3.2 we need a data structure that returns the vertices a with ( x1 ( a), x2 ( a)) ≤ ( x1 (b), x2 (b)). This reporting can be accomplished with a Cartesian tree [GBT84]. A Cartesian tree T is a binary tree defined recursively as follows: The root of T is the point a with minimum x2 -coordinate. The left subtree of the root is a Cartesian tree for the points b with x1 (b) < x1 ( a) and the right subtree of the root is a Cartesian tree for the points b with x1 (b) > x1 ( a). See Figure 2.4. The reporting algorithm uses the following property: Consider two points a and b, and let c be the point with minimum x2 -coordinate such that x1 ( a) ≤ x1 (c) ≤ x1 (b). Then c is the nearest common ancestor of a and b in T. (The nearest common ancestor of two vertices in a tree is their common ancestor that is farthest from the root. E.g., in Figure 2.4 the nearest common ancestor of d and h is e.) Now let ζ be the point with the smallest x1 -coordinate. In order to find all points a such that ( x1 ( a), x2 ( a)) ≤ ( x1 (b), x2 (b)) we first locate the nearest common ancestor 14

of ζ and b in T; call this vertex y. The returned point y has the smallest x2 -coordinate in the x1 -range [0, x1 (b)]. If x2 (y) > x2 (b) then the answer is null and we stop our search. Otherwise we return y and search recursively in the x1 -ranges [0, x1 (y) − 1] and

[ x1 (y) + 1, x1 (b)]. Using the fact that nearest common ancestor queries in a tree can be answered in constant time after linear time preprocessing [HT84], it follows that the efficiency of the above data structure is hn, ki. Again, based on similar ideas, we provide data structures for trees, planar graphs, and general graphs.

15

Chapter 3

Connectivity and Vertex-Disjoint Paths In this chapter we present algorithms for computing pairs of vertex-disjoint paths in a graph G = (V, E) from a common start vertex. We consider the following two problems: (a) Compute a pair of vertex-disjoint s-v paths for all vertices v ∈ V \ {s}, where s ∈ V is a fixed source vertex. (b) Compute a pair of vertex-disjoint s-t paths for a given start vertex s and a given terminal vertex t. We also consider the connectivity requirements that G must satisfy in order for such paths to exist. We remark that the more general problem of computing two vertexdisjoint paths that may connect different start and terminal vertices is NP-hard [BJG02].

3.1 Vertex Connectivity A directed (undirected) graph is k-vertex connected if it has at least k + 1 vertices and the removal of any set of at most k − 1 vertices leaves the graph strongly connected (connected). See Figure 3.1. The vertex connectivity κ ≡ κ ( G ) of a graph G is the maximum k 16

(i)

(ii)

Figure 3.1: A strongly connected and a 2-vertex connected digraph. such that G is k-vertex connected. Graph connectivity is one of the most fundamental concepts in graph theory with numerous practical applications [BJG02]. Currently, the fastest known algorithm for computing κ is due to Gabow [Gab06], with O((n + min{κ 5/2 , κn3/4 })m) running time. A related problem is to test if a graph satisfies κ ≥ k for a given integer k. Henzinger et al. [HRG00] showed how to test k-vertex connectivity in time O(min{k3 + n, kn}m). They also gave a randomized algorithm for computing κ with error probability 1/2 in time O(nm). For an undirected graph, a result of Nagamochi and Ibaraki [NI92] allows m to be replaced by κn or kn in the above bounds. Cheriyan and Reif [CR94] showed how to test k-vertex connectivity in a directed graph with a Monte Carlo algorithm with running time O(( M(n) + nM(k)) log n) and error probability 1/n, and with a Las Vegas algorithm with expected running time O(( M(n) + nM (k))k). In these bounds, M(n) is the time to multiply two n × n matrices, which is O(n2.376 ) [CW90].

3.2 Dominator Verification A flowgraph G (s) = (V, A, s) is a graph with a distinguished root s ∈ V such that every vertex is reachable from s. The dominance relation in G is defined as follows: A vertex w dominates a vertex v if every path from s to v includes w; if w 6∈ {s, v} then w is a proper dominator of v; otherwise, w is a trivial dominator of v. The dominance relation can be 17

s

s

b

a

a b

f

c

f

d

g

h

e

c

d

g D (s)

e

h G (s)

Figure 3.2: A flowgraph G (s) and its dominator tree D(s). represented compactly by the dominator tree D: This is a tree rooted at s that satisfies the following property: For any two vertices v and w, w dominates v if and only if w is an ancestor of v in D [ASU86]. See Figure 3.2. The computation of dominators appears in several application areas, such as program optimization and code generation, constraint programming, circuit testing, theoretical biology, and other areas [GTW06]. Dominators can be computed in almost linear time with the algorithm of Lengauer and Tarjan [LT79]. This algorithm has some conceptual complexities, but it is used in many applications as more simple algorithms have quadratic complexity or worse. There are also even more complicated truly linear-time algorithms [AHLT99, BGK+ 08, GT04]. We define the dominator verification problem as follows: Given a flowgraph G (s) and a tree T test if T is the dominator tree of G (s). An important special case of this problem is the verification of trivial dominators: Given a flowgraph G (s) test if s is the only proper dominator of every vertex v 6= s. We have shown that the dominator verification problem can be reduced in linear time to the problem of verifying trivial dominators. The dominator verification problem was initially motivated by the complexities of the efficient algorithms for computing dominators. Moreover, in the next sections we

18

s

s

a

d b

a

c

e

d

e

d b

f t

G

a

c

b f

t

s

c

e

f t

T1

T2

Figure 3.3: Two independent spanning trees of a 2-vertex connected graph. show that the problems of testing the existence of pairs of vertex-disjoint paths of type (a) to all vertices starting from a fixed source, and (b) from a given source vertex to a given target vertex, can be reduced to the verification of trivial dominators.

3.3 Independent Spanning Trees Let T1 and T2 be two spanning trees of a graph G = (V, E) rooted at a vertex s ∈ V. The spanning trees are independent if for each vertex v the two s-v paths in T1 and T2 are internally vertex-disjoint. See Figure 3.3. The spanning trees are strongly independent if they contain an s-v path and an s-u path that are vertex-disjoint, for all pairs of vertices u and v. Independent spanning trees have been used in fault-tolerant communications (see, e.g., [AB00, IR88]). The existence of two such spanning trees is implied by a result of Whitty [Whi87], when G satisfies the following necessary and sufficient condition: G contains two vertexdisjoint s-v paths for all vertices v 6= s. This is equivalent to stating that the flowgraph with root s has only trivial dominators. Whitty gave a polynomial-time construction for two strongly independent spanning trees. Simpler constructions were later provided by Plehn [Ple91], Cheriyan and Reif [CR94], but the time complexity of these constructions was not specified. Huck [Huc94] gave an O(mn)-time construction of two independent spanning trees. In [GT10] we provide linear-time constructions of two strongly inde19

pendent spanning trees and other related concepts.

3.4 Testing 2-Vertex Connectivity Consider a 2-vertex connected graph G = (V, E). For s ∈ V, let G (s) be the flowgraph with root s. The definition of 2-vertex connectivity implies that s is the only proper dominator in G (s) for all vertices v 6= s. The same property holds for the reverse graph Gr , which is derived from G after reversing all edge directions. In [Geo10] we show that for a graph to be 2-vertex connected it is sufficient that the above two properties hold for two arbitrary vertices. Therefore, testing a graph for 2-vertex connectivity can be reduced to testing if constant number of flowgraphs have trivial dominators only. This reduction together with the results of [GT10] imply a simple linear-time algorithm for testing 2-vertex connectivity.

3.5 Computing Pairs of Vertex-Disjoint s-t Paths We consider next the problem of computing two internally vertex-disjoint paths directed from s to t, for any given source vertex s and target vertex t. See Figure 3.4. This problem can be reduced to computing two edge-disjoint paths (by applying a standard vertex splitting procedure), which in turn can be carried out in O(m) time by computing two flow-augmenting paths [BJG02]. In [Geo10] we presented a faster algorithm for 2-vertex connected graphs. First we note that our algorithm for testing 2-vertex connectivity allows us to find in linear time a 2-vertex connected spanning subgraph of the input digraph with O(n) edges. Hence, the flow-augmenting algorithm can compute two internally vertex-disjoint s-t paths in O(n) time. We can further improve this with the use of independent spanning trees. Based on the results mentioned in Section 3.3, we can construct a linear space data structure that computes two internally vertex-disjoint s-t paths, for any s, t, in O(log2 n)

20

g

g

a

d b

a

c

e

d c

b e

f h

f h two vertex-disjoint d-e paths

G

Figure 3.4: Two vertex-disjoint paths in a 2-vertex connected graph. time, so that the two paths can be reported in constant time per vertex. We remark that the reporting algorithm needs to find common ancestors of some vertices in pairs of trees, which is a variant of the join-reachability problem defined in Chapter 2.

21

Chapter 4

Further Applications Now we consider additional applications of our algorithms and techniques. We remark that the material we present here is part of ongoing research.

4.1 Interprocedural Dominance As we already mentioned in Section 3.2 the computation of dominators is crucial in the analysis and optimization of computer programs. In the context of whole-program analysis and optimization, however, we have to take into account the fact that there are path-constraints which make some paths of the flowgraph invalid [RHS95]. As a result, the most efficient algorithms for intraprocedural dominators are unable to handle the intrerprocedural case. We formulate the interprocedural dominance problem as in [dSvPdB07]. The vertices of the flowgraph are partitioned into sets corresponding to different procedures. Each procedure P has a unique entry vertex s( P) and a unique exit vertex t( P); the main procedure contains the root vertex s and the terminal vertex t. An edge e is directed from tail(e) to head(e). A call edge has the form ( x, s( P)) with x 6∈ P. Similarly, a return edge has the form (t( P), y) with y 6∈ P. Each call edge has a unique corresponding return edge and vice versa. We let φ denote the (bijective) function that maps a call edge to 22

main φ ( e1 ) = e2 φ ( e3 ) = e4 A

e1

s( A)

e3

t( A)

e2 e4

b

a

B

e5

a

d

e

f

s( A)

s( B)

t( A)

t( B)

b

s( B)

e7 c

s

φ ( e5 ) = e6 φ ( e7 ) = e8

s

e6 e8

t( B)

d

c e

t

t

Figure 4.1: An interprocedural flowgraph and its dominator dag. Procedure call edges (e1 , e3 , e5 , and e7 ) and return edges (e2 , e4 , e6 , and e8 ) are dotted; the call-return correspondence is given by the φ() function. its corresponding return edge; if φ(( x, s( P))) = (t( P), y) then it is implied that x and y belong to the same procedure. Figure 4.1 gives an example. A full path starts at s and ends at t. A full path Q is valid if it has a proper nesting of procedure calls-returns, i.e., • if Q contains a return edge e = (t( P), y) then the prefix of Q from s to t( P) contains the call edge φ−1 (e), and • if Q contains the call edges e and e0 , where e precedes e0 , then φ(e0 ) precedes φ(e) in Q. A valid path is a prefix of a full valid path. A vertex w dominates a vertex v if every valid path from s to v includes w. The existence of path-constraints modifies the structure of the dominance relation (with respect to the standard problem). Specifically, the transitive reduction of the interprocedural dominance relation is no longer a tree but a directed acyclic graph. We have developed efficient algorithms for special cases of this problem by formulating them in the context of join-reachability. We are currently extending our solutions 23

f

to these special cases in order to derive efficient algorithms for computing the interprocedural dominance relation in the general case.

4.2 Computational Morphological Analysis Morphology is the study of the internal structure of words. Morphological analysis consists of the identification of the constituents of words. The smallest meaningful constituents are called morphemes (e.g., dog, dog-s). The morphemes have grammatical functions; they express inflectional properties. For instance, tense and aspect are the inflectional categories expressed in verbs (play, play-ed, play-ing). Lexemes are abstract entities and can be thought of as a set of words (PLAY). Wordforms are concrete entities and belong to a single lexeme (plays, played belong to the lexeme PLAY). The set of word-forms that belong to a lexeme is called a paradigm. Word-forms with a concrete meaning are called roots (play). They also consist of affixes with an abstract meaning (play-ing, play-ed, play-er). Affixes that follow the root are called suffixes (play-ing, play-ed, play-er). Affixes that precede the root are called prefixes (re-read). There are four major theoretical approaches to inflection; see [Se01]. In the present study, we adopt the framework of Distributed Morphology. For simplicity reasons, we only provide a brief sketch of a possible morphological analysis of some forms of a verbal paradigm in Greek [Gal05]. Some of the core questions and issues we need to ask and take into account in any given morphological analysis are: - What morphological units languages consist of? - What features are expressed in each morpheme? - How do different morphemes interact with one another? - Can all morphemes be matched to one another? 24

apolim-

-en-

-o

-a

-an∗ -

-ome -omun -ontas

-∗ smen-thik-

-th-os -i

-o

-tik-tis

-tirio

Figure 4.2: A graph of a morphological analysis. - How do we account for any constraints between the matching of morphemes? Derivation is the process by which new words (with a new meaning) are formed (read, read-able, kind, kind-ness). Different languages employ different processes by which derivation occurs. For instance, by affixation. Here, some of the fundamental questions one needs to answer are: - How roots combine with certain prefixes and affixes? - What are the constraints in such formations? - What about the interface of inflection and derivation? Computational approaches to morphology can provide empirical evidence that can help in answering such questions. Parts of such approaches can be formulated as graph reachability and path-selection problems. A simple example is shown in Figure 4.2; Constituents are combined in paths to form a word-form. (We stress that this figure is not an exhaustive morphological representation. We leave aside any phonological and/or lexical rules that may further apply.)

25

Chapter 5

Conclusions and Future Work In this project we studied a collection of Reachability and Path-Selection problems, and designed efficient algorithms for their solution. We believe that several related topics, some of which are listed below, deserve further investigation. Problems related to Reachability: • Determine the computational complexity of constructing the smallest join-reachability graph for simple graph classes such as trees. • Provide bounds for the explicit representation of the join-reachability graph for other interesting graph classes. • Consider the problem of approximating the smallest join-reachability graph for specific graph classes. Problems related to Path-Selection: • Design fast algorithms for testing k-connectivity for constant k > 2. • Consider data structures that report fast more than 2 disjoint s-t paths. • Design fast (linear or near linear time) algorithms for computing a sparse 2-vertex connected subgraph of a given graph; The computation of the smallest such sub26

graph is NP-hard, so here we are interested in fast heuristics that achieve good approximation guarantees.

We plan to investigate the above topics in our future studies.

27

Bibliography [AB00]

F. S. Annexstein and K. A. Berman. Directional routing via generalized st-numberings. SIAM J. Discret. Math., 13(2):268–279, 2000.

[AGU72]

A. V. Aho, M. R. Garey, and J. D. Ullman. The transitive reduction of a directed graph. SIAM J. Comput., 1(2):131–137, 1972.

[AHLT99]

S. Alstrup, D. Harel, P. W. Lauridsen, and M. Thorup. Dominators in linear time. SIAM Journal on Computing, 28(6):2117–32, 1999.

[ASU86]

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986.

[BGK+ 08]

A. L. Buchsbaum, L. Georgiadis, H. Kaplan, A. Rogers, R. E. Tarjan, and J. R. Westbrook. Linear-time algorithms for dominators and other pathevaluation problems. SIAM Journal on Computing, 38(4):1533–1573, 2008.

[BJG02]

J. Bang-Jensen and G. Gutin. Digraphs: Theory, Algorithms and Applications (Springer Monographs in Mathematics). Springer, 1st ed. 2001. 3rd printing edition, 2002.

[CR94]

J. Cheriyan and J. H. Reif. Directed s-t numberings, rubber bands, and testing digraph k-vertex connectivity. Combinatorica, 14(4):435–451, 1994.

[CW90]

D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. J. Symb. Comput., 9(3):251–280, 1990. 28

[DKNS01]

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW ’01: Proceedings of the 10th international conference on World Wide Web, pages 613–622, 2001.

[dSvPdB07] B. de Sutter, L. van Put, and K. de Bosschere. A practical interprocedural dominance algorithm. ACM Trans. Program. Lang. Syst., 29(4), 2007. [Gab06]

H. N. Gabow. Using expander graphs to find vertex connectivity. Journal of the ACM, 53(5):800–844, 2006.

[Gal05]

A. Galani. The Morphosyntax of Verbs in Modern Greek. PhD thesis, University of York, UK, September 2005.

[GBT84]

H. N. Gabow, J. L. Bentley, and R. E. Tarjan. Scaling and related techniques for geometry problems. In Proc. 16th ACM Symp. on Theory of Computing, pages 135–143, 1984.

[Geo08]

L. Georgiadis. Computing frequency dominators and related problems. In ISAAC ’08: Proceedings of the 19th International Symposium on Algorithms and Computation, pages 704–715, 2008.

[Geo10]

L. Georgiadis. Testing 2-vertex connectivity and computing pairs of vertexdisjoint s-t paths in digraphs. In Proc. 37th Int’l. Coll. on Automata, Languages, and Programming, pages 738–749, 2010.

[GNP10]

L. Georgiadis, S. D. Nikolopoulos, and L. Palios. Join-reachability in directed graphs. Manuscript, 2010.

[GT04]

L. Georgiadis and R. E. Tarjan. Finding dominators revisited. In Proc. 15th ACM-SIAM Symp. on Discrete Algorithms, pages 862–871, 2004.

[GT05]

L. Georgiadis and R. E. Tarjan. Dominator tree verification and vertexdisjoint paths. In Proc. 16th ACM-SIAM Symp. on Discrete Algorithms, pages 433–442, 2005. 29

[GT10]

L. Georgiadis and R. E. Tarjan. Dominator verification and independent spanning trees. Manuscript, 2010.

[GTW06]

L. Georgiadis, R. E. Tarjan, and R. F. Werneck. Finding dominators in practice. Journal of Graph Algorithms and Applications (JGAA), 10(1):69–94, 2006.

[HRG00]

M. R. Henzinger, S. Rao, and H. N. Gabow. Computing vertex connectivity: New bounds from old techniques. Journal of Algorithms, 34:222–250, 2000.

[HT84]

D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing, 13(2):338–55, 1984.

[Huc94]

A. Huck. Independent trees in graphs. Graphs and Combinatorics, 10:29–45, 1994.

[IR88]

A. Itai and M. Rodeh. The multi-tree approach to reliability in distributed networks. Information and Computation, 79(1):43–59, 1988.

[KKS05]

I. Katriel, M. Kutz, and M. Skutella. Reachability substitutes for planar ¨ Indigraphs. Technical Report MPI-I-2005-1-002, Max-Planck-Institut Fur formatik, 2005.

[LT79]

T. Lengauer and R. E. Tarjan. A fast algorithm for finding dominators in a flowgraph. ACM Transactions on Programming Languages and Systems, 1(1):121–41, 1979.

[NI92]

H. Nagamochi and T. Ibaraki. A linear-time algorithm for finding a sparse k-connected spanning subgraph of a k-connected graph.

Algorithmica,

7:583–596, 1992. [Ple91]

¨ J. Plehn. Uber die Existenz und das Finden von Subgraphen. PhD thesis, University of Bonn, Germany, May 1991. 30

[RHS95]

T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 49–61, June 1995.

[Se01]

A. Spencer and A. Zwicky (eds). The Handbook of Morphology. Blackwell Publishers, 2001.

[Whi87]

R. W. Whitty. Vertex-disjoint paths and edge-disjoint branchings in directed graphs. Journal of Graph Theory, 11:349–358, 1987.

[WHY+ 06] H. Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. In ICDE ’06: Proceedings of the 22nd International Conference on Data Engineering, page 75, 2006.

31