An Efficient Genetic Algorithm for Subgraph Isomorphism

An Efficient Genetic Algorithm for Subgraph Isomorphism Jaeun Choi School of Computer Science & Engineering Seoul National University 1 Gwanak-ro, Gw...

Author: Dustin Lawrence

0 downloads 1 Views 373KB Size

Report

Download PDF

Recommend Documents

An Algorithm for Subgraph Isomorphism

OPTIMIZED BACKTRACKING FOR SUBGRAPH ISOMORPHISM

An Efficient Grouping Genetic Algorithm

An Efficient Approach on Object Oriented Design using Genetic Algorithm

An Efficient Algorithm for Scatter Chart Labeling

Adaptive Genetic Algorithm for Efficient Resource Management in Cloud Computing

An Efficient Genetic Algorithm for Predicting Protein Tertiary Structures in the 2D HP Model

Genetic Algorithm for Data Mining

A Genetic Algorithm for Classification

A Genetic Algorithm for Scheduling

Efficient Subgraph Matching on Billion Node Graphs

XL: An Efficient Network Routing Algorithm

Parallel genetic algorithm engine on an FPGA

Genetic Algorithm: An Approach for Optimization (Using MATLAB)

An efficient two-step algorithm for the incompressible flow problem

An Efficient A* Search Algorithm for Statistical Machine Translation

Sliding-Window Filtering: An Efficient Algorithm for Incremental Mining

An Efficient Unicode based Sorting Algorithm for Bengali Words

An Efficient Search Algorithm for Motion Data Using Weighted PCA

An Optimal Distributed Constraint Optimisation Algorithm for Efficient Energy Management

An Efficient Simulated Annealing Algorithm for Economic Load Dispatch Problems

An Efficient Processing Algorithm for RADARSAT-2 Spotlight SAR Data

TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies

AN IMPROVED VIRUS EVOLUTIONARY GENETIC ALGORITHM FOR WORKFLOW MINING

An Efficient Genetic Algorithm for Subgraph Isomorphism Jaeun Choi

School of Computer Science & Engineering Seoul National University 1 Gwanak-ro, Gwanak-gu Seoul, 151-744 Korea

[email protected]

Yourim Yoon

Future IT R&D Lab. LG Electronics 19 Yangjae-daero 11gil Seocho-gu Seoul, 137-130 Korea

[email protected]

ABSTRACT

[email protected]

and G2 . The answer to the problem is positive if G1 is isomorphic to a subgraph of G2 , and negative otherwise. Two graphs G and H are isomorphic if there is a bijection between the vertex sets of G and H such that any two vertices u and v of G are adjacent in G if and only if the corresponding vertices of u and v are adjacent in H. Since the subgraph isomorphism problem is a special case of the maximum common subgraph (MCS) problem, the methodologies to solve the two problems are closely related. The MCS problem is an optimization problem which aims to find the largest subgraph of two given graphs that are isomorphic to each other. The MCS problem is known to be NP-hard [10]. A typical object function is the ratio of the number of edges in the common subgraph against the number of edges in the smaller graph. If the size of the MCS equals the size of the smaller graph, it means that the pair of graphs satisfies subgraph isomorphism. There are many algorithms for the two problems, but none of them runs on large size problems within reasonable time complexity. Ullmann [16] described a recursive backtracking procedure for the subgraph isomorphism problem, which has exponential time complexity in general. The running time can be reduced to linear time only under limited conditions [8]. The McGregor algorithm [12] and the Durand-Pasari algorithm [7] are two representative algorithms for the MCS problem [3] [4] [14], but they spend hours to find the MCS for graphs with fewer than 100 vertices. Rutgers et al. [14] proposed an approximate algorithm for graphs of up to 200,000 vertices, but it only works when graphs have specific properties of digital circuits. Genetic algorithm (GA) has also been used for these problems, although not abundant like in other combinatorial optimization problems such as traveling salesman problem and graph partitioning problem. Brown et al. [2] and Wagener et al. [17] applied GA to chemical structure matching. They implemented pure GAs without local optimization algorithm. Fr¨ ohlich et al. [9] used GA to approximate the MCS, but it took more than 9 hours to match graphs of 200 vertices. Kim and Moon [11] proposed a hybrid GA for the subgraph isomorphism problem as a part of malware detection process, where graphs of under 70 vertices were tested. They used directed graphs because a malware’s structure is expressed in a dependency graph with directed edges. Based on the GA structure presented in [11], we propose a multiobjective GA for the subgraph isomorphism problem by introducing a new fitness function. The new fitness function compares the degree of each vertex of two graphs and ex-

In this paper we propose a multi-objective genetic algorithm for the subgraph isomorphism problem. Usually, the number of different edges between two graphs has been used as a fitness function. This approach has limitations in that it only considers directly-visible characteristics of current solutions, not considering the potential for being an optimal solution. We designed a fitness function in which solutions with higher potential can be rated high. This new fitness function has good properties such as transforming the solution space globally convex and improving the performance of local heuristics and genetic algorithms. Experimental results show that the suggested approach brings a considerable improvement in performance and efficiency.

Categories and Subject Descriptors G.1.6 [Numerical Analysis]: Optimization—Global optimization; G.2.2 [Discrete Mathematics]: Graph Theory—Graph algorithms

General Terms Algorithms

Keywords Genetic algorithm, subgraph isomorphism, fitness function, multi-objective genetic algorithm

1.

Byung-Ro Moon

School of Computer Science & Engineering Seoul National University 1 Gwanak-ro, Gwanak-gu Seoul, 151-744 Korea

INTRODUCTION

Graphs are a powerful data representation model in various fields. Computer networks, digital circuits, and chemical structures can be modeled into graphs. Finding a common structure in two graphs arises in many applications such as social network analysis, chemical structure matching, and digital circuit analysis. The subgraph isomorphism problem is a decision problem that is known to be NP-complete [5]. The input to the decision problem is a pair of graphs G1

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’12, July 7-11, 2012, Philadelphia, Pennsylvania, USA. Copyright 2012 ACM 978-1-4503-1177-9/12/07 ...$10.00.

361

1

ploit the information. Experimental results show that this approach improves the performance of GA to a significant extent. The remainder of this paper is organized as follows. We explain the fitness function of previous work and it’s limitation in Section 2. In Section 3, a new fitness function is described with examples and multi-objective GA is explained. Genetic framework is given in Section 4. Section 5 provides experimental results and discussion, and the conclusion is given in Section 6.

2.

1 2

2

Figure 1: G1 and G2

2

Algorithm 1 Two-vertex exchange heuristic Input: a permutation P of vertices in G(V,E) for v1 ∈ V do for v2 ∈ V and v2 6= v1 do Copy and make P’ from P Exchange the location of v1 and v2 on P’ if P’ is more attractive than P then Copy P’ into P end if end for end for

2

3

6

5

3

1 4

2

3

3 6

5 (b)

Figure 2: Two possible mappings between G1 and G2 . (b) is a better solution than (a). where I(e, E) =



0 1

if e ∈E otherwise

If the value of f1 is 0, it means that G1 is a complete subgraph of G2 and the attractiveness is the highest possible. This function has limitations in that it only computes the number of different edges as a whole and does not consider the degree of each vertex. One can screen out unattractive candidate permutations in a heuristic by exploiting the degrees of vertices [6]. For example, in the problem of graph isomorphism, a vertex of one graph cannot be mapped onto a vertex of the other graph if the degrees of the two vertices are not equal. We extend this property; a pair of vertices, each from one of the two graphs, are not probable to match if the vertex from the smaller graph has a higher degree than the one from the larger graph. We devise another function to reflect this idea in the next section.

3. NEW FITNESS FUNCTION 3.1 Main Idea In designing heuristics or evolutionary algorithms for an optimization problem, we have to choose an appropriate fitness function which leads to an effective search path. In that respect, f1 is not an eventual fitness function. The fitness of a solution can be measured differently by comparing the degrees of any two vertices that are mapped to each other. Figure 1 shows two graphs, a 3-vertex graph G1 and a 6vertex graph G2 . Figure 2 shows two different mappings between G1 and G2 . The number of different edges between G1 and G2 is two in both cases. However, Figure 2b is a better solution. In Figure 2a, two vertices out of three are mapped to unsuitable vertices. Vertex 5 of G2 has only one

In previous work [11], the number of different edges between two graphs over the mapping was used as a fitness function. Counting the number of common edges is an alternative [2] [17]. The former aims to minimize the value while the latter aims to maximize the value. We give the formal representation of the former case. Let G1 = (V1 , E1 ), G2 = (V2 , E2 ), where |V1 | ≤ |V2 |, and Gs = (Vs , Es ) be a subgraph of G2 mapped to G1 . Then I(e, E1 )

2

(a)

2.2 Fitness Function

e∈E1

4

1

This heuristic works differently depending on the fitness function, which determines the direction of the search path of the heuristic. If the fitness function can reflect more characteristics of excellent solutions in relation to the current solution, the heuristic may take a more efficient path to find the optimal solution. Since local search algorithms have to terminate in a reasonable number of steps, they usually make a sequence of greedy choices based on the state of a solution. They move from one solution to another according to certain criteria. In Algorithm 1, the sixth line provides a criterion and it is up to the fitness function. The heuristic compares the fitness value of the current solution with the fitness value of an adjacent solution. It does not take a path which makes the fitness function value lower, even though taking that path eventually leads to an optimal solution. So if we can design a fitness function such that a solution with the potential is rated high, it will help in taking a better search path.

X

1

1

Kim and Moon [11] used a two-vertex exchange heuristic for the subgraph isomorphism problem. The two-vertex exchange heuristic is similar to 2-OPT in the traveling salesman problem.

I(e, Es ) +

6

5

2.1 Two-Vertex Exchange Heuristic

X

3

3

PREVIOUS WORK

f1 =

4

(1)

e∈Es

362

1 2 3

dout 2 1 0

din 0 1 2

1 2 3 4 5 6

dout 3 1 1 1 1 2

din 0 3 2 2 1 1

1 2 3 4 5 6

1 0 1 1 1 1 0

2 1 0 0 0 0 0

3 1 0 0 0 1 1

The value of Figure 2b is a11 + a22 + a33 = 0 A smaller value of f2 indicates a better solution. So we can say that Figure 2b shows a better solution than Figure 2a according to the 2nd fitness function.

3.4 Multi-objective GA

Figure 3: First two tables show the degree of each vertex in G1 and G2 . dout denotes out-degree and din denotes in-degree.

The 2nd fitness function has no meaning by itself and needs to be combined with the 1st fitness function. It is clear that both of the f1 value and the f2 value equal zero when the smaller graph is a subgraph of the larger graph. If the f1 value equals zero, then it indicates an optimal solution. It is a sufficient condition for an optimal solution. But the f2 value being zero is a necessary condition. In order to use the two fitness functions together, a multiobjective GA is needed. Multi objective GA is used when more than one object function need to be optimized [15]. A basic method for multi-objective GA is applying weighted sum method. Let w1 be the relative weight of the 1st fitness function and w2 be the relative weight of the 2nd fitness function. The united fitness function is

outgoing edge, while vertex 1 of G1 has two outgoing edges. Vertex 6 of G2 has one incoming edge, while vertex 3 of G1 has two incoming edges. As we discussed above, a vertex of the smaller graph with more outgoing (incoming) edges cannot be mapped to a vertex of the larger graph with fewer outgoing (incoming) edges. So vertex 1 of G1 and vertex 5 of G2 are better not to be a match. Vertex 3 of G1 and vertex 6 of G2 are better not to be a match either. On the other hand, no such case exists in Figure 2b. Vertices 1, 2, and 3 of G2 have more or equal outgoing and incoming edges than vertices 1, 2, and 3 of G1 . It can become an optimal solution by mapping vertex 2 of G1 to vertex 4 of G2 and keeping the rest of vertices as they are. In the following section, we describe how to calculate the new fitness value based on this simple idea.

f (w1 , w2 ) = w1 f1 + w2 f2 Still, the goal is to minimize the f1 value. Since f (w1 , w2 ) = 0 means f1 = 0, optimizing f (w1 , w2 ) results in optimizing f1 . When w1 = 0.5 and w2 = 0.5, the fitness values of Figure 2a and 2b are

3.2 Formal Representation

f (0.5, 0.5) = 0.5 · 2 + 0.5 · 2 = 2

We present the formal definition of the new fitness function in this section, and give an example with detailed explanation in the next section. When G1 is mapped to a subgraph of G2 by a function

f (0.5, 0.5) = 0.5 · 2 + 0.5 · 0 = 1 respectively.

4. GENETIC FRAMEWORK

g : V (G1 ) → V (G2 ) we calculate the new fitness function value as follows X f2 = J(v)

A hybrid GA we used for the subgraph isomorphism problem is described below. It generates a set of initial solutions, evolves over generations, and terminates when meeting the stopping condition. The best solution is returned as a consequence. We used the same parameter values as in [11] and compared the consequences in order to focus on the effect of the 2nd fitness function.

(2)

v∈V (G1 )

where J(v) =



0 1

if dout (g(v)) ≥ dout (v) and din (g(v)) ≥ din (v) otherwise

Algorithm 2 Genetic algorithm create initial population of fixed size; repeat for i = 0 to n do choose parent1 and parent2 from population; offspring i ← crossover( parent1 , parent2 ); mutation( offspring i ); local-optimization( offspring i ); end for replace n chromosomes with n offspring; until stopping condition return the best chromosome;

We will call this function f2 the 2nd fitness function through this paper. The fitness function f1 presented in Section 2.2 will be called the 1st fitness function.

3.3 Example We present how to calculate the new fitness function value associated with Figure 2. Figure 3 shows three tables. The first table has the degree of each vertex in G1 , and the second one has the degree of each vertex in G2 . Then we make the third table using the first two. Let aij denote the value of ith row and jth column of the table. aij equals 0 if vertex i of G2 has more or equal outgoing and incoming edges than vertex j of G1 . It means that vertex i of G2 can be mapped to vertex j of G1 . aij equals 1 otherwise. We calculate the new fitness value by adding the value of table entries corresponding to each vertex mapping. The 2nd fitness value of Figure 2a is

• Representation The vertices of the two graphs are numbered. A chromosome is a permutation of the vertices of the larger graph. When the smaller graph has n vertices, n vertices of a chromosome are matched to the vertices of the smaller graph in order.

a22 + a51 + a63 = 2

363

Fitness value 50

Table 1: The result of local optimization M η Before f1 f (0.5, 0.5) 0.01 3.95 0.87 0.45 0.05 6.41 0.01 0.01 10 0.1 18.69 2.44 2.44 0.2 27.57 4.91 4.91 0.01 17.46 2.89 1.95 0.05 79.50 23.65 23.87 30 0.1 163.96 72.99 72.64 0.2 270.15 140.12 140.12 0.01 48.52 12.84 10.78 0.05 222.66 98.56 100.32 50 0.1 451.08 261.53 261.20 0.2 759.21 486.85 483.87 0.01 88.87 28.24 23.73 0.05 453.83 263.01 262.44 70 0.1 862.62 573.93 573.17 0.2 1461.46 1032.58 1036.10 0.01 154.34 70.68 62.89 0.05 760.57 520.01 503.87 90 0.1 1397.63 1038.23 1005.13 0.2 2392.51 1841.27 1831.05

1st fitness function 2nd fitness function

45 40 35 30 25 20 15 10 5

0

500

1000

1500

2000

2500

3000

Number of steps

3500

4000

4500

5000

4500

5000

(a) When using f1

Fitness value 60

1st fitness function 2nd fitness function

50 40 30 20 10 0

• Fitness Function We use f (w1 , w2 ) with different values of w1 , w2 and compare the consequences.

500

1000

1500

2000

2500

3000

Number of steps

3500

4000

(b) When using f (0.5, 0.5)

• Initialization When GA starts, a hundred chromosomes are created at random.

Figure 4: Local optimization path plotting

• Selection The roulette-wheel-based proportional selection is used. The probability that the best chromosome is chosen was set to four times higher than the probability that the worst chromosome is chosen.

and the size of a smaller graph is denoted by M . The pairs of graphs are divided into 20 classes according to η and M , so each class contains 10 pairs of graphs. A smaller graph is a complete subgraph of the corresponding larger graph in each case, which means the optimal value of f1 equals zero. The code was programmed in C++ language and compiled using GNU’s g++ compiler. The program was executed on a Xeon CPU 2.4GHz computer.

• Crossover and Mutation We used cycle crossover [13] which is applicable to permutational representation. We produce two offspring and choose the better one as the final offspring. After crossover, chromosomes are mutated with 20 percents chance. We select two genes at random and exchange them.

5.1 Local Optimization We firstly used f (1, 0) = f1 as the fitness function in Algorithm 1, and then used f (0.5, 0.5) as the fitness function, and compared the consequences. Table 1 shows the average value of f1 before and after optimization. We generated 100 chromosomes for each pair of graphs and applied local optimization once. Using f (0.5, 0.5) gave better solutions in 13 cases out of 20. The results of using f1 and f (0.5, 0.5) were the same mostly when M was small. In most part of the other cases, the results of using f (0.5, 0.5) were better. Figure 4a and 4b plot the two fitness function values changing over local optimization process. Comparing the final value of the 1st fitness function in the two cases, the latter gives a better solution. In Figure 4a, the value of f1 declines over time, but the value of f2 roughly increases. It means that the output is far from satisfying the necessary condition to be an optimal solution. On the other hand, both of the f1 value and the f2 value decline in Figure 4b. This shows the synergetic power of the two fitness functions. Introducing and exploiting the 2nd fitness function helps optimize the value of the 1st fitness function.

• Replacement We generate 20 offspring per generation and replace the worst members of the population with them. • Local Optimization Algorithm 1 is used, changing the fitness function into f (w1 , w2 ). • Stopping Criterion The GA stops when it finds a solution with f1 = 0.

5.

0

EXPERIMENTAL RESULTS

In this section, we provide our experimental results. We generated 200 pairs of graphs at random. Larger graphs have 100 vertices, and smaller graphs have 10, 30, 50, 70, or 90 vertices. The edge density of a larger graph is 0.01, 0.05, 0.1, or 0.2. The density of a larger graph is denoted by η,

364

Cost calculated by given fitness function

Table 2: Test Instances M η f1 f (0.5, 0.5) 0.01 0.2192 0.1282 0.05 0.1061 0.0919 10 0.1 0.4857 0.4646 0.2 0.2384 0.1871 0.01 0.2554 0.3798 0.05 0.0593 0.1532 30 0.1 0.1208 0.1393 0.2 0.1491 0.1535 0.01 0.2282 0.4033 0.05 0.0451 0.1510 50 0.1 0.0713 0.2078 0.2 0.1154 0.1437 0.01 0.2213 0.5545 0.05 0.0604 0.2989 70 0.1 0.1916 0.4743 0.2 0.2940 0.3247 0.01 0.2625 0.6018 0.05 0.3307 0.6365 90 0.1 0.4005 0.7366 0.2 0.4365 0.5807

580 570 560 550 540 530 520 510

88

88.2

88.4

88.6

88.8

89

89.2

89

89.5

Average distance

(a) When using f1

Cost calculated by given fitness function 315 310 305 300 295 290 285 280 275 270

5.2 Global Convexity of the New Fitness Landscape

265 260 86.5

87

87.5

88

88.5

Average distance

Given a set of local optima, Boese et al. plotted, for each local optimum, the relationship between the cost and the average distance from all the other local optima [1]. They conducted experiments for the graph bisection and the traveling salesman problem, and showed strong positive correlations in both problems. This fact hints that the best local optimum is highly probably located near the center of the localoptimum space and, roughly speaking, the local-optimum space is globally convex. We conducted their experiments for the subgraph isomorphism problem with different fitness functions and the corresponding local optimization methods. The solution space for the experiment is selected as follows. For both of f (1, 0) = f1 and f (0.5, 0.5), we obtained the corresponding sets of local optima by choosing five-hundred random solutions and applying the corresponding local optimization methods. Then for each local optimum, we plotted the relationship between cost on each fitness function and the average distance from all the other local optima. Here, we used Hamming distance as the distance measure between two solutions. Table 2 shows the results. Each value means costdistance correlation for each fitness function and each instance. Figure 5 shows a sample plotting of an instance with η = 0.05 and M = 90. For all instances except instances with M = 10, the results of f (0.5, 0.5) showed stronger positive cost-distance correlation than those of f1 . Instances with M = 10 are too small for fitness functions to affect the structure of solution space. From this experiment, we can infer that the fitness landscape with f (0.5, 0.5) is more globally convex than that with f1 for sufficiently large subgraph isomorphism instances.

(b) When using f (0.5, 0.5) Figure 5: The relationship between cost and distance

and w2 , we tested over six different pairs of (w1 , w2 ), (1, 0), (0.9, 0.1), (0.7, 0.3), (0.5, 0.5), (0.3, 0.7), and (0.1, 0.9). From the experimental results, we observed that the solution qualities were not so different. However, the results showed clear differences in running time. The average running time tended to decrease as w2 increased. Also, the difference between the average running times increased as M increased, i.e., as the size of instances increased. Figure 6a shows the average running time of GAs tested on 200 pairs of graphs. The running time decreased as w2 increased except for the range [0, 0.1]. Figure 6b for the largest instance (M = 90) shows the tendency better. This tendency seems to be due to the difference of the two fitness values’ ranges. Generally, the value of f2 was much smaller than the value of f1 , especially when the value of f1 was large. So f1 exerts stronger influence even when w1 = w2 . Further analysis of the normalization between the two fitness functions is needed. Table 3 shows the total results of GAs using f (1, 0) = f1 and f (0.1, 0.9). The table contains the average f1 value, the CPU time, and the ratio of finding an optimal solution for each class. The ratio values are in percentage and the CPU time is in seconds. GA with f (0.1, 0.9) gave much better solutions than that with f1 . GA with f (0.1, 0.9) found optimal solutions without exception in 12 cases out of 20. For small graphs (M = 10), both GAs showed similar results. For medium size graphs (M = 30 and 50), the performance of both GAs were relatively unsatisfactory compared to other cases; however, GA with f (0.1, 0.9) showed surely better re-

5.3 GA Performance In our experiments, the number of generations is limited to 100. If GAs find an optimal solution, they terminate instantly. For investigating the appropriate values of w1

365

Average running time(s)

tion between the weight of each fitness function is required. We did not analyze the relation between the relative weight of the two fitness functions, but experimental results show that the distribution of weights affects the consequences.

270 260 250 240 230

7. ACKNOWLEDGEMENTS

220 210

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2011-0018006), Brain Korea 21 Project in 2012, and Engineering Research Center of Excellence Program of Korea Ministry of Education, Science and Technology (MEST) / National Research Foundation of Korea (NRF) (Grant 2012-0000463). The ICT at Seoul National University provided research facilities for this study.

200 190 180 170 160

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Weight of the 2nd fitness function

(a) Total

Average running time(s) 500

8. REFERENCES

450

[1] K. D. Boese, A. B. Kahng, and S. Muddu. A new adaptive multi-start technique for combinatorial global optimizations. Operations Research Letters, 15:101–113, 1994. [2] R. Brown, G. Jones, P. Willett, and R. Glen. Matching two-dimensional chemical graphs using genetic algorithms. Journal of Chemical Information and Computer Sciences, 34(1):63–70, 1994. [3] H. Bunke, P. Foggia, C. Guidobaldi, C. Sansone, and M. Vento. A comparison of algorithms for maximum common subgraph on randomly connected graphs. Structural, Syntactic, and Statistical Pattern Recognition, pages 85–106, 2002. [4] D. Conte, P. Foggia, and M. Vento. Challenging complexity of maximum common subgraph detection algorithms: A performance analysis of three algorithms on a wide database of graphs. Journal of Graph Algorithms and Applications, 11(1):99–143, 2007. [5] S. Cook. The complexity of theorem-proving procedures. In Proceedings of the Third Annual ACM Symposium on Theory of Computing, pages 151–158. ACM, 1971. [6] D. Corneil and C. Gotlieb. An efficient algorithm for graph isomorphism. Journal of the ACM (JACM), 17(1):51–64, 1970. [7] P. Durand, R. Pasari, J. Baker, and C. Tsai. An efficient algorithm for similarity analysis of molecules. Internet Journal of Chemistry, 2(17):1–16, 1999. [8] D. Eppstein. Subgraph isomorphism in planar graphs and related problems. In Proceedings of the Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 632–640. Society for Industrial and Applied Mathematics, 1995. [9] H. Fr¨ ohlich, A. Koˇsir, and B. Zajc. Optimization of FPGA configurations using parallel genetic algorithm. Information Sciences, 133(3):195–219, 2001. [10] M. Garey and D. Johnson. Computers and Intractability, volume 174. Freeman San Francisco, CA, 1979. [11] K. Kim and B. Moon. Malware detection based on dependency graph using hybrid genetic algorithm. In Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation, pages 1211–1218. ACM, 2010.

400 350 300 250 200 150

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Weight of the 2nd fitness function

(b) When M = 90 Figure 6: The average running time of GA sults. It found optimal solutions in more cases than GA with f1 . For large graphs (M = 70 and 90), GA with f (0.1, 0.9) showed superior performance in the average value of f1 and the ratio of hitting optimal solutions. Moreover, the running time was reduced significantly when using f (0.1, 0.9). This means that the new fitness function made the genetic search more efficient. The GA with the new fitness function found the optimal solution more quickly.

6.

CONCLUSIONS

In this paper, we proposed a multi-objective GA for the subgraph isomorphism problem. We introduced an additional fitness function which reflects the potential of a solution. The new fitness function considers whether a solution satisfies the necessary condition to be an optimal solution, and put a high valuation on such solutions. By computing cost-distance correlation, we verified that the new fitness function makes the solution space more globally convex than the previous one. This property is good for local heuristics and GAs to find the optimal solution efficiently. We also observed that the running time of GAs decreased as the relative weight of the new fitness function increased. Based on these observations we designed an efficient GA with the best suitable fitness function and local optimization algorithm. Experimental results showed that the proposed GA improves the performance and efficiency. Even though we only dealt with directed graphs in this paper, the idea can be applied to undirected graphs. The effect of this approach on undirected graphs will be worth examining. As another issue for future study, the normaliza-

366

M

10

30

50

70

90

η 0.01 0.05 0.1 0.2 0.01 0.05 0.1 0.2 0.01 0.05 0.1 0.2 0.01 0.05 0.1 0.2 0.01 0.05 0.1 0.2

Table 3: The result of optimization by GA f1 f (0.1, 0.9) f1 CPU(s) Ratio(%) f1 CPU(s) Ratio(%) 0.00 0.32 100 0.00 0.28 100 0.00 0.25 100 0.00 0.24 100 0.00 1.26 100 0.00 3.45 100 1.33 19.58 47 1.23 30.31 47 0.00 18.05 100 0.00 8.54 100 12.57 194.38 7 12.00 359.96 10 28.37 133.18 43 21.77 235.84 57 43.73 92.08 60 25.70 153.58 77 2.97 469.52 10 0.30 493.49 70 56.70 412.92 23 7.45 228.06 90 21.80 129.80 90 9.70 167.71 95 27.70 94.95 93 41.40 167.59 90 11.00 936.93 0 0.00 464.26 100 43.27 388.58 80 0.00 126.65 100 16.17 132.15 97 0.00 79.29 100 0.00 95.25 100 0.00 110.03 100 8.37 1379.68 43 0.00 485.13 100 14.93 240.13 97 0.00 81.96 100 0.00 134.92 100 0.00 61.00 100 0.00 170.88 100 0.00 90.31 100

[12] J. McGregor. Backtrack search algorithms and the maximal common subgraph problem. Software: Practice and Experience, 12(1):23–34, 1982. [13] I. Oliver, D. Smith, and J. Holland. A study of permutation crossover operators on the traveling salesman problem. In Proceedings of the Second International Conference on Genetic Algorithms and Their Application, pages 224–230. L. Erlbaum Associates Inc., 1987. [14] J. Rutgers, P. Wolkotte, P. Holzenspies, J. Kuper, and G. Smit. An approximate maximum common subgraph algorithm for large digital circuits. In 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), pages 699–705. IEEE, 2010. [15] J. Schaffer and J. Grefenstette. Multi-objective learning via genetic algorithms. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pages 593–595. Citeseer, 1985. [16] J. Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM (JACM), 23(1):31–42, 1976. [17] M. Wagener and J. Gasteiger. The determination of maximum common substructures by a genetic algorithm: Application in synthesis design and for the structural analysis of biological activity. Angewandte Chemie International Edition in English, 33(11):1189–1192, 1994.

367