Genetic Algorithms and Genetic Programming Natalia Grafeeva, Lyudmila Grigorieva, & Natalia Kalinina-Shuvalova Manuscript Received: 25, Jul., 2013 Revised: 9, Aug., 2013 Accepted: 10,Aug., 2013 Published: 15,Aug., 2013

Keywords genetic algorithms, genetic programming,

Abstract This article discusses genetic algorithms and their application to three specific examples. The basic principles upon which the genetic algorithms are based are discussed. An example of the use of a genetic algorithm for finding the roots of a Diophantine equation is presented. A genetic program is next used to approximate additional values in a tabulated function. The third case we consider is the development of stock exchange trading systems.

Diophantine equation 1

1. Introduction Currently programming technology is developing incredibly quickly. Also the last decade has seen a rapid growth of computer power. These circumstances make it possible to search for new ways of solving problems. We have reached a stage where mathematicians can do now as engineering has done earlier and derive inspiration from nature. For our purposes here the example from nature we choose to follow is genetics. Once we have completed our system definitions we can set about performing “evolutionary computations”. It is possible that such genetic algorithms and genetic programming will lift programming to the next stage. The reported origins of genetic algorithms go back to the 1950s [1]. In the early sixties Holland published his genetic algorithms and classification systems [2]. General recognition of this new field occurred after his book "Adaptation in Natural and Artificial Systems" was published [3]. This book has become a classic in the field. Currently this area is undergoing rapid development due to the availability of high capacity computing. Let us recall some of the basic ideas of the theory of natural selection provided by Charles Darwin in 1859 in his work "On the Origin of Species by Means of Natural Selection" [4]. • The population of any species is composed of different individuals. • Individuals of this species compete for resources. • More new individuals are born than the resources can support (i.e. there is competition among individuals for resources). Natalia Grafeeva, Lyudmila Grigorieva & Natalia Kalinina-Shuvalova are with Mathematics and Mechanics Faculty, St. Petersburg State University, RUSSIA. (Email: [email protected])

• Individuals who have features that help win the fight for resources more successfully survive and have offspring that could inherit those features. • Any feature of an individual can either be inherited from a parent, or be due to a mutation that occurred for some external reason. It seems quite attractive to move the idea of self-development in nature to the area of computer technology. We would like to introduce such an approach in our problem solving, however, there are two difficult points in terms of its implementation in software algorithms. Nature doesn't have any specific target for the alteration of species (other than the requirement of individual survival). In our programming we need to define or establish some specific preferences. Natural process development proceeds from many factors, all are random, and often chaotic. The output of an algorithm has a precise direction, it is completely determined. A classical genetic algorithm must build some "individual" (number, vector, program, etc.). We can compare “individuals” with each other and select those more suitable for solving the original problem. The special fitness function idea exists for achieving this goal. A fitness value can be calculated for each individual. The existence of the fitness function is a requirement for the application of a genetic approach for solving some problems. Defining and constructing the fitness function is one of the stages in the implementation of genetic algorithms. Our algorithm must have the ability of determining the quality of a constructed solution. We should also have rules (genetic operators) for defining the next generation on the basis of parent “individuals”. The difficulty is not that we have simply to mate the original individuals, but to do it in such a way that the offspring have a greater suitability for our purposes than their parents had. A genetic algorithm has a cycle composed of the following stages: • generation of an initial population; • selection of individuals to produce offspring; • the use of genetic cross and mutation operators; • formation of a new generation. Strictly speaking, the first step is outside of the scope of the cycle. Only the last three steps are repeated. The first generation is typically a set of randomly generated individuals (a sufficiently large number - the size of the initial population). During the selection stage, each individual fitness function is evaluated, and those values are used for choosing the parents for the next generation. After calculating the value of this fitness function for all individuals in the current generation a variety of methods

476

International Journal of Advanced Computer Science, Vol. 3, No. 9, Pp. 475-480, Sep., 2013.

can be used for selecting potential parents [5]. The most popular is to simply rank them by fitness function value. The individual fitness function values are sorted (either ascending or descending), and are indexed in this order (given a position or count number). The probability of selecting an individual to be a parent is proportional to its index value (if sorted in ascending order). A second approach is called the tournament method. The choice of the parent individuals is the best of the M randomly selected individuals. Typically, M = 2. The third method truncates the fitness function list. The individuals function values are first sorted. The set of the N best individuals is selected. Then randomly select the required number of next generation parent individuals from this truncated list. There is also a strategy that chooses a few individuals with good fitness function values to move into the next generation unchanged. In the third step the genetic cross and mutation operators are applied to the selected individuals. The last step of the cycle is the formation of a new generation. Typically two possible options are used. In the first, only children will be included in the new generation. In the second, the new generation will include the best individuals of the current generation and the resulting children. The algorithm continues until the specified number of generations is completed, or a locally optimal solution is found (an individual with the required value of the fitness function). In genetic programming an individual is itself a program. This program is the child produced by crossbreeding programs or a mutation of the program. For efficient operation of a genetic algorithm it is necessary to provide three things: • the definition of all individual’s characteristics; • a suitable fitness function; • the genetic operations. In genetic programming, we need a detailed description of any program representing an individual and this program must allow us to apply genetic operators. We need to have a full-fledged program produced after the genetic operators have been applied. It is reasonably obvious that we should use a tree representation to organize the programs. Programs composing the new generation are derived from fragments of programs in ancestors. A mutation corresponds to the replacement of a fragment of a program. Ultimately we need software that effectively solves a certain class of problems. The fitness function should be able to determine how effectively any of the programs being generated solves the problem. We are interested in a cross of programs and/or program mutations, after which the resulting offspring will have a good value of fitness as soon as possible. In creating a genetic program one should consider: • which parts of the programs you consider "indivisible" (i.e. basic) and that you will retain when

performing genetic operations over the current generation of programs; • the fitness function (an additional program or software system), which will test the effectiveness of solutions; • the algorithm for applying the genetic operators(cross, mutations and so on) to fragments of programs that will be tried on each following generation with the intention of increasing the value of the fitness function; • some condition defining the completion of the algorithm. It is important to consider the question of convergence of the genetic algorithms. In any case convergence means that as a result of repeated application of the genetic operators the fitness function value for the individuals will improve. Holland [3] presents the schema theorem, which provides the theoretical basis of the classical genetic algorithm. We next describe some of the specific ideas and solution approaches to this class of problems. Standard arithmetic operations, mathematical and logical functions, and specific functions known to be appropriate for the subject area can be selected as basic objects. The software components would likely include standard data types: Boolean, integer, floating point, vector, character, or multi-valued. How can we recognize the fitness of the generated program? There are many different ways to determining the fitness, which depend on the program specificity. The most widespread of them is linear approach. In this case we have a solution of the problem for some given set of source data. The fitness function calculates the difference between the known solution and what is obtained for the current generation program for each set of source data. Thereafter, the average of all received differences is treated as a criterion. Criteria that can be taken into account include the size of the resulting program and/or its speed (we might encourage the creation of more compact programs and/or short execution time). We now apply the above approach to solve the following 3 problems.

2. Application of a genetic algorithm to find the roots of a Diophantine equation We want to find the roots in positive integers of the following Diophantine equation: 2x + y + 3z + 7u = 54 First we need to check that this equation has a solution. In order to do this, make sure that the right side of the equation is divisible by the greatest common divisor of the coefficients of the left-hand side of equation. We find that the greatest common divisor (GCD) of the coefficients: 2, 1, 3 and 7 is: GCD (2,1,3,7) = 1

International Journal Publishers Group (IJPG) ©

Grafeeva et al.: Genetic Algorithms and Genetic Programming.

477

TABLE 1 THE 1-ST GENERATION

The ID of an individual in the1-st generation

Individual (x,y,z,u)

The values of the fitness function

Probability

1

(3,6,7,4)

abs(2 * 3 + 6 + 3 * 7 + 7 * 4 - 54) = 7

(1 / 7) / 1.31 = .11

2

(7,2,6,3)

abs(2 * 7 + 2 + 3 * 6 + 7 * 3 - 54) = 1

(1 / 1) / 1.31 = .76

3

(11,8,12,6)

abs(2 * 11 + 8 + 3 * 12 + 7 * 6 - 54) = 54

(1 / 54) / 1.31 = .01

4

(5,30,9,5)

abs(2 * 5 + 30 + 3 * 9 + 7 * 5 - 54) = 48

(1 / 48) / 1.31 = .02

5

(17,38,3,4)

abs(2 * 17 + 38 + 3 * 3 + 7 * 4 - 54) = 55

(1 / 55) / 1.31 = .01

6

(21,9,11,3)

abs(2 * 21 + 9 + 3 * 11 + 7 * 3 - 54) = 51

(1 / 51) / 1.31 = .01

7

(9,33,12,3)

abs(2 * 9 + 33 + 3 * 12 + 7 * 3 - 54) = 54

(1 / 54) / 1.31 = .01

8

(15,24,1,6)

abs(2 * 15 + 24 + 3 * 1 + 7 * 6 - 54) = 45

(1 / 45) / 1.31 = .02

9

(19,31,2,3)

abs(2 * 19 + 31 + 3 * 2 + 7 * 3 - 54) = 42

(1 / 42) / 1.31 = .02

10

(8,15,7,5)

abs(2 * 8 + 15 + 3 * 7 + 7 * 5 - 54) = 33

(1 / 33) / 1.31 = .02

This is sufficient to show that a solution exists and we can begin our search. We know that x>= 1, y >= 1, z >= 1, u >= 1 and 2x + y + 3z + 7u = 54. Therefore the roots of the equation to be found must be in the following ranges: 1