Computer aided selection in breeding programs using genetic algorithm in MATLAB program

Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) Available online at www.inia.es/sjar Spanish Journal of Agricultural Re...
3 downloads 1 Views 109KB Size
Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA) Available online at www.inia.es/sjar

Spanish Journal of Agricultural Research 2010 8(3), 672-678 ISSN: 1695-971-X eISSN: 2171-9292

Computer aided selection in breeding programs using genetic algorithm in MATLAB program M. Azimzadeh1*, R. Amiri1, E. Davoodi-Bojd2, H. Soltanian-Zadeh2, S. Vahedi3 and M. Hoori1 1

Department of Agronomy and Plant Breeding Sciences. College of Abouraihan. University of Tehran. 33916-53775 Tehran. Iran 2 Control and Intelligent Processing Center of Excellence. University of Tehran. P.O. Box: 14395-515. Tehran. Iran 3 Plant Breeding Research Department. Sugar Beet Seed Institute. P.O. Box: 31585-4114. Karaj. Iran

Abstract In plant and animal breeding, the best individuals are selected for the next breeding cycle based on the selection index computed from observed phenotypic values of several traits. However, in calculating the selection index, large amounts of data must be analyzed which is still performed by a calculator. This can cause imperfections in the breeding procedures. In this paper an automatic method for simulating a population under natural selection is proposed based on the selection operator of the genetic algorithms. The fitness function of the algorithm is a linear combination of the individual traits imported by the user. The algorithm generates both general and detailed scores of each trait for each labeled individual. The individuals are sorted with respect to their general scores and it is possible to extract individuals whose general scores are greater than a threshold defined by the user. The outlier individuals can also be eliminated. Moreover, for improved illustration and comparison, the individuals are displayed in a graph based on their index values. The proposed algorithm was applied to two distinct dataset and shown that results of the two methods coincide. The proposed method is automatic, fast, and free of human mistakes. Therefore, it is expected to improve the breeding procedures, especially when the numbers of individuals and traits are huge. Additional key words: artificial selection, cucumber, selection index, simulation, sugarbeet.

Resumen Selección con ayuda de ordenador en programas de mejora genética utilizando el algoritmo del programa MATLAB Tanto en mejora vegetal como animal, se seleccionan los mejores individuos para el próximo ciclo de reproducción basándose en el índice de selección de varios caracteres calculado a partir de valores fenotípicos observados. Sin embargo, al calcular el índice de selección, se deben analizar gran cantidad de datos, lo que aún se realiza con una calculadora. Esto puede causar imperfecciones en los procedimientos de mejora. En este trabajo se propone un método automático para la simulación de una población sometida a selección natural basado en el operador de selección de los algoritmos genéticos. La función de aptitud del algoritmo es una combinación lineal de los caracteres individuales importados por el usuario. El algoritmo genera tanto puntuaciones generales como detalladas de cada carácter para cada individuo etiquetado. Los individuos son ordenados de acuerdo a su puntuación general y es posible extraer aquellos cuyos resultados generales son mayores que un umbral definido por el usuario. Los individuos anómalos también pueden ser eliminados. Para una mayor ilustración y comparación, se muestran los individuos en un gráfico según sus valores. Se aplicó el algoritmo propuesto a dos conjuntos de datos distintos y los resultados de los dos métodos coincidieron. El método propuesto es automático, rápido y libre de errores humanos. Por lo tanto, se espera que mejore los procedimientos de cultivo, sobre todo cuando el número de individuos es grande y los caracteres numerosos. Palabras clave adicionales: índice de selección, pepino, remolacha, selección artificial, simulación.

* Corresponding author: [email protected] Received: 25-04-09; Accepted: 17-06-10.

Computerized selection in breeding programs using genetic algorithm

Introduction In plant and animal breeding, the best individuals are selected for the next breeding cycle on the basis of observed phenotypic values for several traits in each candidate individual (Cerón-Rojas et al., 2006). A selection index (Hazel, 1943) is a recommended method for selecting plants and animals when more than one trait is involved (Falconer and Mackay, 1996; Sivanadian and Smith, 1997) which usually can estimate an individual value in genotype collection (Yamada, 1989). In other words, the synchronized selection for all the important traits is the best selection method, by considering their heritability, economic value and also phenotypic and genotyping correlations between distinct traits. In this method, an index is defined as a singletrait based on which population’s individuals are selected (Borojevic, 1990). The best selection method is based on the entire available information about each individual’s breeding value (Hallauer and Miranda, 1981). The first paper on selection index was written by Smith (1936) who applied directly the discriminant function of Fisher (1936) to multi traits selection in plant populations (Yamada, 1989). Breeders attained additional success using the selection index in comparison with direct selecting of the traits (Gravois and McNew, 1993; Jannink et al., 2000). In other words, when improvement is desired for several traits that may differ in variability, heritability, economic importance, and in the correlation among their phenotypes and genotypes, simultaneous multiple-trait index selection was more effective than independent culling levels or sequential selection (Hazel et al., 1994). Considering the importance and usefulness of the selection index and the selection processes in some breeding programs, it would be important to use the great capabilities of genetic algorithms in this field. Selection is one of the most important operators of a genetic algorithm and nowadays genetic algorithms are one of the best optimization and evolutionary computation methods (Winter et al., 1995; Mitchell, 1999). Evolutionary algorithms are searching and optimizing methods derived from Darwinian theory of evolution («natural selection» defined as: the process in nature by which only the organisms best adapted to their environment are selected). These algorithms try to simulate and imitate some evolutionary characteristics of survival in natural environments (Koza, 1992; Fogel, 2006). A genetic algorithm is a search technique used in computing to find exact or approximate solutions to

673

optimization and search problems. Genetic algorithms are a particular class of evolutionary algorithms (also known as evolutionary computation) that use techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossover (Mitchell, 1999; Reeves and Rowe, 2002). Genetic algorithms have three important operations selection, mutation and crossing-over that simulate conditions of individuals in natural environments (Mitchell, 1999; Wang et al., 2007). Considering the importance of the selection indices in most breeding programs, and also the large amounts of data to be analyzed for the calculation of selection indices, a population was simulated under natural selection by using selection operator of the genetic algorithms in the MATLAB program (The MathWorks, Inc.). In this way, the best candidate individuals can be selected from the population accurately, easily, and quickly. Up to now, this field has not been extensively studied and genetic algorithms have not been used for the selection process in breeding programs. Our research is the first work in this field and thus we do not have a literature review about it.

Material and methods Specifications of the algorithm The proposed algorithm was developed in MATLAB environment. This software has powerful calculation facilities and is able to import and export data through spreadsheet files. The fitness function of the algorithm (which corresponds to the selection index) is a linear combination of the individual features (traits) imported by the user. General scores, detailed scores of each trait, and also the label of each individual are displayed in the output. Individuals are sorted by their general scores and it is possible to extract individuals whose general scores are greater than a threshold defined by the user. This threshold can also be used for opposite selection in which a threshold is defined for eliminating outlier individuals. Individuals elected in that way form the next generation.

Structure of the algorithm The proposed algorithm can be expressed in detail as follows:

674

M. Azimzadeh et al. / Span J Agric Res (2010) 8(3), 672-678

0- begin 1- [input initial values] 1-0 th←threshold 1-1 Nt←Number of traits 1-2 Np←Number of individuals (population size) 1-3 j←1 1-4 a(j)←in 1-5 j←j+1 1-6 if j

Suggest Documents