Evolving classifiers on field programmable gate arrays: Migrating XCS to FPGAs

Journal of Systems Architecture 52 (2006) 516–533 www.elsevier.com/locate/sysarc Evolving classifiers on field programmable gate arrays: Migrating XC...

Author: Rosa Logan

5 downloads 0 Views 511KB Size

Report

Download PDF

Recommend Documents

3) ASICS vs FPGAs. Field Programmable Gate Arrays

Regular Datapaths on Field-Programmable Gate Arrays

Total Ionizing Dose Effects on Xilinx Field-Programmable Gate Arrays

Regulamin specjalistycznego Kursu programowania FPGA (Field Programmable Gate Arrays),

Practical Guidelines for Interfacing Data Converters to Field Programmable Gate Arrays

EXPERIMENT NUMBER 3 INTRODUCTION TO FIELD PROGRAMMABLE GATE ARRAYS AND LOGIC

Low-Cost Digital Ultrasound Beamformer Design using Field Programmable Gate Arrays

Field Programmable Gate Array Based Speed Control of BLDC Motor

Programables en el Campo (Field Programmable Gate Array) FPGA

SUBTRACTION AND ADDITION DESIGN USING FIELD PROGRAMMABLE GATE ARRAY (FPGA)

REAL-TIME FINITE DIFFERENCE PHYSICAL MODELS OF MUSICAL INSTRUMENTS ON A FIELD PROGRAMMABLE GATE ARRAY (FPGA)

Migrating Static Systems to Partially Reconfigurable Systems on Spartan-6 FPGAs

Field-Programmable Gate Array in Miniature Ion Mobility Spectrometer Sensor System

A Floating-Gate-Based Field-Programmable Array of Analog and Digital Devices

Migrating to Windows 7

Implementation of Linear Model Predictive Control using a Field Programmable Gate Array

Migrating to the Cloud

Effect of Gate Electric Field on Single Organic Molecular Devices

UNIVERSITI PUTRA MALAYSIA DUAL-SAMPLING SIGMA-DELTA ANALOG-TO-DIGITAL CONVERTER IMPLEMENTATION IN FIELD-PROGRAMMABLE GATE ARRAY

Extending the Field Access Pointcuts of AspectJ to Arrays

Programmable Aperture Photography: Multiplexed Light Field Acquisition

XCS (Stage Race) 2016

Migrating from SharePoint to SharePoint

Migrating ExcelForms to the.net Framework

Journal of Systems Architecture 52 (2006) 516–533 www.elsevier.com/locate/sysarc

Evolving classifiers on field programmable gate arrays: Migrating XCS to FPGAs Cristiana Bolchini *, Paolo Ferrandi, Pier Luca Lanzi, Fabio Salice Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy Available online 19 April 2006

Abstract The paper presents the first results of the prototype implementation of the eXtended learning Classifier System (XCS) in hardware and precisely on Field Programmable Gate Arrays. For this purpose we introduce a version of the XCS classifier system completely based on integer arithmetic, that we name XCSi, instead of the usual floating point one, to exploit the peculiarities and overcome the limitations of the hardware platform. We present an analysis of XCSi performance and the guidelines for a hardware implementation, showing that, although there is a dramatic reduction of available precision, the integer version of XCS can reach optimal performance in all problems considered, though it often converges more slowly than the original floating point version. Guidelines for a hardware implementation are provided, by analyzing how XCSi functional components can be designed on an FPGA. ! 2006 Elsevier B.V. All rights reserved. Keywords: Learning classifier systems; XCS; FPGA

1. Introduction Field programmable gate arrays (FPGAs) are very large scale integrated (VLSI) circuits consisting of two parts: (i) an array of logic circuits and (ii) a programmable interconnect wiring, to be opportunely set depending on the desired functionality; information about interconnections structure is stored in a RAM, during an initial configuration phase. FPGAs have gained rapid acceptance and growth over the past decade because they can be used for a very wide range of applications. This component is today commonly used in embedded * Corresponding author. Tel.: +39 0223993619; fax: +39 0223993410. E-mail address: [email protected] (C. Bolchini).

systems, e.g., in the telecommunication and automotive fields [1]. With the widespread of portable devices such as PDAs and smart phones, and other embedded mobile devices equipped with growing computational power and storage resources, numerous applications have been studied with respect to the possibility of being executed on this class of devices. From this viewpoint, genetic algorithms (GAs) for solving practical problems, such as the Travelling Salesman Problem, Job shop Scheduling, are currently being adapted to be implemented on FPGA [2,3]. Furthermore, in the last ten years FPGAs have received a lot of attention as a viable target for implementing Artificial Neural Networks (ANN) on FPGAs [4,5]. Learning classifier systems (LCS) [6–8] are genetics-based problem solvers introduced by John

1383-7621/$ - see front matter ! 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2006.02.007

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

Holland [9], the father of genetic algorithms. They are rule-based systems that exploit methods of evolutionary computation to search for the best set (the best population) of condition-action rules (i.e., the classifiers) to solve a target problem. While in genetic algorithms each individual represents a candidate solution to the target problem, in learning classifier systems the entire population represents the solution, while each individual represents a rule (a classifier), that is, a piece of the overall solution. The last ten years have seen the flourishing of the research in learning classifier systems with a wide range of successful applications, e.g., autonomous robotics [10], Data Mining [11–13], but also many others [14]. Among the various models of learning classifier systems available [15], Wilson’s eXtended Classifier System (XCS) [16] is by far the most successful one and therefore the most studied: in fact, it represents the major research direction in this area. In this paper we present the first results regarding the migration of the XCS classifier system to Field Programmable Gate Arrays (FPGAs). XCS heavily relies on floating point arithmetic, as almost every other evolutionary approach. Although FPGAs allow the use of floating point arithmetic, this approach is expensive in terms of computation cost, area, connections and power dissipation; therefore to make FPGAs a viable target platform for a high speed, reconfigurable XCS solution, we develop an integer-based version of the classifier system (XCSi). To implement an XCS controller on an FPGA, we can follow two main approaches. We can evolve the controller using the floating point (software) version of XCS (executed on a standard system), then transfer the evolved solution on a integer-based version of a basic XCS core (implemented on an FPGA) in which both evolution and learning are missing. Alternatively, we can evolve the controller directly with an integer-based fully equipped version of XCS, implemented on an FPGA. In this paper we follow the latter approach and develop a fully integer based implementation of XCS that can be (and it is planned to be) conveniently implemented on an FPGA. For this purpose we introduce XCSi, a version of XCS completely based on integer arithmetic. We present an analysis of the performance of the XCSi and the guidelines for a hardware implementation, showing that, although there is a dramatic reduction of available precision, the integer version of XCS can reach optimal performance in all problems though it often converges more slowly than the original floating point version. Experimental results

517

we present (preliminary discussed in [17]) on a fully integer-based version of XCS show that, with a rather small number of bits, XCS can evolve accurate maximally general solutions for the typical Boolean multiplexer and Boolean parity problems (the most typical testbed for LCS models [16]). The analysis we present, together with the implementation we test, provides effective guidelines to develop a forthcoming VHDL (Very High Speed Integrated Circuits Hardware Description Language) description of the integer-based XCS that will lead to an actual FPGA based version of XCS as recently proposed in [18]. We refer the interested reader to [19] for more details on the FPGA implementation. Finally, with respect to the related work, it is worth noting that until now no implementation of classifiers systems on FPGAs has been presented in the literature; the only work in this area is the proposal by Danek [18] which inspired this work. 1.1. Organization The paper is organized as follows. We begin, in Section 2, with a short description of XCS [16,20]. Then, in Section 3, we discuss the various issues that we face when moving from the usual floating-point implementation of XCS, to an integer-based implementation. In Section 4, we provide an overall picture of the integer-based implementation of XCS, we call it XCSi. In Section 5, we introduce the experimental design we use to test the integer-based implementation of XCS in Section 6. In Section 7, we illustrate some guidelines for an effective hardware implementation of XCSi on FPGAs. Finally, we draw some future research directions in Section 8. 2. The XCS classifier system We now provide a brief description of XCS; for further details, we refer the reader to the algorithmic description in [20]. 2.1. Overview Learning classifier systems learn to solve a problem in the most typical Reinforcement Learning setting [21], i.e., through trial–error interactions with an unknown environment which actually describes the problem. The system and the environment interact continually. At time t, the system receives the current state of the environment st (i.e., the current problem instance); depending on

518

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

its current knowledge (represented by the individuals in the population) the system selects which action at must be performed; based on the current state st and on the action at the system receives an immediate numerical reward rt; the reward estimates how well action at has taken the system toward the solution to the target problem. Note that the system knows nothing about the environment (about the problem) except for the current situation (i.e., the state st) and how well its actions perform (i.e., the reward rt). Learning classifier systems learn to solve the target problem simply by trying to maximize the amount of rewards it receives as a consequence of its actions. 2.2. Classifiers In XCS, classifiers consist of a condition (commonly defined over the {0, 1, #} alphabet [9,16]), a (usually discrete) action, and four main parameters: (i) the prediction p, which estimates the relative payoff that the system expects when the classifier is used; (ii) the prediction error e, which estimates the error of the prediction p; (iii) the fitness F, which estimates the accuracy of the payoff prediction given by p; and finally (iv) the numerosity num, which indicates how many copies of classifiers with the same condition and the same action are present in the population [P]. 2.3. Performance component At each time step, XCS builds a match set [M] containing the classifiers in the population [P] whose condition matches the current sensory inputs; if [M] does not contain at least one classifier for each possible action, the covering operator [16] is activated. This creates a new classifier with a randomly generated condition, that matches the current inputs, and a random action. For each possible action ai in [M], XCS computes the system prediction P(ai) which estimates the payoff that the XCS expects if action ai is performed. The system prediction is computed as the fitness weighted average of the predictions of classifiers in [M], cl 2 [M], which advocate action ai (i.e., cl Æ a = ai): P clk 2½M%jai p k & F k ð1Þ P ðai Þ ¼ P clk 2½M%ja F k i

where ½M%jai represents the subset of classifiers of [M] with action ai, pk identifies the prediction of classifier clk, and Fk identifies the fitness of classifier clk. Then XCS selects an action to perform; the classifiers in

[M] which advocate the selected action form the current action set [A]. The selected action is performed in the environment, and a scalar reward r is returned to XCS together with a new input configuration. 2.4. Reinforcement component When the reward r is received, the prediction value P is computed as follows: P ¼ r þ c max P ðaÞ a2½M%

ð2Þ

Next, the parameters of the classifiers in [A] are updated in the following order [20]: prediction, prediction error, and finally fitness. Prediction p is updated with learning rate b (0 6 b 6 1): p

p þ bðP ( pÞ

ð3Þ

if we substitute P from Eq. (2) we obtain the following update for classifier prediction: ! " p p þ b r þ c max P ðaÞ ( p ð4Þ a2A

Then, the prediction error e is updated as: e b(jP ( pj ( e).

e+

2.5. Fitness update The update of classifier fitness consists of three steps. First, the raw accuracy j of the classifiers in [A] is computed as # 1 if e 6 e0 jðeÞ ¼ ð5Þ (m aðe=e0 Þ otherwise Fig. 1 illustrates the idea behind raw accuracy. The parameter e0 (e0 > 0) is the threshold that determines to what extent prediction errors are accepted; a (0 < a < 1) causes a strong distinction between accurate and inaccurate classifiers; m (m > 0), together with e0, determines the steepness of the slope used to calculate classifier accuracy. The raw accuracy j is used to calculate the relative accuracy j 0 as j0 ¼ P

ðj & numÞ cl2½A% ðcl ) j & cl ) numÞ

ð6Þ

where cl Æ j is the raw accuracy of classifier cl, as computed in Eq. (5); cl Æ num is the numerosity of classifier cl. Finally the relative accuracy j 0 is used to update the classifier fitness as: F F + b(j 0 ( F). In XCS, the error e provides an absolute estimate of the error that affects classifier prediction. Instead, the fitness F estimates the classifier relative accuracy with respect to the evolutionary niches in which the

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

RAW ACCURACY κ

1

START OF THE SLOPE AT (ε0,α)

DESCENDING SLOPE α(ε/ε0)−ν

0

PREDICTION ERROR ε

Fig. 1. The classifier accuracy j as a function of the classifier prediction error e.

classifier appears. The use of relative accuracy instead of an absolute error induces an intrinsic pressure toward generality which is the main learning engine in XCS [22]. 2.6. Discovery component On regular basis (on an average of hga steps), the genetic algorithm is applied to classifiers in [A]. It selects two classifiers with probability proportional to their fitnesses, copies them, and with probability v performs crossover on the copies; then, with probability l it mutates each element ({0, 1, #}) of the classifier conditions. The resulting offspring are inserted into the population and two classifiers are deleted to keep the population size constant. 2.7. Classifier deletion If the number of classifiers in the population [P] exceeds the threshold N, excess classifiers are deleted to keep the population size constant. The deletion process is applied to the classifiers in the whole population [P]. It selects classifiers with probability proportional to an estimate of the size of the action sets that the classifiers occur in. This estimate is stored in the classifier action set size parameter, as. If the classifier is sufficiently experienced and its fitness F is significantly lower than the average fitness of classifiers in [P], its deletion probability is further increased. 2.8. Macroclassifiers In XCS a macroclassifier technique is used to speed processing and provide a more perspicuous

519

view of population contents. Macroclassifiers represent a set of classifiers with the same condition and the same action by means of a new parameter called numerosity. Whenever a new classifier is generated by the genetic algorithm (or covering), [P] is scanned to see if there already exists a classifier with the same condition and action. If so, the numerosity parameter of the existing classifier is incremented by one, and the new classifier is discarded. If not, the new classifier is inserted into [P]. The resulting population consists entirely of structurally unique classifiers, each with numerosity P1. If a classifier is chosen for deletion, its numerosity is decremented by 1, unless the result would be 0, in which case the classifier is removed from [P]. All operations in a population of macroclassifiers are carried out as though the population consisted of conventional classifiers; that is, the numerosity is taken into account. In a macroclassifier population, the sum of numerosities equals N, the traditional population size. [P]’s actual size in macroclassifiers, M, is of interest as a measure of the population’s space complexity. 3. From floating point to integer In XCS, effective learning is based on the balance between (i) the pressure toward general classifiers caused by the niched genetic algorithm (the set pressure in [22]) and (ii) the pressure toward accurate classifiers enforced by the accuracy based fitness (the fitness pressure in [22]). Both set pressure and fitness pressure depend on the correct estimates of the three main classifier parameters: prediction, prediction error, and fitness. When moving from the usual floating point based XCS to an integer based XCS, we must ensure that the (integer) classifiers parameters can still support such successful balance. Moreover, we also wish that the computational effort required to evolve accurate maximally general solutions (in terms of number of problem examples seen by XCS) could be the same for the two versions, although an actual FPGA implementation would execute much faster. The performance and the discovery components of XCS are not much affected by the change from floating point to integer-based arithmetic. In fact, given that all the classifiers parameters are represented by integers of n-bits, both the computation of system prediction and the offspring selection can be computed with integer values. On the other hand, the parameter updates performed in the

520

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

reinforcement component can be source of rounding errors. These errors depend both on the number of bits used to represent classifiers parameters and also on the procedures adopted to update classifiers parameters. To develop the integer based implementation of XCS, referred to as XCSi, we need (i) to discretize classifiers parameters, i.e., to select the resolution of the integer representation (the number n of bits used to represent the classifier parameters); (ii) to modify some of the update procedures performed on the classifiers parameters, in order to deal with the integer representation so as to limit possible rounding errors. In the following, we discuss how to select an adequate number n of bits to represent classifiers parameters (Section 3.1) and how the XCS parameter updates need to be modified for the integer representation (Sections 3.2 and 3.3).

a minimum significant error (much greater than the double floating point precision) must be adequately selected. The first rough bound to determine n can be derived by noting that XCS should be able to distinguish among all the classifiers in the population with respect to their parameters. We note that a typical population is composed by some thousands of classifiers and the performance component can discriminate the different classifiers in a match set by their prediction and fitness. In XCS classifier fitness does not directly depend on classifier prediction in that classifier fitness is defined as the accuracy on the estimate of future payoffs provided by classifier prediction. Thus, since prediction and fitness are roughly independent values, n must satisfy this bound:

3.1. Discretization of classifier parameters

which states that to be able to differentiate among the classifiers in [P], n must be set such that there exist at least N distinct values of classifier prediction and fitness pairs. If this condition is not satisfied, XCSi is not able to exploit the whole population since some classifiers will be undistinguishable based on their parameters. On the other hand, the genetic algorithm uses only fitness to distinguish among the classifiers in the population and thus n must also satisfy

To select the number of bits used to represent classifier parameters, we focus on the three main real-valued classifier parameters: prediction, prediction error, and fitness. In fact, all the other classifier parameters (e.g., experience, action set size, and time stamp) can be represented by integers with no impact on the rest of the system. In XCS the values of the these three parameters can be arbitrarily set, since they can be easily scaled up: the range of classifier prediction is usually set to fit the problem characteristics; the range of prediction error is scaled up according to the range of classifier prediction; finally, the range of classifier fitness is between 0 and 1, but a different range must be employed for fitness in integer representation. Thus in XCSi we scale all the classifier parameters to the same range, [0, MR], where MR is the maximum incoming reward. Note that in XCS this approach is already adopted for classifier prediction and classifier prediction error, here we just extend it to the range of classifier fitness. Given the number n of bits used to represent the three real-valued classifier parameters, we map the three parameters in the integer interval [0, . . . , 2n ( 1]. The choice of n is not straightforward though, but some bounds can be derived. We started from an implementation of XCS based on double precision floating point arithmetic [23] using the IEEE 754 standard representation [35] (1 bit for sign, 11 for exponent, 52 bits for mantissa). Rounding errors could easily lead the system to become unstable, so

N * 22n

N * 2n

ð7Þ

ð8Þ

However, since the genetic algorithm is applied to the classifier in the action set, this bound is actually looser since we need to guarantee that the average number of macroclassifiers in the action set (not N) is much smaller than 2n. This condition is necessary but still not sufficient. Classifier fitness is the result of two separate computations, thus the possible round-off errors must be carefully analyzed. In XCS the error threshold e0 (Eq. (5)) defines what is an ‘‘accurate classifier’’: if a classifier has an error smaller than e0, then it is accurate, otherwise it is not. Note that e0 does not directly depend on the maximum approximation error we allow when setting the number of bits n to represent integer classifier parameters. e0 is the minimum significant error to estimate raw accuracy j; all classifier with prediction error less than e0 have to be considered accurate, perfectly adapted to their niche. Instead the discretization error of prediction and error, which is determined by n, expresses how sensitive is XCSi to evolution of classifier parameters.

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

521

However, even if e0 does not directly depend on n, we can bound n to e0. Given the number of bits n, the maximum possible error is 2n ( 1; in the computation of raw accuracy the ratio e/e0 is used; yet to distinguish between accurate and inaccurate classifiers we need at least two values of e/e0 P 2. This bounds n to the value of e0 as follows:

will produce a zero update of the parameter x. This also means that smaller values of b will correspond to higher thresholds, e.g., with b = 0.01 any value of ^x below 100 would correspond to a zero update.

n > log2 ð2 + e0 Þ

In XCS, the update of classifier fitness is performed in three separate steps: (i) first raw accuracy j is computed, then (ii) raw accuracy is used to compute the relative accuracy j 0 , and finally (iii) the classifier fitness is updated using the classifier relative accuracy as target value. In XCS, classifier fitness can be very small, though virtually always different from zero if the initial value is not null. In contrast, when classifier parameters are encoded with integers, round-off errors can lead classifier fitness to zero very rapidly. For instance, with an 8-bits arithmetic, with a typical b = 0.2, classifier fitness can go from the 10% of the maximum value (28 ( 1) to 0 in only 11 updates. The value zero for classifier fitness is ambiguous: prediction array computation does not take into account any classifier with zero fitness, and the typical ‘‘roulette wheel’’ selection does not consider classifier with zero fitness. Accordingly, to avoid the issues regarding a null classifier fitness, in XCSi we set a minimum value of classifier fitness to 1. To implement relative accuracy, we replace the usual floating point j with its scaled integer version s, defined as,

ð9Þ

Floating point notation approximatively keeps the same number of significant digits, during computation with elementary operators, but integer notation does not. This problem will be analyzed in Section 3.2. It is very difficult to perform a global error analysis for XCS. In fact, although some parts of XCS are linear, overall XCS is a strongly non-linear system. Accordingly, we cannot separate error sources (as noise generators) from the system itself. Moreover integer arithmetic does not preserve relative error, so we can not apply most of the common approximation about error propagation, that are valid in floating point arithmetic analysis. 3.2. Parameters update The update of the classifier parameters can be source of (dangerous) round-off errors. In XCS, classifiers parameters are updated using the Widrow–Hoff delta rule [24] that, given parameter x and a target value ^x, updates the current value of x as follows: x

x þ bð^x ( xÞ

ð10Þ

where b is the usual learning rate. When x is memorized as an integer, the new value of parameter x is computed as the nearest integer of the new value ‘‘x þ bð^x ( xÞ’’. Note that, if bj^x ( xj < 0:5 (i.e., j^x ( xj < 0:5=b) the update in Eq. (10) will not modify x. This typically can happen when a classifier always obtains a zero reward (^x ¼ 0), in this case, the classifier prediction should go to zero, but (if the above condition holds) the value of x will not be modified, i.e., x will not go to zero as expected. To avoid this problem, we rewrite Eq. (10) as, x

ð1 ( bÞx þ b^x

ð11Þ

In this case any value ^x smaller than 1/b will result in the update of x according to x (1 ( b)x which will converge to zero in a number of updates dependent on the value of b. Thus 1/b represents a threshold for the parameter update, any value below 1/b

3.3. Fitness update

sðeÞ ¼ ð2n ( 1ÞjðeÞ

ð12Þ

which maps the usual classifier error e into values of raw accuracy between 0 and 2n ( 1. Thus, the scaled raw accuracy s(e) is defined as s(e) : [0 : eM] ! [0 : 2n ( 1]; where eM represents the maximum error possible, which is equal to the range of classifier prediction. The scaled raw accuracy is bounded between 0 and 2n ( 1 and it can be easily replaced by a table. For this purpose we define a lookup table ^ (the discrete raw accuracy) which maps integer j errors into their scaled raw accuracy. The lookup ^ defines a function j ^ : ½0; 2n ( 1% ! ½1; 2n ( 1%, table j computed as a non-uniform quantization of the scaled raw accuracy s(e). ^ is The maximum number of entries in j ba · 2n + 1c (a is the usual constant value), but it can be actually much smaller, since the power function used in the raw accuracy computation falls to zero very rapidly so that raw accuracy is almost a step function. For instance, when m = 5, the

522

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

25 20 Raw accuracy

Table 1 Size of the lookup table that represents the discrete raw accuracy

scaled raw accuracy table scaled r.a. - eps.zero=2.55 scaled r.a. - eps.zero=3

15 10

#bit

m

j{k(e)}j

j{k(e) : k(e) P 1/b}j

8 8 10 10

4 5 4 5

6 5 18 15

4 3 15 12

e0 = eM/100, a = 0.1, b = 0.25.

5 0

3

4

5

6 7 Error

8

9

10

Fig. 2. Raw accuracy: real value (dashed line), discretized with 8 bits (solid line).

prediction error e is represented with 8 bits (so that the maximum error is 255), and we consider the typical e0 as the 1% of the classifier prediction range, ^ has only four difi.e., e0 = 2.55, the lookup table j ferent values as showed in Fig. 2. When we increase the number of bits for prediction error to 10, there are only 14 different function values that are showed in Fig. 3. In addition, not all these values need to be put in the table, since there is also a threshold for the ^0 smallest value of the discrete relative accuracy j 0 (computed as j but using integer arithmetics) that can be used for the update (see Section 3.2). For instance, when m = 5, b = 0.25 and e0 = (1/100) ^ are 8 bit numbers and we have four values max(e), j ^, i.e., j ^ðeÞ 2 f255; 25; 6; 2; 1g. However, the last of j two values (2 and 1) are not really useful since they would not modify classifier fitness in the usual ^0 6 j ^ < 1=b, see Section parameter update (in fact, j ^ can be 3.2). Thus, the size of the lookup table for j

Raw accuracy

100

scaled raw accuracy table scaled raw accuracy function

80 60 40 20 0 10

15

20 Error

25

30

Fig. 3. Raw accuracy: real value (dashed line), discretized with 10 bits (solid line).

reduced to three different entries. Overall, we note ^ðeÞ ¼ ð2n ( 1Þ; while when e that when e < e0 then j ^ðeÞ is set to the is beyond the slope (Fig. 1) then j update threshold 1 discussed in Section 3.2. There^ðeÞ is built for the values fore the lookup table for j of e between the two extremes. Fig. 2 depicts the discrete raw accuracy function when n = 8, Fig. 3 shows the same function when n = 10. The table is built with the discrete k function, described in Fig. 3 for n = 10 and in Fig. 2 for n = 8. Table 1 reports the size of the lookup table for different settings: n is number of bits; m is the usual XCS parameter (Eq. (5)), j{k(e)}j is the number of distinct values, and j{k(e) : k(e) P 1/b}j is the number of values which can actually be used for classifier fitness update when b = 0.25. Note that, while the typical learning rate is around 0.2, in the case of an integer representation it is convenient to select a similar value which, however, can be easily mapped into binary, like 0.25. ^ is used to compute the disThe lookup table j crete raw accuracy. This is subsequently used to compute the discrete relative accuracy j^0 which, to avoid division by zero errors, is mapped into values in [1, 2n ( 1]. 4. The overall picture To implement XCSi, we convert the three classifier parameters (prediction, prediction error, and fitness) from double float into integer numbers represented with n-bits. In our implementation, we selected three values of n to represent classifier parameters: 8 bits, 10 bits, and 16 bits. We performed experiments with values of population size N between 102 and 103, thus the bound N * 22n is always satisfied. The conversion of classifier fitness, raw accuracy j, relative accuracy j 0 , and e0 was performed as described in the previous section. The floating point version works with macroclassifiers: whenever XCS generates a new classifier, the

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

population is scanned to search for an existing classifier with the same condition and action. If an existing classifier with identical condition and action is found, its numerosity parameter is incremented by one, instead of inserting the new classifier, otherwise the new classifier is inserted with its own numerosity field initialized to one. Macroclassifiers are essentially a programming technique to speed up computation and matching. The proposed software integer version also works with macroclassifier, although the hardware implementation will not since it would be too time/computation expensive to modify/ update the FPGA configuration. The raw accuracy computation is replaced by a table, as described in the previous section. The table is built with a e0 value that must satisfy Eq. (9). In the hardware implementation, all products for non-integer constant values (learning rate or discount factor) are computed by ad hoc multipliers, which can be fast and accurate but prediction array or relative accuracy are computed with a weighted sum of integers values: this is one of the core problem that will be addressed and solved in hardware implementation (see [19] for FPGA related details).

523

5.1. Boolean multiplexer These are defined over binary strings of length n where n = k + 2k; the first k-bits, x0, . . . , xk(1, represent an address which indexes the remaining 2k bits, y 0 ; . . . ; y 2k (1 ; the function returns the value of the indexed bit. For instance, in the 6-multiplexer function, mp6, we have that mp6(100010) = 1 while mp6(000111) = 0. 5.2. Hidden parity This class of Boolean functions has been first used with XCS in [36] to relate the problem difficulty to the number of accurate maximally general classifiers needed by XCS to solve the problem. They are defined over binary strings of length n in which only k-bits are relevant; the hidden parity function (HPn,k) returns the value of the parity function applied to the k relevant bits, that are hidden among the n inputs. For instance, given the hidden parity function HP6,4 defined over inputs of six bits (n = 6), in which only the first four bits are relevant (k = 4), then we have that HP6,3(110111) = 1 while HP6,3(000111) = 1.

5. Design of experiments 6. Experimental results To test XCSi, we compare it to the original (floating point version) of XCS on the learning of Boolean multiplexer and on the hidden parity problem; for this purpose, we follow the standard settings used in the literature [16]. Each experiment consists of a number of problems that the system must solve. Each problem is either a learning problem or a test problem. In learning problems, the system selects actions randomly from those represented in the match set. In test problems, the system always selects the action with highest prediction. The genetic algorithm is enabled only during learning problems, and it is turned off during test problems. The covering operator is always enabled, but operates only if needed. Learning problems and test problems alternate. The reward policy we use is the usual 1000/0 policy for Boolean functions [16]: when XCS solves the problem correctly, it receives a constant reward of 1000; otherwise it receives a zero reward. The performance is computed as the percentage of correct answers during the last 100 test problems. All the reported statistics are averages over 50 experiments. In this paper, we consider two types of functions: Boolean multiplexer and hidden parity.

We start from the 11-multiplexer and compare XCS to XCSi with 8, 10, and 16 bits. We set XCS parameters as follows: N = 1000; b = 0.2; a = 0.1; e0 = 10; m = 5; v = 0.8, l = 0.04, hdel = 20; hGA = 25; d = 0.1; GA-subsumption is on with hGAsub = 20; while action-set subsumption is off; we use roulette wheel selection; the don’t care probability for ternary conditions is P# = 0.3. When applying XCSi with 8, 10, and 16 bits (i.e., n 2 {8, 10, 16}) the 1000/0 reward policy used in XCS (Section 5), in XCSi is replaced by a (2n ( 1)/0 reward, i.e., when XCS solves the problem correctly, it receives a constant reward of 2n ( 1, zero otherwise. Likewise, the original e0 = 10, which is the 1% of the maximum reward, is replaced by the 1% of the maximum reward, i.e., e0 = 3 when n = 8, e0 = 10 when n = 10, and e0 = 65 when n = 16. Fig. 4 compares the performance and the population size of XCS with those of XCSi with 8, 10, and 16 bits; curves are averages over 20 runs. As can be noted XCS always learns faster than XCSi, although XCSi convergence improves when more bits are used. With 10 and 16 bits, XCSi reaches full optimality in all the 20 runs by 20,000 learning problems, while

524

PERFORMANCE – MACROCLASSIFIERS (% OF N)

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

100%

80%

PERF - 8 BITS PERF - 10 BITS PERF - 16 BITS PERF - FP |[P]| - 8 BITS |[P]| - 10 BITS |[P]| - 16 BITS |[P]| - FP

60%

40%

20%

0%

0

10000

20000

30000

40000

50000

NUMBER OF LEARNING PROBLEMS Fig. 4. XCS and XCSi in the 11-multiplexer when roulette wheel selection is applied. Performance (solid dots) and number of macroclassifiers in the population (empty dots). Curves are averages over 20 runs.

PERFORMANCE — MACROCLASSIFIERS (% OF N)

with a lower precision (8 bits) XCSi needs almost 40,000 learning problems to reach optimality in all the 20 runs. When we repeat the same experiment using tournament selection instead of roulette wheel the convergence speed of XCSi improves. Fig. 5 compares XCS and XCSi (with 8, 10, and 16 bits) when tournament selection is used; still XCS learns faster than all the three versions of XCSi and still

XCSi convergence improves when more bits are used; however, all the three versions of XCSi reach optimal performance by 15,000 experiments. When we move to the 20-multiplexer, the difference in performance between XCS and XCSi becomes more evident. Fig. 6 compares the performance and the population size of XCS and XCSi in the 20-multiplexer; the parameters are set as in

100%

80%

PERF - 8 BITS PERF - 10 BITS PERF - 16 BITS PERF - FP |[P]| - 8 BITS |[P]| - 10 BITS |[P]| - 16 BITS |[P]| - FP

60%

40%

20%

0%

0

10000 20000 30000 40000 NUMBER OF LEARNING PROBLEMS

50000

Fig. 5. XCS and XCSi in the 11-multiplexer when tournament selection is applied. Performance (solid dots) and number of macroclassifiers in the population (empty dots). Curves are averages over 20 runs.

PERFORMANCE — MACROCLASSIFIERS (% OF N)

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

525

100%

80%

PERF - 8 BITS PERF - 10 BITS PERF - 16 BITS PERF - FP |[P]| - 8 BITS |[P]| - 10 BITS |[P]| - 16 BITS |[P]| - FP

60%

40%

20%

0%

0

25000

50000 75000 100000 125000 150000 175000 NUMBER OF LEARNING PROBLEMS

Fig. 6. XCS and XCSi in the 20-multiplexer with tournament selection and N = 2000. Performance (solid dots) and number of macroclassifiers in the population (empty dots). Curves are averages over 20 runs.

PERFORMANCE – MACROCLASSIFIERS (% OF N)

the previous experiments except for N = 2000, P# = 0.5, and tournament selection is used; curves are averages over 20 runs. In this case, the increase in the precision of classifier parameters in XCSi has a more significant impact on the learning speed of XCSi so that the difference in the three performance curves (for 8, 10, and 16 bits) is more visible than for the 11-multiplexer. Nevertheless, XCS always learns

much faster than XCSi. While XCS converges to optimal performance around 75,000 problems, with 8 bits XCSi needs around 125,000 learning problems to reach optimality, with 10 and 16 bits XCSi reaches optimality by 100,000 learning problems. However, an increase of the population size from 2000 to 4000 improves XCSi learning speed. Fig. 7 compares the performance and the population size

100%

80%

PERF - 8 BITS PERF - 10 BITS PERF - 16 BITS PERF - FP |[P]| - 8 BITS |[P]| - 10 BITS |[P]| - 16 BITS |[P]| - FP

60%

40%

20%

0%

0

25000

50000 75000 100000 125000 150000 175000 NUMBER OF LEARNING PROBLEMS

Fig. 7. XCS and XCSi in the 20-multiplexer with tournament selection and N = 4000. Performance (solid dots) and number of macroclassifiers in the population (empty dots). Curves are averages over 20 runs.

526

PERFORMANCE – MACROCLASSIFIERS (% OF N)

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

PERF - 8 BITS PERF - 10 BITS PERF - 16 BITS PERF - FP |[P]| - 8 BITS |[P]| - 10 BITS |[P]| - 16 BITS |[P]| - FP

100%

80%

60%

40%

20%

0%

0

250000 500000 750000 NUMBER OF LEARNING PROBLEMS

Fig. 8. XCS and XCSi in the 37-multiplexer with tournament selection and N = 5000. Performance (solid dots) and number of macroclassifiers in the population (empty dots). Curves are averages over 20 runs.

PERFORMANCE – MACROCLASSIFIERS (% OF N)

of XCS and XCSi in the 20-multiplexer with N = 4000. As it can be noted the convergence speed of XCSi improves for all the number of bits, and when 16 bits are used XCSi can reach optimality by 50,000 learning problems, that is less than half the number of the problems required with less classifiers.

Finally, we compare XCS and XCSi on the 37multiplexer when N = 5000, P# = 0.6, and e0 = 10. Fig. 8 compares the performance and the population size of XCS and XCSi. In this case, we note that the convergence with 10 bits is actually faster than in the case of 8 and 16 bits. The analysis of the single runs has evidenced that the worse performance

PERF - 8 BITS PERF - 10 BITS PERF - 16 BITS PERF - FP |[P]| - 8 BITS |[P]| - 10 BITS |[P]| - 16 BITS |[P]| - FP

100%

80%

60%

40%

20%

0%

0

250000 500000 750000 NUMBER OF LEARNING PROBLEMS

Fig. 9. XCS and XCSi in the HP20,5 when tournament selection is applied. Performance (solid dots) and number of macroclassifiers in the population (empty dots). Curves are averages over 20 runs.

527

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

of the version with 16 bits is due to two of the 20 runs which did not converged to the full optimum before the end of the runs. When excluding such slowly converging runs, the learning is faster with 16 bits. But further analysis are required to understand why with 16 bits the learning in the 37-multiplexer can be so slow. Our current working hypothesis is that n = 10 represents the best balance between the accuracy needed to guarantee convergence and the smaller complexity of the underlying representation. In fact, the fewer the bits are, the more simple the solution space is. Finally, we compare XCS and XCSi on the hidden parity function HP20,5 with the parameter settings taken from [25]: N = 2000, b = 0.2; a = 0.1; e0 = 1; m = 5; v = 1.0, l = 0.04, hdel = 20; hGA = 25; d = 0.1; GA-subsumption is on with hGAsub = 20; while action-set subsumption is off; P# = 1.0. The hidden parity problem is interesting in that there is no fitness pressure so that the round-off approximations on the classifier parameters should have less influence on the overall convergence. Fig. 9 compares the performance and population size of the usual three versions of XCSi and of XCS. It is worth noting that, both the versions of XCSi with 8 and 16 bits converge to optimum performance almost as fast as XCS. In contrast, the version with 10 bits is rather slow; many of the 20 runs considered could not actually converge producing on the average the result reported. The plot of the macroclassifiers in the population is coherent with the performance plots: with 10 bits the number of macroclassifiers in the population decrease rather slowly. Further analyses are currently performed to fully understand the reported behavior.

putational tasks, in terms of their hardware requirements and their inputs/outputs (I/O). 7.1. XCS components The system is functionally composed by three main blocks (Fig. 10): the performance component, the discovery component, and the reinforcement component. A random number generator is needed by the performance and discovery components. Here we will focus especially on processing modules and I/O for the identified main functionalities. Nevertheless the final hardware implementation will also include a controller unit (designed as a state machine), to manage the entire processing, and a synchronization unit, for external detectors and reward inputs (Fig. 11). All involved numbers are integer, positive numbers except for the constant coefficients. Constant radix coefficients are embedded in hardware ‘‘ad hoc’’ multipliers, which are designed with a dedicated logic (for example, multipliers for b = 0.25, (1 ( b) = 0.75, etc.) [26]. 7.1.1. Performance component The performance component exchanges input/ output data with the environment and with the discovery and reinforcement components (Fig. 12). detectors

Performance component covering req.

[A] [P]

RND

action

P.A.

∆t

updated classifier

7. Toward an hardware implementation In this section we expose some guidelines for an effective hardware implementation of XCSi on FPGAs. The descriptions of XCS and can lead to an implementation with basic modules providing the necessary functional behavior. Recently FPGAs have become key components for high performance digital signal processing (DSP) systems and on the latest FPGA architectures many embedded integer multiplier blocks are provided together with accumulators and fast dedicated RAM, which are useful for the XCSi implementation. In this paper we will focus mainly on the processing modules necessary to carry out the com-

Discovery component

Reinforcement component

reward

Fig. 10. XCS functional description.

environment INPUT (detectors, reward)

Synchronizer

OUTPUT (action)

XCS Cont.

clk

Fig. 11. FPGA basic structure.

XCS proc.

528

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

• fitness: f-bit integer number, the same as the number of bits for prediction; • experience: x-bit integer number; • action set average size: s-bit integer, if needed; • time stamp: t-bit integer number.

Fig. 12. XCS performance component.

It has an internal memory to store population data and it needs a random number generator. 7.1.2. Storage and match memory The performance component needs quite a large amount of memory to store classifiers data. Two main memory types are needed for classifiers population: a fast one to memorize the matching condition and a storage memory to retrieve and update all data from the selected classifiers in a match set. Storage memory can also be external to the FPGA, but match memory should be internal for a low matching time. In FPGAs on-chip memory is limited; in the recent FPGAs architectures, small memory blocks are also available, providing a spare global memory of few k-bits [27] (they are designed especially for fast multiply and accumulate operations and they will be useful in XCS computations Eqs. (1) and (6)). The typical ternary representation used for classifiers conditions (consisting of 0, 1, and don’t cares # [9,16]) is implemented with two bits: one bit for don’t care and one bit for 1/0; the ‘‘don’t care’’ bit forces a positive match whatever the proper environment state bit is. In the hardware implementation each classifier is a ‘‘microclassifier’’, that is a classifier whose numerosity is always 1, and for each one of them the following data need to be stored: • a condition, composed of 2 * nd bits: one bit for don’t care and one bit for 1/0 condition for every one of the nd detectors; • an integer number [0, na ( 1] representing one of the na available actions requiring dlog2 nae bits; • prediction: p-bit integer number; • error: e-bit integer number, the same as the number of bits for prediction;

For example, for the 37-multiplexer, solved by the 10-bit algorithm: nd = 74 bit, na = 2(1 bit), p = e = f = 10 bit, x = t = 21 bit, s = 13 bit, i.e., 150 bits for each one of the 2000 classifiers, for a total memory of 300,000 bit (about 293 k bit), which, given the size, should be arranged on an external memory. At the beginning of a new experiment memory can be either empty or loaded with a population evolved via software, with a typical, XCS floating point algorithm. 7.1.3. Performance component input/output The performance component exchanges data directly with the environment and with both discovery and reinforcement components. New prediction, error and fitness values can come from such two components. More precisely, a complete list of the component’ inputs and outputs is the following one. Inputs: • detectors: rðtÞf0; 1gnd 2, if nd is the number of detectors; • updated prediction from reinforcement component: p-bit integer number; • updated error from reinforcement component: p-bit integer number, the same as the number of bits for prediction; • updated fitness from reinforcement component: f-bit integer number; • new classifier from discovery component; • classifier deletion request and classifier number (to address it in memory) from the discovery component; • input data to load memory, with a population derived from a standard, floating point XCS software implementation (if this solution is preferred). Outputs: • action: an integer number [0, na ( 1] representing one of the na available actions; • action set: memory pointers to the action set classifiers data or action set itself; • action set size and fitness: two integer numbers (of a- and f-bits, respectively) sent to discovery component, if we choose a deletion strategy based on average action set size and fitness;

529

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

• coverage request: signal to request a covering to the discovery component; • output from internal memory to export population (optional). It is worth noting that, in single step problems, no value is sent from the prediction array to the reinforcement component by the performance component. 7.1.4. Prediction array The prediction array is computed with a weighted sum as shown in Eqs. (1) and (6); each time a match set is composed, the prediction is weighted by fitness in the prediction array computation. A floating point weighted sum is expensive in FPGA, in terms of area occupation, computation time and wiring connections [28]. This problem has been fully analyzed for artificial neural networks (ANN) hardware implementations, especially for the multilayer perceptrons architecture [28,29], adopting a different point of view, since in that scenario there are many weighted sums for every computational step, one for each neuron, and every sum has few addenda (typically less than ten). On the other hand, in every XCS computation step, a relative small number of weighted sums are necessary to build the prediction array, one for every action; every prediction array sum has an average number of addenda that is equal to the average action set size (usually ,10), but the exact number is not known: we only know that the maximum value can be very large, because it is the classifier population size (,1000). For hardware ANN a lot of different multipliers have been proposed, with different arithmetic notations: positional notation (the usual one), stochastic notation, redundant notation, convolution (1 bit) notation etc. [28]. In the presented XCS version weights and addenda can be only integer numbers. Integer multipliers requires less programmable resources on FPGAs that floating point ones, and integer adders are simple and fast devices (FPGAs have dedicated routing resources for adders) instead floating point adders are complex. In the last years FPGAs have become key components for high performance digital signal processing (DSP) systems and thus the latest FPGAs many embed integer multiplier blocks; they are optimized for high-speed operations and low power consumption, compared to usual multipliers in slices. A multiplier block is associated with

a memory block by a specific wiring, for fast operations. Read, multiply and accumulate operations, that we need to build the prediction array, are efficient. We can implement the sum of products either in the logic fabric or using dedicated multipliers, if we choose one of the recent FPGAs series (like Xilinx Virtex II family [27]). There are many integer or fixed point divider that can be implemented in FPGA, with different degrees of pipelining, with speed versus latency and area occupation trade-off [30]. When we design the possible divider architectures, we must consider that the prediction array element computation requires a relative small number of divisions, because the average action set have few tens of classifiers. 7.1.5. Reinforcement component Every time an action is performed, the reinforcement component must update prediction, prediction error, experience, average action set size if we choose a deletion strategy based on action set size and fitness. All values fitness can be updated in a straightforward manner: the performance component scans the action set and for every classifier the new values are immediately computed with few integer adders, specific multipliers for constant values and one increment (Fig. 13), and then they are stored. It is necessary to focus the attention on the correct sequence of data update, granting the XCS stores the new prediction only after prediction error.

PERFORMANCE COMPONENT max{P} [A]

γ

env.

r

∆t

r -1

+ + + -

∆t

abs

[A]

β

β

Pj +

1- β Previous Action Set

ej

1- β

+ + +

+ +

1

Fig. 13. XCS reinforcement component (except fitness update).

530

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

PERFORMANCE COMPONENT [A]

β

∆t

table [A]

-1

Previous Action Set

mem

k'

div

Σ

e F

k

1- β

Fig. 14. XCS fitness calculation.

The update of fitness is more complex operation and it requires at least two memory scans (Fig. 14). In the first one the discrete relative accuracy k(e) 2 [1, 2n ( 1] is computed and stored into an internal memory while an accumulator adds all calculated values. In the second scan XCS can finally update all fitness values in storage memory. In Fig. 14 we emphasize inside the dotted area which part of update procedure is not straightforward. The divisor can be designed as in Prediction array computation. In integer notation relative accuracy j 0 can not be computed because 0 < j 0 6 1; instead we compute a discrete relative accuracy k 0 , 0 6 k 0 6 (2n ( 1) (n bit depth). Before the division we need to multiply the dividend by the constant value (2n ( 1) (for example, we can design a specific multiplier based on the decrement by 1). The discrete raw accuracy is stored in a table with the corresponding error. If e < e0 ) k = (2n ( 1) or if e is beyond the slope then k = 1. Between the two extremes, the table is build with the discrete k function, described in Fig. 3 for n = 10 and in Fig. 2 for n = 8. In Table 1 there’s the number of different discrete k values, (2n ( 1) and 1 included.

7.1.7. Discovery component The discovery component generates new classifiers, if a match set is empty, and it performs evolution with a genetic algorithm. It can also delete one or more classifiers to make room for the new ones, if the population is full. Covering can be done creating a new classifier with a specific condition part and default values for all others parameters. GA components are illustrated in Figs. 15 and 16. Parents selection with tournament requires a random number generator for every classifier and a magnitude comparison (i.e., a subtraction with a control on the final carry output) between the random number itself and the tournament size. Crossover can be implemented by a random memory swap between parts of the two selected parents condition bits, activated by a random number. Mutation can consists of a random change of condition or action activated by a random number too. If crossover is being used, new prediction p, error e and fitness F values are computed for children. Division by multiple of 2 is performed by means of bit shifts. We must pay

PERFORMANCE COMPONENT C

[A]

covering req.

covering

selection parents

ins/del

crossover mutation

subsumpt. children

RND

Fig. 15. XCS genetic algorithm.

7.1.6. Reinforcement component input/output For this components the following inputs and outputs have been identified. • • • •

Inputs: Maximum prediction in prediction array: p-bit integer from the performance component. Previous action set from the performance component (either pointers to classifiers in memory or action set itself). Reward: r-bit integer from the environment. Outputs: Updated classifiers to store in performance component memory.

[P]

SELECTION C1

RND

C2

crossover

a1

p1 ε1

a2

/2 mutation C1

C2

p2 ε2 enable

/2

F1

/10

p 1/2 ε 1/2

SUBSUMPTION Fig. 16. XCS crossover and mutation.

F2

/10

F1/2

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

attention that the new fitness must be greater than 0. The ‘‘subsumption’’ condition is a Boolean control on the condition part of a new classifier and a test over error and experience values. There are three main kinds of classifiers deletion strategy: (i) totally random, (ii) roulette wheel selection according to action set size and (iii) roulette wheel selection accuracy based, i.e., according on both action set size and fitness. First of all insertion/deletion module (Fig. 15) verifies if the population is full (for example a classifier number counter is updated each time a new classifier is loaded in memory); when the situation is detected the module selects which classifier must be deleted to allow a new child to be added. The usual average action set size based deletion (Section 2) is a complex and time consuming task for the hardware implementation. For each classifier it computes a deletion vote; since population is stored in an external memory and its size is large (,1000), every deletion strategy based on the classifiers entire population characteristic can be the bottleneck for the implementation [31]. This is due to the fact that the access to the external memory introduces latency thus limiting performance. The GA is executed in an action set if the number of time-steps since the last GA in that action set exceeds a threshold H (H = 25 in our experiments). Classifiers population is rapidly filled when an experiment starts, so deletion is needed quite often. A totally random deletion would avoid the use of any memory, but it would also slow down learning and generalization. In literature there are experiments with FPGA implementation of GAs, where the common roulette wheel deletion strategy is replaced by a deletion rule that involves only parents and children or only a part of the total population [32,31]. 7.1.8. Discovery component input/output The present component is characterized by the following interactions with the rest of the system. Inputs: • coverage request: a covering request signal from performance component; • action set size and fitness, from the population memory: a-bit and f-bit integer numbers from the performance component, if a deletion strategy based on average action set size and fitness is adopted. Outputs: • a new classifier to send to the performance component memory;

531

• classifier deletion request and classifier number (to address it in memory) to performance component. 7.2. Pseudo-random number generator The most well-known technique to build a random number generator on FPGA is based on a Linear Feedback Shift Register (LFSR) [33,34]. LFSR is a n-bit shift register whose input is the exclusive-or of one or more of its outputs. A properly implemented LFSR generates a pseudorandom bit sequence 2n ( 1 bits long. with uniform distribution. If we collect m * 2n ( 1 of the output bits, we can generate a pseudo-random integer number sequence composed by m-bit numbers. LFSR is simple and FPGAs design tools can synthesize it with very few reconfigurable resources. A new random bit is available at every clock cycle and the generator can easily build r-bit integer random numbers in parallel with the rest of XCS components. More sophisticated random number generators can be implemented in FPGA too [33], and more precisely, with a r-bit random number r (0 6 r 6 2m ( 1) the roulette wheel can be completed. 7.3. Parallel computations Depending on the software implementation, some tasks are usually executed in parallel to achieve better performance. Parallel execution is well supported on hardware if tasks are allocated on different resources; it is thus worth analyzing this issue with respect to the proposed functional architecture. In common GAs, fitness evaluation for individuals can be computed in parallel, because it is usually independent of the rest of population. In XCS fitness is normalized in the action set, thus a fully parallel solution can not be pursued. Nevertheless, fitness is updated only in the action set, characterized by a small average dimension (,10 classifiers). As a result, the lack of parallelism has a limited impact on performance. On the other hand, the prediction array computation, which is executed by XCS for every problem, can be executed in a parallel fashion. The classifiers data can be partitioned by action; all the array elements can be computed together, on different parallel modules. The random number generator can easily work in parallel with other XCS components.

532

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

8. Conclusions We introduced an integer based version of the XCS classifier system, dubbed XCSi, which (for us) represents the first step toward an FPGA implementation of XCS. The results we presented show that although few bits are used to represented main classifier parameters (usually implemented in double floating point arithmetic), XCSi can perform surprisingly well reaching optimality in all the problems, given an appropriate number of bits. XCSi has a functional structure that can be designed to be implemented in FPGAs, together with a control module and a synchronization module. Recent FPGAs have reconfigurable resources, embedded multipliers, accumulators and dedicated memory, that are suited to implement all the functional XCSi components. The performance component needs a large memory to store classifiers data, so we divide it in two distinct memories: a match memory with the condition part of every classifier and a storage memory with all other data. Storage memory can be external to FPGA, but match memory should be internal for a low matching time. The performance component computes the prediction array that is a weighted sum, which is expensive in terms of area occupation if we design it with reconfigurable resources, but in recent FPGA architectures we can take advantage of embedded integer multipliers and accumulators. The reinforcement component is designed to update all classifier parameters. Only fitness update is critical, because it cannot be done in a direct manner, but it needs the discrete relative accuracy normalization. Furthermore, the discovery component can be the real bottleneck of the system, because it needs to read the whole population each time deletion function is activated. As a result, from point of view of the migration to the integer version of XCS, some aspects of the performance of XCSi are not completely clear and will require further investigation. As far as the guidelines for a hardware implementation are concerned, an architectural analysis has been carried out and design proposals have been drawn. References [1] S. Brown, J. Rose, Architecture of FPGAs and CPLDs: a tutorial, IEEE Design and Test of Computers 13 (2) (1996) 42–57. [2] P. Graham, B. Nelson, A Hardware genetic Algorithm for the Travelling Salesman Problem on SPLASH 2Field-Pro-

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10] [11]

[12]

[13]

[14] [15]

[16] [17]

[18] [19]

grammable Logic and Applications, Springer-Verlag, Berlin, 1995, pp. 352–361. C. Aporntewan, P. Chongstitvatana, A hardware implementation of the compact genetic algorithm, in: Proceedings of the 2001 Congress on Evolutionary Computation CEC2001, Seoul, Korea, 2001, pp. 624–629. R. Gadea, J. Cerdá, F. Ballester, A. Mocholí, Artificial neural network implementation on a single FPGA of a pipelined on-line backpropagation, in: Proceeding 13th International Symposium on System Synthesis, Madrid, Spain, 2000, pp. 225–230. J. Zhu, P. Sutton, FPGA Implementation of neural networks: a survey of a decade of progress, in: Proceedings of the 13th International Conference on Field-Programmable Logic and Applications (FPL 2003), Lisbon, Portugal, 2003, pp. 1062–1066. J.H. Holland, J.S. Reitman, Cognitive systems based on adaptive algorithms, 1978 reprinted in: Evolutionary Computation, in: David B. Fogel (Ed.), The Fossil Record, IEEE Press, 1998, ISBN 0-7803-3481-7. J.H. Holland, Adaptive algorithms for discovering and using general patterns in growing knowledge bases, International Journal of Policy Analysis and Information Systems 4 (3) (1980) 245–268. J.H. Holland, Escaping brittleness, in: Proceedings of the Second International Workshop on Machine Learning, 1983, pp. 92–95. J.H. Holland, Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, 1975, republished by the MIT press, 1992. M. Dorigo, M. Colombetti, Robot Shaping: An Experiment in Behavior Engineering, MIT Press/Bradford Books, 1998. E. Bernado´, X. Llora`, J.M. Garrell, XCS and GALE: a comparative study of two learning classifier systems and six other learning algorithms on classification tasks, in: P.L. Lanzi, W. Stolzmann, S.W. Wilson (Eds.), Advances in Learning Classifier Systems, LNAI, vol. 2321, SpringerVerlag, Berlin, Heidelberg, 2002, pp. 115–132. E. Bernado´-Mansilla, J.M. Garrell-Guiu, Accuracy-based learning classifier systems: models, analysis, and applications to classification tasks, Evolutionary Computation 11 (2003) 209–238. P.W. Dixon, D.W. Corne, M.J. Oates, A preliminary investigation of modified xcs as a generic data mining tool, in: P.L. Lanzi, W. Stolzmann, S.W. Wilson (Eds.), Advances in Learning Classifier Systems, LNAI, vol. 2321, SpringerVerlag, Berlin, 2002, pp. 133–150. L. Bull (Ed.), Applications of Learning Classifier Systems, vol. 150, Springer-Verlag, 2004. P.L. Lanzi, W. Stolzmann, S.W. Wilson (Eds.), Learning Classifier Systems: From Foundations to Applications, Lecture notes in Computer Science, vol. 1813, SpringerVerlag, 2000. S.W. Wilson, Classifier fitness based on accuracy, Evolutionary Computation 3 (2) (1995) 149–175. C. Bolchini, P. Ferrandi, P.L. Lanzi, F. Salice, Toward an FPGA Implementation of XCS, in: Proceedings of the IEEE Congress on Evolutionary Computation – CEC-2005, 2005. Martin Danek, Implementing XCS in FPGA, Proposal for Master Thesis, 2003. P. Ferrandi, Toward an fpga implementation of learning classifier systems, Master’s thesis, Dipartimento di Elettron-

C. Bolchini et al. / Journal of Systems Architecture 52 (2006) 516–533

[20] [21] [22]

[23] [24]

[25]

[26] [27] [28]

ica e Informazione. Politecnico di Milano, supervised by Pier Luca Lanzi, 2005. M.V. Butz, S.W. Wilson, An algorithmic description of xcs, Journal of Soft Computing 6 (3–4) (2002) 144–153. R.S. Sutton, A.G. Barto, Reinforcement Learning – An Introduction, MIT Press, 1998. M.V. Butz, T. Kovacs, P.L. Lanzi, S.W. Wilson, Toward a theory of generalization and learning in xcs, IEEE Transactions on Evolutionary Computation 8 (1) (2004) 28–46. P.L. Lanzi, The xcs library, 2003. Available from: . B. Widrow, M.E. Hoff, Adaptive Switching Circuits, The MIT Press, Cambridge, 1988, Ch. Neurocomputing: Foundation of Research, pp. 126–134. M.V. Butz, D.E. Goldberg, K. Tharakunnel, Analysis and improvement of fitness exploitation in xcs: bounding models, tournament selection, and bilateral accuracy, Evolutionary Computation 11 (4) (2003) 239–277. Xilinx, Dynamic constant coefficient multiplier v2.0, 2000. Xilinx, Virtex-ii pro and virtex-ii pro x platform fpgas: complete data sheet, 2005. S.L. Bade, B.L. Hutchings, FPGA-based stochastic neural networks – implementation, in: D.A. Buell, K.L. Pocek (Eds.), IEEE Workshop on FPGAs for Custom Computing Machines, IEEE Computer Society Press, Los Alamitos, CA, 1994, pp. 189–198.

533

[29] A. Perez-Uribe, Structure-adaptable digital neural networks, Ph.D. Thesis, Ecole Polytechnique Fdrale de Lausanne, 1999. [30] Xilinx, Pipelined divider v3.0, 2004. [31] P. Martin, A hardware implementation of a genetic programming system using FPGAs and Handel-C, Genetic Programming and Evolvable Machines 2 (4) (2001) 317–343. [32] P. Martin, A pipelined hardware implementation of genetic, 2002. [33] P. Martin, An analysis of random number generators for a hardware implementation of genetic programming using FPGAs and Handel-C, in: W.B. Langdon, E. Cantu´-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M.A. Potter, A.C. Schultz, J.F. Miller, E. Burke, N. Jonoska (Eds.), GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann Publishers, New York, 2002, pp. 837–844. [34] Xilinx, Efficient shift registers, lfsr counters, and long pseudo-random sequence generators, 1996. [35] IEEE Computer Society, IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std 754-1985, 1985. [36] T. Kovacs, M. Kerber, What makes a problem hard for XCS? in: P.L. Lanzi, W. Stolzmann, S.W. Wilson (Eds.), Advances in Learning Classifier Systems, LNAI, vol. 1996, Springer-Verlag, Berlin, 2001, pp. 80–99.