Report - Chapter 4

 

Design

The General Genetic Algorithm Tool is a system composed of two sub-systems. The first and main sub-system is a genetic algorithm engine with five example fitness functions. The other sub-system is a GUI front end aiming to show the functionality of the genetic algorithm engine in an easy for the user way.

 4.1 Design methodology

The design methodologies used for the production of the GGAT are analogous to the evolutionary incremental software engineering model. A very basic genetic algorithm was first designed, implemented and tested. After that very basic input and output functionalities were added to it and the whole program was tested again. By more repetitive cycles like that all the rest of the functionalities were designed, implemented and tested in tern and added to the program at each stage until the final GGAT was produced.

After a genetic algorithm engine with most of the functionality from the specification was produced the work on the GUI sub-system started. The initial GUI designs were used and modified. The GUI responsible for the genetic algorithm parameters entry was produced. Then further development and testing was done on the genetic algorithm engine, since it was easier to test run the system with a GUI. A refinement of the GUI sub-system followed that would introduce validation of the genetic algorithm parameters as they are entered, to eliminate the chance of inappropriate parameters crashing the genetic algorithm engine. After this stage the design of part of the GUI sub-system that controls the actual execution of the genetic algorithm and displays their result was refined and the necessary changes made.

To start with the genetic algorithm had only a single fitness function to work with. A number of functionalities were added to the specification of the system. The architecture of the genetic algorithm engine and the GUI were revised to accommodate the additional features. Another four fitness functions were added to the genetic algorithm engine. A few other refinements of the genetic algorithm engine and the necessary adjustments in the GUI sub-system followed before the final GGAT was ready.

Object-oriented design was chosen for the development of the software tool as opposed to a functional approach because it is considered to produce more maintainable and easily understood system architecture and code [Sommerville, 1995].

 

 4.2 Genetic algorithm engine subsystem architecture

 

The main subsystem of the GGAT is the genetic algorithm engine. It is composed of three elements: data structure, a kernel and an evaluator unit. The genetic data structure is composed of population, genotype, chromosome and gene classes.

 

 4.2.1 Kernel

The kernel module provides the API for the whole genetic algorithm engine and is encapsulated in the Kernel class. All the parameters concerning the data structure, the recombination techniques, the termination criteria and the fitness function of the genetic algorithm are passed to the kernel, which initialises them appropriately.  Hence a change of the internal structure of the genetic algorithm engine, when the program is updated, is less likely to affect the kernel’s API and therefore the GUI subsystem.

To reduce the size and complexity of the kernel all the parent selection, crossover and mutation functions are part of the data structures they use i.e. the Population, Genotype, Chromosome and Gene classes. The kernel controls these functions to make the genetic algorithm engine work. A class diagram on figure 4.1 represents the architecture of the elements of the genetic algorithm engine subsystem. Some details are hidden from the diagram for simplicity.

The Kernel class initialises a Population class (the part of the genetic data structure that encapsulates all the other data structure elements) and an Evaluator class with a set of parameters. The Population class is responsible for initialising the next part of the data structure tree, the genotypes, which in tern are responsible for initialising the chromosomes etc.

The kernel executes the genetic algorithm and decides which parent selection, which crossover, which mutation and which algorithm termination methods is to be used. The kernel also activates the appropriate fitness evaluation and genotype decoding functions.

 

Figure 4.1 – Class diagram of the genetic algorithm engine subsystem

 

 

 4.2.2 Evaluator

The fitness evaluation and genotype decoding functions form the second module of the genetic algorithm engine – the Evaluator class (module). Its responsibilities include decoding a given genotype to a phenotype and calculating the fitness value for a given phenotype.

When the genetic algorithm engine is to be modified to solve a particular new problem, most likely only a new Evaluator class would be needed. Hence the implementation of other problems to be solved, using this genetic algorithm engine can be easier.

 

 4.2.3 Genetic data structure

The data structure of the genetic algorithm engine mimics the data structure of a genetic algorithm described earlier on (figure 2.3). A Population class contains of several Genotype class instances that contain phenotype information, genotype information and fitness value each. The Population class keeps the rank of each Genotype instance. Every genotype information set also consists of a number of Chromosome class instances, representing a genotype. Each Chromosome class instance contains a number of Gene class instances that represent a bit (0 or 1). The phenotype information in a Genotype instance is represented by a string of text, and the fitness by a real number.

All the genotypes throughout a population have the same number of chromosomes and number of genes in each chromosome. The system structure of the genetic algorithm engine could allow for chromosomes of every genotype to have different number of genes as long as all the number of genes for corresponding chromosome in a genotype is the same for the whole population (figure 4.2).

 

Figure 4.2 – Genotype architecture

 

The kernel sets the parent selection, crossover, mutation, and fitness evaluation methods to be used, but the functions performing these operations are part of the genetic data structure classes.

The Population class performs the parent selection, crossover and mutation of genotypes. Mutation at the gene level is done in the Gene class so that when a new schema is introduced only a gene mutation function has to be added.

Mutation and crossover is performed on a chromosome level i.e. each chromosome form a genotype to be recombined with a chromosome at the same position of another genotype. This technique has an advantage over the traditional mutation and crossover on a genotype level as mentioned previously: different chromosomes can use different crossover and mutation methods hence reproduce better.

 

The architecture of the genetic algorithm engine allows for easier extension of the crossover and mutation capabilities of the genetic algorithm engine with minimal changes. Also the introduction of a new schema to the genetic algorithm engine should require only redesign of the Gene class.

 

 4.3 Choice of algorithms for genetic algorithm engine 

  4.3.1 Random number generator

Randomly generated integer numbers in different ranges were needed for the genetic algorithm engine. Therefore a function that randomly generates an integer number within a specified range was designed. The function uses the random number generating function of the implementation language, which returns a value between 0 and 1, excluding 1. The function divides this range in equal sections so that an integer number in the requested range is produced. All the integer numbers in the specified range have equal probability for being generated assuming the implementation language random number generating function is totally random. In most cases these functions however are pseudo-random, but that is sufficient for the purpose of this project.

This algorithm was first designed as two separate functions: one that generates a random integer of 1 or 0, and another function that generates an integer number within a specified range, different to the previous range (1 or 0). After a few alterations an algorithm that generates a random integer within a specified range including for the range between 0 and 1, was designed.

 

 4.3.2 Schema

The genetic algorithm engine is designed to work with probably most schemas known by the author. However binary and string gene representations were only designed with the GGAT.

The genotypes are encoded using Gray code. The Gray code is translated into binary when a genotype needs to be decoded so it can be passed to a fitness function. For the purpose of this tool only the function translating Gray code to binary is needed, since the initial population of genotypes is always random. However for completeness and future expansion of the tools to allow for some initial genotypes to be specified, the binary to Gray code function was designed too. Both functions are part of the Chromosome class of the uk.co.karnig.ggat.ga package. Appendix E can be consulted for the code of the Chromosome class where the algorithms of these functions implemented and outlined. The function algorithms were adapted from [Michalewicz, 1999].

With binary representation four genotype-decoding techniques have been produced: decoding of unsigned binary as integers, decoding signed binary as integers, decoding unsigned real number and decoding signed real numbers. All four are encoded using binary alphabet, that is represented as Gray cone in the genotype data structure.

The unsigned integer binary decoding algorithm works in the orthodox way.

The signed integer decoding algorithm however decodes the binary string to an unsigned integer first. Then half of the largest unsigned integer value possible is subtracted from that number. The range represented by the binary string is as follows: for an 8 bit string the highest value can be 2 7 = 128 and the lowest  -2 7 + 1 = 127.

The unsigned real numbers are interpreted by decoding the binary string as an unsigned integer first. Then the product of the unsigned integer and the maximum chromosome value allocated is divided by the highest obtainable unsigned integer value of the binary string:

decodedUnsignedInteger * chromosomeMaxValue / (2numberOfGenes -1).

For signed real numbers a similar pattern is used. The binary string is first interpreted as a signed integer. If the result is positive its product with the maximum chromosome value allocated is divided by half the highest obtainable positive number using signed integer for that bit string:

decodedSignedInteger * chromosomeMaxValue / 2numberOfGenes -1.

If the result is negative then its product with the maximum chromosome value allocated is divided by half the highest obtainable negative number using signed integer for that bit string:

decodedSignedInteger * chromosomeMaxValue / (2numberOfGenes -1-1).

If an 8 bit string is used and the maximum chromosome value allocated is 1, then this encoding method divides the –1 to 1 domain in 2 7 = 128 equal parts. Hence a real number is represented so that value accuracy does not concentrate accuracy around the 0 like most floating point numbers encoded in binary do [web reference about floating point]. The accuracy of the real numbers depends on the range and number of bits (genes) to represent each chromosome. However the accuracy of the data structures that will represent the decoded real numbers depends on the floating point numbers of the implementation language. An accuracy of 64 bit floating point number in the implementation programming language is however likely to be good enough for the purpose of this project.

These decoding functions are part of the evaluator class of the uk.co.karnig.ggat.ga package, listed in appendix A.

 

It is because of the complexity of the real number decoding that the minimum chromosome value setting for the genetic algorithm engine is restricted to be either 0 or minus the maximum value for the chromosome. Different value selection for the minimum chromosome value setting can be added to the genotype decoding algorithm, however it did not seem to be a necessity for the project. This minimum chromosome value setting also eased the maximum chromosome value validation techniques and the binary decoding algorithm that represents real numbers over a specified range.

The genetic algorithm engine is designed to work with both continuous and discontinuous schema (when some genotypes in the schema specified can represent invalid phenotypes for the fitness function).

 

 4.3.3 Parent selection

The genetic algorithm engine uses two parent selection techniques: roulette wheel selection and tournament. The parents are selected one at a time. Since a single population is used no separate breeding population is need as with the traditional genetic algorithm previously mentioned in the background chapter.

The roulette wheel selection technique involves a random selection of a slot from a roulette wheel. Fitter genotypes of a population are represented in more number of slots compared to genotypes with lower fitness. Hence the process of parent selection is random, but there is greater probability that fitter genotypes will be selected more.

The genetic algorithm engine uses ranking for genotype strength classification as opposed to “fitness” because of the reasons discussed earlier on. As previously mentioned ranking attempts to avoid too early convergence in a population when a super fit element is present. Ranking is neutral towards the distribution of the fitness.

When a single population is used, the population can be sorted according to the fitness or rank. This is not necessary, but will ease the insertion of new children to the population. In the genetic algorithm engine the initial, randomly generated, population is sorted according to fitness and each genotypes position in the sorted population represents its rank. Then every child fit enough to be part of the population is inserted at a position below a genotype that is fitter or has the same fitness as the child. The actual rank of each genotype is not needed to be stored in the data structure since the genotypes position in the sorted population is its rank.

The roulette wheel slots are calculated and set before the genetic algorithm starts. The probabilities do not need to be recalculated for every cycle of the genetic algorithm because the population retains its size.

After the first parent is selected using the roulette wheel selection, a second one is selected, but making sure that it is a different genotype to the first one so that processing time is not wasted on reproducing identical parents. The algorithm for selecting the second parent makes sure that the same ranked genotype is not selected, but if an identical to the first genotype is present in the population with a different rank, no precaution is taken to avoid it from being selected.

 

The other parent selection technique used for the genetic algorithm engine is called tournament. A specified number of genotypes are randomly selected from the population. The genotype with the highest rank in that group is selected as a parent. The number of genotypes to be randomly selected before the fittest of them decided is specified and ranges between 2 and half the size of the population. This whole process is repeated for a second parent to be selected, again making sure that it is a different to the first genotype.

 

 4.3.4 Crossover

A number of crossover algorithms are used in the genetic algorithm engine, single point and multiple point crossovers. Chromosomes of the genotype are recombined during crossover, as opposed to the genotype being recombined as one entity. As previously mentioned this gives the extra flexibility of the genetic algorithm to use different crossover techniques for every chromosome group. Hence a crossover method in the genetic algorithm engine has to be set individually for every chromosome group of the genotype.

The single point crossover algorithm recombines two chromosomes of selected parent genotypes by splitting the chromosomes at a specified point and swapping the binary half strings over. The crossover point is randomly generated for every crossover when the “moving random point crossover” function of the genetic algorithm engine is used. The crossover point is randomly generated only at the start of a genetic algorithm and used for all the crossovers, when the “fixed random point crossover” function is used. The user specifies the crossover point when the “specified point crossover” function is used. A randomly generated crossover point is in the range between the second and the one before the last point of the chromosome string.

In multiple point crossovers, an algorithm randomly generates a specified number of crossover points in the range between the second and the one before the last point of the chromosome string. The number of crossover points is randomly generated or specified by the user, and then used for all the multiple point crossovers done in that genetic algorithm. If that number is randomly generated a value in the range between 2 and one before the chromosome number of genes is picked. The bit string of the chromosomes is diced at the selected points and alternate bit sets are swapped between the two chromosomes. The result is again two new chromosomes.

Chromosomes can be set to skip crossover. The bit strings of the parent chromosomes are then copied to the child chromosomes without recombination.

The crossover process for every parent genotype pair involves a crossover for each of the genotype’s chromosomes using that chromosome group’s specified (selected) crossover method.

 

 4.3.5 Mutation

Similar to crossover, mutation is done on the individual chromosomes of a genotype. Each chromosome group has a mutation rate, a probability factor specifying how likely is it for mutation to take place, and a specified mutation method. There three mutation methods used for the genetic algorithm engine.

The first mutation, changes a chromosome very slightly, is the last bit mutation. As the name implies the algorithm flips the last bit of the chromosome.

There is another single bit mutation method. It flips a bit (gene) at a randomly generated position of the chromosome. The mutation position is generated for every mutation. This is known as the uniform mutation.

Multiple point mutation is also designed for the genetic algorithm engine. It flips a randomly generated number of bits (genes), at random positions. The maximum allowable number of bits to be flipped is specified by the user, and obviously cannot exceed the number of genes of the chromosome.

The number of bits to be mutated is randomly generated for every mutation, and does not exceed the specified limit. The random positions are not repetitive and are also regenerated for every mutation.

 

 4.3.6 Termination criteria

The genetic algorithm engine algorithm uses three termination criteria. The aim of the genetic algorithm is to find a solution for a problem. So when such a solution is found the genetic algorithm stops. One termination criterion is the fitness of the best genotype of the population. Hence the genetic algorithm engine terminates when the best genotype has reached or surpassed the target solution fitness (smaller than that fitness for minimising fitness functions and bigger for maximising fitness functions).

However not after every execution of the genetic algorithm such a solution (optimal solution) is found. Sometimes the algorithm might be trapped in local optima. In cases like that the other two termination criteria come at hand.

The simpler one keeps track of the number of fitness evaluation calls made by the genetic algorithm, function calls to determine the fitness of child genotypes. The genetic algorithm terminates after a set number of these evaluations are done.

The other termination method measures how much the difference between the best and worst genotypes in the population converges. When this difference becomes smaller than a specified percentage of that same difference of the initial population, the genetic algorithm is terminated. This terminates the genetic algorithm when the genotypes in the population become too homogenous (similar). When this percentage is set to 0 this termination technique is disabled. No value higher that 1% was considered as valid for this parameter.

A genetic algorithm can also be terminated at the end of every cycle, when manually interrupted by the user.

 

 4.3.7 Parameter validation and update

Before the genetic algorithm engine is started all its parameters have to be set and validated so that the genetic algorithm executes correctly without causing any errors.

The main parameters have been divided in three groups: population parameters, recombination parameters and termination parameters (Table 4.1).

Table 4.1 – Main parameters for the genetic algorithm engine.

 

Minimum value

Maximum value

Population parameters

Population size

5

10000

Number of parents to be used for crossover

2

2

Parent selection technique

Roulette wheel or tournament

Number of contestants for tournament parent selection

2

Population size / 2

Number of chromosomes

1

100

Number of genes (bits)

8

100

Chromosome representation technique

1

2

Chromosome maximum

Dependant on other genotype and chromosome settings

Chromosome minimum value

Recombination parameters

Crossover technique

Single moving random point

Single fixed random point

Single specified point

Multiple point

Specified crossover position for single point crossover

1

Number of genes -1

Specified number of crossover points for multiple point crossover

2

Number of genes -1

Mutation rate

0.0

1.0

Mutation technique

Last bit flip

Uniform

Multiple point

Maximum number of mutation points for multiple point mutation

2

Number of genes -1

Termination parameters

Fitness target value

Negative real number infinity (exclusive)

Positive real infinity (exclusive)

Fitness function type

Minimising or maximising

Number of evaluations before termination

2

1000000

Fitness value convergence factor(%)

0.0

1.0

 

 

There are a few other parameters for the genetic algorithm engine, the fitness function to be used and the log parameters. The log parameters consist of a parameter specifying whether the logging of the execution progress will be done and a set of five other parameters specifying what to be logged.

All parameters are validating before being set, so that no invalid settings are made. In the case of wrong parameter being attempted to be set, an exception is generated to inform the client (GUI), without modifying that parameter.

The validation domain for some of the important parameters has been mentioned already. Few of the other important parameters will be discussed next.

 

A valid size for a genetic algorithm is set to have a minimum of 5 genotypes and no more than 10000. The upper boundary can be increased if necessary, but was decided as reasonable.

A valid genotype can consist of 1 to 100 chromosomes. The upper boundary was selected for reasons similar as for the population size.

The size of each chromosome can be between 8 and 100.

The range for the maximum chromosome value parameter depends on the number of genes (the bit string size of the chromosome) and the minimum chromosome value setting. Every time after the chromosome size is changed or the chromosome’s minimum value setting, the maximum chromosome value is set to the maximum value possible. The same happens when the genotype decoding setting is changed as well, since it also affects the range represented by the chromosomes. When integer decoding of the genotype is used, the maximum chromosome value is not required to be specifically set, since it will be set to its maximum when the decoding setting is changed. When integers are used the chromosome range has to obey the range the bit string can represent.

Hence setting of the maximum chromosome value should to be done only after the rest of the chromosome and genotype parameters are set.

 

 4.3.8 Fitness functions

In order to test and demonstrate the capabilities of the genetic algorithm engine five fitness functions were design to work with it. All of them are mathematical optimisation functions, but some are easier and other more difficult to solve.

The first function is a multi-modal quadratic equation with two optima. The equation was used from the “Genetic Algorithm Playground” tool [10], however there it was used as a uni-modal function. The aim is to minimise the function f(x) = x4 – 12x3 + 15x2 + 56x - 60 (figure 4.3).

Figure 4.3 - f(x) = x4 – 12x3 + 15x2 + 56x - 60

 

The function has a local minimum at x = -0.870173 and a global minimum at x = 7.81021. This function was included to provide an easy multi-modal example for a single chromosome genotype genetic algorithm.

As a second example the sin function, f(x) = sin(x°) was used. This is also a smooth and easy function to solve (figure 4.4).

 

Figure 4.4 – f(x) = sin(x°)

 

Sin can be a highly multi-modal function when the variable range is increased. Every optima is global, hence the performance of the genetic algorithm engine can be tested as how preferable one optimum can be to another after a series of executions. The global minima are located every 360 degrees in both positive and negative direction from x = 90, and similarly the global maxima are at every 360 degrees in both directions from x = -90.

A more complex optimisation function, known as De Jong’s function 1 [Goldberg, 1989] has been used also used. This function is one out of the five-function test bed set, designed to test genetic algorithms’ performance by De Jong in1975. The function, f(xi) = Σ xi ; i = 1:n, can use multiple number of variables and is very smooth and uni-modal (figure 4.5). The range of -5.12 ≤ xi ≤ 5.12 is used and the global minimum is at xi = 0 and f(x) = 0.

 

Figure 4.5 – De Jong’s function 1. Reproduced with permission from Hartmut Pohlheim – [9]

 

Another famous optimisation function used with genetic algorithms is Rastrigin’s function [9]. The function, f(x) = 10n + Σ (xi2-10cos(2 Pi xi )), I = 1:n, can also use multiple variables but is very multi-modal (figure 4.6a,b). The range of -5.12 <= xi <= 5.12 is used again and the global minimum is at xi = 0 and f(x) = 0.

Figure 4.6a – Rastrigin’s function. Reproduced with permission from Hartmut Pohlheim – [9]

 

Figure 4.6b – Rastrigin’s function. Reproduced with permission from Hartmut Pohlheim – [9]

 

 

The fifth optimisation function is the so-called Schwefel's function [9]. The function, f(x) = Σ (-xi sin( Sqrt|xi| ) ), i = 1:n, can also use multiple variables and is very multi-modal (figure 4.7). The local optima are irregularly distributed making it a difficult function to solve. The range of –500.0 <= xi <= 500.0 is used and the global minimum is at xi = 420.9687 and f(x) = -n418.9829.

Figure 4.7 – Schwefel’s function. Reproduced with permission from Hartmut Pohlheim – [9]

 

Another two example problems, the travelling salesman and the knapsack problems, were planned to be part of the GGAT, but were not implemented because of time constrains.

 

 4.4 Graphical user interface (GUI) subsystem

 

The GUI is the second subsystem of the GGAT. It provides an interface for setting the genetic algorithm engine parameters and controls its execution. It is composed of nine dialog windows each represented as a class. A class diagram on figure 4.8 represents the architecture of these windows.

The Main window is the main interface window of the system. The three parameter windows, the execution window, the function detail and the exit confirmation windows are all accessed from the main window.

The fitness function is selected from the main window and each group of parameters is set in a different window. The population parameters, mentioned previously are set in the Population parameters window, the recombination parameters in the Recombination parameters window, the termination parameters from the Termination parameters window, and the log parameters from the Log parameters window. The first three are accessible from the main window which also displays a summary of the population and termination parameters. The log parameters are accessible trough the Execution window, since the log parameters for every subsequent execution can be changed.

The Execution window controls the start and termination of a series of genetic algorithm executions. These genetic algorithm executions have the same set of parameters (the log parameters can be changed) but the genetic algorithm engine generates different random initial population for each one.

 

Figure 4.8 – Class diagram of the GUI subsystem

 

After every parameter selection the window contests are updated. Options for other irrelevant parameters are disabled, to aid the user understand which parameters will be needed for the genetic algorithm or to stop him/her from changing a parameter with erroneous value. For example when the sin fitness function is selected, the chromosome number field in the Population parameters window will be disabled, because that fitness function supports only one chromosome.

As each parameter of the genetic algorithm is set trough the GUI, the actual setting is passed to the genetic algorithm engine kernel. If the parameter setting is invalid an exception will be generated (error reported). In that case the Error window will be displayed giving details of what went wrong.  Some parameters, like the fitness function selection, use a selection GUI component like a drop down menu that prevents a wrong value being selected.

The Details window displays details about the selected fitness function.

As the main window is started, the genetic algorithm engine kernel is initialised. The parameters of the kernel are set subsequently in all the parameter windows. The Execution window uses the Kernel instance with the selected parameters as a template to create other identical Kernel instances that can be executed. Hence more than one execution of a genetic algorithm with the same parameters can be done from the Execution window, and the results compared.

 

 4.5 Features dropped at design

A number of features of the genetic algorithm engine have been revised at the design stage.

The uniform crossover technique, discussed earlier on is one of them. This technique uses a randomly generated or a specified masking pattern for recombining genes (bits) of a chromosome. A reason for removing this feature from the design is that a sufficient number of crossover techniques were already designed, and a clear strategy for designing uniform crossover had not been made. Because of the time pressure, effort was put into other aspects of the project that were thought would benefit the project as a whole more, like the a fully functional GUI subsystem.

Heuristic crossover was also removed from the specification during the project design for reasons similar to the ones previously mentioned. Heuristic crossover is not a very general crossover technique, it is very specific for chromosomes with numerical representation. The API for specifying a formula to be used with the heuristic crossover had to be imbedded in the code, or a formula parser had to be designed to use user specified formulas. All the effort needed to produce it outweighed the possible benefits of having the crossover technique in the genetic algorithm engine.

The genetic algorithm engine was planned to experiment with mating techniques not only between two parents, but also between three, four and even one parent. The orthodox way is using one or two parents. When one parent only is used, its bit string can be inverted to produce a child [Michalewicz, 1999]. Other single, three and four parent recombination techniques were considered but a full design was not produced. These features were removed from the specification because of their complexity and because they were likely to affect the whole structure of the genetic algorithm engine. For these features to be designed and implemented other major parts of the specification had to be sacrificed because of time constrains. Therefore it was decided that this feature can be designed and implemented if there is sufficient time at the end of the project or as a future expansion of the genetic algorithm engine. There was not enough time for the integration of these features, but the number of parents parameter was integrated with the GUI and genetic algorithm engine subsystems to ease this feature’s integration in the future.

One of the genetic algorithm execution progress attributes to be monitored was the average fitness of the genotype population. It was planned to act as a termination criteria i.e. the genetic algorithm would stop when the average (mean) fitness of a population reaches a certain value.

Finding the mean fitness of a population proved to be a difficult to implement algorithm because a real number data structure was needed that can accommodate the sum of all the fitness values. The fitness values however are in the range of what the real number data structures can represent. Therefore either an algorithm that can somehow calculate the mean of a group of numbers without actually adding them all together had to be designed, or a numerical data structure that can represent real numbers of arbitrary value and precision had to be used.

An effort of two days only produced an algorithm that can find the average of a set of numbers without adding them all at once; however it worked only for 2n number of elements where n is a positive integer. Such a restriction to the population size was unwanted, therefore the average fitness attribute was replaced with a median fitness attribute since the mean and median of a sorted set of fitness values can both indicate when all the genotypes in a population converge towards a particular solution.

The termination criteria with the average fitness was not implemented using the median fitness value since it was not considered as a necessary termination criteria for the genetic algorithm. A population fitness convergence factor, described latter on in this chapter, was introduced instead as a better population convergence monitor.

Later on in the implementation stage a numerical data structure, as extension to the implementation programming language, was found that could be used for the average fitness function. However the mean fitness seemed to good enough and it was decided that enough time was wasted on a feature that was not so important.

An element of the GUI subsystem was intended to be a genetic algorithm execution progress chart. It was planned to plot the progress of the population’s best genotype fitness and the average genotype fitness. The average was replaced with median for the reasons mentioned earlier on. However because of time constrains this feature was removed from the specification. The population’s best genotype fitness and the median genotype fitness would be still shown after every crossover for the user to see, but their value trends would be only traceable in the genetic algorithm log files.

 

 4.6 Added features

There have been a number of features that were not part of the specification when the design of the GGAT started, but have been added to the project. Some of them have been added to the specification because the design of the GGAT system allowed them to be implemented easily, and the features contributed to the system performance and functionality. Some features have been considered but not added to the specification because the cost in time and effort to produce was thought to outweigh their benefit for the project.

Sometimes the first execution of a genetic algorithm does not find a solution, because of local optima. A several executions of the genetic algorithm might be needed, and the results compared. Therefore the earlier mentioned Execution window was designed to execute a number of genetic algorithms sequentially (a genetic algorithm has to terminate before another execution is started), at the user’s choice, so that the results can be compared.

The schema described earlier on using multiple chromosomes, as opposed to only one was also not part of the original specification. Even though this feature altered the architecture of the whole GGAT and required a considerable effort to integrate to the already designed system, it was considered to be important since genetic algorithms are commonly used for multiple variable optimisations.

The fitness value of a genotype that represents a solution was not planned to be a parameter for the genetic algorithm function. Instead a fixed fitness value was to be specified and the fitness functions had to consider it while generating fitness values for the genotypes. It was planned that value to be 0 where higher fitness values would represent partial (worst solutions). The fitness function had to always adjust its fitness value output as minimised. However to allow for easier fitness function implementation target fitness and fitness function type (minimising or maximising) parameters were added to the genetic algorithm engine.

As previously mentioned with the example of the travelling salesman problem, some genotypes, when recombined can produce invalid genotype children. The genetic algorithm engine was not at first considered to tackle that problem. However at the design stage it was decided that a feature to validate whether a genotype is valid before it is passed to the fitness function might be necessary especially when character string chromosome representation is used. A travelling salesman problem was considered to be also integrated to the GGAT to demonstrate these abilities of the genetic algorithm engine.

Two of the single point crossover methods designed were not part of the original specification. After a moving random point crossover was designed, the implementation of a single point crossover with a fixed point, was noticed can reuse the design of the first crossover. Hence to add to the selection of crossover techniques available in the GGAT the feature was added to the specification.

A similar situation is the addition of the uniform crossover feature. The design for a last bit flip mutation allowed for easy implementation of the uniform crossover. The uniform crossover was also evaluated by the author at this stage as being a more popular mutation technique than the last bit mutation and since it did not cost much extra time and effort was added to the specification of the GGAT.

As previously mentioned a genetic algorithm terminates after a solution is found or some other termination criteria is reached. A factor that monitors a population for becoming homogeneous was mentioned earlier.

The difference between the fitness of the best and worst individuals of the initial population is recorded at the start of a genetic algorithm. Then after every successive generation (when new children are added to the population) this difference is likely to decrease since the evolutionary process is likely to encourage fitter individuals to reproduce rather than less fit ones. The change of this difference can be monitored and when it becomes smaller than a specified percentage of that difference in the initial population, the genetic algorithm can be terminated even if an optimal solution is not found. This convergence factor was added to the design of the genetic algorithm after it was observed, at the GGAT test runs how a population can become fully homogeneous and be still executed before the specified number of evaluations are done.

This factor would allow the user not to monitor the best and median genotype’s fitness as the genetic algorithm progresses, for the purpose of manually terminating the genetic algorithm before the population converges to a set of identical genotypes. This additional feature was not as easy to design as some of the other additional features added to the specification, but was considered as a necessary addition to the GGAT.

The Knapsack problem was considered to be implemented as an additional fitness function example for the GGAT. The knapsack problem involves a set of items with different trade values and different carry parameters. One or more knapsacks with a limited capacity are available. The aim is to optimise the choice of items to be placed in the knapsacks so that they yield a maximum trading value, but without overflowing the knapsacks’ capacity.

Binary gene representation would be used, where the number of genes is determined by the number of items available, 1 would represent the presence of an item in a knapsack and its absence would be 0. For this problem a specific decoding function was to be used alongside with its fitness function. The numerical optimisation fitness function examples all use the integer and real number decoding functions.

The GGAT system was considered to be a tool that would encapsulate all its functionality around a GUI. However as the project progressed the genetic algorithm engine seemed to provide a good genetic algorithm package that can be reused by other developers, to save them time and effort when dealing with genetic algorithms. Therefore and extra effort was put into producing an understandable API for the genetic algorithm engine.

 

 4.7 Further development

The GGAT was designed in such a way that expanding the functionalities of the tool would require minimal amount of changes. A number of features that would extend its functionality were planned for, but the time constrains on the project could not allow their implementation or design.

As it was earlier mentioned a genetic algorithm using not only two parent genotypes can be a possible extension to the genetic algorithm engine designed. A parameter specifying the number of parents to be used in genotype recombination is already part of the GGAT, simplifying a future addition of that feature.

The design of the GGAT also allows for easy addition of new schema to the tool. For a new schema to work with the genetic algorithm engine only the Gene class would usually need to be redesigned. The parent selection and crossover methods operate with Gene class instances, as opposed to specifically bits or strings.

The integration of new problems to the GGAT was planned for too. In most cases only a new Evaluator class and few minor adjustments in the Population and Kernel classes, concerning the chromosome decoding and genotype fitness evaluation would be enough.

The value selection for some chromosomes in the initial population of a genetic algorithm, as opposed to being randomly generated was considered as possible future feature of the tool. Since Gray coding is used a function translating binary to Gray code was added to the GGAT that would be needed for this expansion.

Also different chromosome groups can be extended to use different decoding techniques and to have different number of genes. The genetic algorithm architecture could even allow for genetic algorithm implementations that optimise neural networks to be implemented. Sometimes different genotypes might contain different number of chromosomes i.e. when the neural network topology changes [11]. The design of the genetic algorithm engine can accommodate additional features like that.

The Kernel class was designed so that multiple instances of it cannot only be executed sequentially, as done in the GGAT, but also in parallel. Hence the best result found from a set of running in parallel genetic algorithms can provide an optimal solution to a problem without the need of user impertinence. 

 

Fast Page Navigation


Copyright © 2002 K Derderian.  All rights reserved.