Design
The General Genetic
Algorithm Tool is a system composed of two sub-systems. The first and
main sub-system is a genetic algorithm engine with five example fitness
functions. The other sub-system is a GUI front end aiming to show the
functionality of the genetic algorithm engine in an easy for the user
way.
4.1 Design
methodology
The design methodologies
used for the production of the GGAT are analogous to the evolutionary
incremental software engineering model. A very basic genetic algorithm
was first designed, implemented and tested. After that very basic input
and output functionalities were added to it and the whole program was
tested again. By more repetitive cycles like that all the rest of the
functionalities were designed, implemented and tested in tern and added
to the program at each stage until the final GGAT was produced.
After a genetic algorithm
engine with most of the functionality from the specification was
produced the work on the GUI sub-system started. The initial GUI designs
were used and modified. The GUI responsible for the genetic algorithm
parameters entry was produced. Then further development and testing was
done on the genetic algorithm engine, since it was easier to test run
the system with a GUI. A refinement of the GUI sub-system followed that
would introduce validation of the genetic algorithm parameters as they
are entered, to eliminate the chance of inappropriate parameters
crashing the genetic algorithm engine. After this stage the design of
part of the GUI sub-system that controls the actual execution of the
genetic algorithm and displays their result was refined and the
necessary changes made.
To start with the genetic
algorithm had only a single fitness function to work with. A number of
functionalities were added to the specification of the system. The
architecture of the genetic algorithm engine and the GUI were revised to
accommodate the additional features. Another four fitness functions were
added to the genetic algorithm engine. A few other refinements of the
genetic algorithm engine and the necessary adjustments in the GUI
sub-system followed before the final GGAT was ready.
Object-oriented design was
chosen for the development of the software tool as opposed to a
functional approach because it is considered to produce more
maintainable and easily understood system architecture and code [Sommerville,
1995].
4.2
Genetic algorithm engine subsystem architecture
The main subsystem of the
GGAT is the genetic algorithm engine. It is composed of three elements:
data structure, a kernel and an evaluator unit. The genetic data
structure is composed of population, genotype, chromosome and gene
classes.
4.2.1
Kernel
The kernel module provides
the API for the whole genetic algorithm engine and is encapsulated in
the Kernel class. All the parameters concerning the data structure, the
recombination techniques, the termination criteria and the fitness
function of the genetic algorithm are passed to the kernel, which
initialises them appropriately. Hence a change of the internal
structure of the genetic algorithm engine, when the program is updated,
is less likely to affect the kernel’s API and therefore the GUI
subsystem.
To reduce the size and
complexity of the kernel all the parent selection, crossover and
mutation functions are part of the data structures they use i.e. the
Population, Genotype, Chromosome and Gene classes. The kernel controls
these functions to make the genetic algorithm engine work. A class
diagram on figure 4.1 represents the architecture of the elements of the
genetic algorithm engine subsystem. Some details are hidden from the
diagram for simplicity.
The Kernel class
initialises a Population class (the part of the genetic data structure
that encapsulates all the other data structure elements) and an
Evaluator class with a set of parameters. The Population class is
responsible for initialising the next part of the data structure tree,
the genotypes, which in tern are responsible for initialising the
chromosomes etc.
The kernel executes the
genetic algorithm and decides which parent selection, which crossover,
which mutation and which algorithm termination methods is to be used.
The kernel also activates the appropriate fitness evaluation and
genotype decoding functions.
|
Figure 4.1 – Class
diagram of the genetic algorithm engine subsystem
 |
4.2.2
Evaluator
The fitness evaluation and
genotype decoding functions form the second module of the genetic
algorithm engine – the Evaluator class (module). Its responsibilities
include decoding a given genotype to a phenotype and calculating the
fitness value for a given phenotype.
When the genetic algorithm
engine is to be modified to solve a particular new problem, most likely
only a new Evaluator class would be needed. Hence the implementation of
other problems to be solved, using this genetic algorithm engine can be
easier.
4.2.3
Genetic data structure
The data structure of the
genetic algorithm engine mimics the data structure of a genetic
algorithm described earlier on (figure 2.3). A Population class contains
of several Genotype class instances that contain phenotype information,
genotype information and fitness value each. The Population class keeps
the rank of each Genotype instance. Every genotype information set also
consists of a number of Chromosome class instances, representing a
genotype. Each Chromosome class instance contains a number of Gene class
instances that represent a bit (0 or 1). The phenotype information in a
Genotype instance is represented by a string of text, and the fitness by
a real number.
All the genotypes
throughout a population have the same number of chromosomes and number
of genes in each chromosome. The system structure of the genetic
algorithm engine could allow for chromosomes of every genotype to have
different number of genes as long as all the number of genes for
corresponding chromosome in a genotype is the same for the whole
population (figure 4.2).
|
Figure 4.2 – Genotype
architecture
 |
The kernel sets the parent
selection, crossover, mutation, and fitness evaluation methods to be
used, but the functions performing these operations are part of the
genetic data structure classes.
The Population class
performs the parent selection, crossover and mutation of genotypes.
Mutation at the gene level is done in the Gene class so that when a new
schema is introduced only a gene mutation function has to be added.
Mutation and crossover is
performed on a chromosome level i.e. each chromosome form a genotype to
be recombined with a chromosome at the same position of another
genotype. This technique has an advantage over the traditional mutation
and crossover on a genotype level as mentioned previously: different
chromosomes can use different crossover and mutation methods hence
reproduce better.
The architecture of the
genetic algorithm engine allows for easier extension of the crossover
and mutation capabilities of the genetic algorithm engine with minimal
changes. Also the introduction of a new schema to the genetic algorithm
engine should require only redesign of the Gene class.
4.3 Choice
of algorithms for genetic algorithm engine
4.3.1
Random number generator
Randomly generated integer
numbers in different ranges were needed for the genetic algorithm
engine. Therefore a function that randomly generates an integer number
within a specified range was designed. The function uses the random
number generating function of the implementation language, which returns
a value between 0 and 1, excluding 1. The function divides this range in
equal sections so that an integer number in the requested range is
produced. All the integer numbers in the specified range have equal
probability for being generated assuming the implementation language
random number generating function is totally random. In most cases these
functions however are pseudo-random, but that is sufficient for the
purpose of this project.
This algorithm was first
designed as two separate functions: one that generates a random integer
of 1 or 0, and another function that generates an integer number within
a specified range, different to the previous range (1 or 0). After a few
alterations an algorithm that generates a random integer within a
specified range including for the range between 0 and 1, was designed.
4.3.2
Schema
The genetic algorithm
engine is designed to work with probably most schemas known by the
author. However binary and string gene representations were only
designed with the GGAT.
The genotypes are encoded
using Gray code. The Gray code is translated into binary when a genotype
needs to be decoded so it can be passed to a fitness function. For the
purpose of this tool only the function translating Gray code to binary
is needed, since the initial population of genotypes is always random.
However for completeness and future expansion of the tools to allow for
some initial genotypes to be specified, the binary to Gray code function
was designed too. Both functions are part of the Chromosome class of the
uk.co.karnig.ggat.ga package. Appendix E can be consulted for the code
of the Chromosome class where the algorithms of these functions
implemented and outlined. The function algorithms were adapted from
[Michalewicz, 1999].
With binary representation
four genotype-decoding techniques have been produced: decoding of
unsigned binary as integers, decoding signed binary as integers,
decoding unsigned real number and decoding signed real numbers. All four
are encoded using binary alphabet, that is represented as Gray cone in
the genotype data structure.
The unsigned integer binary
decoding algorithm works in the orthodox way.
The signed integer decoding
algorithm however decodes the binary string to an unsigned integer
first. Then half of the largest unsigned integer value possible is
subtracted from that number. The range represented by the binary string
is as follows: for an 8 bit string the highest value can be 2 7
= 128 and the lowest -2 7 + 1 = 127.
The unsigned real numbers
are interpreted by decoding the binary string as an unsigned integer
first. Then the product of the unsigned integer and the maximum
chromosome value allocated is divided by the highest obtainable unsigned
integer value of the binary string:
decodedUnsignedInteger *
chromosomeMaxValue / (2numberOfGenes -1).
For signed real numbers a
similar pattern is used. The binary string is first interpreted as a
signed integer. If the result is positive its product with the maximum
chromosome value allocated is divided by half the highest obtainable
positive number using signed integer for that bit string:
decodedSignedInteger *
chromosomeMaxValue / 2numberOfGenes -1.
If the result is negative
then its product with the maximum chromosome value allocated is divided
by half the highest obtainable negative number using signed integer for
that bit string:
decodedSignedInteger *
chromosomeMaxValue / (2numberOfGenes -1-1).
If an 8 bit string is used
and the maximum chromosome value allocated is 1, then this encoding
method divides the –1 to 1 domain in 2 7 = 128 equal parts.
Hence a real number is represented so that value accuracy does not
concentrate accuracy around the 0 like most floating point numbers
encoded in binary do [web reference about floating point]. The accuracy
of the real numbers depends on the range and number of bits (genes) to
represent each chromosome. However the accuracy of the data structures
that will represent the decoded real numbers depends on the floating
point numbers of the implementation language. An accuracy of 64 bit
floating point number in the implementation programming language is
however likely to be good enough for the purpose of this project.
These decoding functions
are part of the evaluator class of the uk.co.karnig.ggat.ga package,
listed in appendix A.
It is because of the
complexity of the real number decoding that the minimum chromosome value
setting for the genetic algorithm engine is restricted to be either 0 or
minus the maximum value for the chromosome. Different value selection
for the minimum chromosome value setting can be added to the genotype
decoding algorithm, however it did not seem to be a necessity for the
project. This minimum chromosome value setting also eased the maximum
chromosome value validation techniques and the binary decoding algorithm
that represents real numbers over a specified range.
The genetic algorithm
engine is designed to work with both continuous and discontinuous schema
(when some genotypes in the schema specified can represent invalid
phenotypes for the fitness function).
4.3.3
Parent selection
The genetic algorithm
engine uses two parent selection techniques: roulette wheel selection
and tournament. The parents are selected one at a time. Since a single
population is used no separate breeding population is need as with the
traditional genetic algorithm previously mentioned in the background
chapter.
The roulette wheel
selection technique involves a random selection of a slot from a
roulette wheel. Fitter genotypes of a population are represented in more
number of slots compared to genotypes with lower fitness. Hence the
process of parent selection is random, but there is greater probability
that fitter genotypes will be selected more.
The genetic algorithm
engine uses ranking for genotype strength classification as opposed to
“fitness” because of the reasons discussed earlier on. As previously
mentioned ranking attempts to
avoid too early convergence
in a population when a super fit element is present. Ranking is neutral
towards the distribution of the fitness.
When a single population is used, the population can be sorted according
to the fitness or rank. This is not necessary, but will ease the
insertion of new children to the population. In the genetic algorithm
engine the initial, randomly generated, population is sorted according
to fitness and each genotypes position in the sorted population
represents its rank. Then every child fit enough to be part of the
population is inserted at a position below a genotype that is fitter or
has the same fitness as the child. The actual rank of each genotype is
not needed to be stored in the data structure since the genotypes
position in the sorted population is its rank.
The roulette wheel slots are calculated and set before the genetic
algorithm starts. The probabilities do not need to be recalculated for
every cycle of the genetic algorithm because the population retains its
size.
After the first parent is selected using the roulette wheel selection, a
second one is selected, but making sure that
it is a different genotype to the first one so that processing time is
not wasted on reproducing identical parents. The algorithm for selecting
the second parent makes sure that the same ranked genotype is not
selected, but if an identical to the first genotype is present in the
population with a different rank, no precaution is taken to avoid it
from being selected.
The other parent selection
technique used for the genetic algorithm engine is called tournament. A
specified number of genotypes are randomly selected from the population.
The genotype with the highest rank in that group is selected as a
parent. The number of genotypes to be randomly selected before the
fittest of them decided is specified and ranges between 2 and half the
size of the population. This whole process is repeated for a second
parent to be selected, again making sure that it is a different to the
first genotype.
4.3.4
Crossover
A number of crossover
algorithms are used in the genetic algorithm engine, single point and
multiple point crossovers. Chromosomes of the genotype are recombined
during crossover, as opposed to the genotype being recombined as one
entity. As previously mentioned this gives the extra flexibility of the
genetic algorithm to use different crossover techniques for every
chromosome group. Hence a crossover method in the genetic algorithm
engine has to be set individually for every chromosome group of the
genotype.
The single point crossover
algorithm recombines two chromosomes of selected parent genotypes by
splitting the chromosomes at a specified point and swapping the binary
half strings over. The crossover point is randomly generated for every
crossover when the “moving random point crossover” function of the
genetic algorithm engine is used. The crossover point is randomly
generated only at the start of a genetic algorithm and used for all the
crossovers, when the “fixed random point crossover” function is used.
The user specifies the crossover point when the “specified point
crossover” function is used. A randomly generated crossover point is in
the range between the second and the one before the last point of the
chromosome string.
In multiple point
crossovers, an algorithm randomly generates a specified number of
crossover points in the range between the second and the one before the
last point of the chromosome string. The number of crossover points is
randomly generated or specified by the user, and then used for all the
multiple point crossovers done in that genetic algorithm. If that number
is randomly generated a value in the range between 2 and one before the
chromosome number of genes is picked. The bit string of the chromosomes
is diced at the selected points and alternate bit sets are swapped
between the two chromosomes. The result is again two new chromosomes.
Chromosomes can be set to
skip crossover. The bit strings of the parent chromosomes are then
copied to the child chromosomes without recombination.
The crossover process for
every parent genotype pair involves a crossover for each of the
genotype’s chromosomes using that chromosome group’s specified
(selected) crossover method.
4.3.5
Mutation
Similar to crossover,
mutation is done on the individual chromosomes of a genotype. Each
chromosome group has a mutation rate, a probability factor specifying
how likely is it for mutation to take place, and a specified mutation
method. There three mutation methods used for the genetic algorithm
engine.
The first mutation, changes
a chromosome very slightly, is the last bit mutation. As the name
implies the algorithm flips the last bit of the chromosome.
There is another single bit
mutation method. It flips a bit (gene) at a randomly generated position
of the chromosome. The mutation position is generated for every
mutation. This is known as the uniform mutation.
Multiple point mutation is
also designed for the genetic algorithm engine. It flips a randomly
generated number of bits (genes), at random positions. The maximum
allowable number of bits to be flipped is specified by the user, and
obviously cannot exceed the number of genes of the chromosome.
The number of bits to be
mutated is randomly generated for every mutation, and does not exceed
the specified limit. The random positions are not repetitive and are
also regenerated for every mutation.
4.3.6
Termination criteria
The genetic algorithm
engine algorithm uses three termination criteria. The aim of the genetic
algorithm is to find a solution for a problem. So when such a solution
is found the genetic algorithm stops. One termination criterion is the
fitness of the best genotype of the population. Hence the genetic
algorithm engine terminates when the best genotype has reached or
surpassed the target solution fitness (smaller than that fitness for
minimising fitness functions and bigger for maximising fitness
functions).
However not after every
execution of the genetic algorithm such a solution (optimal solution) is
found. Sometimes the algorithm might be trapped in local optima. In
cases like that the other two termination criteria come at hand.
The simpler one keeps track
of the number of fitness evaluation calls made by the genetic algorithm,
function calls to determine the fitness of child genotypes. The genetic
algorithm terminates after a set number of these evaluations are done.
The other termination
method measures how much the difference between the best and worst
genotypes in the population converges. When this difference becomes
smaller than a specified percentage of that same difference of the
initial population, the genetic algorithm is terminated. This terminates
the genetic algorithm when the genotypes in the population become too
homogenous (similar). When this percentage is set to 0 this termination
technique is disabled. No value higher that 1% was considered as valid
for this parameter.
A genetic algorithm can
also be terminated at the end of every cycle, when manually interrupted
by the user.
4.3.7 Parameter validation and update
Before the genetic
algorithm engine is started all its parameters have to be set and
validated so that the genetic algorithm executes correctly without
causing any errors.
The main parameters have
been divided in three groups: population parameters, recombination
parameters and termination parameters (Table 4.1).
Table 4.1 – Main parameters for the genetic algorithm
engine.
|
|
Minimum value |
Maximum value |
|
Population parameters |
|
Population size |
5 |
10000 |
|
Number of parents to be used for crossover |
2 |
2 |
|
Parent selection technique |
Roulette wheel or
tournament |
|
Number of contestants for tournament parent
selection |
2 |
Population size / 2 |
|
Number of chromosomes |
1 |
100 |
|
Number of genes (bits) |
8 |
100 |
|
Chromosome representation technique |
1 |
2 |
|
Chromosome maximum |
Dependant on other
genotype and chromosome settings |
|
Chromosome minimum value |
|
Recombination parameters |
|
Crossover technique |
Single moving random
point
Single fixed random
point
Single specified point
Multiple point |
|
Specified crossover position for single point
crossover |
1 |
Number of genes -1 |
|
Specified number of crossover points for multiple
point crossover |
2 |
Number of genes -1 |
|
Mutation rate |
0.0 |
1.0 |
|
Mutation technique |
Last bit flip
Uniform
Multiple point |
|
Maximum number of mutation points for multiple
point mutation |
2 |
Number of genes -1 |
|
Termination parameters |
|
Fitness target value |
Negative real number
infinity (exclusive) |
Positive real infinity
(exclusive) |
|
Fitness function type |
Minimising or
maximising |
|
Number of evaluations before termination |
2 |
1000000 |
|
Fitness value convergence factor(%) |
0.0 |
1.0 |
There are a few other
parameters for the genetic algorithm engine, the fitness function to be
used and the log parameters. The log parameters consist of a parameter
specifying whether the logging of the execution progress will be done
and a set of five other parameters specifying what to be logged.
All parameters are
validating before being set, so that no invalid settings are made. In
the case of wrong parameter being attempted to be set, an exception is
generated to inform the client (GUI), without modifying that parameter.
The validation domain for
some of the important parameters has been mentioned already. Few of the
other important parameters will be discussed next.
A valid size for a genetic
algorithm is set to have a minimum of 5 genotypes and no more than
10000. The upper boundary can be increased if necessary, but was decided
as reasonable.
A valid genotype can
consist of 1 to 100 chromosomes. The upper boundary was selected for
reasons similar as for the population size.
The size of each chromosome
can be between 8 and 100.
The range for the maximum
chromosome value parameter depends on the number of genes (the bit
string size of the chromosome) and the minimum chromosome value setting.
Every time after the chromosome size is changed or the chromosome’s
minimum value setting, the maximum chromosome value is set to the
maximum value possible. The same happens when the genotype decoding
setting is changed as well, since it also affects the range represented
by the chromosomes. When integer decoding of the genotype is used, the
maximum chromosome value is not required to be specifically set, since
it will be set to its maximum when the decoding setting is changed. When
integers are used the chromosome range has to obey the range the bit
string can represent.
Hence setting of the
maximum chromosome value should to be done only after the rest of the
chromosome and genotype parameters are set.
4.3.8
Fitness functions
In order to test and
demonstrate the capabilities of the genetic algorithm engine five
fitness functions were design to work with it. All of them are
mathematical optimisation functions, but some are easier and other more
difficult to solve.
The first function is a
multi-modal quadratic equation with two optima. The equation was used
from the “Genetic Algorithm Playground” tool [10], however there it was
used as a uni-modal function. The aim is to minimise the function f(x)
= x4 – 12x3 + 15x2 + 56x - 60
(figure 4.3).
|
Figure 4.3 - f(x) = x4 – 12x3
+ 15x2 + 56x - 60
 |
The function has a local
minimum at x = -0.870173 and a global minimum at x = 7.81021. This
function was included to provide an easy multi-modal example for a
single chromosome genotype genetic algorithm.
As a second example the sin
function, f(x) = sin(x°) was used. This is also a smooth and easy
function to solve (figure 4.4).
|
Figure 4.4 – f(x) = sin(x°)
 |
Sin can be a highly
multi-modal function when the variable range is increased. Every optima
is global, hence the performance of the genetic algorithm engine can be
tested as how preferable one optimum can be to another after a series of
executions. The global minima are located every 360 degrees in both
positive and negative direction from x = 90, and similarly the global
maxima are at every 360 degrees in both directions from x = -90.
A more complex optimisation
function, known as De Jong’s function 1 [Goldberg, 1989] has been used
also used. This function is one out of the five-function test bed set,
designed to test genetic algorithms’ performance by De Jong in1975. The
function, f(xi) = Σ xi ; i = 1:n, can use
multiple number of variables and is very smooth and uni-modal (figure
4.5). The range of -5.12 ≤ xi ≤ 5.12 is used and the global
minimum is at xi = 0 and f(x) = 0.
|
Figure 4.5 – De Jong’s function 1. Reproduced
with permission from Hartmut Pohlheim – [9]
 |
Another famous optimisation
function used with genetic algorithms is Rastrigin’s function [9]. The
function, f(x) = 10n + Σ (xi2-10cos(2 Pi xi
)), I = 1:n, can also use multiple variables but is very multi-modal
(figure 4.6a,b). The range of -5.12 <= xi <= 5.12 is used
again and the global minimum is at xi = 0 and f(x) = 0.
|
Figure 4.6a –
Rastrigin’s function. Reproduced with permission from Hartmut
Pohlheim – [9]
 |
|
Figure 4.6b – Rastrigin’s function. Reproduced
with permission from Hartmut Pohlheim – [9]
 |
The fifth optimisation
function is the so-called Schwefel's function [9]. The function, f(x)
= Σ (-xi sin( Sqrt|xi| ) ), i = 1:n, can also
use multiple variables and is very multi-modal (figure 4.7). The local
optima are irregularly distributed making it a difficult function to
solve. The range of –500.0 <= xi <= 500.0 is used and the
global minimum is at xi = 420.9687 and f(x)
= -n418.9829.
|
Figure 4.7 – Schwefel’s function. Reproduced
with permission from Hartmut Pohlheim – [9]
 |
Another two example
problems, the travelling salesman and the knapsack problems, were
planned to be part of the GGAT, but were not implemented because of time
constrains.
4.4 Graphical user interface
(GUI) subsystem
The GUI is the second
subsystem of the GGAT. It provides an interface for setting the genetic
algorithm engine parameters and controls its execution. It is composed
of nine dialog windows each represented as a class. A class diagram on
figure 4.8 represents the architecture of these windows.
The Main window is the main
interface window of the system. The three parameter windows, the
execution window, the function detail and the exit confirmation windows
are all accessed from the main window.
The fitness function is
selected from the main window and each group of parameters is set in a
different window. The population parameters, mentioned previously are
set in the Population parameters window, the recombination parameters in
the Recombination parameters window, the termination parameters from the
Termination parameters window, and the log parameters from the Log
parameters window. The first three are accessible from the main window
which also displays a summary of the population and termination
parameters. The log parameters are accessible trough the Execution
window, since the log parameters for every subsequent execution can be
changed.
The Execution window
controls the start and termination of a series of genetic algorithm
executions. These genetic algorithm executions have the same set of
parameters (the log parameters can be changed) but the genetic algorithm
engine generates different random initial population for each one.
|
Figure 4.8 – Class diagram of the GUI subsystem
 |
After every parameter
selection the window contests are updated. Options for other irrelevant
parameters are disabled, to aid the user understand which parameters
will be needed for the genetic algorithm or to stop him/her from
changing a parameter with erroneous value. For example when the sin
fitness function is selected, the chromosome number field in the
Population parameters window will be disabled, because that fitness
function supports only one chromosome.
As each parameter of the
genetic algorithm is set trough the GUI, the actual setting is passed to
the genetic algorithm engine kernel. If the parameter setting is invalid
an exception will be generated (error reported). In that case the Error
window will be displayed giving details of what went wrong. Some
parameters, like the fitness function selection, use a selection GUI
component like a drop down menu that prevents a wrong value being
selected.
The Details window displays
details about the selected fitness function.
As the main window is
started, the genetic algorithm engine kernel is initialised. The
parameters of the kernel are set subsequently in all the parameter
windows. The Execution window uses the Kernel instance with the selected
parameters as a template to create other identical Kernel instances that
can be executed. Hence more than one execution of a genetic algorithm
with the same parameters can be done from the Execution window, and the
results compared.
4.5
Features dropped at design
A number of features of the
genetic algorithm engine have been revised at the design stage.
The uniform crossover
technique, discussed earlier on is one of them. This technique uses a
randomly generated or a specified masking pattern for recombining genes
(bits) of a chromosome. A reason for removing this feature from the
design is that a sufficient number of crossover techniques were already
designed, and a clear strategy for designing uniform crossover had not
been made. Because of the time pressure, effort was put into other
aspects of the project that were thought would benefit the project as a
whole more, like the a fully functional GUI subsystem.
Heuristic crossover was
also removed from the specification during the project design for
reasons similar to the ones previously mentioned. Heuristic crossover is
not a very general crossover technique, it is very specific for
chromosomes with numerical representation. The API for specifying a
formula to be used with the heuristic crossover had to be imbedded in
the code, or a formula parser had to be designed to use user specified
formulas. All the effort needed to produce it outweighed the possible
benefits of having the crossover technique in the genetic algorithm
engine.
The genetic algorithm
engine was planned to experiment with mating techniques not only between
two parents, but also between three, four and even one parent. The
orthodox way is using one or two parents. When one parent only is used,
its bit string can be inverted to produce a child [Michalewicz, 1999].
Other single, three and four parent recombination techniques were
considered but a full design was not produced. These features were
removed from the specification because of their complexity and because
they were likely to affect the whole structure of the genetic algorithm
engine. For these features to be designed and implemented other major
parts of the specification had to be sacrificed because of time
constrains. Therefore it was decided that this feature can be designed
and implemented if there is sufficient time at the end of the project or
as a future expansion of the genetic algorithm engine. There was not
enough time for the integration of these features, but the number of
parents parameter was integrated with the GUI and genetic algorithm
engine subsystems to ease this feature’s integration in the future.
One of the genetic
algorithm execution progress attributes to be monitored was the average
fitness of the genotype population. It was planned to act as a
termination criteria i.e. the genetic algorithm would stop when the
average (mean) fitness of a population reaches a certain value.
Finding the mean fitness of
a population proved to be a difficult to implement algorithm because a
real number data structure was needed that can accommodate the sum of
all the fitness values. The fitness values however are in the range of
what the real number data structures can represent. Therefore either an
algorithm that can somehow calculate the mean of a group of numbers
without actually adding them all together had to be designed, or a
numerical data structure that can represent real numbers of arbitrary
value and precision had to be used.
An effort of two days only
produced an algorithm that can find the average of a set of numbers
without adding them all at once; however it worked only for 2n
number of elements where n is a positive integer. Such a restriction to
the population size was unwanted, therefore the average fitness
attribute was replaced with a median fitness attribute since the mean
and median of a sorted set of fitness values can both indicate when all
the genotypes in a population converge towards a particular solution.
The termination criteria
with the average fitness was not implemented using the median fitness
value since it was not considered as a necessary termination criteria
for the genetic algorithm. A population fitness convergence factor,
described latter on in this chapter, was introduced instead as a better
population convergence monitor.
Later on in the
implementation stage a numerical data structure, as extension to the
implementation programming language, was found that could be used for
the average fitness function. However the mean fitness seemed to good
enough and it was decided that enough time was wasted on a feature that
was not so important.
An element of the GUI
subsystem was intended to be a genetic algorithm execution progress
chart. It was planned to plot the progress of the population’s best
genotype fitness and the average genotype fitness. The average was
replaced with median for the reasons mentioned earlier on. However
because of time constrains this feature was removed from the
specification. The population’s best genotype fitness and the median
genotype fitness would be still shown after every crossover for the user
to see, but their value trends would be only traceable in the genetic
algorithm log files.
4.6 Added
features
There have been a number of
features that were not part of the specification when the design of the
GGAT started, but have been added to the project. Some of them have been
added to the specification because the design of the GGAT system allowed
them to be implemented easily, and the features contributed to the
system performance and functionality. Some features have been considered
but not added to the specification because the cost in time and effort
to produce was thought to outweigh their benefit for the project.
Sometimes the first
execution of a genetic algorithm does not find a solution, because of
local optima. A several executions of the genetic algorithm might be
needed, and the results compared. Therefore the earlier mentioned
Execution window was designed to execute a number of genetic algorithms
sequentially (a genetic algorithm has to terminate before another
execution is started), at the user’s choice, so that the results can be
compared.
The schema described
earlier on using multiple chromosomes, as opposed to only one was also
not part of the original specification. Even though this feature altered
the architecture of the whole GGAT and required a considerable effort to
integrate to the already designed system, it was considered to be
important since genetic algorithms are commonly used for multiple
variable optimisations.
The fitness value of a
genotype that represents a solution was not planned to be a parameter
for the genetic algorithm function. Instead a fixed fitness value was to
be specified and the fitness functions had to consider it while
generating fitness values for the genotypes. It was planned that value
to be 0 where higher fitness values would represent partial (worst
solutions). The fitness function had to always adjust its fitness value
output as minimised. However to allow for easier fitness function
implementation target fitness and fitness function type (minimising or
maximising) parameters were added to the genetic algorithm engine.
As previously mentioned
with the example of the travelling salesman problem, some genotypes,
when recombined can produce invalid genotype children. The genetic
algorithm engine was not at first considered to tackle that problem.
However at the design stage it was decided that a feature to validate
whether a genotype is valid before it is passed to the fitness function
might be necessary especially when character string chromosome
representation is used. A travelling salesman problem was considered to
be also integrated to the GGAT to demonstrate these abilities of the
genetic algorithm engine.
Two of the single point
crossover methods designed were not part of the original specification.
After a moving random point crossover was designed, the implementation
of a single point crossover with a fixed point, was noticed can reuse
the design of the first crossover. Hence to add to the selection of
crossover techniques available in the GGAT the feature was added to the
specification.
A similar situation is the
addition of the uniform crossover feature. The design for a last bit
flip mutation allowed for easy implementation of the uniform crossover.
The uniform crossover was also evaluated by the author at this stage as
being a more popular mutation technique than the last bit mutation and
since it did not cost much extra time and effort was added to the
specification of the GGAT.
As previously mentioned a
genetic algorithm terminates after a solution is found or some other
termination criteria is reached. A factor that monitors a population for
becoming homogeneous was mentioned earlier.
The difference between the fitness of the best and worst individuals of
the initial population is recorded at the start of a genetic algorithm.
Then after every successive generation (when new children are added to
the population) this difference is likely to decrease since the
evolutionary process is likely to encourage fitter individuals to
reproduce rather than less fit ones. The change of this difference can
be monitored and when it becomes smaller than a specified percentage of
that difference in the initial population, the genetic algorithm can be
terminated even if an optimal solution is not found. This convergence
factor was added to the design of the genetic algorithm after it was
observed, at the GGAT test runs how a population can become fully
homogeneous and be still executed before the
specified number of evaluations are done.
This factor would allow the
user not to monitor the best and median genotype’s fitness as the
genetic algorithm progresses, for the purpose of manually terminating
the genetic algorithm before the population converges to a set of
identical genotypes. This additional feature was not as easy to design
as some of the other additional features added to the specification, but
was considered as a necessary addition to the GGAT.
The Knapsack problem was
considered to be implemented as an additional fitness function example
for the GGAT. The knapsack problem involves a set of items with
different trade values and different carry parameters. One or more
knapsacks with a limited capacity are available. The aim is to optimise
the choice of items to be placed in the knapsacks so that they yield a
maximum trading value, but without overflowing the knapsacks’ capacity.
Binary gene representation
would be used, where the number of genes is determined by the number of
items available, 1 would represent the presence of an item in a knapsack
and its absence would be 0. For this problem a specific decoding
function was to be used alongside with its fitness function. The
numerical optimisation fitness function examples all use the integer and
real number decoding functions.
The GGAT system was
considered to be a tool that would encapsulate all its functionality
around a GUI. However as the project progressed the genetic algorithm
engine seemed to provide a good genetic algorithm package that can be
reused by other developers, to save them time and effort when dealing
with genetic algorithms. Therefore and extra effort was put into
producing an understandable API for the genetic algorithm engine.
4.7
Further development
The GGAT was designed in
such a way that expanding the functionalities of the tool would require
minimal amount of changes. A number of features that would extend its
functionality were planned for, but the time constrains on the project
could not allow their implementation or design.
As it was earlier mentioned
a genetic algorithm using not only two parent genotypes can be a
possible extension to the genetic algorithm engine designed. A parameter
specifying the number of parents to be used in genotype recombination is
already part of the GGAT, simplifying a future addition of that feature.
The design of the GGAT also
allows for easy addition of new schema to the tool. For a new schema to
work with the genetic algorithm engine only the Gene class would usually
need to be redesigned. The parent selection and crossover methods
operate with Gene class instances, as opposed to specifically bits or
strings.
The integration of new
problems to the GGAT was planned for too. In most cases only a new
Evaluator class and few minor adjustments in the Population and Kernel
classes, concerning the chromosome decoding and genotype fitness
evaluation would be enough.
The value selection for
some chromosomes in the initial population of a genetic algorithm, as
opposed to being randomly generated was considered as possible future
feature of the tool. Since Gray coding is used a function translating
binary to Gray code was added to the GGAT that would be needed for this
expansion.
Also different chromosome
groups can be extended to use different decoding techniques and to have
different number of genes. The genetic algorithm architecture could even
allow for genetic algorithm implementations that optimise neural
networks to be implemented. Sometimes different genotypes might contain
different number of chromosomes i.e. when the neural network topology
changes [11]. The design of the genetic algorithm engine can accommodate
additional features like that.
The Kernel class was
designed so that multiple instances of it cannot only be executed
sequentially, as done in the GGAT, but also in parallel. Hence the best
result found from a set of running in parallel genetic algorithms can
provide an optimal solution to a problem without the need of user
impertinence.