User guide of version 0.2 |
|||||||||||||||||||||||||||||||||||||||
|
X |
Y |
1 |
7 |
-5 |
-605 |
2 |
-34 |
9 |
.. |
4 |
.. |
-7 |
.. |
0 |
.. |
2.5 |
… |
3 |
.. |
So can i find a program that I give it X and Y examples and it will give me code how to turn X into Y?
let's see.
GeneticProgrammingEngine engEngine = newGeneticProgrammingEngine(); engEngine.NumberOfPrograms = 1000; engEngine.MinInitialTreeDepth = 3; engEngine.MaxInitialTreeDepth = 7; engEngine.MaxOverallTreeDepth = 10; engEngine.FitnessGoal = Engine.FitnessGoalEnum.MIN_FITNESS; engEngine.Overselection = 10; engEngine.ChanceForCrossover = 1F; engEngine.SaveTopIndividuals = 0.01F;
|
Here i am creating a new Engine and set up some parameters.
You don't even have to set those parameters, because there are default values for each, but if you want to find out more you can look at the reference.
engEngine.SetFunctions = new FunctionType[] { new Add(), new Substract(), new Multiply(), new Divide() }; engEngine.SetValues = new TreeLanguageEvolute.ValueType[] {new RandomMacro()};
|
One of the most important things is to determine what will be in those trees.
I want to functions "Add, substract, multiply , and divide" to be in the trees.
Those functions are just like the ordinary +, -, * and / that we have in C#. they take 2 values and return a result.
By defining what functions I want in the trees, I give an option for the computer to use these functions in the creation of random programs.
I also set the values that i want to appear in the trees. RandomMacro means that in some tree nodes, random values between 0 to 1 will be spread and randomized as leafs.
That can be useful, because it gives a chance to the computer to use some values in his programs. it can help the computer, because such random values may hold the key for the whole solution.
You can also give values out of your own. If the program your'e trying to find, for an example, needs to find out how to calculate the diameter of a circle by itself
, it's a good idea to put a value of PIE in there.
engEngine.DeclareVariable("X");
|
This is a declaration of a variable. By declaring the variable X, it will appear randomly on the tree nodes and then we will be able to give input to the programs before running them.
engEngine.EvalFitnessForProgramEvent += new Engine.EvalFitnessHandler(engEngine_EvalFitnessForProgramEvent); engEngine.GenerationIsCompleteEvent += new Engine.GenerationIsCompleteHandler(engEngine_GenerationIsCompleteEvent);
|
these are two events that can be invoked from the Engine class. The first event is a MUST. it must be invoked. because you must give a fitness to the programs.
the event Engine.EvalFitnessForProgramEvent is a MUST.
The second event, Engine.GenerationIsCompleteEvent is invoked when a generation of programs ends (when all the programs in a generation finished to run and you gave them fitness), that event can provide you statistics about the generation, and a chance for you to do one of your own actions.
all the code so far :
namespace TestProject { public partial class FunctionGuesserTestForm : Form { string strLabelString; public FunctionGuesserTestForm() { InitializeComponent(); GeneticProgrammingEngine engEngine = new GeneticProgrammingEngine(); engEngine.NumberOfPrograms = 1000; engEngine.MinInitialTreeDepth = 3; engEngine.MaxInitialTreeDepth = 7; engEngine.MaxOverallTreeDepth = 10; engEngine.FitnessGoal = Engine.FitnessGoalEnum.MIN_FITNESS; engEngine.Overselection = 10; engEngine.ChanceForCrossover = 1F; engEngine.SaveTopIndividuals = 0.01F; engEngine.SetFunctions = new FunctionType[] { new Add(), new Substract(), new Multiply(), new Divide() }; engEngine.DeclareVariable("X"); engEngine.SetValues = new TreeLanguageEvolute.ValueType[] {new RandomMacro()}; engEngine.EvalFitnessForProgramEvent += new Engine.EvalFitnessHandler(engEngine_EvalFitnessForProgramEvent); engEngine.GenerationIsCompleteEvent += new Engine.GenerationIsCompleteHandler(engEngine_GenerationIsCompleteEvent); engEngine.RunEvoluteOnThread(10000);
Timer tmTimer = new Timer(); tmTimer.Interval = 1000; tmTimer.Tick += new EventHandler(tmTimer_Tick); tmTimer.Start(); }
|
Now let's see what did i do in the engEngine_EvalFitnessForProgramEvent function.
public void engEngine_EvalFitnessForProgramEvent(BaseProgram progProgram, BaseEngine sender) { progProgram.Fitness = 0; Hashtable hsVariables = progProgram.GetVariables(); Variable varX = (Variable)hsVariables["X"];
for (int nCaseNumber = 0; nCaseNumber < 100; nCaseNumber++) { // y = 5x^3+x^2+x varX.Value = (float)GlobalRandom.m_rndRandom.NextDouble(); float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value) + (3 * varX.Value * varX.Value) + varX.Value; progProgram.Run(); float fActualResult = progProgram.Result; progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult); } }
|
One of the parameters, is the program. (progProgram in our case), this incoming parameter is a program, that is made out of a tree like we saw previously, (while having some more capabilities). This is only ONE of the programs that the computer had randomized. But we still need to evaluate it's fitness. which means, we need to run it, and check if it's good or not.
First I initialise the fitness of the program to 0. why? later you will find out.
Next, I pull out the Variable X from the hashtable into a separate variable for more comfort (and optimization). The variable will be used in order to give input for the programs.
Let's focus on the for loop, that's the most important part.
for (int nCaseNumber = 0; nCaseNumber < 100; nCaseNumber++) { // y = 5x^3+x^2+x varX.Value = (float)GlobalRandom.m_rndRandom.NextDouble(); float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value) + (3 * varX.Value * varX.Value) + varX.Value; progProgram.Run(); float fActualResult = progProgram.Result; progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult); } |
The for loop runs through 100 cases. (or 100 examples)
In each iteration of the for loop, I randomly select X value, and I give it to the variable. The moment I give it to the variable (varX.Value = ...) it get's updated in the program tree.
afterwards, I generate the Y for the function 5x³+3x²+x. for the X value I have updated in the program's tree.
So what, you may ask. I know the function, what's the point in all of this?! I know the connection, I don't need to create this program.
While your initial intuition is true,
I could've easily taken X and Y examples from a file, a website, a camera, or a voice recorder. These are 100 examples randomly generated of X and Y.
float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value) + (3 * varX.Value * varX.Value) + varX.Value; |
After we generate an example of X and Y, and we set the input (by setting the variable value of X), it's time to run the program.
progProgram.Run(); float fActualResult = progProgram.Result; |
I run the program, then I take the result that the program gave me for the input I gave.
we do have the original result (the Y) . so we can find out how the computer got it wrong. the difference between the Y (which we know for sure it's the output we wanted, because we made it) - and the output that the program gave, is the fitness that we can give to the program.
That means, the less this fitness is, the better the program because it reaches more and more
to the real output we wanted.
progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult); |
This just sums up what I said above. all the differences for all the 100 cases we generated is the fitness of the program. (this is in order to give the best grade - it's possible that one program will randomly succseed in one example and will fail badly in other. the sum of 100 cases makes it more general so now it will be more difficult for programs to randomly succseed)
One thing we need to take care of is to set the goal of the evolution towards minimal fitness. we can do that in this code, before the algorithm runs.
engEngine.FitnessGoal = Engine.FitnessGoalEnum.MIN_FITNESS; |
This is also appearing in the above complete code example.
The entire code :
void tmTimer_Tick(object sender, EventArgs e) { this.label1.Text = strLabelString; }
void engEngine_GenerationIsCompleteEvent(Statistics stsStatistics, BaseEngine sender) { this.strLabelString = "Generation number " + stsStatistics.GenerationNumber + " : min fitness = " + stsStatistics.MinFitnessProgram.Fitness + ", Total nodes = " + stsStatistics.TotalNodes; try { Graphics g = this.CreateGraphics(); ((TreeProgram)(stsStatistics.MinFitnessProgram)).Draw(g, this.Width, this.Height, this); } catch (System.Exception ex) { } }
public void engEngine_EvalFitnessForProgramEvent(BaseProgram progProgram, BaseEngine sender) { progProgram.Fitness = 0; Hashtable hsVariables = progProgram.GetVariables(); Variable varX = (Variable)hsVariables["X"];
for (int nCaseNumber = 0; nCaseNumber < 100; nCaseNumber++) { // y = 5x^3+x^2+x varX.Value = (float)GlobalRandom.m_rndRandom.NextDouble(); float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value) + (3 * varX.Value * varX.Value) + varX.Value; progProgram.Run(); float fActualResult = progProgram.Result; progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult); } } |
So what do we have so far? over many generations that the program's fitness will be generated, the programs will get close and closer to discovering the mathematical function 5x³+3x²+x.
I want to take a moment to explain about the other event, Engine.GenerationIsCompleteEvent.
This event is called at the end of every generation and you can assign to that event in order to get statistics about the generation.
void engEngine_GenerationIsCompleteEvent(Statistics stsStatistics, BaseEngine sender) { this.strLabelString = "Generation number " + stsStatistics.GenerationNumber + " : min fitness = " + stsStatistics.MinFitnessProgram.Fitness + ", Total nodes = " + stsStatistics.TotalNodes; try { Graphics g = this.CreateGraphics(); ((TreeProgram)(stsStatistics.MinFitnessProgram)).Draw(g, this.Width, this.Height, this); } catch (System.Exception ex) { } } |
We can get from the Statistics object, info about the generation. such as the generation number, the total nodes in the generation, and also we can get the minimal and maximal programs. (we can also run these programs, and draw them, like in this example)
Adding additional functionality to the program.
One of the early projects I have made is about a robot that can rotate itself, and move forward. there are walls around, and it can't cross them. the goal of the robot is to reach a destination point by navigating around the walls. The fitness computation that I did in this example, is the distance between the robot location to the destination location.
The programs that managed to get as close as possible to the destination, are better, that's why they got lower fitness.
Anyways, I'm not going to explain now the entire process, that we saw earlier, but I still going to point out few things :
engEngine.SetFunctions = new TreeLanguageEvolute.FunctionType[] { new Add(), new Substract(), new Multiply(), new Divide(), new IfGreaterThenElse(), new WillIntersect(this.Width, this.Height)}; |
Put attention to the functions in the function set. The IfGreaterThenElse is an "if" function, that let's the random programs take a conditional path around the tree.
A more advanced option is when you want to include functions that are not avaliable in functions that the library includes.
In order to do that, you must create them yourself, by inheriting from the Function class.
This is an example how it's being done, although it's better design to scrape and leave as little as possible in those function classes to another entity class which performs the calculations.
public class RotateRight : Function |
The function class is an abstract class that you inherite from it, you must overload some functions and implement them.
The properties that you must implement :
The NumberOfArguments property means, how much arguments your function takes (or uses). In the function in the above example, it takes 1 arguments.
Notice that for each argument, there will be a new branch in the tree.
The Name property is just a name to give to the function. If you decide to draw the program, or save the program in a file, this will play a role.
The functions that you must implement :
public override float action(Node[] arrNodes, TreeLanguageEvolute.TreeProgram progOwner) { } |
Is the function you must implement. When the function is called, you will receive an array of tree nodes, and a program. You can use the program parameter in order to get information from the program itself. (For an example, you can use "AdditionalInformation" field in order to receive objects from outside the program)
However, the parameter you will mainly use will be the arrNodes parameter. for every node, you can return it's value.
You can return a value from node[0] by calling arrNodes[0].ReturnValue(..)
the NumberOfArguments property will affect how many elements will be inside the array arrNodes that will be passed into the function.
Another example of a function, which actually uses the nodes, is the "Add" function.
/// <summary> /// The add function, Needs two tree sons, adds them both and returns /// A result as the sum of the both sons /// </summary> public class Add : Function { public override float action(Node[] arrNodes, TreeLanguageEvolute.TreeProgram progOwner) { return (arrNodes[0].ReturnValue(progOwner) + arrNodes[1].ReturnValue(progOwner)); }
public override int NumberOfArguments { get { return (2); } }
public override string Name { get { return ("+"); } } } |
As you can see, it takes 2 arguments (which means it will have 2 nodes in arrNodes)
it returns the sum of ReturnValue of node[0] added to the ReturnValue of node[1].
Let's review the tutorial in a brief way and summarise
1. The user decides how the programs will appear :
This is done by setting all the properties, setting the set of functions, the set of values, and declaring variables
2. The computer randomises random programs :
the algorithm is doing it alone.
3. The computer gives to the user those random programs :
This is done by the event for evaluating a program's fitness.
4. The user gives input to the programs, runs them and takes the output. then he calculates how good the program is :
This is done by inserting the input to the programs (by setting the values of variables), by running the programs, and setting the fitness of the programs.
5. The library takes this fitness and works so that the next set of programs will be better than the previous :
The algorithm is doing that alone.
You also have some example projects that you can use to find out what I'm talking about.
Evolute parameters tips :
Before you begin messing with performance, please check that you have the property NumberOfThreads set as the number of cores in your CPU.
The first thing that affects the performance is the population size. No other property in BaseEngine class affects more than that.
More programs meaning having more ideas. More ideas meaning closer chance to discovering the solution.
If you must discover a solution, use more population. It will take much more time (linearly) though.
I can't say what is the best tradeoff between population size and problem complexity. And I did not research that yet.
But I can clearly say that you have bigger chance for finding a solution when evolving a bigger population.
Another thing that can affect performance is the number of islands, and how much population resides in each island.
I did not research it thoroughly, but clearly the statistics shows that having islands is better than not having islands,
And that the minimum island size must be at least 20 individuals.
The migration rate between the islands is another thing that can affect performance.
From one side of the coin, having less migrations means more possible solutions that are searching at once.
From the other side of the coin, having more migrations, means that each island is currently participating in reseraching the current solution further and aiding the current solution.
I did not research this too much, but from what I have found, the best migration rate is the number of islands for a regular evolution purposes, but if you don't care about the time taken, you can also put migration rate that is number of islands / 10, and from what I have found leads to slower convergence rate and better overall fitnesses after long evolution time.
You can also have migration rate that is number of islands / 100, if your'e aiming for evolutions that will take thousands of interations.
So for an example, if you have 100 islands, you'll want to have between : fast convergence rate - 100 migrations per generation, slow convergence rate - 10 migrations per generation, or extremely slow convergence rate - 1 migrations per generation.
Another second thing that can affect performance, is the property "Overselection"
If this property is set to a size which is like 1% of the population,
(and not 2 individuals but more) then a big speed up can occur,
Although, this speed-up is costy, the meaning of selecting only the best individuals from the population is costy. sometimes, the less good
individuals can have good ideas inside their code. throwing them away meaning throwing chances for better solutions in the long term.
It's up to your choice whether to make that sacrafice.
One of the major improvements of Evolute 0.2 is the introduction of Parsimony Pressure.
Using parsimony pressure, Evolute is optimizing program size at the same time that it optimizing the program fitness.
It's a good idea to set parsimony pressure using the following formula : <population size> / 1,000,000
More details are avaliable at the reference.
Dealing with chaostic environments and setting the parameters for such environments :
When dealing with chaostic environments, the environment may be the same (not evolving), and the environments may evolve (change it's difficulty).
When dealing with an environment that does not evolve, but stays at the same difficulty, there is a possibility that we are dealing with a chaostic environment.
In this kind of non-evolving chaostic environment, fitness values may change between programs and vary by a bit in a random way. However, despite this
randomality we still want our programs to evolve in an efficient way. If we want so, then we must first create a stable environment for fitness improvements to occur.
By using the property SaveTopIndividuals and set it to, for an example, 0.1, we are saving 10% of the individuals and prevent them from ongoing evolutionary operations,
Such as crossover, mutation and destruction. Still though, in a chaostic environment, their fitness value will be re-evaluated the next generation, and they will suddenly be
worse than other individuals, and will get placed out of the 10% safety ring into extinction. This kind of event can destroy good solutions to the problem, and we want
to prevent it. The way to prevent it using Evolute is to use OnceInHowManyGenerationsTopIndividualsShouldBeEvaluated property and set it to 20. In this way, only once
In 20 generations, those individuals in the safety ring will get re-evaluated and gives them time to evolve before getting extinct.
This doesn't prevent better individuals to be exchanged for the worse and take their place in the safety ring.
However, in environments that their difficulty is rising in rapidly, setting OnceInHowManyGenerationsTopIndividualsShouldBeEvaluated must be a low number.
Becuase 20 generations ago, those good individuals may not be so good now when the difficulty of the environment have rised. However, even in this case,
Evaluating those individuals in safety ring only once in 20 generations, is still good.
Other great performance tips, are related to the way you have programmed your fitness calculations, and functions inside your programs.
Recommended parameters values :
MinInitialTreeDepth = 2
MaxInitialTreeDepth = 4
MaxOverallTreeDepth = 30 (10 if parsimony pressure is disabled)
Overselection = 2
ChanceForCrossover = 0.9F
SaveTopIndividuals = 0.1F (population size must be higher or equal than 10 per island)
NumberOfPrograms = dependent on fitness evaluation complexity
NumberOfIslands = between NumberOfPrograms / 100 to NumberOfPrograms / 20
MutationChance = 0.1f
EnableParsimonyPressure = NumberOfPrograms / 1,000,000 or NumberOfPrograms / 10,000 for extremely slow fitness evaluation problems.
MigrationsPerMigrationGeneration = Between NumberOfIslands / 100 (very slow evolution, slow convergence) to NumberOfIslands (fast evolution, medium convergence)
OnceInHowManyGenerationsPerformMigrationGeneration = 1
OnceInHowManyGenerationsTopIndividualsShouldBeEvaluated = 20 (or infinity, in this case of non changing environments)
FitnessGoal = dependent on problem
ApplyCrossoverOnlyWithFitnessChange = false
NumberOfThreads = The number of cores in your CPU
NumberOfResults = depdendent on the problem, usually 1.
OnlyBetterFitnessIslands = false
PrecentageToTakeBetterIndividuals = 1.0f
Evolutionary style performance tips :
The style of the evolutionary process is extremely important for good fitness progression.
In this section I give tips on how to program your project so that the evolutionary performance will be better.
1. When building new functions, always build them that they will require the least number of arguments.
More arguments are un-neccessary and will cause overhead and large tree sizes.
2. When using functions, variables, and values. Always use the minimal set of functions, variables and values.
A larger set means larger search space and less chance for the evolutionary process to find a solution.
3. Provide a smooth fitness value : a fitness value that is an integer will not provide enough clues for the evolutionary process
To continue in the search process and it will not find a solution.
4. If you have an extremely complicated problem, separate it to steps. For an example : If you want a robot to be able to walk as fast as it can,
And also shoot at targets automatically on the way to the target - that would be an extremely complicated problem.
What I think it's better to do in this kind of case is to divide and conquer : First, optimize the programs for a robot that is able to walk fast,
And once you have some descent solution, trigger another stage at the fitness progression, and include the shooting objective as well.
5. If you have an extremely complicated problem, try solving only part of it. Don't solve all of it.
6. If you can, provide as "high level" information as you can to the evolving programs. For an example, trying to find a target in a maze could be
By giving an array of the maze matrix cells and expecting the evolutionary process to accurately understand the maze matrix and to understand
the target cell identification. Or, you could put a whole function that is dedicated in telling the distance to the target.
By providing a dedicated function
you are already doing some work for the evolutionary algorithm, so it won't have to figure out everything by it's own.
7. Don't expect the evolutionary process to find a needle in a haystack. For an example, don't provide an "array" of items to the programs, because
they will NOT analyze it and gain nothing from it. Always try to insert the information directly to the progarms. Another example is, imagine you have
A number of variables that you would like the program to use it as a sort of primitive memory. One important question would be : Which size of memory should I
supply to the programs? the answer is : The size that the programs won't need to search a needle in a haystack. Don't provide 100 variables and expect the
evolutionary process to read and write to the same memory cell out of the hundred memory cells. It will be extremely random and this kind of thing will never
happen. Provide at maximum, five memory cells.
7. Understand the limitations of genetic programming, and particularly, the Evolute library :
- Small scale of genetic programming was never shown to be able to solve complex problems - only toy problems.
- Even medium scale of genetic programming (using hundreds of computers) was only be able to solve not so complicated real world problems.
- Genetic programming will enter local minima very fast and won't get out of it. Therefore,
if you see that the evoutionary process did not solve much until generation 200, most chances is that it won't solve much more afterwards.