User guide of version 0.1

1. Introduction
2. Exmaples what can be done
3. general usage
4. Tips for better performance

 

1. Introduction

Have you ever thought about randomizing a code, and let the computer write those programs instead of you? Well... that's what genetic programming and genetic algorithm is all about - Letting the computer write the code for you.
When you look at a program, you see input and output. It's always working like that.
By using this library, you will define the input, and you will take the output.
Some programs that will be made by the computer will be better, and some will be worse. You can check how good a program is by giving it an input, and expecting a certain output. If the program didn't reach that output, then the program is bad! but, if the program came closer to the output. the program is better.
this grade, or mark , is called Fitness.
Let's summarise the role of the user and the library :

1. the user decides how the programs will appear.
2. the computer randomises random programs
3. the computer gives to the user those random programs
4. the user gives input to the programs, runs them and takes the output. then he calculates how good the program is.
5. the library takes this fitness and works so that the next set of programs will be better than the previous.

more on The user decides how the programs will appear :

Since the computer randomises programs, still, it needs some rules how to randomise and what to randomise.
A program in Evolute C# version 0.1 looks like this - a tree.
A tree contains a set of functions, variables and values.
A function in a tree takes parameters, which are values that will comes from the branches below, doing an action, and returning a result.
A variable in a tree has a value that can be changed before the program is being run. it can be accessed globally through the entire program.
A value in a tree is a const value that remained through the entire evolution process. it's only used to receive some const value when executing a function.
The user can decide what function, variables and values will appear in the random trees. he can also decide what size the trees will be.

2. Examples what can be done

I'm referring now to the first TestProject when you download Evolute.

Let's say we have a table, of X values, and a table of Y values.

I would like to find the connection between them. Like i said before, a program recieves an input and gives an output.
The connection will be a code. the code will receive X, and will output Y.
But i have a problem. I don't know the connection between X and Y. I just have examples. and i would like the computer to generate me a code that get's X and outputs Y even not for these examples, but also values outside these examples, like for a new input X that i invented.

X

Y

1

7

-5

-605

2

-34

9

..

4

..

-7

..

0

..

2.5

3

..

So can i find a program that i give it X and Y examples and it will give me code how to turn X into Y?
let's see.

 

            Engine engEngine = new Engine();

            engEngine.NumberOfPrograms = 1000;

            engEngine.MinInitialTreeDepth = 3;

            engEngine.MaxInitialTreeDepth = 7;

            engEngine.MaxOverallTreeDepth = 10;

            engEngine.FitnessGoal = Engine.FitnessGoalEnum.MIN_FITNESS;

            engEngine.Overselection = 10;

            engEngine.ChanceForCrossover = 1F;

            engEngine.SaveTopIndividuals = 0.01F;

 

Here i am creating a new Engine and set up some parameters.
You don't even have to set those parameters, but if you want to find out more you can look at the reference.

            engEngine.SetFunctions = new FunctionType[]

            { new Add(), new Substract(), new Multiply(), new Divide() };

            engEngine.SetValues = new TreeLanguageEvolute.ValueType[] {new RandomMacro()};

 

 

One of the most important things is to determine what will be in those trees.
I want to functions "Add, substract, multiply , and divide" to be in the trees.
Those functions are just like the ordinary +, -, * and / that we have in C#. they take 2 values and return a result.
by defining what functions i want in the trees, i give an option for the computer to use these functions in the creation of random programs.
I also set the values that i want to appear in the trees. RandomMacro means that in some tree nodes, random floats between 0 to 1 will appear there.
That can be useful, because it gives a chance to the computer to use some values in his programs. it can help the computer, because such random values can give help.
(By the way, on this example it actually worsens the process very much)
You can also give random values out of your own. If the program your'e trying to find, for an example, needs to find out how to calculate the diameter of a circle by itself , it's a good idea to put a value of PIE in there.

           

engEngine.DeclareVariable("X");

 

 

This is a declaration of a variable. By declaring the variable X, it will appear randomly on the tree nodes and then we will be able to give input to the programs before running them.

           

engEngine.EvalFitnessForProgramEvent += new Engine.EvalFitnessHandler(engEngine_EvalFitnessForProgramEvent);

            engEngine.GenerationIsCompleteEvent += new Engine.GenerationIsCompleteHandler(engEngine_GenerationIsCompleteEvent);

 

these are two events that can be invoked from the Engine class. The first event is a MUST. it must be invoked. because you must give a fitness to the programs.
the event Engine.EvalFitnessForProgramEvent is a MUST.
The second event, Engine.GenerationIsCompleteEvent is invoked when a generation of programs ends (when all the programs in a generation finished to run and you gave them fitness), that event can provide you statistics about the generation.

all the code so far :

namespace TestProject

{

    public partial class FunctionGuesserTestForm : Form

    {

        string strLabelString;

        public FunctionGuesserTestForm()

        {

            InitializeComponent();

            Engine engEngine = new Engine();

            engEngine.NumberOfPrograms = 1000;

            engEngine.MinInitialTreeDepth = 3;

            engEngine.MaxInitialTreeDepth = 7;

            engEngine.MaxOverallTreeDepth = 10;

            engEngine.FitnessGoal = Engine.FitnessGoalEnum.MIN_FITNESS;

            engEngine.Overselection = 10;

            engEngine.ChanceForCrossover = 1F;

            engEngine.SaveTopIndividuals = 0.01F;

            //engEngine.ApplyCrossoverOnlyWithFitnessChange = 0.1F;

            engEngine.SetFunctions = new FunctionType[]

            { new Add(), new Substract(), new Multiply(), new Divide() };

            engEngine.DeclareVariable("X");

            engEngine.SetValues = new TreeLanguageEvolute.ValueType[] {new RandomMacro()};

            engEngine.EvalFitnessForProgramEvent += new Engine.EvalFitnessHandler(engEngine_EvalFitnessForProgramEvent);

            engEngine.GenerationIsCompleteEvent += new Engine.GenerationIsCompleteHandler(engEngine_GenerationIsCompleteEvent);

            engEngine.RunEvoluteOnThread(10000);

 

            Timer tmTimer = new Timer();

            tmTimer.Interval = 1000;

            tmTimer.Tick += new EventHandler(tmTimer_Tick);

            tmTimer.Start();

        }

 

 

Now let's see what did i do in the engEngine_EvalFitnessForProgramEvent function.

 

 

        void engEngine_EvalFitnessForProgramEvent(TreeLanguageEvolute.Program progProgram, Engine sender)

        {

            progProgram.Fitness = 0;

            Hashtable hsVariables = progProgram.GetVariables();

            Variable varX = (Variable)hsVariables["X"];

 

            for (int nCaseNumber = 0;

                 nCaseNumber < 100;

                 nCaseNumber++)

            {

                // y = 5x^3+x^2+x

                varX.Value = (float)GlobalRandom.m_rndRandom.NextDouble();

                float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value)

                    + (3 * varX.Value * varX.Value) + varX.Value;

                progProgram.Run();

                float fActualResult = progProgram.Result;

                progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult);

            }

}

       

This event gives me the program i need to evaluate it's fitness, that is one program among the entire set of program.

First i initialise the fitness of the program to 0. why? later you will find out.
Next, i pull out the Variable X from the hashtable for easyness. The variable will be used in order to give input for the programs.
let's focus on the for loop, that's the most important part.

 

            for (int nCaseNumber = 0;

                 nCaseNumber < 100;

                 nCaseNumber++)

            {

                // y = 5x^3+x^2+x

                varX.Value = (float)GlobalRandom.m_rndRandom.NextDouble();

                float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value)

                    + (3 * varX.Value * varX.Value) + varX.Value;

                progProgram.Run();

                float fActualResult = progProgram.Result;

                progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult);

            }

The for loop runs through 100 cases. (or 100 examples)
in each iteration of the for loop, i randomly select X value, and i give it to the variable. The moment i give it to the variable (varX.Value = ...) it get's updated in the program tree.
afterwards, i generate the Y for the function 5x³+3x²+x.

So what, you may ask. i know the function, what's the point in all of this?! i know the connection, i don't need to create this program.
YES - but this is only example, not a real world problem, you may know the connection, but the computer don't know. It only knows examples.
instead of writing :

 

                float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value)

                    + (3 * varX.Value * varX.Value) + varX.Value;

I could've easily taken X and Y examples from a file. these are 100 examples randomly generated of X and Y.
after we generate an example of X and Y, and we set the input (by setting the variable value of X), it's time to run the program.

 

 

                progProgram.Run();

                float fActualResult = progProgram.Result;

I run the program, then i take the result that the program gave me for the input i gave.
we do have the original result (the Y) . so we can find out how the computer got it wrong. the difference between the Y (which we know for sure it's the output we wanted, because we made it) - and the output that the program gave, is the fitness that we can give to the program.
that means, the less this fitness is, the better the program because it reaches more and more to the real output we wanted.

 

                progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult);

This just sums up what i said above. all the differences for all the 100 cases we generated is the fitness of the program. (this is in order to give the best grade - it's possible that one program will randomly succseed in one example and will fail badly in other. the sum of 100 cases makes it more general so now it will be more difficult for programs to randomly succseed)

One thing we need to take care of is to set the goal of the evolution towards minimal fitness. we can do that in this code, before the algorithm runs.

 

               engEngine.FitnessGoal = Engine.FitnessGoalEnum.MIN_FITNESS;

This is also appearing in the above complete code example.

The entire code :

void tmTimer_Tick(object sender, EventArgs e)

        {

            this.label1.Text = strLabelString;

        }

 

        void engEngine_GenerationIsCompleteEvent(Statistics stsStatistics, Engine sender)

        {

            this.strLabelString = "Generation number " + stsStatistics.GenerationNumber

                                + " : min fitness = " + stsStatistics.MinFitnessProgram.Fitness

                                + ", Total nodes = " + stsStatistics.TotalNodes;

            try

            {

                Graphics g = this.CreateGraphics();

                stsStatistics.MinFitnessProgram.Draw(g, this.Width, this.Height, this);

            }

            catch (System.Exception ex)

            { 

            }

        }

 

        void engEngine_EvalFitnessForProgramEvent(TreeLanguageEvolute.Program progProgram, Engine sender)

        {

            progProgram.Fitness = 0;

            Hashtable hsVariables = progProgram.GetVariables();

            Variable varX = (Variable)hsVariables["X"];

 

            for (int nCaseNumber = 0;

                 nCaseNumber < 100;

                 nCaseNumber++)

            {

                // y = 5x^3+x^2+x

                varX.Value = (float)GlobalRandom.m_rndRandom.NextDouble();

                float fExpectedResult = (5 * varX.Value * varX.Value * varX.Value)

                    + (3 * varX.Value * varX.Value) + varX.Value;

                progProgram.Run();

                float fActualResult = progProgram.Result;

                progProgram.Fitness += Math.Abs(fExpectedResult - fActualResult);

            }

 

//             if (progProgram.m_fFitness < 0.01F)

//             {

//                 progProgram.m_fFitness =

//                     0.00001F * progProgram.m_arrRoots[0].Count;

//             }

        }

 

So what do we have so far? over many generations that the program's fitness will be generated, the programs will get close and closer to discovering the mathematical function 5x³+3x²+x.

I want to take a moment to explain about the other event, Engine.GenerationIsCompleteEvent.
This event is called at the end of every generation and you can assign to that event in order to get statistics about the generation.

 

        void engEngine_GenerationIsCompleteEvent(Statistics stsStatistics, Engine sender)

        {

            this.strLabelString = "Generation number " + stsStatistics.GenerationNumber

                                + " : min fitness = " + stsStatistics.MinFitnessProgram.Fitness

                                + ", Total nodes = " + stsStatistics.TotalNodes;

            try

            {

                Graphics g = this.CreateGraphics();

                stsStatistics.MinFitnessProgram.Draw(g, this.Width, this.Height, this);

            }

            catch (System.Exception ex)

            { 

            }

        }

We can get from the Statistics object, info about the generation. such as the generation number, the total nodes in the generation, and also we can get the minimal and maximal programs. (we can also run these programs, and draw them, like in this example)

Test project 3 :

Test project 3 is about a robot that can rotate itself, and move forward. there are walls around, and it can't cross them. the goal of the robot is to reach a destination point by navigating around the walls. The fitness computation that i did in this example, is the distance between the robot location to the destination location.
The programs that managed to get as close as possible to the destination, are better, that's why they got lower fitness.

Anyways, I'm not going to explain now the entire process , like i did with test project 1. But i still going to point out few things :

engEngine.SetFunctions = new TreeLanguageEvolute.FunctionType[]

            { new Add(), new Substract(), new Multiply(), new Divide(),

            new IfGreaterThenElse(), new WillIntersect(this.Width, this.Height)};

Put attention to the functions in the function set. The IfGreaterThenElse is an "if" function, that let's the random programs take a conditional path around the tree.
A more advanced option is when you want to include functions that are not avaliable in functions that the library includes.
In order to do that, you must create them yourself, by inheriting from the Function class.
This is an example how it's being done :

    class WillIntersect : Function

    {

        int m_nScreenWidth;

        int m_nScreenHeight;

 

        public override int NumberOfArguments

        {

            get

            {

                return (1);

            }

        }

 

        public override string Name

        {

            get

            {

                return ("Intersects");

            }

        }

 

        public WillIntersect(int nScreenWidth, int nScreenHeight)

        {

            this.m_nScreenWidth = nScreenWidth;

            this.m_nScreenHeight = nScreenHeight;

        }

 

        public override float action(Node[] arrNodes, TreeLanguageEvolute.Program progOwner)

        {

            Hashtable hsVariables = progOwner.GetVariables();

            float fPlayerX = ((Variable)hsVariables["CurrPosX"]).Value * this.m_nScreenWidth;

            float fPlayerY = ((Variable)hsVariables["CurrPosY"]).Value * this.m_nScreenHeight;

            float fAngle = ((Variable)hsVariables["CurrAngle"]).Value;

            Player plrPlayer = new Player(new PointF(fPlayerX, fPlayerY));

            plrPlayer.Angle = fAngle;

            if (plrPlayer.GoingToIntersect(NavigationTestForm.m_lstRects))

            {

                return (1);

            }

            else

            {

                return (-1);

            }

        }

    }

}

The function class is an abstract class that you inherite from it, you must overload some functions and implement them.
The properties that you must implement :

The NumberOfArguments property means, how much arguments your function takes (or uses). In the function in the above example, it takes 1 arguments.
The Name property is just a name to give to the function.

The functions that you must implement :

        public override float action(Node[] arrNodes, TreeLanguageEvolute.Program progOwner)

        {

        }

Is the function you must implement. When the function is called, you will receive an array of tree nodes, and a program. You can use the program parameter in order to get information from the program itself.
However, the parameter you will mainly use will be the arrNodes parameter. for every node, you can return it's value.
You can return a value from node[0] by calling arrNodes[0].ReturnValue(..)
the NumberOfArguments property will affect how many elements will be inside the array arrNodes that will be passed into the function.

WARNING : as of version 0.1, it may not be good to have a function that takes 0 parameters because it can interefere with the tree limits.

Another example of a function, which actually uses the nodes, is the "Add" function.

/// <summary>

    /// The add function, Needs two tree sons, adds them both and returns

    /// A result as the sum of the both sons

    /// </summary>

    public class Add : Function

    {

        public override float action(Node[] arrNodes, Program progOwner)

        {

            return (arrNodes[0].ReturnValue(progOwner) + arrNodes[1].ReturnValue(progOwner));

        }

 

        public override int NumberOfArguments

        {

            get

            {

                return (2);

            }

        }

 

        public override string Name

        {

            get

            {

                return ("+");

            }

        }

    }

As you can see, it takes 2 arguments (which means it will have 2 nodes in arrNodes)
it returns the sum of ReturnValue of node[0] added to the ReturnValue of node[1].

General usage :

Let's review what i said at first, because that will be clearer now.

1. the user decides how the programs will appear
this is done by setting all the properties, setting the set of functions, the set of values, and declaring variables
2. the computer randomises random programs
the algorithm is doing it alone.
3. the computer gives to the user those random programs
This is done by the event for evaluating a program's fitness.
4. the user gives input to the programs, runs them and takes the output. then he calculates how good the program is.
This is done by inserting the input to the programs (by setting the values of variables), by running the programs, and setting the fitness of the program.
5. the library takes this fitness and works so that the next set of programs will be better than the previous.
the algorithm is doing that alone.

you also have 3 example projects that you can use to find out what i'm talking about.

Tips for better performance :

The first thing that affects the performance is the population size. no other property in Engine class affects more than that.
more programs meaning having more ideas. more ideas meaning closer chance to discovering the solution.
If you must discover a solution, use more population. it will take much more time though.

Another second thing that can affect performance, with such easy "one minute tweaks", is the property "Overselection"
If this property is set to a size which is like 1% of the population, (and not 2 individuals but more) then a big speed up can occur,
although, this speed-up is costy, the meaning of selecting only the best individuals from the population is costy. sometimes, the less good
individuals can have good ideas inside their code. throwing them away meaning throwing chances for better solutions in the long term.
It's up to your choice whether to make that sacrafice.

Other good things to know is the limitation of genetic programming.
The genetic programming may look like magic at first, but facts are it's quite limited.
It's actually very limited in Evolute C# version 0.1 because the automatic function discovery can improve things a lot.
You just can't shove things to the algorithm and say "solve it!". so lower your expectations already now.
Things that you should know is that there is no point at waiting above generation 100, on population of 1000 programs.
the more you wait and the programs still perform bad - the less chance that they will ever get better.
if you still want to do something complex by the genetic programming algorithm,
it's better that you will create as much functions as possible that will do most of the work for them.
for an example, if you plan on trying to evolve an artificial intelligence that's moving in a maze, it's good to have a function named "WallAhead" that will return 1 if there is a wall ahead or -1 if not. Don't try to give an array of walls to the program so it will find it itself (because it won't), And also, give to the computer as much information that he needs to know. too much information and the performance won't be good.


one of the other things that should guide you is : "What is the chance, that a completely random program will do something a bit good?"
your fitnesses must be as discriptive as possible. the more information you give through the fitness , the more helpful you are in filtering out the bad programs.
If the fitness is equal to all the random programs - you have already failed. that should be the guide to place in your head - do something that a completely random program CAN be "a bit" better than the rest.

Also , don't expect for the programs to know something that only 1 out of a milliard random program is able to find out.
Like, don't expect programs to find out alone, that they will need a data that is found at the index of 9251 of an array that you give to them. That's a really hard function, because the chance that one program will activate it just in the right time is reaaaaaaly reaaaaally slim.