The aim of that project is to create controllers for autonomous soccer clients using unsuperwised learning. The following technologies where applied.
- Neural Networks
- Genetic Algorithm
The training environment is a simulator based on the sserver of the robocup 2d soccer simulation league. In our case it uses only 3 players per team and a reduced set of flags.
The learning process (not only of players but also of the author) is devided into iterations where the results of each iteration are analysed and documented. The results of that analyse should lead to a better configuration or even new principles in the next iteration.
External Links
- Simulator: vsoc-2007
- Sourcecode: vsoc-ga-2018
- Neural Nets: deeplearning4J
- Genetic Algorithm: Is part of vsoc-ga-2018
Some Expressions I use in the following text that might not be immediately understandable.
- Stepwise fitness function: A fitness function not only focusing on the final goal of game. E.g. Train the players to kick the ball before they are trained to score goals (which is the finall goal by the way)
- Same as Iteration 5
- Lower max Values for kicksMin and otherGoalsMax
+ math.min(500, data.kicksMax)
+ math.min(50000, data.kicksMin * 100)
- data.kickOutMean
+ math.min(5000, data.otherGoalsMax * 500)
+ data.otherGoalsMin * 1000
- data.ownGoalsMean * 500
To get an overview of the expected results the max values of the elements of the fitness function are calculated.
Element | +/- | reinforcement | max Actions | max Value |
---|---|---|---|---|
kicksMax | + | 1 | 500 | 500 |
kicksMin | + | 100 | 100 | 10.000 |
kickOutMean | - | 1 | - | - |
otherGoalsMax | + | 500 | 5 | 2.000 |
otherGoalsMin | + | 1000 | - | - |
ownGoalsMin | - | 500 | - | - |
table 051
- Same as Iteration 4
- Improve Fitness-Function of Iteration 4
+ math.min(500, data.kicksMax)
+ math.min(10000, data.kicksMin * 100)
- data.kickOutMean
+ math.min(2000, data.otherGoalsMax * 500)
+ data.otherGoalsMin * 1000
- data.ownGoalsMean * 500
To get an overview of the expected results the max values of the elements of the fitness function are calculated.
Element | +/- | reinforcement | max Actions | max Value |
---|---|---|---|---|
kicksMax | + | 1 | 500 | 500 |
kicksMin | + | 100 | 500 | 50.000 |
kickOutMean | - | 1 | - | - |
otherGoalsMax | + | 500 | 10 | 5.000 |
otherGoalsMin | + | 1000 | - | - |
ownGoalsMin | - | 500 | - | - |
table 051
There where 10 independend testruns each with configured with the exact same configuration.
Names of the testruns: 'work001', 'work002', 'work003', 'work004', 'work005', 'work006', 'bob001', 'bob002', 'bob003', 'bob004'.
Looking at the timlines of the fitness value and some other parameters show that the testruns can be assigned to two categories.
- Kickers: All players kick the ball.
- Goalgetters: One player of the team scores goals. The other players do not contribue to the fitness value.
Category | Testrun |
---|---|
Kickers | 'work001', 'work002', 'work004', 'bob001' |
Goalgetters | 'work003', 'work005', 'work006', 'bob002', 'bob003', 'bob004' |
Testruns in the kicker category improve the capabillity of all players to kick the ball. They do not improve any of the other capabillities as e.g. scoring goals. That means they are optimizing the number of kicks of the worst player, and by that the number of kicks for all players.
The following diagram shows that the score (fitness value) is only created from 'kicksMin'.
The max score value for kicksMin is 50.000 (see table 051), which is not yet reached for any of the testruns.
Video: work001 ▶
Testruns in that categoryimprove the capabillity of one player to score goals. The other players of a team do not contribue anything to the fitness.
The following diagram shows that the score is only created from goalsMax.
The max score value for goalsMax is 5000 (see table 051), which is almost reached by some testruns ('bob004', 'work005', ...). Though the max value was reached none of the simulations started to optimize another parameter. They seem to be stuck in a local maximum.
Video: bob004 ▶
- Players are trained independendly.
- Improved stepwise optimizing fitness function.
+ math.min(100, data.kicksMax)
+ math.min(2000, data.kicksMean) * 10
+ math.min(5000, data.kicksMin * 100)
- math.min(1000, data.kickOutMean)
+ math.min(10000, data.otherGoalsMax * 500)
+ math.min(80000, data.otherGoalsMin * 1000)
- math.min(10000, data.ownGoalsMean * 500)
Reformated for better understanding
Has rewards for kicking and goal scoring. Gives penalty for kick out and own goals.
All rewards and penalties are given for a team of three players during the test of one generation. During this test a team played several matches. All values are the average scored during theses matches. E.g. if team a played 3 matches and the best kicker of that team kicked the ball 5 times in the first match, 6 times in the second match and 7 times in the third match, the kicksMax value for that team would be 6.
The reward for kicking is split into tree elements. The number of kicks of the best kicking player (kicksMax), the average number of kicks for of all players (kicksMean) and the number of kicks of the worst player (kicksMin).
In order to avoid the scenario of having one player that optimizes his kicking skills while the other two player are not trained at all these values are weighted. A higher reward is given to kicksMean (10 times higher) and kicksMin (100 times higher). To avoid gettig teams that optimize their kicking skills but do not focus on scoring goals, all three values are capped. Again the maximum value for kicksMin is the highest to force all three players to develop their kicking skills.
Penalties are given for kicking out the ball. This should train the players to keep the ball inside the feeld.
To get a reward only the goals of the best scoring player (otherGoalsMax) and those of the worst scoring player (oterGoalsMin) are taken in account. Goals are in general weighted much heigher than kicks to favour the main purpose of the game. To get teams where all players score goals, goals of the best player are weighted less than those of the worst player.
The values for scoring are capped (why did I do that ?).
Penalties are given for scoring own goals.
To get an overview of the expected results the max values of the elements of the fitness function are calculated.
Element | +/- | max Actions | max Value |
---|---|---|---|
kicksMax | + | 100 | 100 |
kicksMean | + | 200 | 2.000 |
kicksMin | + | 50 | 5.000 |
kickOutMean | - | 1.000 | 1.000 |
otherGoalsMax | + | 20 | 10.000 |
otherGoalsMin | + | 80 | 80.000 |
ownGoalsMin | - | 20 | 10.000 |
(why did I not make these calculations before the testrun ???)
- All three players learn how to kick
- All three players keep the ball inside the field
- All three players score goals.
- All three players do not score own goals
Seven independend Populations where tested. They had all the exact same parameters. Only the start values of the neural nets where randomly choosen.
The names of these populations are:
bob001, bob002, bob003, bob004,
work001, work002, work003, wok004, work005, work006
bob and work are the names of the used workstations.
In the following we will analyse if, and in what degree, the postulated training goals where attained.
The following diagram shows that the populations can be fit into three categories
- One Kicker [OK]
- All Kickers [AL]
- One Goalgetter [OG]
The OK teams include one player hitting the ball as often as he can.
The hitting player moves slowly towards the ball and hits it very soft
so that he can hit it again and again. The other players. NOT WHAT WE WANTED.
In AL Teams all players are are kicking the ball. They are not
very focused on that aim but at least all players are kicking.
SOMHOW WHAT WE WANTED TO GET.
The OG category represents teams where one player is scoring goals, but
the other two are not evolving at all. NOT WHAT WE WANTED.
One of the main aims of the simulation is to breed teams where all players are scoring goals. In this iteration most population are stuck in local minima where players do not fulfill the main aim.
The only populations (Category AL) where actually all players are trained are unsatisfying as scoring goals is not really what the players have learned.
As a consequence of that the fitness function of further iterations should reward goals of the worst player without limitation.
Another consequence must be, that the max value of the worst player must be higher. The max action count for that element of the fitness function must be much higher than 20 (For details see the table above).
Rewarding of mean values seems to have no positive effect and should not be a separate element of the fitness function any longer.
Goal of that iteration was to find a better fitness functions to avoid the pitfalls from the previous iteration. Mainly to avoid the breeding of teams that have only one evolving player. To achieve that goal the max value for the best kicking palyer was reduced.
This diagram shows us that some populations stopped increasing fitness between 1000 and 1500 generations. two of those at a relative high level of about 7000 (bob002, work003). five at a lower level of 3000 to 5000 (bob001, w001, w002 w005, w006). bob003 and wok004 show still increasing fitness at a high level. bob004 also shows continous increasing fitness althogh it did not increase very fast at the beginning.
This diagram shows us two classes of population. populations where all players are kicking [ALL] the ball and others where only one player kicks the ball.
Class | Populations |
---|---|
ALL | bob002, bob003, work003, work004 |
ONE | bob001, bob004, work001, work002, work005, work00 |
The populations with the continous incrasing fitness can be found in both groups. bob004 gets incrasing fitness by improving exactly one player. bob003 and work004 have all players kicking the ball, which was the goal of this iteration.
on that diagram you can watch how the kicking of goals evolves.
of course goalkicking increases as kicking increases because the
players hit the goals by chance. therefor we also observe the number
of owngoals. if this number is lower than the number of goals the
players tend to shoot more goals than own goals (good) at the end of
the training there is only one populaton that fulfills that task
(work006). other population where shooting goals for a certain time
but then did not improve that (very important) behaviour any longer
(bob001 generation 200 to 1500, bob004 generation 700 to 1400)
bob001 ▶ bob002 ▶ bob003 ▶ bob004 ▶ work001 ▶ work002 ▶ work003 ▶ work004 ▶ work005 ▶ work006 ▶
bob001 ▶ One player per team is kicking. Other players have little to no interest at the ball.
bob002 ▶ One player per team is kicking and scoring goals. Own goals are avoided. Other players have no interest at the ball.
bob003 ▶
One player per team is kicking and scoring goals.
Own goals are avoided.
Other players have no interest at the ball.
bob004 ▶
One player per team is kicking.
No goals.
Other players have no interest at the ball.
work001 ▶ One player per team is kicking. No goals. Other players have no interest at the ball.
work002 ▶
One player per team is kicking and scoring goals.
Own goals are avoided.
Other players have no interest at the ball
work003 ▶ One player per team is kicking and scoring some goals. Other players have no interest at the ball.
work004 ▶
One player per team is kicking and scoring goals.
Own goals are avoided
Other players have no interest at the ball
Only one player per team learns kicking and (sometimes) scoring goals. NOT EXACTLY WHAT WE WANTED.
Two more videos with unknown parametersets