PhD Progress: Bug Fix Improvements

Posted by Sam on August 17th, 2010 under PhD Project  •  No Comments

Seems that little bug fix was what was holding the agent back from attaining better results. Of course, the bug wasn’t always present, I know I introduced it recently, but it’s good to see that the agent is back on its feet – even without the presence of useful fromGhost rules.

Here is the results of the experiment 62.78% (78.39%) into the first run. Note: this has been updated to show later results, also of two experiments.

While these are promising, they come at a high (possibly negotiable) price: time. The experiment has so far taken 94 hours and a half hours. Out of that time, 75 hours is learning time. That’s almost 4 days of runtime and just over 3 days of learning time, and it’s not even half complete on the first run. As said in the previous post: this is likely a combination of the changes to the learning rate and the fact that the better PacMan does, the longer it takes. According to the ETA, the first run will be complete in 5 more days… Assuming 10 days of learning, that’s 100 days to simply produce some reliable results. 3 and a third months!

So something needs to be done. Symphony, shorter learning algorithm, optimised Ms. PacMan environment, any of them. Otherwise my PhD will be spent waiting on possibly unstable experiments to complete.

The readable policy goes like this:
A typical policy:
(distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 0.0 10.0)) (pacman player) => (fromGhost ?X ?__Num7) / (distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 0.0 52.0)) (pacman player) => (fromGhost ?X ?__Num7)
(distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 1.0)) (pacman player) => (toPowerDot ?X ?__Num1) / (distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 50.0)) (pacman player) => (toPowerDot ?X ?__Num1)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)
(distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 43.0 50.0)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 32.75 43.0)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 12.25 22.5)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 2.0 12.25)) (pacman player) => (fromPowerDot ?X ?__Num2)
(junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -11.0 0.0)) => (toJunction ?X ?__Num4) / (junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -16.0 -11.0)) => (toJunction ?X ?__Num4)

Note that some rules are not present. Primarily the agent runs from ghosts, though only ghosts 10 units away (smart move, if they are hostile). Then the agent eats powerdots to keep the ghosts pacified. Though the first rule of this slot seems a little useless, as the agent will rarely be 1 unit from a powerdot, and when it is, it’ll probably be eating it anyway. Perhaps it is used to counterbalance an overarching fromPowerDot rule. Following this is the toDot slot, times three. The agent clearly likes to stack that rule with all three rules of the slot being active. This will result in the agent always pursuing dots, but when dots are 0-13 units away, the agent pursues them with gusto. The fromPowerDot rule seems a little useless; running from distant powerdots, but perhaps it is just how the agent copes with being forced to use that rule (it is likely to disappear with slot removal as it has a selection chance of 0.58). The junctionSafety one doesn’t seem to be useful either; perhaps the same reason for the previous slot.

When I checked in on the agent previously on the weekend, the fromGhost rule was not the top rule; I think toPowerDot was. Or perhaps toFruit (which is strangely missing – it’s selection ratio was too low, probably caused by the fact the rule rarely triggers). So I’m picking the point in the graph where performance takes off is where fromGhost was most actively used.

PhD Progress: Ghost Rule Mutation

Posted by Sam on August 16th, 2010 under PhD Project  •  No Comments

Well, after fixing up the bug in the environment, PacMan is performing better, but is still limited at the ~4000 point mark. The reason for this is likely because there are no useful rules regarding ghost avoidance/chasing. I’m guessing either the pre-goal was never met by the optimal policy, or the goal was met with differing states: edible and non-edible ghosts. Furthermore, it seems like I have made it slower. Whether this is a result of it performing better, or because I changed the number of steps required, I am not sure. I have a feeling both may be involved, but at least I can modify the latter. I should still have the old code for it somewhere in the SVN repository.

The pre-goal model is useless for these sorts of heuristic mutations, so I will likely have to introduce a new algorithm for creating mutations: either with or without the existing pre-goal one. Perhaps maintaining a tree of rules, with the covered rule as the base rule and more specialised rules running down from it. This tree needs to be prunable too. This is beginning to look a lot like TG, though.

A secondary problem is the fact that actions not necessarily directly concerned with objects in the domain may still perform better with conditions linked to those objects (e.g. toPowerDot only when ghosts are not edible). If actions didn’t have delayed effects, this could easily be found by watching the post-action state, but alas, toPowerDot could be active at any point, regardless of PacMan’s distance to a powerdot. As far as I am concerned at the moment, I cannot see a solution to this problem without expensive state watching. Perhaps just the to/fromGhost actions will have to suffice for now.

D&D – Multi-Culture Pals: The Triple Ruby Inn

Posted by Sam on August 14th, 2010 under D&D  •  No Comments

I write this journal so that my memories may be recorded and inscribed upon the world, however insignificant that inscription may be. In this jumble, this maze of thoughts within my mind, writing it into this journal allows me to resolve the chaos that we so often find ourselves in.

I DimatrĂ¼s, Cecilia and Dolfer arrived in Sharn, seeking a profitable job offered by the city’s university. While the job does not sound as though it will be helping the world in the ways I wish it to, we nonetheless have found ourselves needing coin, and this job was the most lawful one we could find. Against the grime of this city, Cecilia shines brightly with her pale Changling white visage, a sight my underground-trained eyes often shy away from. I do not know why she does not take the shape of a human, as it would attract much less unwanted attention. And Dolfer contrasts just as much so, though certainly in a very different manner. Many peasants point and stare at his odd Warforged body construction of metal and plant, not to mention the fact his gaze often lingers uncomfortably long on those who get too near him. I fear he has not yet fully adjusted to society and the presence of people yet. But still, the stares are not just focused on my companions. The people of Sharn, like the people across the rest of the land, often glance up at my own towering height and foreign race. The brothers of my Minotauren race do not do well to inspire acceptance, with the large majority of them choosing a life of evil over my own choice of justice.

After passing through the lower levels we arrived at the university on the Upper Level of Sharn, and I must say that never have I seen such marvellous, labyrinthian construction. The towers of Sharn rise high above the ground below and it is surely through the work of magic that they do not fall. The University itself is grand, though according to Cecilia it is only an average one compared to those in other cities.

I constantly find myself amazed with the devices utilised in these cities, as the serving woman in the main hall was using some sort of magical device for quickly accessing information. I only wish I had such a device to record my thoughts in, though perhaps I would find myself lost within its complexities. She guided us to our employer’s office, though once we arrived we found that this Keith Wyatt was a none-too-reputable sort of academic.

While he was kind enough to offer us fine breads, the rest of the encounter was less pleasant. The man would fit in well in a pit of snakes, though even the snakes may find him too slippery for their likes. However, we managed to barter a reasonable reward from him for the not entirely legal job, though I found that before the day was over, we were already in jail for it. I must remember to watch him when we return so that he does not double-cross us.

The job was in essence treasure hunting, though the slippery devil never admitted to it. We were tasked to recover an original deck of Three Dragon Ante cards, something I was never aware even existed. Seems the popular game of gamblers was originally founded by the dragons and played for much more than coin. While Keith was vague on the location of the cards, he did know of a man that could help us, to which we then went to meet.

The Triple Ruby Inn was the location of our target, a typical gambling den of the Lower Level of Sharn. We found him without trouble, this Ku van’Hu, a sly gnome with a penchant for gambling. But he would not talk to us without first having a game of the Three Dragon Ante with him first. I found that I was quite skilled at the game, even though I have never had a chance to play it. Unfortunately, our game was cut short by violent patrons of the inn – members of the Order of the Emerald Claw – attacking us without any discernable reason. Such injustice was quickly dealt with by our might, with one member being thrown across the room with the electrical blade of Cecilia and another taking his own life when he was seized by Dorfel. Unfortunately the town guard had entered the inn by then, and immediately saw us as the wrongdoers! Us, the victims of this random attack! They ordered for all of us to drop our weapons, but the last Emerald Claw looked as though he was not likely to stop attacking, so with a leap from the table I slammed into the bar bench, pinning him down with my horns. As for Ku and the barkeep, they were nowhere to be found. Perhaps they fleed when the fight broke out.

Needless to say the guards were not letting a group such as us off with a warning and we were sent off to the cells. I earned many beatings for my struggles against being bound, though I think the guards were looking for any excuse to strike me. I am thankful none of them were dwarves or I am afraid that I could not control myself and would have mercilessly destroyed them. Our court date is tomorrow and I pray that they see the truth of the situation else we may have to resort to illegal means to right ourselves.

PhD Progress: Doh!

Posted by Sam on August 12th, 2010 under PhD Project  •  No Comments

Dammitall! The PacMan environment has been found to be broken and any agent that passes the first level stands little chance of succeeding from there. So the past 44 and a half hours of experimentation have been for nothing.

How irritating. Well, I guess I’ll just have to restart it. But I will do that tomorrow, when I have found a solution to decreasing the elite policy size (and hopefully increasing the update value).

PhD Progress: Successful developments

Posted by Sam on August 11th, 2010 under PhD Project  •  No Comments

Seems that all of my work over the past few weeks is paying off. All the additions of extra learning options and such. Blocks World is now able to complete its onAB learning task in little over 36 minutes (just under 20 minutes learning time) (this includes learning modules as well). The modules it learns are compact, neat and valid, consisting of minimal rules (thanks to the slot removal aspect). The convergance property allows the learning to progress quickly along with little over a minute per iteration used.

As for Ms. PacMan, it is still slow, but after 12 (learning) hours, the experiment is 2.7125% along (27.125% iteration). The ruleset is beginning to shape itself to something useful, with handy fromGhost rules (which include conditions for ghost state: aggressive) and toDot rules near the top, and other, less useful rules near the bottom/disappearing. Assuming it continues at this speed, one iteration will take ~45 hours (2 days). Hence the entire experiment takes 20 days, but these can be split up.

Speaking of splitting up, Eibe brought up the possibility of splitting the learning across multiple machines (i.e. Symphony). This could be easily achieved, as the very nature of the experiment allows it to be split. Simply send out X agents operating in their own environments and when they return, sort them in order and update generator. Then repeat. Of course this current system operates in an iterative manner, but the learning should be roughly equal if the update parameter is proprotional to the number of samples.

An alternative to that method is a much larger one which only requires 10 machines, each running entire experiments. Then the results are averaged and stored. But that takes much longer and doesn’t make full use of the number of processors Symphony has available.

Seems there is still the problem of statistical pre-goal unification which hasn’t upset the PacMan experiment yet, but is likely to when a pre-goal is created with edible ghosts. I’ll have top give more thought to sorting that out later.

A testing policy from PacMan:
Policy:
(distanceGhost player ?X ?__Num7&:(betweenRange ?__Num7 2.0 12.666666666666666)) (nonblinking ?X) (aggressive ?X) (pacman player) => (fromGhost ?X ?__Num7)
(distanceFruit player fruit ?__Num9&:(betweenRange ?__Num9 2.0 14.5)) (pacman player) => (toFruit fruit ?__Num9)
(distancePowerDot player ?X ?__Num1&:(betweenRange ?__Num1 0.0 51.0)) (pacman player) => (toPowerDot ?X ?__Num1)
(distanceDot player ?X ?__Num3&:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3)
(distancePowerDot player ?X ?__Num2&:(betweenRange ?__Num2 19.5 29.25)) (pacman player) => (fromPowerDot ?X ?__Num2)
(distanceGhostCentre player ?X ?__Num8&:(betweenRange ?__Num8 0.0 13.0)) (pacman player) => (toGhostCentre ?X ?__Num8)
(distanceGhost player ?X ?__Num6&:(betweenRange ?__Num6 34.0 43.0)) (pacman player) => (toGhost ?X ?__Num6)
(junctionSafety ?X ?__Num4&:(betweenRange ?__Num4 -8.0 0.0)) => (toJunction ?X ?__Num4)
(distanceGhostCentre player ?X ?__Num5&:(betweenRange ?__Num5 0.0 52.0)) (pacman player) => (fromGhostCentre ?X ?__Num5)

Clearly from ghost behaviour is most important, along with eating fruit and using the powerdot to keep the ghosts placid. The toDot is all encompassing and will always be active. The fromPowerDot rule will only trigger at a distance, so it has no value. The toGhost rules have little value currently, as they don’t include the edible attribute, but that is likely because they can’t be included. The last 2 are practically useless, though the all-compassing fromGhostCentre does lend some defensive behaviour.

Just had a thought about pre-goal mutation and such. Haven’t fleshed out the possibilities yet, but mutate rules based on what constant elements are present and mutate in relevant conditions seen in conjunction with said elements. So a rule concerning a ghost would mutate in the 4 attributes concerning the ghost: edible, aggressive, blinking or nonblinking. This could also create an opening for negation, allowing me to remove half of those attributes. I’ll think on it.