PhD Progress: Testing Domains

Posted by Sam on March 16th, 2010 under PhD Project  •  No Comments

Over the course of my research I have come across a number of testing domains, both official and unknown. This post intends to list them, both as possible testing grounds for my agent, and for others to use.

Blocks World
This one needs no link or introduction. If you have come across this post, Blocks World is where you should probably be starting. It’s easy to grasp, but not necessarily simple to overcome (for an agent).

Ms. PacMan
My first domain. Agent plays Ms. PacMan over a number of levels. Domain isn’t easily grasped as a relational domain, but will soon be.

StarCraft
Or for this matter, any RTS. But StarCraft is currently where my efforts are going, as there is an international contest running for it currently, and it already has tools for interfacing with the game and creating a bot.

Game of Intelligent Design
A collection of tasks for a script-based agent to solve. Currently, I am unsure whether it’s simply a test of creating a smart, one-shot agent that completes the problem or if the agent is capable of storing knowledge over a number of iterations. Some of the tasks seem to require that the agent has some form of knowledge evolution. The problems again aren’t easily put into a relational map, but I’m sure it’s possible. The problem lies with the numbers in the environment.

Car Racing
A car racing competition is also being run at the moment, which would be a fun environment to attack. While I’m not entirely sure how the environment is given, I figure it could be relationalised fairly easily.

ACM Queue Challenge
While the competition is over, I reckon the software should still be available. A simple, yet complex game of trying to capture counters using a ‘line-drawer’ and some ‘pushers’. The rules are simple, but an optimal play strategy is near-impossible. Especially when an opponent is involved.

Google AI Challenge
Again, this competition seems to be over, but it is another possible environment. The problem lies in how to implement it.

Mario AI
The Mario AI competition seemed to start in 2009, or at the very least it caught my eye then with the popular video of the A* Mario solution. The Mario world is represented in Java, making the environment easy to rig up and test with the agent. Perhaps this could be part of my batch? StarCraft, Ms. PacMan, Mario and Tetris? A thesis on computers playing computers.

PyLab
A testing environment developed by a group of RL researchers. It contains many testing environment and other things, including RLGlue support. Not that that’s much help for my relational system, but it did begin with RLGlue, so it shouldn’t be too difficult to write a proxy environment.

There are probably heaps of other ones too.

PhD Progress: Dynamic CE Updating

Posted by Sam on March 13th, 2010 under PhD Project  •  No Comments

With mutation completed, and modularisation next, there needs to be a middleman of sorts. The fixed 1000 step CE update process is too rigid to allow the agent to split off into it’s own internal goal. So, the next step is to refactor the system such that the agent decides when to update. Note that this agent is not the same as the acting agent. In fact, this architecture will basically be an actor-critic architecture.

In any case, the critic controls what the actor is doing and observes the outcomes. The dynamicy (?) of the system is such that the number of episodes per iteration is dynamically adjusted towards the number of rules in the rulebase. In the first standard 100 random rule experiments, 1000 episodes were used (sure it was 1000 for the 30 rule experiments too). So, I propose that the population of each iteration is adjusted to 10 * the largest slot. Furthermore, the elites were made up of 5% of the samples. This could be modified too, to perhaps 10%, or 20-25% using weights (the lower samples have low update weight).

The proposed algorithm is as follows:
1) Critic continually runs actor until all LGGs are found, and the pre-goal has settled. Perhaps some form of guidance towards previously seen elite samples could be used here.
2) Once 1 is complete, begin the formal testing over iterations, recording all sample’s performance.
3) Once we have 10*max(slot.length) samples, begin the update process and post-update process (possibly creating new mutations).
4) If using a goal with constants, it is likely that modularisation will need to take place (unless we already have a module for the problem). If we mutate a constant fact, create/load module. Creation involves essentially a recursive creation of the entire system. More on this step later.
5) If, after X iterations, a slot remains roughly the same (same best rule, which only gets better), then fix it in place, and create a new slot, bar the best rule.

That should do it for now.

PhD Progress: The future…

Posted by Sam on March 9th, 2010 under PhD Project  •  No Comments

Progress is going well on the new system, and it appears to be performing well. I just today finished up the mutations and will be moving onto modularisation, which is likely to cause problems. Still, once I have that resolved, I’ll need to move onto other directions.

Sure, there’s all the experiments I can run, like observing the effect of population size/elite percentage, and there’s getting Ms. PacMan and StarCraft running, but those will likely only take up 2010, or less. So the problem now lies in what I can do in the future.

One area is to try to remove the ‘bonuses’ the agent is given for it’s learning (valid actions, initial pre-goal, known goal state). Each of these can safely be removed, but it makes things much more difficult for the agent, as in all cases it has to learn the dynamics of the environment by itself, probably with an alternative learner (TG or RIBS, etc).

Another area is to try and ‘commercialise’ the agent. In the sense that WEKA is a machine learning suite, the agent and environment could be loaded from GUI form and run, allowing the user to specify the necessary information (and be allowed the option of leaving out some things, as mentioned above). This could also tie into teacher/student learning, allowing the user to interrupt the agent and teach it what to do at a given point.

If I was ever to get stuck, I’m sure I could just look back at my posts from earlier in the research to attempt to solve some of the earlier dreams. It would be really cool if I could get a robot to play around with.

PhD Progress: Mutation working

Posted by Sam on March 8th, 2010 under PhD Project  •  No Comments

It appears that single step mutation is fully operational. There do appear to be some bugs here and there, but I should be able to find them eventually.

Single step mutation is where we only have one batch of mutants, so rules aren’t continually mutated. Further steps could result from mutating extra rules every CE update iteration, where we sample the distribution N times (where N is equal to the number of rules in the slot).

A further problem is re-covering, where it isn’t necessary. This could be fixed by appending the general rules (in sampled order at the end of every policy, as default go-to rules should the more specific ones fail.

PhD Progress: Differing pre-goal actions

Posted by Sam on February 26th, 2010 under PhD Project  •  No Comments

Working on forming a general description of the pre-goal state and it is raising issues. In Blocks World, it is all fine and good, because for each of the 3 main goals, the final action is the same, therefore there are no issues in generalising the pre-goal to the action (simply swap all variables for those present in the action). Constants are another issue, but they shouldn’t be conflicting – they are simply a matter of holding onto constants in the pre-goal unless forced to generalise.

The problem lies in the clear(a) case. In the clear(a) pre-goal, the state is: on(X,a), clear(X), clear(Y), etc. However, the problem can be solved by either move(X,Y) or moveFloor(X). The fact that X happens to be the same block in either action is simply circumstance on how the actions are defined.

The only way that I can see this being solved is to have separate pre-goals for each action predicate. Hmm, this actually makes sense anyway, as each action will need to be mutated towards it’s particular pre-goal anyway (if it exists). So this also restricts the mutation operator by only allowing mutations on pre-goals that exist for that action. Unfortunately, restricting mutation could be a problem because while an action may be ideal for the final step, the intermediate steps may require other actions.

I have a feeling this is where modularisation fits in well. While Blocks World is just a toy example, it may do for now. The onAB problem has opportunities for modularisation, as well as the ‘final step’ rule which state ‘if a and b are clear, move a to b’. Dropping into modularisation for clear a, the pre-goal for this state will essentially look the same for either action, but will behave differently.

So, clear A. The general rules for each action are:
(move X Y) <= (clear X) (clear Y)
(moveFloor X) <= (clear X) (on X ?) (above X ?)
Each of these need mutations to be optimal rules for clear A:
(move X Y) <= (clear X) (clear Y) (above X a)
(moveFloor X) <= (clear X) (on X ?) (above X a)

Apart from the move case, which isn't always guaranteed, these should always work. And their pre-goal states will lead towards these mutations, as well. Going back to onAB, the last action will always be move, so the moveFloor will simply remain in its general format (and likely be ignored through iterations).

So, the slightly modified strategy now uses multiple pre-goals for actions.

However, looking at the Ms PacMan case, things may be more difficult. Assuming there are only 2 actions (moveTowards and moveFrom), the pre-goals for each of these will likely be in an extremely general form. Typically, the final action will be moving towards a dot, but Ms PacMan may simply happen across the final dot while running from a ghost, or moving towards a safe junction. Each case will look something like this:
moveTowards(X) <= dot(X)
Sp: dot(X), ghost(?), junction(?), thing(X)

moveTowards(X) <= junction(X)
Sp: dot(?), ghost(?), junction(X), thing(X)

Union: Sp: dot(?), ghost(?), junction(?), thing(X)
Which is rather useless. Perhaps the modified Ms PacMan world (still in progress) will be more helpful. Or perhaps the actions need to be dropped down a level to moveToDot(X), moveFromGhost(X)…

I think I’ll continue trying to solve the Blocks World as currently defined before trying to tackle Ms PacMan/StarCraft.