<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Super Sanity &#187; Academic</title>
	<atom:link href="http://super-sanity.com/category/academic/feed/" rel="self" type="application/rss+xml" />
	<link>http://super-sanity.com</link>
	<description>There is no insanity, rather a super sanity</description>
	<lastBuildDate>Wed, 08 Sep 2010 00:09:39 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>PhD Progress: Results Update</title>
		<link>http://super-sanity.com/2010/09/08/phd-progress-results-update/</link>
		<comments>http://super-sanity.com/2010/09/08/phd-progress-results-update/#comments</comments>
		<pubDate>Wed, 08 Sep 2010 00:09:39 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=270</guid>
		<description><![CDATA[The experiment is nearly complete for the Pacman with Population Constant 10, though the other two are still a long way from total completion. Each run has completed at least twice, though the Pop Const 10 has completed 8 runs. Judging by the speed of each experiment, the Pacman Pop Const 30 takes roughly as [...]]]></description>
			<content:encoded><![CDATA[<p>The experiment is nearly complete for the Pacman with Population Constant 10, though the other two are still a long way from total completion. Each run has completed at least twice, though the Pop Const 10 has completed 8 runs.</p>
<p><a href="http://super-sanity.com/2010/09/08/phd-progress-results-update/results-2/" rel="attachment wp-att-271"><img src="http://super-sanity.com/wp-content/uploads/2010/09/Results.jpg" alt="" title="Results" width="655" height="554" class="aligncenter size-full wp-image-271" /></a></p>
<p>Judging by the speed of each experiment, the Pacman Pop Const 30 takes roughly as long as the 50, which is interesting as they share the same results. The question is, which approach is best? The Pop Const 10 does eventually match the other two performances, and will likely level out just as the others did. But it can reach the goal in less time, but more learning iterations. I suppose in the end, all are judged by time, as the number of iterations can be infinite, if necessary. So perhaps this Pop 10 works, but should have a larger number of iterations to learn over.</p>
<p>Note that these results are using purely pre-goal specialisation only, no general rule specialisation. That still needs some work before I am ready to launch an experiment for that.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/09/08/phd-progress-results-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Automatic Environment Learning</title>
		<link>http://super-sanity.com/2010/09/02/phd-progress-automatic-environment-learning/</link>
		<comments>http://super-sanity.com/2010/09/02/phd-progress-automatic-environment-learning/#comments</comments>
		<pubDate>Thu, 02 Sep 2010 02:44:23 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=268</guid>
		<description><![CDATA[I have recently been working on a mutation operator which creates new rules using the known predicates of the environment. For instance, adding cl(X) to a rule not already containing cl(X) but does mention this X. However, a problem this process introduces is the problem of creating rules which are differing duplicates (essentially are the [...]]]></description>
			<content:encoded><![CDATA[<p>I have recently been working on a mutation operator which creates new rules using the known predicates of the environment. For instance, adding cl(X) to a rule not already containing cl(X) but does mention this X. However, a problem this process introduces is the problem of creating rules which are differing duplicates (essentially are the same rule, but are worded differently, such as the on(X,Y) -> B and on(X,Y) &#038; abv(X,Y) -> B case) and creating completely useless rules (on(X,Y) &#038; onFl(X) -> B case). Furthermore, this process can intrtoduce negation, so that needs to be accounted for as well.</p>
<p>I have so far got around this by using existing background knowledge to check rules, and also introducing a new form of background knowledge (well it is still the same) which isn&#8217;t evaluated by the JESS compiler, but is still a valid and legal rule for the environment. However, as I wrote up these rules, I realised that I am essentially telling the agent the dynamics of the environment, which a smarter planning agent could use to achieve its needs. Which isn&#8217;t really a bad idea at all, perhaps something I&#8217;ll check out later. Anyway, I thought of a new agent measure to learn the environment,</p>
<p>Because the agent can spend so much time in the environment, it should be able to learn the dynamics of the environment by itself, and learn which conditions are always together, and which are apart. By allowing the agent to learn the environment, this means that every background knowledge rule doesn&#8217;t have to be declared by the environment designer, only the ones that are required for asserting predicates automatically need be.</p>
<p>The problem this task faces is firstly extra overhead, but as all of the data about the state has already been collected, the agent need only sort and check it against its current beliefs of the environment&#8217;s structure. The second problem is deciding when to stop checking the environment. Because the agent only checks the environment for the first few episodes (for covering purposes), it may find its belief&#8217;s of the environment to be short-sighted. Like many learning mechanisms of the agent, it may just have to be settled after a number of episodes or something and only forcefully checked when the agent covers new rules and maybe for the first few pre-goal states.</p>
<p>This learning mechanism may have the capability of learning major shifts in the environment, but for now it can just learn constant rules for the entire environment.</p>
<p>The mechanism operates by maintaining three lists for each (non-type) condition: sometimes true, never true, and always true (Both, False, True). Whenever the agent encounters a state, it evaluates the conditions and their relation against other conditions (with each condition in simplified variable form). So initially, after one state has been seen, all currently true conditions are in the True list and all other conditions in the environment are in the False list. For each state seen after this, conditions in either list can either stay where they are, or shift to Both list. Eventually, after a number of states, the observed behaviour will be stabilised for X steps (and have seen Y pre-goals), so the agent can actively scanning the state and focus on learning which rules work.</p>
<p>There is a natural bonus to this system as well. I&#8217;m still not 100% sure if it&#8217;s foolproof (I&#8217;m sure any logical programming book will be able to confirm it for me), but the system of implication allows conditions to be more quickly spread across other conditions. For example, highest(X) -> clear(X), and clear(X) -> !on(?,X). Therefore, highest(X) -> !on(?,X). This may just be learned automatically anyway, but it may be beneficial to be aware of this.</p>
<p>This mechanism can be built into the existing covering class (as a separate class) which could be merged with the known ranges member. Ranges could be a problem too&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/09/02/phd-progress-automatic-environment-learning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Project: Bang for your Buck</title>
		<link>http://super-sanity.com/2010/08/20/phd-project-bang-for-your-buck/</link>
		<comments>http://super-sanity.com/2010/08/20/phd-project-bang-for-your-buck/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 02:42:36 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=258</guid>
		<description><![CDATA[The 10 population constant experiment has completed and after plotting it on a graph where the x axis is time, it is shown to be the best strategy. Sure, the 50 population experiment does get to the apparent ~7000 point limit in fewer iterations, it takes much longer. And as for the 30 population constant [...]]]></description>
			<content:encoded><![CDATA[<p>The 10 population constant experiment has completed and after plotting it on a graph where the x axis is time, it is shown to be the best strategy. Sure, the 50 population experiment does get to the apparent ~7000 point limit in fewer iterations, it takes much longer. And as for the 30 population constant one, it hasn&#8217;t really run enough to know yet. The curve of its performance vs. time fits it in right between the 10 and 50 in a roughly proportional manner.</p>
<p>I probably should have saved the image for future reference, but I forgot to. Ah well. A possible strategy to pursue is to have a changing population constant, based on the maturity of the experiment. Perhaps a min of 10 and max of 50. Something for later experimentation. The reason for even thinking of it is that smaller population constants are more likely to have large variances. So a form of annealling, I suppose.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/20/phd-project-bang-for-your-buck/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Bug Fix Improvements</title>
		<link>http://super-sanity.com/2010/08/17/phd-progress-bug-fix-improvements/</link>
		<comments>http://super-sanity.com/2010/08/17/phd-progress-bug-fix-improvements/#comments</comments>
		<pubDate>Mon, 16 Aug 2010 22:55:43 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=251</guid>
		<description><![CDATA[Seems that little bug fix was what was holding the agent back from attaining better results. Of course, the bug wasn&#8217;t always present, I know I introduced it recently, but it&#8217;s good to see that the agent is back on its feet &#8211; even without the presence of useful fromGhost rules. Here is the results [...]]]></description>
			<content:encoded><![CDATA[<p>Seems that little bug fix was what was holding the agent back from attaining better results. Of course, the bug wasn&#8217;t always present, I know I introduced it recently, but it&#8217;s good to see that the agent is back on its feet &#8211; even without the presence of useful fromGhost rules.</p>
<p>Here is the results of the experiment 62.78% (78.39%) into the first run. <ins datetime="2010-08-19T01:19:25+00:00">Note: this has been updated to show later results, also of two experiments.</ins><br />
<a href="http://super-sanity.com/2010/08/17/phd-progress-bug-fix-improvements/results/" rel="attachment wp-att-257"><img src="http://super-sanity.com/wp-content/uploads/2010/08/Results.jpg" alt="" title="Results" width="418" height="387" class="aligncenter size-full wp-image-257" /></a><br />
While these are promising, they come at a high (possibly negotiable) price: time. The experiment has so far taken 94 hours and a half hours. Out of that time, 75 hours is learning time. That&#8217;s almost 4 days of runtime and just over 3 days of learning time, and it&#8217;s not even half complete on the first run. As said in the previous post: this is likely a combination of the changes to the learning rate and the fact that the better PacMan does, the longer it takes. According to the ETA, the first run will be complete in 5 more days&#8230; Assuming 10 days of learning, that&#8217;s 100 days to simply produce some reliable results. 3 and a third months!</p>
<p>So something needs to be done. Symphony, shorter learning algorithm, optimised Ms. PacMan environment, any of them. Otherwise my PhD will be spent waiting on possibly unstable experiments to complete.</p>
<p>The readable policy goes like this:<br />
A typical policy:<br />
(distanceGhost player ?X ?__Num7&#038;:(betweenRange ?__Num7 0.0 10.0)) (pacman player) => (fromGhost ?X ?__Num7) / (distanceGhost player ?X ?__Num7&#038;:(betweenRange ?__Num7 0.0 52.0)) (pacman player) => (fromGhost ?X ?__Num7)<br />
(distancePowerDot player ?X ?__Num1&#038;:(betweenRange ?__Num1 0.0 1.0)) (pacman player) => (toPowerDot ?X ?__Num1) / (distancePowerDot player ?X ?__Num1&#038;:(betweenRange ?__Num1 0.0 50.0)) (pacman player) => (toPowerDot ?X ?__Num1)<br />
(distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)<br />
(distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)<br />
(distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 1.0 13.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3) / (distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 1.0)) (pacman player) => (toDot ?X ?__Num3)<br />
(distancePowerDot player ?X ?__Num2&#038;:(betweenRange ?__Num2 43.0 50.0)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&#038;:(betweenRange ?__Num2 32.75 43.0)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&#038;:(betweenRange ?__Num2 12.25 22.5)) (pacman player) => (fromPowerDot ?X ?__Num2) / (distancePowerDot player ?X ?__Num2&#038;:(betweenRange ?__Num2 2.0 12.25)) (pacman player) => (fromPowerDot ?X ?__Num2)<br />
(junctionSafety ?X ?__Num4&#038;:(betweenRange ?__Num4 -11.0 0.0)) => (toJunction ?X ?__Num4) / (junctionSafety ?X ?__Num4&#038;:(betweenRange ?__Num4 -16.0 -11.0)) => (toJunction ?X ?__Num4)</p>
<p>Note that some rules are not present. Primarily the agent runs from ghosts, though only ghosts 10 units away (smart move, if they are hostile). Then the agent eats powerdots to keep the ghosts pacified. Though the first rule of this slot seems a little useless, as the agent will rarely be 1 unit from a powerdot, and when it is, it&#8217;ll probably be eating it anyway. Perhaps it is used to counterbalance an overarching fromPowerDot rule. Following this is the toDot slot, times three. The agent clearly likes to stack that rule with all three rules of the slot being active. This will result in the agent always pursuing dots, but when dots are 0-13 units away, the agent pursues them with gusto. The fromPowerDot rule seems a little useless; running from distant powerdots, but perhaps it is just how the agent copes with being forced to use that rule (it is likely to disappear with slot removal as it has a selection chance of 0.58). The junctionSafety one doesn&#8217;t seem to be useful either; perhaps the same reason for the previous slot.</p>
<p>When I checked in on the agent previously on the weekend, the fromGhost rule was not the top rule; I think toPowerDot was. Or perhaps toFruit (which is strangely missing &#8211; it&#8217;s selection ratio was too low, probably caused by the fact the rule rarely triggers). So I&#8217;m picking the point in the graph where performance takes off is where fromGhost was most actively used.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/17/phd-progress-bug-fix-improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Ghost Rule Mutation</title>
		<link>http://super-sanity.com/2010/08/16/phd-progress-ghost-rule-mutation/</link>
		<comments>http://super-sanity.com/2010/08/16/phd-progress-ghost-rule-mutation/#comments</comments>
		<pubDate>Sun, 15 Aug 2010 23:01:30 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=250</guid>
		<description><![CDATA[Well, after fixing up the bug in the environment, PacMan is performing better, but is still limited at the ~4000 point mark. The reason for this is likely because there are no useful rules regarding ghost avoidance/chasing. I&#8217;m guessing either the pre-goal was never met by the optimal policy, or the goal was met with [...]]]></description>
			<content:encoded><![CDATA[<p>Well, after fixing up the bug in the environment, PacMan is performing better, but is still limited at the ~4000 point mark. The reason for this is likely because there are no useful rules regarding ghost avoidance/chasing. I&#8217;m guessing either the pre-goal was never met by the optimal policy, or the goal was met with differing states: edible and non-edible ghosts. Furthermore, it seems like I have made it slower. Whether this is a result of it performing better, or because I changed the number of steps required, I am not sure. I have a feeling both may be involved, but at least I can modify the latter. I should still have the old code for it somewhere in the SVN repository.</p>
<p>The pre-goal model is useless for these sorts of heuristic mutations, so I will likely have to introduce a new algorithm for creating mutations: either with or without the existing pre-goal one. Perhaps maintaining a tree of rules, with the covered rule as the base rule and more specialised rules running down from it. This tree needs to be prunable too. This is beginning to look a lot like TG, though.</p>
<p>A secondary problem is the fact that actions not necessarily directly concerned with objects in the domain may still perform better with conditions linked to those objects (e.g. toPowerDot only when ghosts are not edible). If actions didn&#8217;t have delayed effects, this could easily be found by watching the post-action state, but alas, toPowerDot could be active at any point, regardless of PacMan&#8217;s distance to a powerdot. As far as I am concerned at the moment, I cannot see a solution to this problem without expensive state watching. Perhaps just the to/fromGhost actions will have to suffice for now.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/16/phd-progress-ghost-rule-mutation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Doh!</title>
		<link>http://super-sanity.com/2010/08/12/phd-progress-doh/</link>
		<comments>http://super-sanity.com/2010/08/12/phd-progress-doh/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 00:00:16 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=249</guid>
		<description><![CDATA[Dammitall! The PacMan environment has been found to be broken and any agent that passes the first level stands little chance of succeeding from there. So the past 44 and a half hours of experimentation have been for nothing. How irritating. Well, I guess I&#8217;ll just have to restart it. But I will do that [...]]]></description>
			<content:encoded><![CDATA[<p>Dammitall! The PacMan environment has been found to be broken and any agent that passes the first level stands little chance of succeeding from there. So the past 44 and a half hours of experimentation have been for nothing.</p>
<p>How irritating. Well, I guess I&#8217;ll just have to restart it. But I will do that tomorrow, when I have found a solution to decreasing the elite policy size (and hopefully increasing the update value).</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/12/phd-progress-doh/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Successful developments</title>
		<link>http://super-sanity.com/2010/08/11/phd-progress-successful-developments/</link>
		<comments>http://super-sanity.com/2010/08/11/phd-progress-successful-developments/#comments</comments>
		<pubDate>Tue, 10 Aug 2010 23:09:10 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=246</guid>
		<description><![CDATA[Seems that all of my work over the past few weeks is paying off. All the additions of extra learning options and such. Blocks World is now able to complete its onAB learning task in little over 36 minutes (just under 20 minutes learning time) (this includes learning modules as well). The modules it learns [...]]]></description>
			<content:encoded><![CDATA[<p>Seems that all of my work over the past few weeks is paying off. All the additions of extra learning options and such. Blocks World is now able to complete its onAB learning task in little over 36 minutes (just under 20 minutes learning time) (this includes learning modules as well). The modules it learns are compact, neat and valid, consisting of minimal rules (thanks to the slot removal aspect). The convergance property allows the learning to progress quickly along with little over a minute per iteration used.</p>
<p>As for Ms. PacMan, it is still slow, but after 12 (learning) hours, the experiment is 2.7125% along (27.125% iteration). The ruleset is beginning to shape itself to something useful, with handy fromGhost rules (which include conditions for ghost state: aggressive) and toDot rules near the top, and other, less useful rules near the bottom/disappearing. Assuming it continues at this speed, one iteration will take ~45 hours (2 days). Hence the entire experiment takes 20 days, but these can be split up.</p>
<p>Speaking of splitting up, Eibe brought up the possibility of splitting the learning across multiple machines (i.e. Symphony). This could be easily achieved, as the very nature of the experiment allows it to be split. Simply send out X agents operating in their own environments and when they return, sort them in order and update generator. Then repeat. Of course this current system operates in an iterative manner, but the learning should be roughly equal if the update parameter is proprotional to the number of samples.</p>
<p>An alternative to that method is a much larger one which only requires 10 machines, each running entire experiments. Then the results are averaged and stored. But that takes much longer and doesn&#8217;t make full use of the number of processors Symphony has available.</p>
<p>Seems there is still the problem of statistical pre-goal unification which hasn&#8217;t upset the PacMan experiment yet, but is likely to when a pre-goal is created with edible ghosts. I&#8217;ll have top give more thought to sorting that out later.</p>
<p>A testing policy from PacMan:<br />
Policy:<br />
(distanceGhost player ?X ?__Num7&#038;:(betweenRange ?__Num7 2.0 12.666666666666666)) (nonblinking ?X) (aggressive ?X) (pacman player) => (fromGhost ?X ?__Num7)<br />
(distanceFruit player fruit ?__Num9&#038;:(betweenRange ?__Num9 2.0 14.5)) (pacman player) => (toFruit fruit ?__Num9)<br />
(distancePowerDot player ?X ?__Num1&#038;:(betweenRange ?__Num1 0.0 51.0)) (pacman player) => (toPowerDot ?X ?__Num1)<br />
(distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3)<br />
(distancePowerDot player ?X ?__Num2&#038;:(betweenRange ?__Num2 19.5 29.25)) (pacman player) => (fromPowerDot ?X ?__Num2)<br />
(distanceGhostCentre player ?X ?__Num8&#038;:(betweenRange ?__Num8 0.0 13.0)) (pacman player) => (toGhostCentre ?X ?__Num8)<br />
(distanceGhost player ?X ?__Num6&#038;:(betweenRange ?__Num6 34.0 43.0)) (pacman player) => (toGhost ?X ?__Num6)<br />
(junctionSafety ?X ?__Num4&#038;:(betweenRange ?__Num4 -8.0 0.0)) => (toJunction ?X ?__Num4)<br />
(distanceGhostCentre player ?X ?__Num5&#038;:(betweenRange ?__Num5 0.0 52.0)) (pacman player) => (fromGhostCentre ?X ?__Num5)</p>
<p>Clearly from ghost behaviour is most important, along with eating fruit and using the powerdot to keep the ghosts placid. The toDot is all encompassing and will always be active. The fromPowerDot rule will only trigger at a distance, so it has no value. The toGhost rules have little value currently, as they don&#8217;t include the edible attribute, but that is likely because they can&#8217;t be included. The last 2 are practically useless, though the all-compassing fromGhostCentre does lend some defensive behaviour.</p>
<p>Just had a thought about pre-goal mutation and such. Haven&#8217;t fleshed out the possibilities yet, but mutate rules based on what constant elements are present and mutate in relevant conditions seen in conjunction with said elements. So a rule concerning a ghost would mutate in the 4 attributes concerning the ghost: edible, aggressive, blinking or nonblinking. This could also create an opening for negation, allowing me to remove half of those attributes. I&#8217;ll think on it.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/11/phd-progress-successful-developments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Slot Addition/Removal</title>
		<link>http://super-sanity.com/2010/08/05/phd-progress-slot-additionremoval/</link>
		<comments>http://super-sanity.com/2010/08/05/phd-progress-slot-additionremoval/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 00:11:47 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=245</guid>
		<description><![CDATA[With rule selection out of the way, I move on to the next issue of address: slot addition/removal. In some problems, the same action may need to be performed more than once under different circumstances. For example, in Towers of Hanoi, all we can do is move a tile from A to B. But a [...]]]></description>
			<content:encoded><![CDATA[<p>With rule selection out of the way, I move on to the next issue of address: slot addition/removal. In some problems, the same action may need to be performed more than once under different circumstances. For example, in Towers of Hanoi, all we can do is move a tile from A to B. But a single rule is not enough to solve the problem; in fact it will only get us 1, maybe 2 steps. And on the other side of the equation, for the sake of policy simplification, it is better to remove slots we never use than to have them hang around. Sure, in Ms. PacMan, every action has a use, and could potentially have a rule which works for it, but some actions are better left without (toGhostCentre, fromPowerDot for example).</p>
<p>And in a simple case, Blocks World has an action without use: moveFloor. Yes, it is used in modules, but when using modules, it is useless. So for the sake of these environments and possibly future environments, I will implement an optional slot addition/removal system.</p>
<p>The algorithm is something like this:<br />
- Each slot maintains an M value for the base chance of it being used (initially M = 1, M >= 0) and an S value for variance (initially S = say 0.5, 0 <= S <= 0.5).<br />
- The chance of a slot being used depends on a normally sampled value of <em>m</em> = M +- S.<br />
- If <em>m</em> is above 1.0, the slot will be used but not removed from the slot distribution. The value for the not removed slot is equal to <em>m</em> &#8211; 1.0.<br />
- If <em>m</em> is below 1.0, the slot will only be used (and removed from the distribution) if a random number is below <em>m</em>. Otherwise the slot is not used but is removed from the distribution.</p>
<p>For example, a slot has M = 1.0 and S = 0.5. It is drawn with <em>m</em> = 1.4. Hence, the slot is used at least once and a 40% chance of being used twice.</p>
<p>Another example, a slot is drawn with <em>m</em> = 2.3. The slot is used at least twice and a 30% chance of being used thrice.</p>
<p>Another example, a slot is drawn with <em>m</em> = 0.8. The slot only has an 80% chance of being used at all.</p>
<p>During the update procedure, these slot values M and S are updated as well, with M&#8217;s values being updated in a standard step-wise manner using the elite solution count. S&#8217;s value needs to slowly decrease over time if M remains relatively stable. Perhaps a standard deviation for the elite samples can be calculated and used to update S in a step-wise manner. That should suffice.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/05/phd-progress-slot-additionremoval/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Rarely Used Rule Selection Algorithm</title>
		<link>http://super-sanity.com/2010/08/03/phd-progress-rarely-used-rule-selection-algorithm/</link>
		<comments>http://super-sanity.com/2010/08/03/phd-progress-rarely-used-rule-selection-algorithm/#comments</comments>
		<pubDate>Tue, 03 Aug 2010 00:08:11 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=242</guid>
		<description><![CDATA[It is said that the cross-entropy method is good for finding rarely used rules and improve their selection chance, but in a massive ruleset which is already converged, it can be difficult to chance upon them. The algorithm can be changed to modify the probabilities of individual rules being selected by taking into account how [...]]]></description>
			<content:encoded><![CDATA[<p>It is said that the cross-entropy method is good for finding rarely used rules and improve their selection chance, but in a massive ruleset which is already converged, it can be difficult to chance upon them.</p>
<p>The algorithm can be changed to modify the probabilities of individual rules being selected by taking into account how many times a rule has been previously used. So given a distribution of rules D, where each rule has probability P(R) and each rule records how many times it has been used U(R) (this includes times ruls are used but never fired), the probabilities of D can be updated at each step using the following formula:</p>
<p>D&#8217; = forall(R): P&#8217;(R) = if (U(R) < #(R in D) * alpha) ? P(R) * (#(R in D) + 1 - U(R)) * beta : P(R)</p>
<p>This states, if a rule's uses are less than the number of rules in the distribution * alpha (a coefficient), then modify the probability of being selected by the probability * the number of rules in the distribution + 1 - the number of uses * beta (another coefficient).</p>
<p>This would probably work well enough with alpha and beta both set to 1, but some modifications may be needed to properly test each little used rule.</p>
<p>An example distribution:<br />
Rule A: 0.4, U = 5    =>    0.4<br />
Rule B: 0.17, U = 0    =>    0.17 * 5 = 0.85<br />
Rule C: 0.26, U = 1    =>    0.26 * 4 = 1.04<br />
Rule D: 0.17, U = 0    =>    0.17 * 5 = 0.85</p>
<p>Normalised:<br />
Rule A: 0.13<br />
Rule B: 0.27<br />
Rule C: 0.33<br />
Rule D: 0.27</p>
<p>This system attempts to maintain the probabilities of the original distribution but offsets them by the number of uses, which will eventually decrease to the threshold, resulting in a fair and stable distribution again.</p>
<p>I suppose I could ignore existing probabilities and just set the distribution to reflect the inverse uses (until all thresholds are met). But this could result in unfair balancing.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/08/03/phd-progress-rarely-used-rule-selection-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Performance Decrease</title>
		<link>http://super-sanity.com/2010/07/21/phd-progress-performance-decrease/</link>
		<comments>http://super-sanity.com/2010/07/21/phd-progress-performance-decrease/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 21:25:55 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=239</guid>
		<description><![CDATA[Seems the performance of Ms.PacMan has decreased. This is possibly due to the &#8216;broken&#8217; mutation operators used at the moment. Because the unification process was modified such that general rules now contain ranges, the mutation process no longer mutates these ranges because it lacks the capabilities at the moment. Therefore, PacMan achieves less reward. A [...]]]></description>
			<content:encoded><![CDATA[<p>Seems the performance of Ms.PacMan has decreased. This is possibly due to the &#8216;broken&#8217; mutation operators used at the moment. Because the unification process was modified such that general rules now contain ranges, the mutation process no longer mutates these ranges because it lacks the capabilities at the moment. Therefore, PacMan achieves less reward.</p>
<p>A Ms. PacMan experiment was run recently (40 hours in, though half of that was just testing) which has the following results (about 62% completion):<br />
2064.0<br />
3259.2<br />
3441.8<br />
3574.0334<br />
3836.2334<br />
3582.9666<br />
3420.2<br />
3421.8333<br />
3715.6667<br />
3890.8667<br />
3555.1333<br />
3555.1667<br />
3542.5<br />
3654.1667<br />
3400.3667<br />
3484.4<br />
3443.6667<br />
3809.3333<br />
3616.8667<br />
3061.9<br />
2986.3<br />
3770.3667<br />
3762.3667<br />
3142.9333<br />
3247.0334<br />
3266.5<br />
4481.4<br />
4112.433<br />
4163.967<br />
3327.4333<br />
3934.3<br />
4198.533</p>
<p>This is significantly worse than the previous experiment over the regular environment. While there is some initial improvement in the rules, it doesn&#8217;t appear to be increasing very fast and the last few results could simply have been flukes.</p>
<p>The readable generator is:<br />
A typical policy:<br />
(distanceDot player ?X ?__Num3&#038;:(betweenRange ?__Num3 0.0 52.0)) (pacman player) => (toDot ?X ?__Num3)<br />
(distanceGhost player ?X ?__Num7&#038;:(betweenRange ?__Num7 0.0 52.0)) (pacman player) => (fromGhost ?X ?__Num7)<br />
(distancePowerDot player ?X ?__Num1&#038;:(betweenRange ?__Num1 0.0 50.0)) (pacman player) => (toPowerDot ?X ?__Num1)<br />
(distanceGhostCentre player ?X ?__Num5&#038;:(betweenRange ?__Num5 0.0 51.0)) (pacman player) => (fromGhostCentre ?X ?__Num5)<br />
(junctionSafety ?X ?__Num4&#038;:(betweenRange ?__Num4 -16.0 28.0)) => (toJunction ?X ?__Num4)<br />
(distancePowerDot player ?X ?__Num2&#038;:(betweenRange ?__Num2 0.0 50.0)) (pacman player) => (fromPowerDot ?X ?__Num2)<br />
(distanceGhostCentre player ?X ?__Num8&#038;:(betweenRange ?__Num8 0.0 51.0)) (pacman player) => (toGhostCentre ?X ?__Num8)<br />
(distanceGhost player ?X ?__Num6&#038;:(betweenRange ?__Num6 0.0 52.0)) (pacman player) => (toGhost ?X ?__Num6)<br />
(distanceFruit player fruit ?__Num9&#038;:(betweenRange ?__Num9 0.0 52.0)) (pacman player) => (toFruit fruit ?__Num9)</p>
<p>Obviously, toDot behaviour is best, followed by fromGhost behaviour. Normally, I&#8217;d prefer fromGhost to be highest weighted, but because it only contains one (or two) all-encompassing rules, it would mean the agent spends most of its time cowering in the corner. ToPowerDot is above fromPowerDot, generally a good choice, and toGhost and toGhostCentre are both lowly weighted. Strangely, toFruit is quite low. Possibly because it doesn&#8217;t turn up so much? You know, the fruit may be making all the difference between the scores from this run and the previous one.</p>
<p>On the bright side, the faster optimisation seems to be working, with this run due to be completed in about 24 hours, making a total time of 63 hours.</p>
<p>I need to fix this range mutation and also to fix the mutation towards useful to/fromGhost rules that include the binary attributes edible/aggressive. Furthermore, the modularisation for learning clear still hasn&#8217;t been 100% solved, as sub-optimal rules are being chosen as the best.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/07/21/phd-progress-performance-decrease/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
