<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Super Sanity</title>
	<atom:link href="http://super-sanity.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://super-sanity.com</link>
	<description>There is no insanity, rather a super sanity</description>
	<lastBuildDate>Tue, 09 Mar 2010 00:37:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>PhD Progress: The future&#8230;</title>
		<link>http://super-sanity.com/2010/03/09/phd-progress-the-future/</link>
		<comments>http://super-sanity.com/2010/03/09/phd-progress-the-future/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 00:37:00 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=208</guid>
		<description><![CDATA[Progress is going well on the new system, and it appears to be performing well. I just today finished up the mutations and will be moving onto modularisation, which is likely to cause problems. Still, once I have that resolved, I&#8217;ll need to move onto other directions.
Sure, there&#8217;s all the experiments I can run, like [...]]]></description>
			<content:encoded><![CDATA[<p>Progress is going well on the new system, and it appears to be performing well. I just today finished up the mutations and will be moving onto modularisation, which is likely to cause problems. Still, once I have that resolved, I&#8217;ll need to move onto other directions.</p>
<p>Sure, there&#8217;s all the experiments I can run, like observing the effect of population size/elite percentage, and there&#8217;s getting Ms. PacMan and StarCraft running, but those will likely only take up 2010, or less. So the problem now lies in what I can do in the future.</p>
<p>One area is to try to remove the &#8216;bonuses&#8217; the agent is given for it&#8217;s learning (valid actions, initial pre-goal, known goal state). Each of these can safely be removed, but it makes things much more difficult for the agent, as in all cases it has to learn the dynamics of the environment by itself, probably with an alternative learner (TG or RIBS, etc).</p>
<p>Another area is to try and &#8216;commercialise&#8217; the agent. In the sense that WEKA is a machine learning suite, the agent and environment could be loaded from GUI form and run, allowing the user to specify the necessary information (and be allowed the option of leaving out some things, as mentioned above). This could also tie into teacher/student learning, allowing the user to interrupt the agent and teach it what to do at a given point.</p>
<p>If I was ever to get stuck, I&#8217;m sure I could just look back at my posts from earlier in the research to attempt to solve some of the earlier dreams. It would be really cool if I could get a robot to play around with.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/03/09/phd-progress-the-future/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Mutation working</title>
		<link>http://super-sanity.com/2010/03/08/phd-progress-mutation-working/</link>
		<comments>http://super-sanity.com/2010/03/08/phd-progress-mutation-working/#comments</comments>
		<pubDate>Sun, 07 Mar 2010 23:58:25 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=207</guid>
		<description><![CDATA[It appears that single step mutation is fully operational. There do appear to be some bugs here and there, but I should be able to find them eventually.
Single step mutation is where we only have one batch of mutants, so rules aren&#8217;t continually mutated. Further steps could result from mutating extra rules every CE update [...]]]></description>
			<content:encoded><![CDATA[<p>It appears that single step mutation is fully operational. There do appear to be some bugs here and there, but I should be able to find them eventually.</p>
<p>Single step mutation is where we only have one batch of mutants, so rules aren&#8217;t continually mutated. Further steps could result from mutating extra rules every CE update iteration, where we sample the distribution N times (where N is equal to the number of rules in the slot).</p>
<p>A further problem is re-covering, where it isn&#8217;t necessary. This could be fixed by appending the general rules (in sampled order at the end of every policy, as default go-to rules should the more specific ones fail.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/03/08/phd-progress-mutation-working/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Differing pre-goal actions</title>
		<link>http://super-sanity.com/2010/02/26/phd-progress-differing-pre-goal-actions/</link>
		<comments>http://super-sanity.com/2010/02/26/phd-progress-differing-pre-goal-actions/#comments</comments>
		<pubDate>Fri, 26 Feb 2010 00:07:36 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=206</guid>
		<description><![CDATA[Working on forming a general description of the pre-goal state and it is raising issues. In Blocks World, it is all fine and good, because for each of the 3 main goals, the final action is the same, therefore there are no issues in generalising the pre-goal to the action (simply swap all variables for [...]]]></description>
			<content:encoded><![CDATA[<p>Working on forming a general description of the pre-goal state and it is raising issues. In Blocks World, it is all fine and good, because for each of the 3 main goals, the final action is the same, therefore there are no issues in generalising the pre-goal to the action (simply swap all variables for those present in the action). Constants are another issue, but they shouldn&#8217;t be conflicting &#8211; they are simply a matter of holding onto constants in the pre-goal unless forced to generalise.</p>
<p>The problem lies in the clear(a) case. In the clear(a) pre-goal, the state is: on(X,a), clear(X), clear(Y), etc. However, the problem can be solved by either move(X,Y) or moveFloor(X). The fact that X happens to be the same block in either action is simply circumstance on how the actions are defined.</p>
<p>The only way that I can see this being solved is to have separate pre-goals for each action predicate. Hmm, this actually makes sense anyway, as each action will need to be mutated towards it&#8217;s particular pre-goal anyway (if it exists). So this also restricts the mutation operator by only allowing mutations on pre-goals that exist for that action. Unfortunately, restricting mutation could be a problem because while an action may be ideal for the final step, the intermediate steps may require other actions.</p>
<p>I have a feeling this is where modularisation fits in well. While Blocks World is just a toy example, it may do for now. The onAB problem has opportunities for modularisation, as well as the &#8216;final step&#8217; rule which state &#8216;if a and b are clear, move a to b&#8217;. Dropping into modularisation for clear a, the pre-goal for this state will essentially look the same for either action, but will behave differently.</p>
<p>So, clear A. The general rules for each action are:<br />
(move X Y) <= (clear X) (clear Y)<br />
(moveFloor X) <= (clear X) (on X ?) (above X ?)<br />
Each of these need mutations to be optimal rules for clear A:<br />
(move X Y) <= (clear X) (clear Y) (above X a)<br />
(moveFloor X) <= (clear X) (on X ?) (above X a)</p>
<p>Apart from the move case, which isn't always guaranteed, these should always work. And their pre-goal states will lead towards these mutations, as well. Going back to onAB, the last action will <em>always</em> be move, so the moveFloor will simply remain in its general format (and likely be ignored through iterations).</p>
<p>So, the slightly modified strategy now uses multiple pre-goals for actions.</p>
<p>However, looking at the Ms PacMan case, things may be more difficult. Assuming there are only 2 actions (moveTowards and moveFrom), the pre-goals for each of these will likely be in an extremely general form. Typically, the final action will be moving towards a dot, but Ms PacMan may simply happen across the final dot while running from a ghost, or moving towards a safe junction. Each case will look something like this:<br />
moveTowards(X) <= dot(X)<br />
Sp: dot(X), ghost(?), junction(?), thing(X)</p>
<p>moveTowards(X) <= junction(X)<br />
Sp: dot(?), ghost(?), junction(X), thing(X)</p>
<p>Union: Sp: dot(?), ghost(?), junction(?), thing(X)<br />
Which is rather useless. Perhaps the modified Ms PacMan world (still in progress) will be more helpful. Or perhaps the actions need to be dropped down a level to moveToDot(X), moveFromGhost(X)&#8230;</p>
<p>I think I&#8217;ll continue trying to solve the Blocks World as currently defined before trying to tackle Ms PacMan/StarCraft.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/02/26/phd-progress-differing-pre-goal-actions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Covering from General to Specific, the Pseudo-Algorithm</title>
		<link>http://super-sanity.com/2010/01/28/phd-progress-covering-from-general-to-specific-the-pseudo-algorithm/</link>
		<comments>http://super-sanity.com/2010/01/28/phd-progress-covering-from-general-to-specific-the-pseudo-algorithm/#comments</comments>
		<pubDate>Wed, 27 Jan 2010 23:38:30 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=205</guid>
		<description><![CDATA[An algorithm of sorts describing the covering (and mutating process)

Agent is at (S, A(S)) with no rules firing in its policy.
Cover rules R:C->A for each A in A(S). Some or all of the rules may be at maximum generality (this is known).
At every state, attempt to further generalise the rules not at maximum generality until [...]]]></description>
			<content:encoded><![CDATA[<p>An algorithm of sorts describing the covering (and mutating process)</p>
<ol>
<li>Agent is at (S, A(S)) with no rules firing in its policy.</li>
<li>Cover rules R:C->A for each A in A(S). Some or all of the rules may be at maximum generality (this is known).</li>
<li>At every state, attempt to further generalise the rules not at maximum generality until all rules are at maximum generality.</li>
<li>For every rule that reaches maximum generality, create mutated specialisations of the rule using the pre-goal state (<strong>one complete trace is given to the agent at the start of the experiment as a teacher</strong>).</li>
<li>Continue running the agent through the environment (episodes), noting down each pre-goal state and using it to form a general description of the minimal pre-goal state. Also be sure to note the agent&#8217;s final action taken.</li>
<li>Once the pre-goal minimal state is known/settled (no changes to the minimal state for X iterations), mutate further specialisations of the maximally general rules, removing existing mutations (from the first pre-goal) that aren&#8217;t possible mutations for the minimal pre-foal state.</li>
<li>For the action, if the action taken by the agent is always the same (same predicate and constant terms), then create a rule which matches the relevant minimal pre-goal facts for the action (e.g. if the action = on(a,b), then the conditions for the rule are all predicates concerning &#8216;a&#8217; and &#8216;b&#8217;). This rule should be perfect, so fix it in a slot.</li>
<li>If the action isn&#8217;t the same, but is of the same predicate, then it is clearly a general action concerning particular predicates. If the action is general, then clearly the pre-goal states are general and both need to be generalised. See further notes)</li>
</ol>
<p>Regarding unifying pre-goal states into a minimal state: When the states are stored and unified, the covering unifier need only note constants present in the goal. So for the onAB case, we only need to check they&#8217;re under the same predicates in the pre-goal (clear). If they&#8217;re not (some other case), well, I&#8217;m not sure. Generalise them?</p>
<p>In the other case, where goals have no constants, the pre-goal states are stored as a bunch of anonymous variables, and a few variables. The variables are the terms used in the action (where the constants are swapped for variables). The rest of the terms in the state are swapped for anonymous variables.</p>
<p><strong>Question!</strong> Is it necessary to store groups (i.e. the number) of facts which all have the same fact and all use anonymous variables? Probably not. It would be best not to store these groups as the rules developed need to be general and not to care about whether there are 5 anonymous blocks clear when there are only four. It probably wouldn&#8217;t matter anyway, as the rules created will likely disregard these anonymous variables anyway and only concern themselves with the variables or constants.</p>
<p>So basically, when a pre-goal state is stored, every term in the state is generalised to an anonymous variable except terms used in the final action and terms in the goal. However, during unification, if these non-anonymous terms don&#8217;t unify, they become anonymous (unless the cause of non-unification is that the fact predicate doesn&#8217;t exist in one pre-goal but does in another).</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/28/phd-progress-covering-from-general-to-specific-the-pseudo-algorithm/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Covering Algorithm Refinement</title>
		<link>http://super-sanity.com/2010/01/25/phd-progress-covering-algorithm-refinement/</link>
		<comments>http://super-sanity.com/2010/01/25/phd-progress-covering-algorithm-refinement/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 00:05:17 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=204</guid>
		<description><![CDATA[It seems like I may need modularisation to be introduced now. The specialisation step of the rule-generating covering algorithm requires more than just random mutations of states, as there are too many states to possibly cover from.
The covering algorithm will generate (in one-to-three covering steps) on(X,_) &#038; clear(X) -> moveFloor(X) and clear(X) &#038; clear(Y) -> [...]]]></description>
			<content:encoded><![CDATA[<p>It seems like I may need modularisation to be introduced now. The specialisation step of the rule-generating covering algorithm requires more than just random mutations of states, as there are too many states to possibly cover from.</p>
<p>The covering algorithm will generate (in one-to-three covering steps) <code>on(X,_) &#038; clear(X) -> moveFloor(X)</code> and <code>clear(X) &#038; clear(Y) -> move(X,Y)</code>.</p>
<p>An example of the algorithm working: unstack goal. Note that the moveFloor rule is optimal for unstack, so no specialisation is necessary (it could also be highest(X), but either rule works fine).</p>
<p>The stack goal is slightly more tricky, in that it requires Y in the move action to also be highest(Y). This can be achieved by noting the pre-goal state. While the agent may not recognise the exact predicate needed, it will generate three specialisations: highest(Y), on(Y,_), above(Y,_).</p>
<p>A much harder goal is onAB, which introduces constants as specialisations. Ideally, we want the specialisations to swap X for a and Y for b in the move rule. However, this could take more than one specialisation step (probably 2). Even then, once we have reached this point, the policy will not be optimal. Though it will evetually solve the problem, it will do so by unstacking all or most blocks first with the general moveFloor rule (assuming the general moveFloor rule is in the policy).</p>
<p>Here is where modularisation comes in. Modular rules allow conditions to be achieved over constants. achieve(clear(a)) achieves the goal of clearing a. So when specialisations generate constant preconditions, we need to arc off into learning to achieve that module and then come back, armed with that knowledge.</p>
<p>A problem with modules is that two modules may conflict (i.e. achieve(clear(a)) and achieve(clear(b))). For example, while clearing a, we may cover b. Modules need to be able to be combined to achieve both at an optimal rate.</p>
<p>The optimal clearA goal rule is: above(X,a) &#038; clear(X) -> moveFloor(X). While in this case, the moveFloor(X) will always let X remain clear, making conflicts impossible, in other cases this may not be true. Let&#8217;s say an alternative optimal clearing rule is: above(X,a) &#038; clear(X) &#038; clear(Y) -> move(X,Y). This could cause clashes if Y was say, b.</p>
<p>A simple combination of the two would be to join the rules together in a policy. This still has the possibility of messing up though. Because the clear rule can be defined both by move and moveFloor, I can only hope that both rules are present in the final fixation of the goal policy.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/25/phd-progress-covering-algorithm-refinement/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Old People are Rude</title>
		<link>http://super-sanity.com/2010/01/24/old-people-are-rude/</link>
		<comments>http://super-sanity.com/2010/01/24/old-people-are-rude/#comments</comments>
		<pubDate>Sat, 23 Jan 2010 23:55:23 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[Mind of Me]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=203</guid>
		<description><![CDATA[Seems the old people were out in full force today as I went down to my local shopping block (they must be shopping after church). And from today, coupled with experience from other harrowing encounters, I have surmised that in general, old people are quite rude.
My first encounter with them was when I was driving [...]]]></description>
			<content:encoded><![CDATA[<p>Seems the old people were out in full force today as I went down to my local shopping block (they must be shopping after church). And from today, coupled with experience from other harrowing encounters, I have surmised that in general, old people are quite rude.</p>
<p>My first encounter with them was when I was driving about the Warehouse carpark. Generally in a carpark, pedestrians stay out of the middle of the driving lanes, where cars go, but old people seem to completely disregard this fact, ambling towards me as I cruised about for a park. The normal reaction for a pedestrian to perform when a car is coming towards them is to get out of the way. But this old couple seemed to be completely oblivious towards my car&#8217;s presence. Something had grabbed the old dude&#8217;s attention of to the left and he was fixated on it, ignoring the fact he was cutting off half of the lane with his aging body.</p>
<p>I figured he was just a stupid old person, the only one I should encounter today. Unfortunately, the carpark was fuller than usual, increasing the proportion of mindless old people. Cursing at him, I parked my car and went into the Warehouse for my supplies.</p>
<p>I should have noticed the number of wrinkled old people entering and exiting the store with me, but it wasn&#8217;t until I got into line that my frustration rose. Normally I make a point of avoiding lines in which people are trying to purchase clothing, as they tend to take a while. But I didn&#8217;t notice the short elderly lady with some nasty looking shirts in front of me in what I thought was a short and quick line. Something in their withered minds must make them regard time as so much less precious. Perhaps it&#8217;s because they&#8217;ve seen so much, or perhaps it&#8217;s because everything took longer &#8216;in their day.&#8217; But something about old people and buying clothes causes them to negotiate and haggle over fixed store prices because &#8217;something&#8217;s missing&#8217;, or &#8216;there&#8217;s a bright tag on that there tunic.&#8217;</p>
<p>This particular old lady was fussed because she had two identical cyan shirts, but one was missing the singlet so she wanted a price reduction. I swear these people purposely search for these sorts of things; missing singlets, broken labels, etc. During the whole ordeal, she briefly turns back to me and utters &#8220;sorry.&#8221; Fuck you, you hag. You ain&#8217;t sorry. You&#8217;re happy you&#8217;ve managed to keep the poor checkout operator busy for as long as you did. Perhaps the crone craves the interaction, the feeling of power over the store workers, knowing that as the consumer, she&#8217;s always right. &#8220;I demand respect because I have lived longer!&#8221; Well that doesn&#8217;t mean you can disrespect everyone else.</p>
<p>Getting through my transaction at a fraction of the time, I zipped away to the other store, thankfully free of anyone. But the next one, a veggie shop was unfortunately quite busy. I grabbed my goods in short order and proceeded to wait in line. But who should upset the sanctity of the line? None other than some granny with a trolley who happened to be finished with her shop and decided that she&#8217;ll be next to be served. She did all this while walking past the line, glancing at the befuddled group of Asians at the head of the line who were unsure of what they should do. What&#8217;s worse is at the end of her transaction, and she was presented with the total, she almost sounded like she was going to debate the cost. &#8220;$18.70?!&#8221; Don&#8217;t you fucking dare dispute that, or I&#8217;ll destroy you. After realising they had been wronged, the Asians moved closer to the counter, ensuring they were next.</p>
<p>As the transactions were processed, more people started to line up. Well, lines up, as there was more tahn  one line forming, thanks to other old people starting their own line just behind the Asians. As I was originally behind them, I was right annoyed at these fuckers, but thankfully the checkout server was aware I was next and motioned me over. But the first old person in the impromptu &#8216;line&#8217; was not happy with this and made a point of standing right next to me as I went through the transaction, fixing me with an evil glare as though she had been wronged.</p>
<p>At the end of all this, I figure this: sure young people can be insolent and rude, but old people are manipulatively rude. They do things which are flat out crimes against social law, but act as though they are above it. Perhaps they simply forget social laws, or perhaps they are envious and want to piss off those who still have full control of their bladder.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/24/old-people-are-rude/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Maximal covering complete</title>
		<link>http://super-sanity.com/2010/01/22/phd-progress-maximal-covering-complete/</link>
		<comments>http://super-sanity.com/2010/01/22/phd-progress-maximal-covering-complete/#comments</comments>
		<pubDate>Fri, 22 Jan 2010 03:43:37 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=202</guid>
		<description><![CDATA[I have finished the code for covering a state when no rules fire. It appears to be quite stable and usually  always finds the most general rules. However, this can also be a problem. Because the most general conditions for an action (i.e. the basic action preconditions) may not be the best rules for [...]]]></description>
			<content:encoded><![CDATA[<p>I have finished the code for covering a state when no rules fire. It appears to be quite stable and usually  always finds the most general rules. However, this can also be a problem. Because the most general conditions for an action (i.e. the basic action preconditions) may not be the best rules for the job (for example stack requires highest(X), which is above the basic condition of clear(X)).</p>
<p>Not only is there a problem in generality, but also in goal constants (onAB). The onAB problem requires that the constant terms a and b are present in the rules to have an optimal policy and as of yet, the algorithm doesn&#8217;t have these. I feel like I&#8217;ve addressed this problem before but I can&#8217;t find where.</p>
<p>A possibility for heuristical specification is to note the pre-goal achieved state and perform a direct specification, or perhaps a number of specifications on the rule (creating a bunch of mutations). The rule can then be removed, as at least one of the children should be on the right track towards an optimal rule.</p>
<p>Another issue is that of covering which does not find the most general rule. For instance, the generated rules would sometimes create rules which included the onFloor(X) (tied to clear(X)) condition simply because the actions it used to cover all dealt with rules on the floor. The only solution to this I can currently think of is finding the actions proposed by the rule and crossing them with the valid actions of the same type, and ensuring they match. If the valid actions have more actions than the rule predicts, then the rule is not general enough. But this could undo the progress made with the above paragraph&#8217;s specialisation mutation.</p>
<p>Perhaps during covering creation the unification needs to occur over ALL rules, not just until it senses no change. This still has a small chance of creating onFloor(X) rules. Maybe mark which rules have attained maximum generality and which haven&#8217;t and for every rule that hasn&#8217;t attained maximum generality, cover it until it does (or has seen enough states, as it may be impossible to attain maximum generality). Using this marking system, rules which are mutants of maximally general rules can avoid being re-generalised.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/22/phd-progress-maximal-covering-complete/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Covering non-necessary terms</title>
		<link>http://super-sanity.com/2010/01/22/phd-progress-covering-non-necessary-terms/</link>
		<comments>http://super-sanity.com/2010/01/22/phd-progress-covering-non-necessary-terms/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 21:55:25 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=201</guid>
		<description><![CDATA[When covering an action from the state, it is likely that non-necessary terms (with regards to the action in question) will also be in the rule (such as on(X,Z), where Z isn&#8217;t part of the action). An example of such a rule is:
on(a,c) &#038; cl(a) &#038; on(b,d) &#038; cl(b) -> move(a,b)
Note that &#8216;c&#8217; and &#8216;d&#8217; [...]]]></description>
			<content:encoded><![CDATA[<p>When covering an action from the state, it is likely that non-necessary terms (with regards to the action in question) will also be in the rule (such as on(X,Z), where Z isn&#8217;t part of the action). An example of such a rule is:<br />
<code>on(a,c) &#038; cl(a) &#038; on(b,d) &#038; cl(b) -> move(a,b)</code></p>
<p>Note that &#8216;c&#8217; and &#8216;d&#8217; aren&#8217;t part of the action so when the action is inversely subsumed, these need to be put into variables of sorts. However, if they were given concrete named variables, they may cause problems when unifying later, as the variables would need to be checked if they can sumsume one-another.</p>
<p>A possible fix for this (though perhaps too rough), is to swap these variables with special terms &#8216;_&#8217; (don&#8217;t care, again seen in FOXCS). These don&#8217;t care terms can be anything (within type constraints) and do not necessarily have to be inequal to one-another &#8216;_(1)&#8217; can equal &#8216;_(2)&#8217;. However, on creation of the final rule for the action, these &#8216;_&#8217; terms need to be concretely put into variable terms, though again without the inequals predicate bounding them from each other.</p>
<p>Currently this issue isn&#8217;t a big one, but this is mostly due to the small domain size and lack of background predicates being included in the observations.</p>
<p>The process of inversely substituting terms could be performed with Replacements, but I have a feeling it may proceed more smoothly by using writing the actions in string form and using the existing rule parser to create the rules. Strings hold advantage by allowing simple equality checks, omission of state terms and dealing with state terms, and special behaviour when parsing the rule (&#8216;_&#8217; term).</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/22/phd-progress-covering-non-necessary-terms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Valid Actions Problem</title>
		<link>http://super-sanity.com/2010/01/14/phd-progress-valid-actions-problem/</link>
		<comments>http://super-sanity.com/2010/01/14/phd-progress-valid-actions-problem/#comments</comments>
		<pubDate>Thu, 14 Jan 2010 03:23:37 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=199</guid>
		<description><![CDATA[Like the FOXCS paper, this algorithm will be using an observation containing the valid actions for the state. This is all fine and good in Blocks World, but in larger worlds like StarCraft or PacMan, this may be problematic. The algorithm would be required to compute every possible action, which is near infinite in the [...]]]></description>
			<content:encoded><![CDATA[<p>Like the FOXCS paper, this algorithm will be using an observation containing the valid actions for the state. This is all fine and good in Blocks World, but in larger worlds like StarCraft or PacMan, this may be problematic. The algorithm would be required to compute every possible action, which is near infinite in the StarCraft domain. Perhaps the action set needs to be bounded somehow.</p>
<p>I feel like I&#8217;ve touched on this problem before, but I cannot recall where or when. Ergh, it&#8217;s too hot for thinking today&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/14/phd-progress-valid-actions-problem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PhD Progress: Scrapping Bernoulli Distributions</title>
		<link>http://super-sanity.com/2010/01/11/phd-progress-scrapping-bernoulli-distributions/</link>
		<comments>http://super-sanity.com/2010/01/11/phd-progress-scrapping-bernoulli-distributions/#comments</comments>
		<pubDate>Sun, 10 Jan 2010 22:14:54 +0000</pubDate>
		<dc:creator>Sam</dc:creator>
				<category><![CDATA[PhD Project]]></category>

		<guid isPermaLink="false">http://super-sanity.com/?p=198</guid>
		<description><![CDATA[With the extensive changes to the system comes new ideas and methods of doing things. Because the policy will look radically different to the previous policies, it will need an alternative method of creation.
In the old CE system, the policy was of fixed size, with each slot containing a large number of random rules to [...]]]></description>
			<content:encoded><![CDATA[<p>With the extensive changes to the system comes new ideas and methods of doing things. Because the policy will look radically different to the previous policies, it will need an alternative method of creation.</p>
<p>In the old CE system, the policy was of fixed size, with each slot containing a large number of random rules to optimise. In the new system (still utilising CE), the policy is of an adaptive size, initially sized at the number of actions in the state specification. Each slot in this adaptive policy contains initially no rules, but these will be filled with rules covered from the environment, though it is likely there won&#8217;t be a large number. Also, each slot is bound by an action, so each rule within leads to the same action.</p>
<p>A problem with an adaptive policy using action bound slots is that it has no order of rules, so deterministic policies will fail if a bad rule is at the top. A possible ordering is to arrange the rules in order of specificity, such that the most specific rules are checked first. This still may cause problems, as a general rule may never be checked, even if it is right for the job. And this fault, combined with a Bernoulli distribution may result in useful slots being turned off.</p>
<p>The CE distribution can still be utilised for slot ordering by weighting the usefulness of a slot and creating a policy by sampling from the slot distribution in the creation of an ordered policy. Note that a slot may only occur once in a policy, so sampling is done via removal. So initially every slot (aka every action) has an equal chance of being selected for the top of the policy, but as weights are changed (through updates of firing rules), more useful slots will be placed at the top, allowing useful actions to be quickly evaluated. This strategy will still result in every slot being used, but because there is likely to be a low number of overall slots, it shouldn&#8217;t be an issue. Perhaps slots with probability < epsilon are discarded, and slots with probability > 1 &#8211; epsilon are fixed.</p>
<p>Something to note is while every slot will be present in the policy, updates will still influence particular slots over one-another. this is achieved by only looking at which slots (and rules) were used in experimentation. Otherwise nothing would be updated.</p>
<p>A future problem is how to deal with the dynamics of covering and updates. However, I will relegate this to later thought, when I&#8217;m at that stage in the code.</p>
]]></content:encoded>
			<wfw:commentRss>http://super-sanity.com/2010/01/11/phd-progress-scrapping-bernoulli-distributions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
