PhD Progress: Covering from General to Specific, the Pseudo-Algorithm
Posted by Sam on January 28th, 2010 under PhD Project • No Comments
An algorithm of sorts describing the covering (and mutating process)
- Agent is at (S, A(S)) with no rules firing in its policy.
- Cover rules R:C->A for each A in A(S). Some or all of the rules may be at maximum generality (this is known).
- At every state, attempt to further generalise the rules not at maximum generality until all rules are at maximum generality.
- For every rule that reaches maximum generality, create mutated specialisations of the rule using the pre-goal state (one complete trace is given to the agent at the start of the experiment as a teacher).
- Continue running the agent through the environment (episodes), noting down each pre-goal state and using it to form a general description of the minimal pre-goal state. Also be sure to note the agent’s final action taken.
- Once the pre-goal minimal state is known/settled (no changes to the minimal state for X iterations), mutate further specialisations of the maximally general rules, removing existing mutations (from the first pre-goal) that aren’t possible mutations for the minimal pre-foal state.
- For the action, if the action taken by the agent is always the same (same predicate and constant terms), then create a rule which matches the relevant minimal pre-goal facts for the action (e.g. if the action = on(a,b), then the conditions for the rule are all predicates concerning ‘a’ and ‘b’). This rule should be perfect, so fix it in a slot.
- If the action isn’t the same, but is of the same predicate, then it is clearly a general action concerning particular predicates. If the action is general, then clearly the pre-goal states are general and both need to be generalised. See further notes)
Regarding unifying pre-goal states into a minimal state: When the states are stored and unified, the covering unifier need only note constants present in the goal. So for the onAB case, we only need to check they’re under the same predicates in the pre-goal (clear). If they’re not (some other case), well, I’m not sure. Generalise them?
In the other case, where goals have no constants, the pre-goal states are stored as a bunch of anonymous variables, and a few variables. The variables are the terms used in the action (where the constants are swapped for variables). The rest of the terms in the state are swapped for anonymous variables.
Question! Is it necessary to store groups (i.e. the number) of facts which all have the same fact and all use anonymous variables? Probably not. It would be best not to store these groups as the rules developed need to be general and not to care about whether there are 5 anonymous blocks clear when there are only four. It probably wouldn’t matter anyway, as the rules created will likely disregard these anonymous variables anyway and only concern themselves with the variables or constants.
So basically, when a pre-goal state is stored, every term in the state is generalised to an anonymous variable except terms used in the final action and terms in the goal. However, during unification, if these non-anonymous terms don’t unify, they become anonymous (unless the cause of non-unification is that the fact predicate doesn’t exist in one pre-goal but does in another).
PhD Progress: Covering Algorithm Refinement
Posted by Sam on January 25th, 2010 under PhD Project • No Comments
It seems like I may need modularisation to be introduced now. The specialisation step of the rule-generating covering algorithm requires more than just random mutations of states, as there are too many states to possibly cover from.
The covering algorithm will generate (in one-to-three covering steps) on(X,_) & clear(X) -> moveFloor(X) and clear(X) & clear(Y) -> move(X,Y).
An example of the algorithm working: unstack goal. Note that the moveFloor rule is optimal for unstack, so no specialisation is necessary (it could also be highest(X), but either rule works fine).
The stack goal is slightly more tricky, in that it requires Y in the move action to also be highest(Y). This can be achieved by noting the pre-goal state. While the agent may not recognise the exact predicate needed, it will generate three specialisations: highest(Y), on(Y,_), above(Y,_).
A much harder goal is onAB, which introduces constants as specialisations. Ideally, we want the specialisations to swap X for a and Y for b in the move rule. However, this could take more than one specialisation step (probably 2). Even then, once we have reached this point, the policy will not be optimal. Though it will evetually solve the problem, it will do so by unstacking all or most blocks first with the general moveFloor rule (assuming the general moveFloor rule is in the policy).
Here is where modularisation comes in. Modular rules allow conditions to be achieved over constants. achieve(clear(a)) achieves the goal of clearing a. So when specialisations generate constant preconditions, we need to arc off into learning to achieve that module and then come back, armed with that knowledge.
A problem with modules is that two modules may conflict (i.e. achieve(clear(a)) and achieve(clear(b))). For example, while clearing a, we may cover b. Modules need to be able to be combined to achieve both at an optimal rate.
The optimal clearA goal rule is: above(X,a) & clear(X) -> moveFloor(X). While in this case, the moveFloor(X) will always let X remain clear, making conflicts impossible, in other cases this may not be true. Let’s say an alternative optimal clearing rule is: above(X,a) & clear(X) & clear(Y) -> move(X,Y). This could cause clashes if Y was say, b.
A simple combination of the two would be to join the rules together in a policy. This still has the possibility of messing up though. Because the clear rule can be defined both by move and moveFloor, I can only hope that both rules are present in the final fixation of the goal policy.
Old People are Rude
Posted by Sam on January 24th, 2010 under PhD Project • 1 Comment
Seems the old people were out in full force today as I went down to my local shopping block (they must be shopping after church). And from today, coupled with experience from other harrowing encounters, I have surmised that in general, old people are quite rude.
My first encounter with them was when I was driving about the Warehouse carpark. Generally in a carpark, pedestrians stay out of the middle of the driving lanes, where cars go, but old people seem to completely disregard this fact, ambling towards me as I cruised about for a park. The normal reaction for a pedestrian to perform when a car is coming towards them is to get out of the way. But this old couple seemed to be completely oblivious towards my car’s presence. Something had grabbed the old dude’s attention of to the left and he was fixated on it, ignoring the fact he was cutting off half of the lane with his aging body.
I figured he was just a stupid old person, the only one I should encounter today. Unfortunately, the carpark was fuller than usual, increasing the proportion of mindless old people. Cursing at him, I parked my car and went into the Warehouse for my supplies.
I should have noticed the number of wrinkled old people entering and exiting the store with me, but it wasn’t until I got into line that my frustration rose. Normally I make a point of avoiding lines in which people are trying to purchase clothing, as they tend to take a while. But I didn’t notice the short elderly lady with some nasty looking shirts in front of me in what I thought was a short and quick line. Something in their withered minds must make them regard time as so much less precious. Perhaps it’s because they’ve seen so much, or perhaps it’s because everything took longer ‘in their day.’ But something about old people and buying clothes causes them to negotiate and haggle over fixed store prices because ’something’s missing’, or ‘there’s a bright tag on that there tunic.’
This particular old lady was fussed because she had two identical cyan shirts, but one was missing the singlet so she wanted a price reduction. I swear these people purposely search for these sorts of things; missing singlets, broken labels, etc. During the whole ordeal, she briefly turns back to me and utters “sorry.” Fuck you, you hag. You ain’t sorry. You’re happy you’ve managed to keep the poor checkout operator busy for as long as you did. Perhaps the crone craves the interaction, the feeling of power over the store workers, knowing that as the consumer, she’s always right. “I demand respect because I have lived longer!” Well that doesn’t mean you can disrespect everyone else.
Getting through my transaction at a fraction of the time, I zipped away to the other store, thankfully free of anyone. But the next one, a veggie shop was unfortunately quite busy. I grabbed my goods in short order and proceeded to wait in line. But who should upset the sanctity of the line? None other than some granny with a trolley who happened to be finished with her shop and decided that she’ll be next to be served. She did all this while walking past the line, glancing at the befuddled group of Asians at the head of the line who were unsure of what they should do. What’s worse is at the end of her transaction, and she was presented with the total, she almost sounded like she was going to debate the cost. “$18.70?!” Don’t you fucking dare dispute that, or I’ll destroy you. After realising they had been wronged, the Asians moved closer to the counter, ensuring they were next.
As the transactions were processed, more people started to line up. Well, lines up, as there was more tahn one line forming, thanks to other old people starting their own line just behind the Asians. As I was originally behind them, I was right annoyed at these fuckers, but thankfully the checkout server was aware I was next and motioned me over. But the first old person in the impromptu ‘line’ was not happy with this and made a point of standing right next to me as I went through the transaction, fixing me with an evil glare as though she had been wronged.
At the end of all this, I figure this: sure young people can be insolent and rude, but old people are manipulatively rude. They do things which are flat out crimes against social law, but act as though they are above it. Perhaps they simply forget social laws, or perhaps they are envious and want to piss off those who still have full control of their bladder.
PhD Progress: Maximal covering complete
Posted by Sam on January 22nd, 2010 under PhD Project • No Comments
I have finished the code for covering a state when no rules fire. It appears to be quite stable and usually always finds the most general rules. However, this can also be a problem. Because the most general conditions for an action (i.e. the basic action preconditions) may not be the best rules for the job (for example stack requires highest(X), which is above the basic condition of clear(X)).
Not only is there a problem in generality, but also in goal constants (onAB). The onAB problem requires that the constant terms a and b are present in the rules to have an optimal policy and as of yet, the algorithm doesn’t have these. I feel like I’ve addressed this problem before but I can’t find where.
A possibility for heuristical specification is to note the pre-goal achieved state and perform a direct specification, or perhaps a number of specifications on the rule (creating a bunch of mutations). The rule can then be removed, as at least one of the children should be on the right track towards an optimal rule.
Another issue is that of covering which does not find the most general rule. For instance, the generated rules would sometimes create rules which included the onFloor(X) (tied to clear(X)) condition simply because the actions it used to cover all dealt with rules on the floor. The only solution to this I can currently think of is finding the actions proposed by the rule and crossing them with the valid actions of the same type, and ensuring they match. If the valid actions have more actions than the rule predicts, then the rule is not general enough. But this could undo the progress made with the above paragraph’s specialisation mutation.
Perhaps during covering creation the unification needs to occur over ALL rules, not just until it senses no change. This still has a small chance of creating onFloor(X) rules. Maybe mark which rules have attained maximum generality and which haven’t and for every rule that hasn’t attained maximum generality, cover it until it does (or has seen enough states, as it may be impossible to attain maximum generality). Using this marking system, rules which are mutants of maximally general rules can avoid being re-generalised.
PhD Progress: Covering non-necessary terms
Posted by Sam on January 22nd, 2010 under PhD Project • No Comments
When covering an action from the state, it is likely that non-necessary terms (with regards to the action in question) will also be in the rule (such as on(X,Z), where Z isn’t part of the action). An example of such a rule is:
on(a,c) & cl(a) & on(b,d) & cl(b) -> move(a,b)
Note that ‘c’ and ‘d’ aren’t part of the action so when the action is inversely subsumed, these need to be put into variables of sorts. However, if they were given concrete named variables, they may cause problems when unifying later, as the variables would need to be checked if they can sumsume one-another.
A possible fix for this (though perhaps too rough), is to swap these variables with special terms ‘_’ (don’t care, again seen in FOXCS). These don’t care terms can be anything (within type constraints) and do not necessarily have to be inequal to one-another ‘_(1)’ can equal ‘_(2)’. However, on creation of the final rule for the action, these ‘_’ terms need to be concretely put into variable terms, though again without the inequals predicate bounding them from each other.
Currently this issue isn’t a big one, but this is mostly due to the small domain size and lack of background predicates being included in the observations.
The process of inversely substituting terms could be performed with Replacements, but I have a feeling it may proceed more smoothly by using writing the actions in string form and using the existing rule parser to create the rules. Strings hold advantage by allowing simple equality checks, omission of state terms and dealing with state terms, and special behaviour when parsing the rule (‘_’ term).