Anda di halaman 1dari 8

Metacognition for Self-Regulated Learning in a Dynamic Environment

Darsana P. Josyula, Franklin C. Hughes, Harish Vadali, Bette J. Donahue


Fassil Molla, Michelle Snowden, Jalissa Miles, Ahmed Kamara and Chinomnso Maduka
Department of Computer Science
Bowie State University, Bowie, MD 20715
darsana@cs.umd.edu
This research has been supported in part by grants from NSF and NASA.

AbstractThis paper describes a self-regulated learning decides that his knowledge of some subject is lacking. The
system that uses metacognition to decide what to learn, when student then chooses the best way to learn the required
to learn and how to learn in order to succeed in a dynamic material. The student may decide to re-read the section of the
environment. Metacognition provides the system the ability to
monitor anomalies and to dynamically change its behavior to textbook that deals with that subject. Or, perhaps the student
fix or work around them. The dynamic environment for the is more hands on, so instead of reading the text he would
system is an air traffic control domain that has six approach solve sample problems or he is more visual and would find
vectors for planes to land. The system has access to three basic a video tutorial online. The student stops reading, solving,
approach strategies for choosing a landing terminal: Nearest or watching when he is confident that his knowledge has
Terminal, Free Terminal and Queued Terminal. In addition,
the system has access to a supervised-learning algorithm that improved enough to earn the grade he desires. Metacognition
can be used to create new strategies. The system has the ability guided self-regulated learning gives the student the ability to
to generate its own training data sets to train the supervised- learn how to construct a learning strategy that is appropriate
learner. for a particular problem.
The metacognitive component of the system monitors various In this paper we discuss how a metacognitive component
expectations; anomalies in the environment cause expectation
violations. These expectation violations act as indicators for can allow self-regulated learning in a dynamic environment
what, when and how to learn. For instance, if an expecta- an air traffic control domain. The air traffic controller
tion violation occurs because aircraft are not being assigned (ATC) is tasked with deciding which approach path each
approach vectors within a given time threshold, the system incoming aircraft should be assigned. The ATC is equipped
automatically triggers a change in landing strategies. Examples with three basic approach path selection strategies and is
of anomalies that cause expectation violations include closing
one or more of the six approach vectors or changing all of capable of creating new strategies based on different learning
their geographical locations simultaneously. In either case, the algorithms.
system will respond to the situation by assigning the planes to The metacognitive component of the ATC helps it learn
one of the currently active approach vectors. the landing strategy that is most appropriate for the con-
figuration of the environment that it is situated in. The
I. I NTRODUCTION learning may be as simple as changing the current strategy
The ability of an agent to learn about its environment to a different but pre-existing strategy or as complicated
and make decisions based on that information can mean the as discovering a new strategy to add to its repository of
difference between success and failure. Recent work [1], [2], available strategies. The metacognitive component of the
[3] on human learning has suggested that the best learners ATC initiates the creation of new strategies or switching
are the ones who practice self-regulated learning. to alternate strategies in response to failed expectations.
Self-regulated learning refers to the ability of an agent For creating new strategies, the metacognitive component
to be able to determine when to learn, what to learn and of the ATC sets the training parameters (e.g., training
how to learn. Knowing when to learn requires an ability to duration, data set size and desired output function) based
judge when to start and stop learning. Knowing what to learn on the current environmental configuration, triggers creation
requires an ability to identify the specific piece of knowledge of a training data set making use of the training parameters
that is lacking. Knowing how to learn requires an ability to and initiates a supervised learning algorithm to train on the
choose the best learning strategy available to learn what is data set.
required. Metacognition, the act of thinking about thinking, The following sections describe the domain in which
is an integral part of this process. It allows agents to change the ATC and its metacognitive component were tested, the
the way learning occurs. real power behind metacognition as illustrated by examples,
For students, an example of these steps is exam prepa- other related work, our conclusions and future work to be
ration. A student notes that an exam is approaching and done.
II. A IR T RAFFIC C ONTROL S IMULATOR 6) Land: The aircraft checks its location against a list of
possible approach paths and if it matches one it will begin
The air traffic control simulator has two major
to perform a landing maneuver which takes the aircraft from
components(i) the ATC that monitors the traffic within
its current location and altitude to the ATCs location and
a specified radar range and directs aircraft toward available
an altitude of 0.
approach paths, and (ii) the aircraft that fly towards the ATC
monitored radar area, wait for direction from ATC for an B. ATC Actions
approach path and use that approach path for landing. The ATC can perform the following actions:
The simulated x-y area is a 10, 000 by 10, 000 unit 1) Communicate with each aircraft: The ATC can send
square with the ATC at its center: (5000, 5000). Aircraft one of the following messages to one or more of the aircraft:
have a z component representing their altitude (minimum : An ID message that contains a unique identifier that the
0, maximum : 500). The ATCs radar range is a square with aircraft receive on creation. This ID is used by the ATC
corners at (2500, 2500) and (7500, 7500). for tracking and modifying the flight path of specific
The aircraft are spawned randomly in the region outside planes, identifying new planes entering its radar range
the ATCs radar range. The aircraft initiate a connection and distinguishing between different aircraft displayed
with the ATC upon creation and are issued a unique ID. on the GUI display.
The aircraft use their default destination which is the ATCs A terminal approach message that contains instructions
location at (5000, 5000) to determine their initial flight path. for the aircraft to use one of the six (or more) approach
Aircraft outside of the ATCs radar range fly under their own paths.
guidance until they cross into the area, at which point they A delay message that can be accomplished by either
begin to circle until the ATC determines and communicates slowing down or flying in a circle.
instructions. Communication between the aircraft and the A destination updating message that contains one or
ATC is accomplished by TCP/IP socket connection. All more destinations the aircraft must fly through.
aircraft land at the ATCs location and must fly there through The ATC can also receive messages from all aircraft
one of six approach paths located within 1000 units of within the radar range.
the ATC. Once the aircraft lands, its trajectory and current 2) Alter approach paths: The ATC can add, delete, or
position are erased from the GUI. edit existing approach path locations. There must always be
a minimum of one approach path but there is no maximum.
A. Aircraft Actions 3) Change strategy for choosing approach paths: The
Each aircraft can perform the following actions: ATC must choose between multiple strategies for determin-
1) Fly to a destination at (x, y, z): The aircraft deter- ing the most efficient approach path an aircraft should take.
mines the proper velocities in the x, y and z directions to The ATC (with its metacognitive component) is capable
take it from its current location to the destination in as of choosing between multiple arrival strategies. Nearest
straight a line as possible. Terminal, Free Terminal, and Queued Terminal are the
2) Fly through multiple destinations: The aircraft flies basic strategies available to the ATC. In addition to the
through a list of destinations in order, each time flying basic strategies available, the ATC is capable of creating
straight from one goal to the next and ending at the final other strategies by self-initiated supervised learning (See
destination. Section II-B4). The basic strategies are discussed next.
Nearest Terminal Strategy
3) Fly in a circle: The aircraft flies in a circle of a
specified radius. Under this strategy, aircraft will be assigned to the
4) Delay: The aircraft delays its flight by slowing down closest approach path. Once an aircraft has been cleared
its speed. to approach from a specific approach path, that path is
unavailable to others until the first aircraft has landed.
5) Communicate with the ATC: The aircraft communi-
Other aircraft reaching the ATC radar range must wait
cates with the ATC by sending messages to the ATC and
their turn for assignment to an empty approach path by
receiving messages from the ATC. The aircraft can send
circling at the outer radar region.
one of two messages:
Free Terminal Strategy:
An update containing the aircrafts current location, Aircraft will be assigned to the nearest free approach
goal location and flight path. path when ATC uses this strategy. As in Nearest
A message telling the ATC that it has successfully Terminal Srategy, once an aircraft has been cleared to
landed and will disconnect. approach from a specific approach path, that path is
The aircraft can receive messages (See Section II-B1) unavailable to others until the first aircraft has landed.
from the ATC that contain instructions on the actions to However, unlike Nearest Terminal Strategy, in this
be performed. strategy other aircraft may be diverted to another free
path if such a path is available. If no path is free, then current version of the simulator is a back propogation neural
this strategy works similar to Nearest Terminal Strategy network. Out of this learning will come a completely new
and the other aircraft must wait their turns by circling strategy that is added to the ATCs repertoire of Nearest
at the outer radar region. Terminal, Free Terminal and Queued Terminal strategies.
Queued Terminal Strategy: This process allows the ATC to learn new strategies that can
This strategy involves the Nearest Terminal strategy help in a dynamic environment. If the newly formed strategy
with two important differences: (i) the approach path is is put into use and the metacognitive component continues
not closed down after an aircraft has been assigned to it, to see expectation violations the size of the training data is
and (ii) the speed of each aircraft can be manipulated to increased and the strategy creation process is restarted.
avoid collisions. The advantage of this strategy is that 5) Avoid Collisions: The basic three strategies guarantee
the duration for which an aircraft circles before it gets that there will be no direct collisions among the aircraft
assigned an approach path may be shorter if this aircraft inside the ATCs radar region; however, this need not be
can move towards an approach path without colliding the case for the self-learned strategies. Also, under Queued
with other aircraft that are already moving toward their Terminal strategy it is possible to have aircraft right next
designated approach paths. to each other and this close proximity is dangerous in
The strategy places a maximum of five aircraft into a the real world. For these two reasons, a separate fail-safe
landing queue for each approach path. Once the queue collision avoidance mechanism is available to the ATC. This
is full other aircraft will be forced to circle and wait mechanism automatically maintains minimum safe distance
until there is room in the queue. While the first aircraft zones around each aircraft by calculating their future flight
is on approach at full speed, the second aircrafts flight paths and manipulating the speed of one or more aircraft
path from its current location to the starting point of should they intersect.
the approach path is checked against the first aircrafts 6) Perform Metacognitive Monitoring and Control: The
flight path. If the ATC detects that a collision between ATC is equipped with a metacognitive component that moni-
the two aircraft is likely to occur it calculates a modified tors expectations and offers responses when the expectations
slower speed for the second aircraft and directs the are not being fulfilled. In our dynamic environment the
second aircraft to fly at the modified speed. The third ATCs ability to perform metacognition makes it a far
aircraft in the queue has its likely flight path checked more effective tool than otherwise. Metacognition makes it
against the first and second and so on. possible for the ATC to realize when its standard strategies
4) Create New Strategies using Supervised Learning: The are ineffective and guides the ATC to learn and apply new
ATC can discover new strategies for choosing the approach ones. Figures 1 through 4 show the flow of information to
paths by self-initiated supervised learning. The ATC creates and from the components of the ATC.
its own training data set dynamically with a variable number
of data points gathered from a virtul ATC which is operated C. ATC Component Interaction
in the system background. The virtual ATC spawns virtual
planes at random points or from specific regions according
to the the training data requested.
The size of the training data is determined by the metacog-
nitive component. Each input for creating the training data
set includes (i) the approach paths, (ii) a set of locations
and speeds of aircraft that are in the radar range and (iii)
the current speed and location of an aircraft for which the
approach terminal and flight speed need to be determined.
Each output includes the approach path that the aircraft
should be assigned to and the flight speed for that aircraft.
The output data for the training set is determined by applying Figure 1. The basic interaction between the components of the ATC.
the Queued Terminal strategy to the created input data.
Once the training data is created, varying configurations Figure 1 broadly describes the interaction of the metacog-
of a supervised learning algorithm are applied to each nitive component, control modules, user interface and
training set in order to determine if new strategies with lower knowledge base. The control modules update and access
anomaly rates can be created for situations modeling the the knowledge base, the metacognitive component checks
training data. Because of this feature the ATC is capable of the observations in the knowledge base against its own
dynamically learning new strategies increasing its robustness expectations in order to adjust the control modules and
and ability to operate in a dynamic environment. inform the user when needed. The user interface allows
The supervised learning algorithm that is used in the the user to alter the performance of the control modules
and introduce anomalies for testing purposes. Figures 2 knowledge base are the current observations of collision
through 4 show the detailed interactions of each component frequency, flight speeds, and flight durations. The metacogni-
under different circumstances. tive component has, within its own separate knowledge base,
expectations in the form of threshold values for each of these
observations. If the observed value of collision frequency is
beyond the threshold value, the metacognitive component
registers an expectation violation and lowers the current
strategys effectiveness rating. The metacognitive component
then instructs the strategy chooser to search through the strat-
egy repository for the highest rated strategy and implement
it. Once implemented, the metacognitive component waits
for further observations. Should more expectation violations
occur it can instruct the strategy creator to initiate the
creation of entirely new strategies and then use the strategy
chooser to implement them in an attempt to find a more
effective solution while at the same time keeping the user
updated with its actions through the user interface.

Figure 2. The detailed interactions of the components of the ATC when


the metacognitive component is not involved in decision making.

Figure 2 shows the operation of the ATC when the chosen


strategy is performing as desired. The strategy executor
feeds the flight data and approach path locations into the
algorithm of the currently selected strategy to calculate each
aircrafts designated flight velocity and approach path which
are then communicated to each aircraft. The metacognitive
component monitors the observations but otherwise is not
involved because each of them falls within the expected
thresholds.

Figure 4. The detailed interaction of the User Interface with the various
control modules of the ATC.

Figure 4 shows how the user interface (UI) can be used to


introduce anomalies into the system. From the UI, users can
change approach path locations, initiate strategy creation,
manually select the current strategy, and control the flow of
communication to and from the aircraft. The expectations
of the metacognitive component were chosen such that the
component itself need not be aware of each of the numerous
ways the system can break down. All the anomalies will have
some effect on the observations stored in the knowledge
base which will cause a triggering of the metacognitive
component should the values cross the thresholds as the
following examples illustrate.
Figure 3. The detailed interactions of the components of the ATC when The user blocks the communication of a designated
the metacognitive component is actively taking part in decision making. approach path from an aircraft so that it continues
to circle, not knowing that it should have received a
Figure 3 shows the operation of the ATC when the clearance to approach. The aircrafts flight duration will
chosen strategy is not performing as desired. Within the increase eventually crossing the threshold value and
starting the metacognitive components response. A. Illustration
The user moves the approach path locations. The col- The following example illustrates how the metacognitive
lision frequency and/or the flight durations could be component helps the ATC regulate its learning of the best
adversely impacted raising them enough to trigger the strategy to be used in a dynamic environment.
metacognitive components response.
The user selects a strategy that is not efficient for
the current set of aircraft, communicates an incorrect
approach path or sends an aircraft into an infinite loop.
In each case the increase in flight duration would trigger
the metacognitive components response.
With proper expectations and response strategies in place,
all possible failure models need not be hard coded into
the system. Metacognition allows the system to deal with
different failures by monitoring expectations and responding
to expectation violations.

III. T HE P OWER OF M ETACOGNITION

The ATC makes its metacognitive component aware of


the following observations: (i) aircraft circle times, (ii) flight
speeds, (iii) aircraft locations, (iv) the number of times the
collision avoidance mechanism is used under the current Figure 5. Trajectories of multiple aircraft flying toward the ATC
strategy and (v) the strategies that are available in the
repository. The expectations of the metacognitive component Figure 5 shows the GUI with multiple aircraft beginning
are as follows: their flights toward the ATC. The darkened center rectangle
Each flight will land within 100 seconds of coming into represents the ATCs radar range. Once an aircraft crosses
the ATCs radar region. the radar range it circles until the ATC consults its current
The aircraft speed should match the speed the ATC strategy and communicates instructions. The default strategy
assigns to it. is Nearest Terminal.
The collision avoidance system should be used a min- The aircraft and their trajectories are represented by the
imal number of times. thick dark lines, their current position is marked by the
The number of strategies that are available must be ID/Altitude label. The ATC is located at the center of the
greater than 5. radar region. Each of the lines connected to the ATCs
location represent one of six approach paths the aircraft must
If an expectation violation occurs because there are use. Although the number and location of these approach
not enough strategies available for choosing terminals, the paths can change, shown here are the six default values.
metacognitive component will trigger the strategy creator to Under the Nearest Terminal strategy an aircraft entering
generate new test data in order to create a new strategy by into the ATCs radar range is assigned to its closest approach
applying various supervised learning algorithms. Once the path if the path is free otherwise it begins to circle. The
ATC learns a new strategy, that strategy becomes part of the disadvantage is that multiple aircraft could be circling while
strategy repository. waiting for one approach path when adjoining approach
Should the flight times continue to exceed expectations paths are free. Figure 6 illustrates this occurrence.
or the collision avoidance mechanism be over worked the The metacognitive components expectation that all air-
metacognitive component will ask the ATC to change to craft should arrive at the ATC within 100 seconds of crossing
another strategy that has not been tried yet. The new strategy into the ATCs radar region fails since many of the aircraft
could be a newly discovered one. are circling while their nearest approach paths are full. It
If further expectation violation occurs, the metacognitive responds by having the ATC change the strategy to the Free
component can tell the strategy creator to increase the Terminal strategy. Under the Free Terminal strategy aircraft
number of data points that it uses to generate training data are cleared to approach from the nearest free terminal. This
in order to create a more robust strategy that provides more allows the ATC to land aircraft without wasting as much
efficient results. The metacognitive component can tell the time, thereby fulfilling the stated expectation.
ATC to use any of the strategies it knows and it can delete Figure 7 shows the result of the change in strategy from
strategies that do not perform. Nearest Terminal to Free Terminal. Aircraft 2S7A2, 8LOOE
Figure 8. For illustration purposes all aircraft are created such that their
Figure 6. The southwestern aircraft: 8LOOE, 2S7A2 and EXKJI are made nearest approach is the western approach path.
to circle while QRUSA lands from the western approach path and MQJAX
approaches the southwestern approach path. Both northern approach paths
are empty.
with the Queued Terminal strategy five of the aircraft are
immediately sent to the closest approach path with the other
two circling for a smaller amount of time. This strategy
is best used when the concentration of aircraft is high for
multiple approach paths.

Figure 7. All aircraft immediately head to approach paths as determined


under the Free Terminal Strategy.

and EXKJI are sent to approach paths that are available Figure 9. Queued Terminal strategy directs five aircraft at once to approach
by manipulating their flight speeds to avoid any collisions. Two aircraft must
instead of waiting at the edge of the radar range for their wait for the first aircraft to land before one of the two is permitted in the
closest approach paths to become available. When the distri- approach.
bution of aircraft is spread out the metacognitive component
selects Free Terminal strategy; however as Figures 8 and 9 Figure 9 illustrates a high concentration of aircraft at
illustrate, with a more concentrated distribution of aircraft, one approach path being successfully instructed under the
the metacognitive component selects the Queued Terminal Queued Terminal strategy. Using flight speed manipulation
strategy. this strategy can provide efficient approach path instructions
Under the Nearest Terminal strategy, all but the first during periods of high traffic.
aircraft in Figure 8 would be left circling while the approach Our approach to solving the problem of learning strategy
path was unavailable leading to very long flight durations selection in a dynamic environment necessarily cannot rely
for the last flights to cross into the radar region. However upon any explicit knowledge of the environment itself. The
metacognitive component therefore does not incorporate then selects a learning strategy based on the learning goal.
hard coded heuristics for determining learning strategies This approach to solving the problem of selecting the best
since dynamic environments can lead to any such heuristics learning strategy is based on the machines prior knowledge
becoming obsolete from one moment to the next. Instead, the of what strategy best fits the determined goal. However in
metacognitive component looks solely at failed expectations certain dynamic environments the effectiveness of any one
and determines in real-time how to respond based on its strategy can change with time, so it is harder to explicitly
array of available actions. Say, for instance, the approach state the best strategy for each learning goal.
paths need to be changed. Perhaps the system is being Raja and Lesser [8] describe a reinforcement learning
deployed at a different airport or some terminal has to go technique which allows agents to learn meta-level control
down for maintenance. An ATC with metacognition is able policies that govern decisions in multiagent environments.
to treat this situation the same way it treats the others. The system learns a meta-level Markov Decision Process
Aircraft that are sent to incorrect approach path locations (MDP) model that represents the system behavior for a
would begin to circle and their flight times increase. The particular environment from a set of states, actions, transition
metacognitive component of the ATC notices that flight probabilities and a reward function. The system learns the
times are outside of expected values and instructs the ATC MDP model by making random decisions to collect state
to create a new training data set that incorporates the new transition probabilities. While these studies focus on learning
approach path locations. This new strategy then fulfills the metacognitive control knowledge that can help in domain
required maximum flight time expectation. activities like task scheduling, our research focuses on the
application of metacognition to self-regulate the acquisition
IV. R ELATED W ORK of new knowledge (discover new approach choosing strat-
Metacognition has been shown to be a key ingredient egy) or revise existing knowledge (change current landing
in problem solving and learning in the research in various strategy).
fields including psychology, education and linguistics. For Fox and Leake [9], [10] use a model of the reasoning
example, a study [3] involving college students and their process to derive expectations about the ideal reasoning
ability to predict the grades they would receive on exam- behavior of the system; the actual reasoning is compared
inations based on their self awareness of their academic to this ideal to detect reasoning failures. Their system uses
strength or weakness, found that the more accurate the introspective reasoning to monitor the retrieval process of a
students prediction the higher the students score. The case-based planner and detect the retrieval of inappropriate
authors summarize that expert learners are also skillful at cases. When retrieval problems occur, the explanations for
metacognitive knowledge monitoring. The ability, or lack the failures are evaluated and these explanations are used to
thereof, of a problem solver such as a student to recognize update case indices in order to improve future performance.
that their knowledge has come to a point that will allow This work is close in spirit to ours in the use of expectations
them to succeed at difficult tasks related to that knowledge as a means to detect failures. While their system monitors
can predict future success. This relationship between self- expectations to improve the performance of a case-based
regulated learning and metacognitive knowledge monitoring planner, our system monitors expectations to improve the
as witnessed in human beings is the motivation behind our performance of an agent situated in a dynamic environment.
approach to regulating the learning for systems deployed in The POIROT project [11] presents an architecture that
dynamic environments. combines a set of machine learning approaches to learn
In the field of artificial intelligence metacognition has complex procedural models from a single demonstration in
been applied in various ways to help systems learn. Cox a medical evacuation airlift planning domain. The overall
[4], [5] presents a computational theory of introspective learning framework is based on goal-driven learning that
failure-driven multistrategy learning. The reasoning of an performs targeted searches for explanations when observa-
agent is represented explicitly in knowledge structures called tions do not agree with the developing model and creates
Meta-XPs. The Meta-XPs explain how and why reasoning new knowledge goals for the different learning components.
fails. This knowledge is used by the learner to determine A meta-control learning moderator signals when to propose
the proper learning strategy. The theory is implemented in learning hypotheses and when to evaluate them based on
a case-based reasoning system called Meta-AQUA [6], [7]. the knowledge needs of the different components. While
This system reads stories sentence by sentence and attempts POIROT focuses on learning a generalized hierarchical task
to understand them. If the knowledge base fails to provide model from a demonstration of a sequence of web service
an explanation for a sentence, then a reasoning failure transactions, our work focuses on learning a task model that
is generated. When a reasoning failure occurs in Meta- is appropriate for the current environmental setting.
AQUA, the system creates new learning goals autonomously Cox and Raja ([12]) believe that at the meta-level, an
based on the introspective analysis of its own successes agent must have a model of itself to represent the products of
and failures at the performance task. A non linear planner experience and to mediate the choices effectively at the ob-
ject level. Facing novel situations the successful agent must [4] M. T. Cox and A. Ram, Introspective Multistrategy Learn-
learn from experience and create new strategies based upon ing: On the Construction of Learning Strategies, Artificial
its self-perceived strengths and weaknesses. In this way, Intelligence, vol. 112, pp. 155, 1999.
the system becomes a complete cognitive system capable [5] M. T. Cox, Introspective Multistrategy Learning:
of making decisions and having a clear understanding of Constructing a Learning Strategy under Reasoning Failure,
its own capabilities, its relationship to the problem and the Ph.D. dissertation, College of Computing, Georgia Institute
environment in which it exists. of Technology, Atlanta, USA, 1996. [Online]. Available:
hcs.bbn.com/cox/thesis/
V. C ONCLUSIONS AND F UTURE W ORK
[6] A. Ram and M. Cox, Introspective reasoning using
At this point the metacognitive component is helpful in meta-explanations for multistrategy learning, in Machine
choosing strategies that reduce expectation violations, but Learning: A Multistrategy Approach IV, R. Michalski and
G. Tecuci, Eds. San Mateo, California: Morgan Kaufmann,
it has a limited number of learning algorithms which it 1994, pp. 349377.
can apply to the training data set. In the future we hope
to increase the number and type of classifiers available [7] A. Ram, AQUA: questions that drive the understanding
for new strategy creation. To this end we are exploring process, in Inside Case-Based Explanation, A. R.C. Schank
the Weka data mining and machine learning software [13] and C. Riesbeck, Eds. Hillsdale, New Jersey: LEA, 1994,
pp. 207261.
from the University of Waikato. This software package
contains multiple machine learning algorithms incorporated [8] A. Raja and V. Lesser, A framework for meta-level control
into a Java API and has independently produced solutions to in multi-agent systems, Autonomous Agents and Multi-Agent
several training data sets. Integrating this software package Systems, vol. 15, no. 2, pp. 147196, 2007.
into the strategy learning module of the ATC will allow for
[9] S. Fox, Introspective learning for case-based planning.
more diverse strategies yielding an increase in efficiency and Ph.D. dissertation, Department of Computer Science, Indiana
robustness for dynamic environments. University, Bloomington, IN., 1995.
Also in the future we would like to remove the metacog-
nitive component and instead have the ATC communicate [10] S. Fox and D. Leake, Introspective reasoning for index
with the Metacognitive Loop (MCL) software [14], [15], refinement in case-based reasoning, Journal of Experiment
and Theoretical Artificial Intelligence, vol. 13, pp. 26388,
[16], that reasons with the same metacognitive algorithms 2001.
but makes use of three generic ontologiesindications, fail-
ures and responsesto note anomalies, assess failures and [11] M. H. Burstein, R. Laddaga, D. McDonald, M. T. Cox,
respond to the situation. MCL has been tested successfully B. Benyo, P. Robertson, T. Hussain, M. Brinn, and D. V.
in other domains and its use in the air traffic control domain McDermott, Poirot - integrated learning of web service
procedures. in AAAI, D. Fox and C. P. Gomes, Eds. AAAI
will further test the generality of the MCL ontologies. MCL Press, 2008, pp. 12741279.
uses a Bayesian network to select a proper response for
a particular input and only requires reconfiguration of the [12] M. Cox and A. Raja, Metareasoning: A manifesto,
fringe nodes when applied to different domains. This project in BBN Technical Memo TM-2028. Cambridge,
will be used to support further evolution of MCL in hopes MA: BBN Technologies, 2007. [Online]. Available:
http://www.mcox.org/Metareasoning/Manifesto/manifesto.pdf
of creating a general reasoning system that can be applied in
any domain with minimal reconfiguration. Use of MCL will [13] I. H. Witten and E. Frank, Data Mining: Practical Machine
allow the system access to a more complete decision making Learning Tools and Techniques (Second Edition). Morgan
system that uses more complex methods of evaluation to Kaufmann, 2005.
learn proper reactions to expectation failures.
[14] M. L. Anderson and D. R. Perlis, Logic, self-awareness and
self-improvement: The metacognitive loop and the problem
R EFERENCES of brittleness, Journal of Logic and Computation, vol. 15,
[1] A. L. Wenden, Metacognitive knowledge and language learn- no. 1, 2005.
ing, Applied Linguistics, vol. 19, no. 4, pp. 515537, 1998.
[15] M. Schmill, D. Josyula, M. L. Anderson, S. Wilson, T. Oates,
D. Perlis, and S. Fults, Ontologies for reasoning about
[2] W. P. Rivers, Autonomy at all costs: An ethnography of
failures in AI systems, in Proceedings from the Workshop
metacognitive self-assessment and self-management among
on Metareasoning in Agent Based Systems at the Sixth
experienced language learners, The Modern Language Jour-
International Joint Conference on Autonomous Agents and
nal, vol. 85, no. 2, pp. 279290, 2001.
Multiagent Sytems, 2007.
[3] R. Isaacson and F. Fujita, Metacognitive knowledge moni- [16] M. L. Anderson, S. Fults, D. P. Josyula, T. Oates, D. Perlis,
toring and self-regulated learning: Academic success and re- M. D. Schmill, S. Wilson, and D. Wright, A Self-Help Guide
flections on learning, Journal of the Scholarship of Teaching for Autonomous Systems, AI Magazine, 2008.
and Learning, vol. 6, no. 1, pp. 3955, 2006.

Anda mungkin juga menyukai