0 Suka0 Tidak suka

40 tayangan8 halamanball and plate

Oct 01, 2015

© © All Rights Reserved

PDF, TXT atau baca online dari Scribd

ball and plate

© All Rights Reserved

40 tayangan

ball and plate

© All Rights Reserved

- ADKAR (S10)
- Fuzzy
- 8ijest-ng-vol2-no7-pp123-128
- 3-s2.0-B978075063429850018X-main
- Verbal Models
- Behavioral Learning Theory
- 593-2657065
- Intelligent Analysis of the Effect of Internet
- copyofle4managementsystemplaninformation
- Griffin 10e PPTS Ch04
- List of Tables
- Full Text
- Lecture 10
- sib
- foundations of ind behaviour 1
- Fuzzy Applications in Industrial Engineering - Cengiz Kahraman.pdf
- im_02.doc
- fuba-bip-data collection
- tws 9
- Being-A-Human-at-Work.docx

Anda di halaman 1dari 8

Nima Mohajerin, Mohammad B. Menhaj, Member, IEEE, Ali Doustmohammadi

Reinforcement Learning Fuzzy Controller (RLFC), is proposed

and implemented. Based on fuzzy logic, this newly proposed

online-learning controller is capable of improving its behavior by

learning from experiences it gains through interaction with the

plant. RLFC is well established for hardware implementation

with or without a priori knowledge about the plant. To evidence

this claim, a hardware implementation of Ball and Plate system

was established, and RLFC was then developed and applied to it.

The obtained results are illustrated in this paper.

Index TermsFuzzy Logic Controller, Reinforcement

Learning, Ball and Plate system, Balancing Systems, Model-free

optimization

I. INTRODUCTION

challenging test platforms for control engineers. Such

systems are the traditional cart-pole system (inverted

pendulum), the ball and beam (BnB) system, the multiple

inverted pendulums, the ball and plate system (BnP), etc.

These systems are the promising test-benches for investigating

the performance of both model-free and model-based

controllers. Considering those complicated ones (such as

multiple inverted pendulums or BnP) even if one bothers to

mathematically model them, the resulted model is likely to be

too complicated to be used in a model-based design. One

would highly prefer to use an implemented version of such a

system (if available and not risky) and observe its behavior

while the proposed controller is applied to it. This paper is

devoted to the efforts done for a project in which the main

goal is to control a ball over a flat rotary surface (the plate)

mimicking humans behavior controlling the same plant, i.e.

the BnP system. The proposed controller neither should be

dependent on any physical characteristics of the BnP system

nor should it be supervised by an expert. It should learn an

optimal behavior from its experiences interacting with the BnP

system and improve its action-generation strategy; however,

some prior knowledge about the overall behavior of the BnP

Nima Mohajerin is with the School of Science and Technology of rebro

University, rebro, Sweden. (e-mail: nima.mohajerinh091@student.oru.se).

Mohammad B. Menhaj, is with the Electrical Engineering Department of

Amirkabir University of Technology, (e-mail: tmenhaj@ieee.org).

Ali Doustmohammadi, is with the Electrical Engineering Department of

Amirkabir University of Technology, (e-mail: doustm@aut.ac.ir).

the needed time for reaching the goal.

Those few published papers on the BnP system are mainly

devoted to achieve the defined goals regarding the BnP itself

rather than how to achieve those goals [3, 4, 5, 8, 9]. They can

be categorized into two main groups: those which are based on

mathematical modeling (with or without hardware

implementation), those proposing model-free controllers.

Since the simplification in the mathematical modeling of the

BnP system yields two separate BnB systems, the first

category is goal-oriented and is of no interest for the current

project [4, 5]. On the other hand, the hardware apparatus used

in the second category is CE151 [6] (or in some rare cases

another apparatus [9]) which all use image feedback for ball

position sensing. However, among all, [5] is devoted to a

mechatronic design of the BnP system controlled by a classic

model-based which benefits a touch-sensor feedback and [3,

4] used the CE151 apparatus. Note that the image feedback is

a time bottleneck which will be discussed in Section III. In [3],

a fuzzy logic controller (FLC) is designed which learns online

from a conventional controller. Although the work in [4] is

done through mathematical modeling and is applied to CE151

apparatus, it is of more interest because it tackles the problem

of trajectory planning (to be stated in Section III). Reports in

[8] and [9] are focused on motion planning and control though

they are less interesting for us.

To achieve the desired goal, a different approach will be

demonstrated in this paper. This approach is based on fuzzy

logic controller which learns on the basis of reinforcement

learning. Additionally, a modified version of the BnP

apparatus is also implemented in this project as a test platform

for the proposed controller.

In this paper, the fundamental concepts of RL are embodied

into fuzzy logic controlling methodologies. This leads to a

new controller, namely Reinforcement Learning Fuzzy

Controller (RLFC) which is capable of learning from its own

experiences. Inherited from fuzzy controlling methodologies,

RLFC is a model-free controller and, of course, previous

knowledge about the system can be included in RLFC so as to

decrease the learning time. However, as it will be seen,

learning in RLFC is not a separate phase of its functioning

phase.

This paper is divided into six sections. After this

introduction, in section II RLFC is completely explained, both

conceptually and mathematically. In section III, the BnP

system is introduced and the hardware specification of the

implemented version of this system is also outlines. In section

IV some modifications on RLFC in order to be applicable to

illustrate and analyze the results of RLFC performance on the

implemented BnP system. In this section, RLFC performance

is also compared with that of a human controlling the same

plant. Finally, section VI concludes the paper.

II. CONTROLLER DESIGN

This section is dedicated to explain the idea and

mathematics of the proposed controller, i.e. RLFC. First, the

behavior of RLFC is outlined conceptually and then the

mathematics will be detailed.

A. Outline

According to Fig. 1, RLFC consists of two main blocks, a

controller (FLC) and a critic. The task of the controller is to

generate and apply actions in each given situation as well as

improving its state-action mapping while the critic has to

evaluate the current state of the plant. Neither controller nor

critic necessarily knows anything about how the plant

responds to actions, how the actions will influence its states

and what the best action is in any given situation. There is also

no need for the critic to know how the actions in the controller

are generated. The most important responsibility of the critic

is to generate an evaluating signal (reinforcement signal),

which best represents the current state of the plant. The

reinforcement signal is then fed to the controller.

Once the controller receives a reinforcement signal, it

should realize whether its generated action was indeed a good

action or a bad one, and in both cases how good or how bad it

was. These terms are embodied into a measure named

reinforcement measure which apparently is a function of the

reinforcement signal. Then according to this measure, the

controller attempts to update its knowledge of generating

actions, i.e. improves its mapping from states to actions. The

process of generating actions is separate from the process of

updating the knowledge and thus they can be assumed as two

parallel tasks. This implies that while the learning procedure is

a discrete-time process, the controlling procedure can be

continues-time. However, without loss of generality it is

assumed that the actions are generated in a discrete-time

manner and each action is generated after the reinforcement

signal - describing consequences of the previously generated

action - has been reported and the parameters have been

updated. The dashed line in Fig. 1 implies the controller

awareness of its generated actions. Although apparently the

controller is aware of its generated actions, in case of

hardware implementation, inaccuracies in mechanic structure

and electronic devices and other unknown disturbances may

impose distortions on the generated actions.

x = [x 1 x 2

xn]

(1)

xi

ni

, i.e. U i , a number of

IF x1 is A1l1 AND x2 is A2l2 AND AND xn is Anln

THEN y is B l

(2)

universe of discourse of xi

discourse of y ( M ) - V is the universe of discourse of

whereU = U 1 U 2

all the corresponding universes of discourse are bounded and

can be covered by a limited number of fuzzy sets.

For hardware implementation, what matters first is the

processing speed of the controller. In other words, we have to

establish a fuzzy controller architecture that leads to an

optimum performance versus its complexity. For this reason,

we propose the following elements for the FLC structure:

singleton fuzziyfier, product inference engine and center

average defuzzyfier [11]. In this case given an input vector

the output of the controller is given by:

y ( A (x ) )

= f (x ) =

( A (x ) )

L

l =1

i =1

l =1

i =1

li

i

li

i

(3)

mentioned earlier, other FLC structures may be considered.

Apparently, only those rules are participated in generating y

that have a non-zero premise (IF-part), i.e. the fired rules. This

fact is not dependent on the FLC structure.

On the design stage, the controller does not know which

states it will observe, or in other words, the designer hardly

may know what rules are useful to be embodied in the fuzzy

rule base. Thus, all rules with premises made by all possible

combinations of input variables, using AND operator, are

included in the fuzzy rule base. The number of these rules is:

n

L = ni

(4)

i =1

B. The Controller

The aforementioned concept is as general as to be

applicable to any fuzzy controller scheme; however, we

assume that the fuzzy IF-THEN rules and controller structure

are of Mamdani type [9]. Imagine that the input to the fuzzy

inference engine (FIS) is an n element vector, each element is

a fuzzy number produced by the fuzzyfication block [11]:

*

so.

number of variables and defined term sets. Consequently, the

processing time drastically increases. To solve this, we assume

that for any given value to each variable, there are at most two

term sets with non-zero membership values. This condition

which will be referred by the covering condition is necessary

and if it is held then the number of fuzzy rules which

contribute to generate the actual output, i.e. the fired rules, is

equal or less than 2n, i.e. 2 is powered by n. Noticeably, to

reduce the needed time for discovering the fired rules we

implement a set of conditional statements rather than an

extensive search among all of the rules.

C. The Decision Making

As previously mentioned, the controller is discrete-time. So,

in each iteration, as the controller observes the plant state, it

distinguishes the fired rules. From now on, the L-rules FLC

shrinks to a 2n-rules FLC where 2n << L . The key-point in

generating output, i.e. decision making, is how the

consequences (THEN parts) of these fired rules are selected.

Noticing (2), the assigned term set to the consequence of the

l

lth rule is B . It was also mentioned that there are M term sets

defined on the universe of discourse of the output variable.

Assume that these term sets are referred by W i where

the probability of choosing W i for its consequence is:

Pkl ( j = i )

(5)

distribution over indices. The aim of reinforcement learning

algorithm, which is discussed in the next sub-section, is to

learn this probability for each rule such that the overall

performance of the controller is (nearly) optimized.

l

well suited for applying the reward/punishment procedure and

also for software programming. Fulfilling these objectives, for

each rule a bounded sub-space on is chosen. Note that

represents the set of real numbers. Factors for how this one

dimensional sub-space should be chosen are discussed in

section IV. Let the related sub-space to lth rule be:

l = a0l , aMl

(6)

(8-a), each of which is assigned to an index i where

i = 1, 2, , M . We have:

M

l = rl

(7)

r =1

b) a0 as as +1 aM , s = 1,..., M 2

lp ql =

We calculate the probability P l by (9):

(8)

Pl (j = i ) =

il

l

(9)

rl

rl = arl arl 1

(10)

the reinforcement learning procedure is done by tuning the

above parameters, they are all a function of k. Thus, (9) turns

to:

il ( k )

Pkl ( j = i ) = l

(11)

(k )

Or:

ail (k ) ail 1 (k )

(12)

aMl ( k ) a0l ( k )

By observing (7), (8) and (11) it is obvious that

Pkl ( j = i ) =

necessary axioms.

D. Reinforcement Learning Algorithm

In this sub-section the proposed algorithm for tuning the

above defined parameters is depicted. This algorithm is based

on reinforcement learning methods and satisfies the six

axioms mentioned by Barto in [12]. Let r ( k ) be the

reinforcement signal generated by the critic module in the kth

iteration. Note that it represents the effect of the previously

generated action by the FLC, i.e. y ( k 1) , on the plant and

before generating the kth action the parameters of the related

fired rules should be updated. In other words, this scalar can

represent the change in the current state of the plant.

To be more specific, as a general assumption, imagine that

the smaller reinforcement signal represents a better state of

the plant. Obviously, the change in the current state of the

plant is stated by (13). Therefore, an improvement in the

plant situation is indicated by r (k ) > 0 while r ( k ) < 0

indicates that the plant situation is worsened.

r (k ) = r ( k 1) r ( k )

(13)

However, since (13) is based solely on the output of the

critic, it does not contain information about which rules have

been fired, which term sets have been chosen for generating

y (k 1) , etc. Thus, r (k ) is not immediately applicable

for

updating

the

parameters.

The

mentioned

reward/punishment updating procedure means that if the

generated action resulted in an improvement in the plant

state, the probabilities of choosing the same term set for the

premises of each corresponding fired rule should be

increased. However, if this action caused the plant state to be

worsened, these probabilities should be decreased.

r (k ) will be mathematically manipulated in order to

amount by which the mentioned probabilities, (11), would be

affected.

At the first step, a simple modification is done on r (k ) .

This step may be ignored if r (k ) is already a suitable

representation of change in the system state A comprehensive

example will illustrate this case in section IV. This

manipulation is done by

:

f (.) noticing that f :

r ( k ) = f ( r (k ) )

(14)

corresponding sub-distances, (8) are changed.The amount of

change in the sub-distances relating to the lth rule is defined

by

il (k ) in (15).

il ( k ) = g . (l , i ). (l ).r (k )

(15)

Regarding (15), g is gain and is a scaling factor, (l , i )

represents the exploration/exploitation policy [1], (l ) is the

firing rate of the lth rule and is obtained by replacing x in the

membership function obtained from the premise part of the lth

rule. Note that this factor expresses the contribution of this

rule in generating the output.

There are a variety of exploration/exploitation policies [1, 2,

, 12], however here we propose a simple one:

(l , i ) = 1

(k )

e

n il

rule, as W i is chosen more for this rule, (l , i ) grows

exponentially to one, letting i ( k ) g . (l ).r ( k ) .

l

given by (17).

il (k )

, r (k ) 0

il (k ) =

(17)

l

l

max i (k ), i (k ) , r (k ) < 0

In (17), max operator is used in case of r ( k ) < 0 . This

is due to avoid large reinforcing those sub-distances that have

not been chosen. The reason is clearer if (18) is studied.

Equation (18) depicts the updating rule:

aql (k 1)

,q < i

aql ( k ) =

l

l

l

max {aq 1 ( k ), aq ( k 1) + i (k )} , q i

where q = 0,1,..., M .

length remains unchanged.

5- Modification of l always let other subspaces (and

hence other indices) to be chosen. As the system learns

more, a dominant subspace will be found, but the

length of un-chosen subspaces are still not zero as long

as the effect of choosing them is not observed as a

worsening result. This feature is useful in case of

slightly changing plants.

Theorem1. By (18) if a term set receives

reward/punishment, then the probability of choosing that

term set is increased/decreased.

Proof.

a) Reward. Assume that W i has been chosen for lth rule

Pkl ( j = i ) should increase. Note that in this case

that i ( k ) > 0 . Using (5) we have:

l

(16)

Hence there are at most M .2 n modifications per

iteration.

4- By (18) it should be understood that only the length of

subspaces il and l are modified. Although,

(18)

1- i is the index of the chosen term set for the

consequence part of the lth fired rule , i.e. B l = W i .

2- q is a counter which starts from i and ends to M.

Apparently, there is no need to update those

parameters which are not modified and q may starts

from i. This will indeed reduce the processing time.

3- There are M parameters for each rule in the rule base.

=

il (k )

l

(k )

il (k 1)

(19)

(k 1)

scalar. Using (9), (10), (11) and (12) in (19) we obtain:

a l (k ) ail 1 (k ) ail (k 1) ail 1 (k 1)

Pkl ( j = i ) = i

aml (k )

aml (k 1)

According to (18) the above equation yields:

a l + l (k ) ail 1 ail ail 1

Pkl ( j = i ) = i l

am + l (k )

aml

equation. It can easily be seen that:

il (k ) aml ail + ail 1

Pkl ( j = i ) =

aml aml + l (k )

This equation with regard to (8) implies that:

Pkl ( j = i ) > 0

b) Punishment. In this case r (k ) < 0 .

Hence:

il (k ) = max {il (k ), il (k )}

operators used in the above equation and (17) assures us that

update rule (18) does not yield to undefined sub-distance.

The max operators are used to satisfy (8-b).

scalar. The procedure is the same as part a.

A Ball and Plate system, aforementioned BnP, is an

evolution of the traditional Ball and Beam (BnB) system [13,

14]. It consists of a plate which its slope can be manipulated

in two perpendicular directions and a ball over it. The

behavior of the ball is of interest and can be controlled by

tilting the plate. According to this scheme, various structures

may be proposed and applied in practice. Usually an image

feedback is used to locate the position of the ball. However,

due to its less accuracy and slower sampling rate comparing

touch screen sensors (or simply touch sensors), we opt for a

touch sensor.

The implemented hardware structure in this project, as

outlined in Fig.2, consists of five blocks. However, the whole

system can be viewed as a single block which its input is a

two element vector, i.e. the target angles (20). The output

of the assumed block is a six element vector which contains

the position and velocity of the ball and the current angles of

the plate. Table I illustrates the related parameters.

TABLE I

PARAMETERS OF BALL AND PLATE SYSTEM AND THEIR UNITS

Symbol

Parameter

Unit

( xd , y d )

Pixel

( x, y)

Pixel

(vx , vy )

Pixel per

second

( x , y )

Angel step

Angel step

Angel step

(u x , u y )

Control signal

Angel step

ux

uy

Angel step

Angel step

u = x , y

(20)

Simple command of the ball. The objective is to place the

ball on any desired location on the plate surface starting

from an arbitrary initial position.

hardware specifications of the implemented BnP for this

project is given below done in hardware implementation of the

BnP in this project is stated. A complete or even brief

description of the efforts how we made this apparatus is

beyond scope of this paper. But a summary is needed to show

that the plant for RLFC is made roughly and there are many

inaccuracies that a classic controller will definitely be unable

to control the plant. Referring back to Fig.2 each block

The Actuating Block

The actuating block consists of high accuracy

stepping motors equipped with precise incremental

encoders (3600 ppr*) coupled to their shafts plus

accurate micro-stepping-enhanced drivers.

The original step size of the steppers is 0.9 degrees

and reducible by the drivers down to 1 200 of step

Taking into account the mechanical limitations, the

smallest measurable and applicable amount of

rotation is 0.1 degrees.

square separates the electronics section from the mechanical parts.

The sensor is a 15 inches touch-screen sensor.

Sensor output is a message packet sent through

RS-232 serial communication with 19200 bps.

Thus the fastest sampling rate of the whole sensor

block is 384 samples per second. This implies that

the maximum available time for decision making is

3

1

second.

384 2.604 10

which pressure can be sensed is 30.41 22.81cm.

The sensor resolution is 19001900 pixels. If the

sensor sensitivity is uniformly distributed on its

sensitive area, then each pixel is assigned to an area

approximately equals to 1.8 1.2 mm2 of the surface of

the sensor.

The Interface Block

The third and main section of the BnP system is its

electronic interface. This interface receives commands from

an external device in which the controller has been

implemented and then takes necessary corresponding actions.

Each decision made by the controller algorithm is translated

and formed into a message packet which is then sent to the

interface via a typical serial communication (RS232) or other

communication platforms. Then, the interface sends necessary

signals to actuators. In addition to some low-level signal

manipulation (such as applying low-pass filter to sensor

reading and noise cancelling), upon request from the main

controller, the interface sends current information, such as ball

position and velocity or position of the actuators, to the main

controller.

*

bps: bit per second. A measuring unit for serial communication.

In section II, RLFC was discussed completely. In this

section, necessary modifications for making it applicable to

control the implemented BnP system for solving the second

problem depicted in the previous section are expressed.

Fig.1 depicts the control architecture as well as signal flows.

With regard to Table I, illustrated signals are explained next:

Control signal vector:

T

u = u x

u y

(21)

s = x

u y

(22)

0 0]

(23)

ux

sd = [ xd

yd

Error vector:

e = e x

ey

ev x

ev y

(24)

Where:

ex = x x d , e y = y y d

And since we want to make the ball steady:

evx = vx , ev y = v y

ey

vx vy

S = i =1 ( n i + 1)

For our case study: S=32.

n

controller section of RLFC. Let us arrange them in x vector as

written in (25).

x = e x

graphically. Number, shape and distribution of the defined

term sets are chosen based on the logical sense and practical

experience. There is no obligation for the system not to

perform well if these selections are changed.

Assuming that the covering condition is satisfied, as it is the

6

case in Fig.3, in each iteration, there are at most 2 =64 fired

rules. However, if an extensive search is done among 4900

rules to discover fired rules, i.e. to calculate each premise

membership value and check whether it is zero or not, then the

processing time would grow out of tolerable bounds. Instead,

since we know the exact location of each term set, if we locate

each measured value of input variables on their corresponding

universe of discourses then those term sets with non-zero

membership value are discovered. Doing this procedure for all

six input variables, there will be at most two discovered term

sets for each of them. Hence the combinations of these termsets using AND operator directly shows us which rules are

fired. The mentioned locating procedure can be coded into a

software programming language using a set of conditional

statements. Apparently, if there are n defined term sets on a

particular universe of discourse then (n+1) conditional

statements are needed in order to discover the fired rules.

Hence instead of L arithmetic calculations, we are faced with

S logical comparisons, where:

(25)

(26)

term sets on the universes of discourse of the output variables.

There are M=51 triangular term sets which are uniformly

distributed over each universe of discourse. Fig. 4 illustrates

the location of these term sets.

(a)

variables.

Until now, the reinforcement signal is only defined to be an

(b)

(c)

evaluation of the system current situation generated by the

Fig. 3 shows the defined term sets on the universe of discourse of (a)

e x and e y , (b) v x and v y , (c) x and y . For the respective units refer to critic block. No other precise definition could have been given

Table I.

by now since the nature of this signal directly depends on the

On the universe of discourse of each input variable, a nature of the plant to be controlled. Referring back to Fig.1, it

specific number of term sets are defined. Let these numbers be is obvious that the input to critic is the Error Signal, (24).

n x , n y , nvx , nvy , n x and n y . According to (4), the fuzzy According to this figure the reinforcement signal is a function

of this vector:

rule base contains L rules where:

r = g (e ) = g (e x , e y ,v x ,v y )

(27)

L = n x .n y .nvx .nvy .n x .n y

Apparently, the critic is defined by g(.) and is the

For our very specific implementation following quantities

designers

choice. The only necessary condition is that this

are assigned to the above variables:

function should represent the current state of the plant as well

( nx = ny = 7, nvx = nvy = 5, n x = n y = 2 ) L = 4900

possible as it can be. A proposed general form of this function

is depicted in (28).

modifying bounds on

(28)

Note that the aim of RLFC is to minimize (28). In (28) three

coefficients are defined which are explained next.

cv is

the

position error. As

cv increases,

desired location.

cx

inevitable impreciseness of the mechanical structure of BnP.

According to this mutual interaction, the motion of the ball in

each direction is not only a function of the corresponding

angle of the plate. However, exact values of

cx

and c y are

then tuned experimentally or to be learnt.

C. Reinforcement Measure

Having proposed the reinforcement signal, we are seeking

for a suitable function to produce il ( k ) according to (17).

Equation (29) is a proposed function for (14):

r ( k )

sgn ( r (k ) )

r (k ) = f ( r ( k ) ) =

+

(29)

r (k )

rmax

This function consists of two terms, the first one scales the

pure reinforcement signal received from the critic, and the

second one tunes the learning sensitiveness when the plant is

around

the

target

state;

actions

receive

more

reward/punishment as they affect the plant state when it is

around the desired target state. Note that rmax can be easily

calculated using (28).

According to (29) and (15), the reinforcement measure is

given by (30) replaced in (17):

r (k )

sgn(r )

il = g. 1 nl ( k ) .

+

(30)

i

r (k )

e

rmax

D. Adding a Priori Knowledge

From a very general point of view, the proposed algorithm

is a search in the space of possible actions. However, it is

possible to add a priori knowledge in order to increase the

learning speed. To describe that, it would be a great help to

explain how the random selection of output term sets takes

place. Digital processors can produce uniformly distributed

random numbers and this is also used in RLFC. First, a

random number is generated by the processor and then it is

checked to which sub-distance (note equation (8)) it belongs.

Then the corresponding number of that sub-distanfce will be

the index of the chosen term set. Let the randomly generated

number for the lth rule be

j l l

(31)

following rule form:

ex is A11 AND ey is whatever AND

Apparently, this applies to a set of rules which the first and

third conditions are fixed as mentioned. By referring back to

the depicted term sets in Fig.3 and Fig.4, This clearly indicates

that the sensible choice for this condition is a high deviation in

j l [35,50]

(31)

the mentioned rules.

V. PERFORMANCE RESULTS OF RLFC ON BALL AND PLATE

After the modified RLFC is implemented it is experimentally

applied to the implemented BnP system. The illustrated graphs

in this section are the results of a series of experiments. In all

the experiments, the position of the ball versus time is

collected using the monitoring section of the implemented

BnP. Then these row data is modified by a sort of software

enhancing procedures to avoid huge amount of confusing

graphs. After omitting the time, the x position versus y is

obtained. Then, the obtained points corresponding to a series

of iterations are drawn on a single graph. Units of x and y are

pixels and the origin of the coordinating system is selected as

seen by the touch sensor. Each figure illustrates the touchsensitive area of the plate. Thus illustrates the 19001900

pixels of the touch sensor. Location of the ball is illustrated

approximately around the mean of the acquired points. Dark

areas on the figures illustrate presence of the ball over the

corresponding areas of the plate. The more each space of the

figure darkens, the more the ball was in the corresponding area

over all observed iterations. The target (desired) location of

the ball in all the illustrated experiences is the centre of the

plate.

In Fig. 5, improvement in behavior of the ball under control

of RLFC is shown. It is observed that after approximately

70000 iterations, an acceptable performance is obtained. Note

that the needed time for iterations normally varies. In our

experiments, 70000 iterations took around 20 minutes. Since

the performance of the system is satisfactory around 70000th

iteration, at this stage RLFC is regarded as a trained system.

However, since we do not omit the learning procedure

afterwards, the comments under the next figures do not

include the term trained. Instead, the 70000th iteration is

mentioned as a reference point for a good train of the system.

The control signals relating to the best performance illustrated

in Fig. 5 is shown in Fig. 6.

VI. CONCLUSION

The main idea of the discussed work was to propose a

human-like controller, capable of learning from the past and

own obtained experiences as well as embedding some prior

knowledge and reasonable facts. Although still far away from

an exact human-like behavior, the result of applying this

controller to a very complex and uncertain plant resulted in

satisfactory performance, especially when compared with a

good human behavior trying to control the same plant. There

are varieties of extensions and modifications to the proposed

Fig. 5. Improvement in behavior of the ball under RLFC control. Top left method, from the form of fuzzy IF-THEN rules to the method

figure corresponds to the first 15000 iterations, where initially a priori of tuning various defined parameters.

knowledge is embedded. Top right and bottom left figures are showing

improvement in performance. After approximately 70000 iterations,

bottom right performance is regarded as acceptable.

REFERENCES

[1]

located in the same place. To show that this is not a necessary

condition, in Fig. 7 another start point is chosen. It is seen that

since RLFC has not experienced these new states enough

before, at first it could not perform well. However, as the ball

touches a previously enough-experienced state (shown by an

arrow), the behavior comes under the control. The number of

iterations in this figure is about 3500.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

human, 10 individuals (all healthy, normal, and matured with

no apparent nervous or muscular disorder) were selected and

asked to control the implemented BnP system. Each individual

was allowed to try the system 10 times. Note that in all

experiences, the steppers are released and the individuals

controlled the plate by their own hands. This would indeed

omit the most complicated nonlinearity and impreciseness of

the system: the actuators. In Fig.8 the best performance is

illustrated which was the 7th try of the individual who

possessed the best control over the BnP system.

[10]

[11]

[12]

[13]

[14]

MIT Press/Bradford Books, Cambridge, MA, 1998.

L. P. Kaelbling, M. L. Littman, A. W. Moore Reinforcement

Learning: A Survey Journal of Artificial Intelligence. vol. 4, pp.

237-285, May 1996.

A. B. Rad, P. T. Chan, Wai Lun Lo, C. K. Mok An Online

Learning Fuzzy Controller IEEE Trans. Industrial Engineering, vol

50, no. 50, pp. 1016-1021, October 2003.

X. Fan, N. Zhang, S. Teng, Trajectory planning and tracking of ball

and plate system using hierarchical fuzzy control scheme Elsevier

Journal of Fuzzy Sets and Systems, vol. 144, pp. 297-312, 2003.

S. Awtar, K. C. Craig, Mechatronic Design of a Ball on Plate

Balancing System Proc. 7th Mechatronics Forum International

Conference, Atlanta, GA, 2000.

Humusoft Users Manual CE 151 Ball&Plate Apparatus,

Humusoft,

E. H. Mamdani, Application of fuzzy algorithms for simple

dynamic plant, Proceedings of IEE, vol. 121, pp. 1585-1588, 1974.

M. Bai, H. Lu, J. Su, Y. Tian, Motion Control of Ball and Plate

System Using Supervisory Fuzzy Controller, WCICA 2006, vol. 2,

pp. 8127-8131, June, 2006.

Wang, H. Tian, Y. Sui, Z. Zhang, X. Ding, Tracking Control of

Ball and Plate System with a Double Feedback Loop Structure,

ICMA 2007, August, 2007.

R. Bellman, Adaptive Control Processes: A Guided Tour, Princton

University Press, 1961.

L. X. Wang, A Course in Fuzzy Systems and Control , Prentice-Hall

International Inc.,1997.

A. G. Barto, Reinforcement Learning, The Handbood of Brain

Theory and Neural Networks, M. A. Arbib, Jaico Publishing House,

The MIT Press, Cambridge, MA, USA, 2006, pp. -804-809.

H. K. Lam, F. H. F. Leung, P. K. S. Tam, Design of a fuzzy

controller for stabilizing a ball and beam system, The proceedings

of IECON 1999, vol. 2, pp. 520 524.

E. Laukanen, S. Yurkovich, A Ball and Beam testbed for Fuzzy

identification and Control design, Proc. American Control Conf.,

June 1993, pp 665-669.

- ADKAR (S10)Diunggah olehALI
- FuzzyDiunggah olehSairaj Chidrupaya
- 8ijest-ng-vol2-no7-pp123-128Diunggah olehLaxminarayanaMurthy
- 3-s2.0-B978075063429850018X-mainDiunggah olehHarbey Alexander Millan Cardenas
- Verbal ModelsDiunggah olehNilesh Asnani
- Behavioral Learning TheoryDiunggah olehAda Gay Olandia Serencio
- 593-2657065Diunggah olehXavier Martínez Moctezuma
- Intelligent Analysis of the Effect of InternetDiunggah olehJames Moreno
- copyofle4managementsystemplaninformationDiunggah olehapi-328707616
- Griffin 10e PPTS Ch04Diunggah olehAbigail Suba
- List of TablesDiunggah olehVarun Kumar
- Full TextDiunggah olehKang Dae Sung
- Lecture 10Diunggah olehNael Nasir Chiragh
- sibDiunggah olehapi-252596802
- foundations of ind behaviour 1Diunggah olehSanika Yadav
- Fuzzy Applications in Industrial Engineering - Cengiz Kahraman.pdfDiunggah olehpongerke
- im_02.docDiunggah olehDostmuhammad01
- fuba-bip-data collectionDiunggah olehapi-296414448
- tws 9Diunggah olehapi-251798124
- Being-A-Human-at-Work.docxDiunggah olehAj Labiste
- pbs cmDiunggah olehapi-232387803
- 5Diunggah olehIJMER
- Session 8-9 SlidesDiunggah olehtheclueless
- 205336_engDiunggah olehAnonymous Z1J96lDCX
- psyc300b researchpaperDiunggah olehJanelyn De Boer
- TODIM Method for Group Decision Making under Bipolar Neutrosophic Set EnvironmentDiunggah olehMia Amalia
- A Suggested Methodology for the Application of Parent Training InDiunggah olehakorincic
- A Probabilistic Neural-Fuzzy Learning System for Stochastic ModelingDiunggah olehGelareh MP
- A Modified Fuzzy C Means Clustering using Neutrosophic LogicDiunggah olehAnonymous 0U9j6BLllB
- Counter ConditioningDiunggah olehiulia9gavris

- Issues of Switches, Vlan, Etherchannel, Hsrp, Nat and DhcpDiunggah olehEmerson Barros Rivas
- Microsoft.pressReleases.windows1Diunggah olehLuis
- Oracle AIX+Tuning+1Diunggah olehvijay729
- SINAMICS S120 S120 AC Drive Www.otomasyonegitimi.comDiunggah olehwww.otomasyonegitimi.com
- Journal ManagementDiunggah olehvasanthisundari
- How to Site2site With Safe@ FortinetDiunggah olehkhinhkha
- Actreg Act 200Diunggah olehRafael Lovato
- How to Convert Files to Binary FormatDiunggah olehAhmed Riyadh
- Lazarus and SQLiteDiunggah olehFrancis JS
- Underground Cable ScreenDiunggah olehMilan Djumic
- EXPERIMENT NO 3(i) Demonstration of use and care of hand tools..pdfDiunggah olehNaiyar Mankad
- ex1Diunggah olehAndre Septian
- SDR Solution for MAKO Bermuda Project R0.1Diunggah olehIng Abraham J Tiberio L
- System Sensor Scw Clr Alert Data SheetDiunggah olehJMAC Supply
- Micro solarDiunggah olehSusan Pinto
- ztyy_logDiunggah olehMuhammad Firdaus
- strelDiunggah olehAhmad Bas
- Spy Robot for Video Surveillance and Metal DetectionDiunggah olehIJRASETPublications
- answer_Quiz I-CN.docxDiunggah olehVanjul Christian Hutajulu
- KPIT Cummins Model Question PapersDiunggah olehSriranga P Bijapur
- Frequency MeasurementDiunggah olehGustavo Rios
- 03-14 Liga Resolution Computer Canobas (1)Diunggah olehJimell Obrador
- Lab 1 Assignment.pdfDiunggah olehMathew Musallam
- Tutorial Sheets UEE 801Diunggah olehSouvik Ganguli
- Class 3 1Diunggah olehRadhika Govindan
- SINUMERIK_SYSTEM_VARIABLES.pdfDiunggah olehJose Valencia
- Manual Siem 3010 v2.0Diunggah olehairtelporto
- GeneXus X Evolution2 White PaperDiunggah olehLuis Antonio
- Smart Note TakerDiunggah olehAnonymous gXIp9g
- Capitulo 6 Redes 4Diunggah olehLesliepliz

## Lebih dari sekadar dokumen.

Temukan segala yang ditawarkan Scribd, termasuk buku dan buku audio dari penerbit-penerbit terkemuka.

Batalkan kapan saja.