Anda di halaman 1dari 11

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO.

3, JULY 2017 447

Online Learning Control for Harmonics Reduction


Based on Current Controlled Voltage
Source Power Inverters
Naresh Malla, Student Member, IEEE, Ujjwol Tamrakar, Student Member, IEEE, Dipesh Shrestha, Student
Member, IEEE, Zhen Ni, Member, IEEE, and Reinaldo Tonkoski, Member, IEEE

AbstractNonlinear loads in the power distribution system for harmonic reduction, owing to the recent advancements in
cause non-sinusoidal currents and voltages with harmonic com- power electronic switches and their guaranteed performance
ponents. Shunt active filters (SAF) with current controlled voltage over a wide range of dynamic operating conditions unlike
source inverters (CCVSI) are usually used to obtain balanced and
sinusoidal source currents by injecting compensation currents. passive filters [4], [5].
However, CCVSI with traditional controllers have a limited
transient and steady state performance. In this paper, we propose
an adaptive dynamic programming (ADP) controller with online
learning capability to improve transient response and harmonics.
The proposed controller works alongside existing proportional
integral (PI) controllers to efficiently track the reference currents
in the d q domain. It can generate adaptive control actions to
compensate the PI controller. The proposed system was simulated
under different nonlinear (three-phase full wave rectifier) load
conditions. The performance of the proposed approach was
compared with the traditional approach. We have also included
the simulation results without connecting the traditional PI
control based power inverter for reference comparison. The
online learning based ADP controller not only reduced average
total harmonic distortion by 18.41 %, but also outperformed
traditional PI controllers during transients.
Fig. 1. Microgrid with non-linear loads and shunt active filter connected to
Index TermsAdaptive dynamic programming (ADP), current
controlled voltage source power inverter (CCVSI), online learning reduce harmonics.
based controller, neural networks, shunt active filter (SAF), total
harmonic distortion (THD). The total harmonic distortion (THD) of a current signal
is a measurement of the harmonic distortion present and is
defined as the summation of all the harmonic components
I. I NTRODUCTION
of the current waveform compared against the fundamental

E LECTRICAL energy is expected to contribute to about


60 % of all world energy consumption by 2040 [1]. Power
electronics based equipment in data centers and in commercial
component of the current waveform. The IEEE Std. 519
recommendations for harmonic control in power systems
states that the THD of the source current should be less
and industrial applications as shown in Fig. 1 are with non- than 5 % [6]. The standard also has stringent requirements
linear nature. Hence, they draw harmonic currents from the on individual harmonic components. The basic concept
power grid and can cause power quality issues. Some of the behind a SAF is to compensate the higher order harmonic
major issues related to current harmonics include vibration in currents and the reactive power demand of the load
motors, generator burnouts, and computer network failures to through a power electronics based current controlled voltage
name a few [2], [3]. Among the different approaches, shunt ac- source power inverter (CCVSI), hereinafter referred to as
tive filters (SAF) have become a particularly popular solution power inverter. Thus, the source only needs to provide
Manuscript received October 26, 2016; accepted March 30, 2017. This work the fundamental component of the load current [7]. The
was partly supported by Microsoft Inc. Recommended by Associate Editor control structure of a SAF has two control loops: 1) an
Qinglai Wei. (Corresponding author: Zhen Ni.) outer loop that estimates the current that the inverter needs
Citation: N. Malla, U. Tamrakar, D. Shrestha, Z. Ni, and R. Tonkoski,
Online learning control for harmonics reduction based on current controlled to inject/absorb to compensate the load harmonics and
voltage source power inverters, IEEE/CAA J. of Autom. Sinica, vol. 4, no. 3, the reactive power, and 2) an inner loop responsible for
pp. 447457, Jul. 2017. tracking the generated reference current. The outer current
The authors are with the Electrical Engineering and Computer Science
Department, South Dakota State University, Brookings, SD 57007, USA loop is typically implemented using either the instantaneous
(e-mail: naresh.malla@jacks.sdstate.edu; ujjwol.tamrakar@jacks.sdstate.edu; reactive power theory [8], or the synchronous reference frame
dipesh.shrestha@jacks.sdstate.edu; zhen.ni@sdstate.edu; tonkoski@ieee.org). theory [9]. The inner current control loop is traditionally
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. implemented using a proportional-integral (PI) control based
Digital Object Identifier 10.1109/JAS.2017.7510541
448 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 3, JULY 2017

approach. However, the dynamics of the PI controller are the current controller lacks the online learning capability. In
inadequate because of the limited bandwidth of the controller. this paper, we propose an integration of the online learn-
As a result, the SAF may not give the required performance ing ADP control approach for the power electronic inverter
when the loads have a high degree of non-linearity [10]. to improve the harmonic compensating capability and the
This has led to the development of a number of machine transient performance. The model-free ADP control does not
learning based adaptations to improve the performance of rely on an accurate mathematical model of the system or
the traditional control in power inverters. It also motivated the pole-zero placement design as in traditional controllers.
us to design an online learning based controller to improve Thus, the mathematical model descriptions of the benchmark
the performance of the power inverter in terms of the THD. system are not required. The ADP controller is a purely data-
Artificial neural network (NN) techniques and classical back- driven control approach which observes the system states and
propagation algorithms have been used to improve the control outputs the control actions over time. In addition, the ADP
performance of the SAF in [2], [7], [11]. However, the imple- controller has learning and adaptation capabilities. The key
mented algorithm lacks online adaptation and the performance parameters in the ADP controllers will be adjusted online for
under varying load conditions has not been quantified. In [12], the disturbances and changing of loads.
NNs were used in the outer loop to improve the estimation The major contributions of this paper are summarized as
of compensating currents. Although the THD was reduced follows: 1) A multiple-input multiple-output (MIMO) online
within the IEEE Std. 519 limits, the inner control structure still learning control system is developed based on the ADP design.
used a hysteresis based current control approach which has its A series of time-delayed current error signals are designed as
own disadvantages related to high switching frequency and input for the ADP controller, which outputs compensating con-
difficulty in the filter design. In [13], recurrent neural network trol actions (one is for d-axis and the other one is for q-axis)
(RNN) training with the Levenberg-Marquardt algorithm is for the current controller in the power inverter. In addition, an
used for optimal control of a grid-connected inverter, but this appropriate reinforcement signal has been designed with the
does not involve online training and requires a well-trained delayed tracking errors in the power inverter. Based on this,
RNN controller. Other optimal control techniques like the the ADP controller will generate compensating control actions,
model predictive control with selective harmonic elimination which guarantees its success. 2) This proposed approach
is proposed in [14] for multilevel power converters which has been applied in two challenging case studies. First, the
resulted in a fast dynamic response. However, these techniques proposed system was tested for a three-phase full wave bridge
are highly dependent on the accuracy of the system model and rectifier with a resistive load (non-linear load). The current
are prone to limited transient and steady state behavior due to drawn by such a load shows a high degree of non-linearity.
parameter uncertainties [15]. The ADP controller improved the bandwidth of the inner
All of the aforementioned literature indicates the need current control loop leading to efficient tracking of the currents
for developing an online learning approach to improve the and also an average reduction by 18.41 % in THD. Next, the
dynamic performance of the inner current control loop, hence proposed system was subjected to a sudden step change in the
improving the harmonic compensating capability of power load. Furthermore, the system was tested under different load
inverters. An adaptive dynamic programming (ADP) controller conditions. The system obtained a better transient and steady
has been proposed to address these issues in recent power and state performance (e.g., faster response, lower overshoot, and
control communities. It consists of two NNs based architecture efficient tracking) compared to a traditional PI controller based
consisting of an action network and a critic network. The system.
critic network is designed to evaluate the online learning based The rest of the paper is organized as follows: Section II
controller, while the action network produces an online learn- describes the system configuration and benchmark used for the
ing based control signal to improve the control performance simulation. Section III illustrates the algorithm used for ADP
[16][23]. The ADP controller has shown promising results applied here as the online learning based controller. Section IV
on a number of power system control examples [24][35]. compares the performance of the ADP controller with the
For example, the concept of the supplementary ADP control traditional PI controller for two case studies. Finally, Section
has been used to enhance power system stability in [25], [26], V concludes the paper.
[36], [37]. In [20], [27], [38], a similar control approach is
incorporated into the generators frequency control loop to
II. B ENCHMARK AND S YSTEM C ONFIGURATION
improve the frequency response of the system and smooth out
the variations due to large scale wind, photovoltaics (PVs) A simplified diagram of the SAF configuration is shown
and/or electric vehicles (EVs) integration. in Fig. 2. The non-linear load is connected to the three-
Yet, only a few papers have investigated the power electron- phase AC source. The non-linear load considered in this
ics pulse width modulation (PWM) current control through paper is a three-phase diode bridge rectifier, followed by a
the ADP based controller concerning the difficulty involved resistance, RL . In order to supply the harmonics and the
as mentioned in [39]. References [39][42] proposed and reactive power consumed by this non-linear load, the SAF is
validated the vector control of a grid-connected rectifier/in- connected in parallel to the system. The SAF thus provides the
verter using an artificial neural network and back-propagation harmonic current demand of the load and as a consequence
through time weights updating rule. However, the system only sinusoidal currents with an almost unity factor would be
was not implemented for harmonic reduction applications and delivered from the source.
MALLA et al.: ONLINE LEARNING CONTROL FOR HARMONICS REDUCTION BASED ON 449

the low frequency and the high frequency components of


instantaneous reactive power. The instantaneous active power
consumed by the load (p + p) is passed through a low pass
filter (LPF, butterworth type) to filter out the high frequency
harmonics component (p). Next, the output of the LPF is
deducted from the original instantaneous active power to give
just the high frequency component (p). This combination is
essentially a high pass filter (HPF) that extracts only the
high frequency components of the load active power. This
is the reference compensating active power (pref ) for the
SAF. In order to compensate all the reactive power demanded
by the load and to maintain the unity power factor in the
Fig. 2. Schematic diagram of a shunt active filter connected to source and source current, all the calculated instantaneous reactive power
non-linear load for compensation of harmonics. becomes the reference compensating reactive power (qref ) for
the SAF. There is one extra control loop which compensates
The SAF generates the compensation current, iC . The sum the active power loss in the capacitor to maintain constant
of the source current, iS and iC is the non-linear load current, DC voltage across it. The DC voltage across the capacitor is
iL . The coupling inductor is used to suppress the high- compared with the standard voltage (400 V): the error is then
frequency currents originating from the switching of power fed to a PI controller which generates the compensating active
inverter. There are two main control loops associated with power (ploss ) to maintain the constant voltage across the DC
the SAF control algorithm: an inner current control loop capacitor. For the PI controller design of this extra control
and an outer harmonic current estimation loop. The inner loop, the inverter is considered to be a constant current source
current control loop tracks the reference current in the d-q for the DC side of the voltage controller. Thus, the transfer
frame using the PI controller. The outer control loop measures function of the plant for the DC side controller is as follows:
the load current and the source voltage and generates the
1
reference active and reactive power to be compensated by VDC
the SAF using instantaneous p-q theory as shown in Fig. 3. = CVCO2 (5)
ploss s + RC
First, the instantaneous three-phase source voltages (vga , vgb
and vgc ) and the three-phase load currents (iLa , iLb and iLc ) where ploss is the compensating active power loss in the
are measured and transformed to --0 coordinates (iL0 , iL , capacitor, VDC is the DC bus voltage across the capacitor, C is
iL , vg0 , vg and vg as shown in Fig. 3) using the Clarkes the rated capacitance, VCO is the reference capacitor voltage,
transformation as follows: and R is the resistance parallel to the capacitor.
r 1 1 1
Next, the reference active power loss for the DC voltage
vg0 2 2 2 vga
vg = 2 1 1 controller is added to the reference active power calculated
1 2 2 vgb (1)
3 earlier from the HPF. Thus, the reference instantaneous active
vg 0 3
2 3 vgc
2 and reactive power are calculated using:
r 1 1 1

iL0 2 2 2 iLa iref 1 vg vg p ploss
iL = 2 1 1
1 2 2 iLb . (2) = 2 (6)
3 iref v + v2 vg vg q
iL 0 3
3 iLc
2 2
where iref and iref are the reference compensating currents
in the --0 coordinate system. Since the PI controller cannot
track sinusoidal currents, the reference current in the -
-0 coordinate is converted into d-q coordinates using the
following relation:

Idref cos(t) sin(t) iref
= (7)
Iqref sin(t) cos(t) iref
Fig. 3. Current reference generation (iref and iref ).
where t is the synchronous reference frame angle, and Idref
Next, these currents and voltages in --0 coordinate are and Iqref are reference currents in the d-q coordinate system.
used to calculate the instantaneous active (p) and reactive Finally, the reference current in d-q coordinates is passed
power (q) consumed by the load as follows: to a standard tuned PI controller designed with a switching
p = vg iL + vg iL = p + p (3) frequency of 20 kHz, a cut-off frequency of 4 kHz and a phase
margin of 45 (tuned by the methods described in [43], [44])
q = vg iL vg iL = q + q (4)
which generates the modulating signal for the PWM generator
where p and p are the low frequency and the high frequency of the inverter. The cut-off frequency is kept relatively higher
components of instantaneous active power, and q and q are in this case to increase the PI controllers bandwidth.
450 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 3, JULY 2017

Fig. 4. Overall diagram of online learning based ADP controller for the inner current control loop in the power inverter (Fig. 2).

The benchmark simulation model was developed and tested N th harmonic order corresponds to the Nyquist frequency.
in MATLAB\Simulink. A three-phase three-wire balanced The Nyquist frequency is half the sampling frequency of the
system is considered in this paper. A three-phase bridge diode selected signal. In our case, the sampling frequency is 1 MHz
rectifier with a DC load of 700 W (base case) is used as and the switching frequency is 20 kHz. Thus, the Nyquist
the non-linear load. The load was simulated using six diodes criterion is satisfied.
configured in three-phase bridge configuration as illustrated in
Fig. 2. The output of the diode-bridge was then connected to III. P ROPOSED C ONTROLLER D ESIGN FOR
resistor RL . The value of RL was obtained using the simple B ENCHMARK S YSTEM
equation: The basic idea in an adaptive-critic design is to adapt the
V2 weights of the critic network to approximate the optimal cost
P = r (8)
RL function, J (X(t)), satisfying the modified Bellman principle
where Vr is the output voltage of the rectifier which in our of optimality [45], given by:
case is 272 V. The power inverter is rated at 1 kW. A capacitor J (X(t)) = min{J (X(t + 1)) + r(X(t)) Uc }. (10)
rated at 400 V DC is used as the energy storage device. The u(t)
switching frequency of the PWM inverter is 20 kHz. The grid
side impedance is set as: Zsource = (0.01 + j0.00038) and The optimal online learning based controller can be written
load side impedance is set as: Zload = (0.01 + j3.8) . The as:
parameters used for the simulation of the benchmark are given u (X(t)) = arg min J (X(t)) (11)
u(t)
in Table I.
TABLE I where X(t) is the input state vector, r(X(t)) is the imme-
B ENCHMARK S YSTEM PARAMETERS diate cost incurred by u(t) at time t, and Uc is ultimate
desired objective which we want our cost function to achieve.
Parameter Value
This equation cannot be analytically solved in general, thus
1. Inductance of coupling inductor 10 mH
the problem of optimal current control needs to be solved
2. Internal resistance of coupling inductor 0.1
iteratively and approximately by using ADP [46][49]. In
3. Resistance of grid 0.01
4. Inductance of grid 1 H
ADP, the action network generates the optimal control action
5. Load side resistance 0.01 iteratively and the critic network evaluates the performance
6. Load side inductance 10 mH of action network by approximating J close to the optimal
7. Capacitance of DC bus capacitor 450 F solution. J is also the output of the critic network. The
connection detail for the inner current control loop with pro-
In this paper, we used THD as the performance measure- posed online based ADP controller is shown in Fig. 4 with the
ment to evaluate the success of the control approaches. The signal flow from left to right. The errors in d-axis and q-axis
THD is computed as: feedback currents (Id and Iq ) are fed to both the PI and ADP
p controllers whose output combination produces the appropriate
I22 + I32 + I42 + + IN
2
T HD = (9) modulating signal for the PWM inverter. The gate signals for
I1 the inverter are generated using this modulating signal after
where I2 , I3 , I4 , . . . , IN are the root-mean-square (RMS) considering the cross-coupling terms LIq and LId , and
value of the harmonics 2, 3, 4, . . . , N respectively and I1 is adding feed-forward terms Vd and Vq . This signal is then
the RMS value of the fundamental source current. Here, the transformed from d-q to a-b-c coordinates using the inverse
MALLA et al.: ONLINE LEARNING CONTROL FOR HARMONICS REDUCTION BASED ON 451

Parks transformation. A LPF is added before PWM generation during the steady state conditions. The objective function to
to avoid potential high frequency noise in the modulating be minimized in the critic network is given by:
signal by the ADP. Here, the online learning based control 1
action produced by ADP is defined as: u(t) = [ud (t), uq (t)], Ec (t) = e2c (t). (14)
2
where ud (t) and uq (t) are supplementary control actions for
Thus, the critic network can be represented as:
the outputs of d-axis and q-axis PI controllers respectively.
There are two paths to tune the parameters of the two types J = nnc (X, u, wc ). (15)
of networks in ADP which will be discussed below.
The weight update algorithm by a gradient descent rule for
the critic network is given as follows:
A. Critic Network wc (t + 1) = wc (t) + wc (t) (16)

The critic network is a three-layer neural network with 12 Ec (t)
hidden neurons. The architecture of the critic network is 8- wc (t) = lc (t) (17)
wc (t)
12-1 as shown in Fig. 5. The inputs to the critic network are Ec (t) Ec (t) J(t)
the measured system state vector, X(t), and action network = (18)
wc (t) J(t) wc (t)
output, u(t). Here, J(t) is the output of the critic element and
the J function approximates the discounted total reward to go. where lc (t) > 0 is the learning rate of the critic network at
time t, which usually decreases with time to a small value,
and wc is the weight vector in the critic network.

B. Action Network
The action network has a similar multi-layer perception
neural network architecture as the critic network. However,
input neurons and output neuron numbers are different. The
inputs to the action network are the measured system state
vector, X(t), and the output of the action network is the online
learning based control signal, u(t). The principle in adapting
the action network is to indirectly back-propagate the error
between the desired ultimate objective, denoted by Uc , and
the approximate J function from the critic network. Since we
have defined 0 as the reinforcement signal for success, Uc is
set to 0 in our design paradigm. In the action network, the
state measurements are used as inputs to create a control as
Fig. 5. Critic neural network with 8 inputs, 12 hidden neurons, and 1 output
the output of the network. In turn, the action network can
neuron.
be implemented by either a linear or a non-linear network,
The prediction error for the critic network is given by: depending on the complexity of the problem. The weight
updating in the action network can be formulated as follows.
ec (t) = J(t) [J(t 1) r(t)]. (12)
ea (t) = J(t) Uc (t). (19)
The reinforcement signal, r(t) for the critic network is
defined as follows: The performance error measure to be minimized in the
action network is given by:
r(t) = c(a1 x21 + a2 x22 + a3 x23 + a4 x24 + a5 x25 + a6 x26 ) (13) 1
Ea (t) = e2a (t). (20)
2
where c, a1 , a2 , a3 , a4 , a5 and a6 are the coefficients of this
quadratic equation. Thus, the action network can be represented as:
x1 = error signal between Id and Idref u = nna (X, wa ). (21)
x2 = one-time step delayed error signal for Id
x3 = two-time step delayed error signal for Id The weight update algorithm by a gradient descent rule is
x4 = error signal between Iq and Iqref given as follows:
x5 = one-time step delayed error signal for Iq wa (t + 1) = wa (t) + wa (t) (22)
x6 = two-time step delayed error signal for Iq
Ea (t)
X(t) = [x1 , x2 , x3 , x4 , x5 , x6 ]. wa (t) = la (t) (23)
wa (t)
Ea (t) Ea (t) J(t) u(t)
The above delayed error signals (1 time-step and 2 time-step = (24)
delayed signals) are fed into the ADP controller in order for it wa (t) J(t) u(t) wa (t)
to work properly. This would ensure that the high magnitude where la (t) > 0 is the learning rate of the action network at
of compensating control actions are produced during the time t, which usually decreases with time to a small value,
transients, and less magnitude of compensating control actions and wa is the weight vector in the action network.
452 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 3, JULY 2017

C. Training Procedures for ADP Controller


The objective of the ADP controller is to provide an optimal
control signal that exhibits fast dynamics in tracking Id and
Iq current references. The weights of the two networks are
updated at most Na and Nc times for action, and critic
networks respectively within each time step. It will also be
stopped once the internal training error threshold Ta and Tc
have been met, using the gradient descent algorithm described
in Section III. The typical learning process of the ADP
controller includes two trials as described in [24]. In the first
trial, we initialized the NN with random weights. We repeated
the simulation until we got no further improvement in THD.
At the end of this process, the ADP would learn a considerable
amount of information about the system and state-action pairs.
This is called offline learning. In the second trial, or online
learning, fully trained weights are used. The online training
procedure for the ADP controller is shown in Algorithm 1. Fig. 6. The training process of ADP controller. (a) Weights trajectories from
The weights evolution of the ADP controller is saved for 6 inputs to 1 hidden node in action network. (b) Weights trajectories from 12
the run with the best performance in terms of THD. Fig. 6 hidden to 1 output node in action network. (c) Trajectory of output of critic
shows the trajectories for a typical training process. Fig. 6 (a) network J(t). (d) Reinforcement signal r(t) during the training process.
shows the weight evolution of the action network from 6 input
nodes to 1 hidden node. Similarly, Fig. 6 (b) shows the weight the weights estimates are uniformly ultimately bounded
evolution of the action network from 12 hidden nodes to 1 (UUB) using the Lyapunov stability construct. In [51],
output node. Here, the power inverter and ADP are connected the proportional-integral-derivative (PID) control rule is in-
at 0.11s at which there is a major adjustment of the weights corporated into neural networks (NNs) and new results of
because of transients in the system. In addition, Fig. 6 (c) UUB are provided using a Lyapunov stability construct. The
shows the convergence of the J(t) function, and Fig. 6 (d) monotonic convergence of optimality is discussed for the goal
shows the convergence of the reinforcement signal, r(t). The representation ADP control design in [52], and theoretical
convergence of both wa1 and wa2 indicates the convergence proof of convergence is given in terms of both the internal
of the learning process. The weights are saved at the end of reinforcement signal and the performance index. In [53][56],
the simulation process and fine-tuned until there is no further stability of the ADP controller is presented, where the au-
improvement in the THD. thors demonstrated the theoretical analysis that the estimation
errors of NN weights are UUB by the Lyapunov stability
construct. We can follow the similar technique to conduct
Algorithm 1 Online learning process of ADP controller
the analysis. We follow the similar algorithm in [53], and set
1 Initialize ADP with i = 0, online learning based control action,
our parameters accordingly. From Corollary 4.4 in [53], the
u(0) (t) = [0, 0], use weights from offline training, and J (0) = 0. error in the NN weights estimate are UUB, provided that the
2 Apply u(i) (t) in addition to the output of PI controller in power conditions are met. For our case, we set the discount factor,
inverter and collect data for state, X(t) from 3 most recent = 0.95, number of hidden nodes for both action and critic
samples for ADP controller. network = 12, and learning rates, la = lc = 0.01. With these
3 Obtain the cost function J (0) (t) by using (15). parameters, all criteria as described by Corollary 4.4 in [53]
4 Obtain the online learning based controller function, u(i) (t) by are met. Thus, estimation errors of NN weights are UUB for
using (21). the ADP controller designed in this paper. Experimentally, we
5 Update critic and action NN weights using (16) and (22), can observe from Fig. 6 that the weights and parameters of the
respectively. ADP controller are quickly converged and bounded after the
system transients.
6 i = i + 1. Repeat from Step 2 to 5 with until end of simulation.
7 Observe THD and repeat from Step 1 to 6 until no further IV. S IMULATION R ESULTS
improvement. The performance of the learning control approach using
8 Save NN weights. ADP was accessed by testing the integrated controller and the
benchmark with the ADP embedded in the S-function block
in Simulink. Two case studies were performed: the first case
D. Stability of ADP Controller study was focused on the improvement of current harmonics
There are several major approaches to analyze the stability by connecting the power inverter with the traditional PI
of ADP controller. A detailed Lyapunov stability analysis of controller in an inner current control loop, and the subsequent
the ADP based controller is presented in [50] to support the performance improvement by the ADP. The second case study
ADP structure from a theoretical point of view. The authors was a performance comparison between the traditional PI and
demonstrated that the auxiliary error and the error in the ADP based control approach during a sudden step change
MALLA et al.: ONLINE LEARNING CONTROL FOR HARMONICS REDUCTION BASED ON 453

in load as well as under different loading conditions. Note Id : PI and Id : PI + ADP in Fig. 7 (a), and Iq : PI and
that, without power inverter in the following sections refers Iq : PI + ADP in Fig. 7 (b), respectively. Idref and Iqref
to the system without the SAF connected in the system. PI represent the reference current to be tracked. When the power
is used for the traditional PI controller in inner loop of the inverter was connected at 0.11 s, the initial charging of the
power inverter and PI + ADP is used for the integrated PI capacitor caused transient behavior for a few cycles as shown
and ADP controller in the inner loop of the power inverter. in Fig. 7. In this case with the traditional PI controller, the
THD of the source current was 3.69 %. We can also see
A. Parameters Setup for ADP Controller that proposed online learning based controller has a lower
overshoot and more efficient tracking than the PI controller
The following parameters were adjusted for training of the
which reduced the THD in the source current to 2.70 %.
ADP: Inputs were normalized in the range of [1, 1] after
dividing the d-axis error by 3 and the q-axis error by 2 since The reason why the ADP gave better results than the tradi-
the input current was assumed to vary in the range of 0 to tional PI controllers can be explained with the help of Figs. 7
3 A for a 1 kW inverter. Outputs were amplified by a factor and 8. The ADP controller generated adaptive control actions
of 100 for the d-axis and 66.67 for the q-axis since output of for the d-axis and q-axis currents, represented by ud and uq
the PI controller varies from 0 to 400. Here, 25 % adjustment respectively in Figs. 8 (a) and (b). The transient dips in the
was assumed for the online learning based ADP controller. d-axis and q-axis currents were reduced by the compensating
The coefficients of (13) were set as follows: c = 1, a1 = 0.4, control actions generated by the ADP controller from 0.11 s to
a2 = 0.2 and a3 = 0.04, a4 = 0.4, a5 = 0.2 and a6 = 0.04. 0.111 s as shown in Figs. 8 (a) and (b). The tracking errors for
For both action and critic network, weights were initialized in Iq are clearly visible at 0.1108 s and 0.1135 s in Fig. 7 (b). The
the range of [0.1, 0.1]. ADP controller produced appropriate compensating control
actions as shown in Fig. 8 (b) to reduce these tracking errors.
The following additional parameters are set in ADP. These control actions were added to the output of the PI
Nc : internal cycle of the critic network; controller so that the appropriate gate signals can be produced
Na : internal cycle of the action network; for the power inverter. This reduced the error between Iq and
Ta : internal training error threshold for the action network; Iqref . This error signal was also set as one of the components
Tc : internal training error threshold for the critic network; for the reinforcement signal (represented by x4 in (13)), and
la : learning rate of the action network = 0.01 for training this helped the ADP controller to generate proper control
and la (f ) = 0.001; actions. Similar reasons apply for Id as well.
lc : learning rate of the critic network = 0.01 for training and The THD of 3.69 % with the PI only was due to the non-
lc (f ) = 0.001. efficient tracking of reference Idref and Iqref currents as shown
For the training process, after the above parameters were set in Fig. 7. With the integration of the ADP controller, the
in MATLAB inside the S-function block, the ADP controller tracking became efficient. Thus, the source current became
was connected as shown in Fig. 4 to the power inverter shown sinusoidal with less harmonic components. Thus from (9), the
in Fig. 2. Then, these trained weights were used for improving THD was improved with the ADP controller in comparison to
the transient and steady state operation for both of cases. traditional PI controllers. This was the reason why the ADP
If major system parameters changes, normalization of input gave improved total harmonics in terms of the THD during
and amplification of output might change or ADP might need the step response and changing of non-linear loads.
training again.
C. Case Study II: Sudden Step Change in Non-linear Load
B. Case Study I: Three-phase Full Wave Bridge Rectifier With and Different Loading Conditions
Resistive Load The same trained weights used for ADP from the case
A three-phase diode bridge rectifier was used as the non- study I were used for case study II as well. The non-linear load
linear load (described in Section II) as shown in Fig. 2 for which was for case study I was reduced from 700 W to 400 W
case study I. The 700 W load was simulated using 105 at 0.35 s to observe the transient performance of the proposed
load resistance from (8). The three phase AC source with a controller. This was conducted by changing the value of the
frequency of 60 Hz and line to line root-mean-square (RMS) output resistance RL from 105 to 192 at 0.35 s. The ADP
voltage of 208 V was connected to this load. Then, THD controller generated the similar compensating control actions
was calculated by using fast fourier transform (FFT) analysis as Figs. 8 (a) and (b) during the step change in load. Thus,
tool of powergui block in Simulink for 12 cycles of source the online learning based ADP controller showed consistent
current waveform. Here, equation (9) with a sampling time performance and improved transients as well, which can be
of 1s was used for THD calculation. The current (iS ) seen in Fig. 9. Figs. 9 (a) and (b) show the zoomed-in plots
supplied by the source was distorted and had a high THD for the comparison of the transient performance of active and
of 25.21 % before connecting the power inverter. Fig. 7 shows reactive power supplied by the source with the power inverter
the performance comparison between the PI controller and connected at 0.11 s. This type of oscillation in the active
the PI + ADP controller with a tracking curve for the d-axis and reactive power is normal in the case of power electronic
current, Id , and q-axis current, Iq . These are denoted by circuits. However, the adaptive control approach improved
454 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 3, JULY 2017

Fig. 7. Performance comparison between two control techniques: PI and PI + ADP with tracking curve for (a) direct axis current Id , and (b) quadrature axis
current Iq .

Fig. 8. Control actions generated by ADP for (a) direct axis current Id , and (b) quadrature axis current Iq .

Fig. 9. Comparative study of PI and PI+ADP controller for (a) Active power transients, and (b) Reactive power transients because of the power inverter
connected at 0.11 s. These figures are zoomed-in to show superior performance of the ADP controller.

the transients in power drawn from the source. The proposed power for the power inverter connected system were reduced
system with an online learning based ADP controller per- by the integration of our proposed online learning based ADP
formed well during such conditions. Because of the efficient control system which can be seen in Fig. 9 (b) after 0.11 s.
tracking of Id and Iq , the fluctuations and spikes in the reactive The proposed online learning based controller enabled the
MALLA et al.: ONLINE LEARNING CONTROL FOR HARMONICS REDUCTION BASED ON 455

Fig. 10. Performance comparison of PI and ADP under different loading conditions.

Fig. 11. Comparison of dominant harmonics for PI and ADP under load = 100 W. The blue dashed line shows IEEE Std. 519 limits.

most efficient tracking of both Id and Iq current in comparison To observe a clear difference between the PI controller and
to the PI controller alone. Thus, the fluctuations in the active ADP, the worst case among all the above load conditions
and reactive power for the PI controller were reduced by the described previously (i.e., for a load of 100 W) was consid-
integration of ADP. The system performed well even during ered for detailed individual harmonic analysis. The individual
a sudden load change from 700 W to 400 W at 0.35 s. THD harmonics of the PI and ADP controllers were compared in
of the system was reduced for 400 W load from 5.37 % (PI Fig. 11. The 3rd harmonics is not presented here since it was
controller only) to 4.33 % by online learning based ADP. close to 0 in all cases. It can be seen that ADP outperformed
PI even in this worst-case condition and for different harmonic
We went further to try different load conditions from 700 W
spectrums as well. According to the IEEE recommendations
to 100 W; however, we used the same ADP approaches.
for harmonic control in power systems defined in IEEE Std.
Fig. 10 shows the THD comparison of the source current for
519 [6], requirements on individual odd harmonic components
above mentioned controllers under different loading condi-
of the source current were satisfied by the PI controller up to
tions. Without the power inverter connected, the THD was
the 19th harmonics i.e., it should be less than 1.5 %. However,
poor (above 25 %) for all loading conditions. Because, the
for the 23rd to 31st harmonics, the PI controller violated the
THD increases with a decrease in load, the online learning
standard which mentions that the harmonics should be less
based ADP control produced the best THD results for the
than 0.6 %. That requirement was met by the proposed online
source current even under low load conditions. Generally, the
learning based ADP control approach. It can also be observed
THD improvement from a high value to low value can be
from Fig. 11 that the PI+ADP could achieve significant im-
easily achieved by connecting the power inverter. However,
provement in harmonics beyond the 23rd harmonics.
further improvement of THD at lower level is a challenging
task which is the major contribution of this paper. For loads
from 700 W to 100 W, PI and the ADP controller together V. CONCLUSIONS
achieved further THD improvement which proves the effec- In this paper, an online learning based ADP controller for a
tiveness of the proposed controller under different operating d-axis and q-axis current control was proposed so that Id and
modes unlike passive filters. On average, the ADP controller Iq current effectively track their respective reference currents.
reduced THD by 18.41 %. The increased speed of the PI controller with the online
456 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 4, NO. 3, JULY 2017

learning based control improved the harmonic compensating [17] J. Si, A. G. Barto, W. B. Powell, and D. C. Wunsch, Handbook of
capability of the power inverter. This subsequently reduced the Learning and Approximate Dynamic Programming. Hoboken, NJ, USA:
IEEE Press and John Wiley Sons, 2004.
THD of the source current by an average of 18.41 % compared [18] J. Si and Y. T. Wang, Online learning control by association and
to a traditional PI controller alone. With the advantage of NN reinforcement, IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264276,
weights updating rules and a properly defined reinforcement Mar. 2001.
[19] H. B. He, Z. Ni, and J. Fu, A three-network architecture for on-line
signal, the ADP controller adapted its key parameters when learning and optimization based on adaptive dynamic programming,
the system was under different conditions. The ADP controller Neurocomputing, vol. 78, no. 1, pp. 313, Feb. 2012.
[20] W. T. Guo, F. Liu, J. Si, D. W. He, R. Harley, and S. W. Mei, Ap-
produced the fastest response time, low overshoot and in gen- proximate dynamic programming based supplementary reactive power
eral, the best performance in comparison to the traditional PI control for DFIG wind farm to enhance power system stability, Neuro-
controller. This controller improved the power quality problem computing, vol. 170, pp. 417427, Dec. 2015.
[21] H. G. Zhang, X. Zhang, Y. H. Luo, and J. Yang, An overview of
of harmonics in case of dynamic non-linear load conditions. research on adaptive dynamic programming, Acta Automat. Sinica,
The improvement in the dynamics of the inner current control vol. 39, no. 4, pp. 303311, Apr. 2013.
loop achieved in this work can have a number of applications [22] Q. L. Wei, D. R. Liu, Q. Lin, and R. Z. Song, Adaptive dynamic
programming for discrete-time zero-sum games, IEEE Trans. Neural
for other power electronic systems such as grid-connected Netw. Lear. Syst., to be published. doi: 10.1109/TNNLS.2016.2638863.
inverters, STATCOMs, virtual synchronous machines, etc. For [23] Z. Ni, H. B. He, J. Y. Wen, and X. Xu, Goal representation heuristic
future work, we plan to develop a hardware experiment system dynamic programming on maze navigation, IEEE Trans. Neural Netw.
Lear. Syst., vol. 24, no. 12, pp. 20382050, Dec. 2013.
for the verification of these simulation results. [24] C. Lu, J. Si, and X. R. Xie, Direct heuristic dynamic programming for
damping oscillations in a large power system, IEEE Trans. Syst. Man
R EFERENCES Cybern. B Cybern., vol. 38, no. 4, pp. 10081013, Aug. 2008.
[25] Y. F. Tang, H. B. He, Z. Ni, J. Y. Wen, and X. C. Sui, Reactive
[1] ECPE Europen Center for Power Electronics, EPE European Power power control of grid-connected wind farm based on adaptive dynamic
Electronics and Drives Association, Position paper on energy efficiency- programming, Neurocomputing, vol. 125, pp. 125133, Feb. 2014.
the role of power electronics, in European Workshop on Energy Efficien- [26] Y. F. Tang, H. B. He, Z. Ni, J. Y. Wen, and T. W. Huang, Adaptive
cythe Role of Power Electronics, 2007. modulation for DFIG and STATCOM with high-voltage direct current
[2] D. O. Abdeslam, P. Wira, J. Merckle, D. Flieller, and Y. A. Chapuis, transmission, IEEE Trans. Neural Netw. Lear. Syst., vol. 27, no. 8,
A unified artificial neural network architecture for active power filters, pp. 17621772, Aug. 2016.
IEEE Trans. Industr. Electron., vol. 54, no. 1, pp. 6176, Feb. 2007. [27] W. T. Guo, F. Liu, J. Si, D. W. He, R. Harley, and S. W. Mei, Online
[3] F. S. dos Reis, J. Ale, F. D. Adegas, R. Tonkoski, S. Slan, and K. Tan, supplementary ADP learning controller design and application to power
Active shunt filter for harmonic mitigation in wind turbines generators, system frequency control with large-scale wind energy integration,
in Proc. 37th IEEE Power Electronics Specialists Conf., Jeju, Korea, IEEE Trans. Neural Netw. Lear. Syst., vol. 27, no. 8, pp. 17481761,
2006, pp. 16. Aug. 2016.
[4] L. Asiminoael, F. Blaabjerg, and S. Hansen, Detection is key-harmonic [28] J. Na and G. Herrmann, Online adaptive approximate optimal tracking
detection methods for active power filter applications, IEEE Ind. Appl. control with simplified dual approximation structure for continuous-time
Mag., vol. 13, no. 4, pp. 2233, Jul.Aug. 2007. unknown nonlinear systems, IEEE/CAA J. Automat. Sinica, vol. 1, no. 4,
[5] L. Marconi, F. Ronchi, and A. Tilli, Robust nonlinear control of shunt
pp. 412422, Oct. 2014.
active filters for harmonic current compensation, Automatica, vol. 43,
[29] W. Guo, F. Liu, J. Si, and S. W. Mei, Incorporating approximate
no. 2, pp. 252263, Feb. 2007.
dynamic programming-based parameter tuning into PD-type virtual
[6] IEEE recommended practice and requirements for harmonic control in
inertia control of DFIGs, in Proc. 2013 Int. Joint Conf. Neural Networks
electric power systems, IEEE Standard 519-2014, 2014, pp. 129.
[7] J. Vazquez and P. Salmeron, Active power filter control using neural (IJCNN), Dallas, TX, USA, 2013, pp. 18.
network technologies, IEEE Proc. Electr. Power Appl., vol. 150, no. 2. [30] Z. Ni, Y. F. Tang, X. C. Sui, H. B. He, and J. Y. Wen, An adaptive
pp. 139145, Mar. 2003. neuro-control approach for multi-machine power systems, Int. J. Electr.
[8] H. Akagi, A. Nabae, and S. Atoh, Control strategy of active power Power Energy Syst., vol. 75, pp. 108116, Feb. 2016.
filters using multiple voltage-source PWM converters, IEEE Trans. Ind. [31] Z. Ni, Y. F. Tang, H. B. He, and J. Y. Wen, Multi-machine power
Appl., vol. IA-22, no. 3, pp. 460465, May 1986. system control based on dual heuristic dynamic programming, in Proc.
[9] V. Soares, P. Verdelho, and G. D. Marques, An instantaneous active 2014 IEEE Symp. Computational Intelligence Applications in Smart
and reactive current component method for active filters, IEEE Trans. Grid (CIASG), Orlando, FL, USA, 2014, pp. 17.
Power Electron., vol. 15, no. 4, pp. 660669, Jul. 2000. [32] U. Tamrakar, N. Malla, D.Shrestha, Z. Ni, and R. Tonkoski, Design of
[10] S. Buso, L. Malesani, and P. Mattavelli, Comparison of current control online supplementary adaptive dynamic programming for current control
techniques for active filter applications, IEEE Trans. Industr. Electron., in power electronic systems, in Proc. of IEEE Energy Conversion
vol. 45, no. 5, pp. 722729, Oct. 1998. Congress and Exposition (ECCE17), Cincinnati, OH, USA, Oct. 15,
[11] L. L. Lai, Intelligent System Applications in Power Engineering: Evolu- 2017, pp. 15.
tionary Programming and Neural Networks. Chichester, England, New [33] Q. L. Wei, D. R. Liu, G. Shi, and Y. Liu, Multibattery optimal
York, USA: John Wiley & Sons, Inc., 1998. coordination control for home energy management systems via dis-
[12] M. A. M. Radzi and N. A. Rahim, Neural network and bandless tributed iterative adaptive dynamic programming, IEEE Trans. Industr.
hysteresis approach to control switched capacitor active power filter for Electron., vol. 62, no. 7, pp. 42034214, Jul. 2015.
reduction of harmonics, IEEE Trans. Industr. Electron., vol. 56, no. 5, [34] S. Poudel, Z. Ni, and N. Malla, Real-time cyber physical system testbed
pp. 14771484, May 2009. for power system security and control, Int. J. Electr. Power Energy Syst.,
[13] X. G. Fu, S. H. Li, M. Fairbank, D. C. Wunsch, and E. Alonso, Training vol. 90, pp. 124133, Sep. 2017.
recurrent neural networks with the Levenberg-Marquardt algorithm for [35] Z. Ni, H. B. He, D. B. Zhao, X. Xu, and D. V. Prokhorov, Grdhp:
optimal control of a grid-connected converter, IEEE Trans. Neural A general utility function representation for dual heuristic dynamic
Netw. Learn. Syst., vol. 26, no. 9, pp. 19001912, Sep. 2015. programming, IEEE Trans. Neural Netw. Lear. Syst., vol. 26, no. 3,
[14] R. P. Aguilera, P. Acuna, P. Lezana, G. Konstantinou, B. Wu, S. Bernet, pp. 614627, Mar. 2015.
and V. G. Agelidis, Selective harmonic elimination model predictive [36] Y. F. Tang, H. B. He, J. Y. Wen, and J. Liu, Power system stability
control for multilevel power converters, IEEE Trans. Power Electron., control for a wind farm based on adaptive dynamic programming, IEEE
vol. 32, no. 3, pp. 24162426, Mar. 2017. Trans. Smart Grid, vol. 6, no. 1, pp. 166177, Jan. 2015.
[15] H. A. Young, M. A. Perez, and J. Rodriguez, Analysis of finite-control- [37] L. Dong, Y. F. Tang, H. B. He, and C. Y. Sun, An event-triggered
set model predictive current control with model parameter mismatch in approach for load frequency control with supplementary ADP, IEEE
a three-phase inverter, IEEE Trans. Industr. Electron., vol. 63, no. 5, Trans. Power Syst., vol. 32, no. 1, pp. 581589, Jan. 2017.
pp. 31003107, May 2016. [38] Y. F. Tang, J. Yang, J. Yan, and H. B. He, Intelligent load fre-
[16] F. L. Lewis and D. R. Liu, Reinforcement Learning and Approximate quency controller using GrADP for island smart grid with electric vehi-
Dynamic Programming for Feedback Control. Piscataway, NJ, USA: cles and renewable resources, Neurocomputing, vol. 170, pp. 406416,
Wiley-IEEE Press, 2012. Dec. 2015.
MALLA et al.: ONLINE LEARNING CONTROL FOR HARMONICS REDUCTION BASED ON 457

[39] S. H. Li, D. C. Wunsch, M. Fairbank, and E. Alonso, Vector control of Ujjwol Tamrakar (S14) received the B.E. degree
a grid-connected rectifier/inverter using an artificial neural network, in in electrical engineering, in 2011, from Tribhuvan
Proc. 2012 Int. Joint Conf. Neural Networks (IJCNN), Brisbane, QLD, University, Nepal, and the M.S. degree in electrical
Australia, 2012, pp. 17. engineering, in 2015, from South Dakota State Uni-
[40] S. H. Li, M. Fairbank, C. Johnson, D. C. Wunsch, E. Alonso, and J. L. versity (SDSU), Brookings, SD, USA. Presently, he
Proao, Artificial neural networks for control of a grid-connected rec- is pursuing the Ph.D. degree in electrical engineering
tifier/inverter under disturbance, dynamic and power converter switch- from SDSU. His research interests include power
ing conditions, IEEE Trans. Neural Netw. Lear. Syst., vol. 25, no. 4, electronics and control, grid integration of renewable
pp. 738750, Apr. 2014. energy systems, power system stability and power
[41] M. Fairbank, S. H. Li, X. G. Fu, E. Alonso, and D. Wunsch, An quality.
adaptive recurrent neural-network controller using a stabilization matrix
and predictive inputs to solve a tracking problem under disturbances,
Neural Netw., vol. 49, pp. 7486, Jan. 2014.
[42] S. H. Li, M. Fairbank, X. G. Fu, D. C. Wunsch, and E. Alonso, Nested-
loop neural network vector control of permanent magnet synchronous
motors, in Proc. 2013 Int. Joint Conf. Neural Networks (IJCNN), Dallas,
TX, USA, 2013, pp. 18.
[43] N. Malla, D. Shrestha, Z. Ni, and R. Tonkoski, Supplementary control Dipesh Shrestha (S16) received the bachelor de-
for virtual synchronous machine based on adaptive dynamic program- gree in electrical engineering from Institute of Engi-
ming, in Proc. 2016 IEEE Congr. Evolutionary Computation (CEC), neering, Tribhuvan University, Kirtipur, Nepal, the
Vancouver, BC, Canada, 2016, pp. 19982005. M.S. degree in electrical engineering from South
[44] U. Tamrakar, R. Tonkoski, Z. Ni, T. M. Hansen, and I. Tamrakar,
Dakota State University, Brookings, SD, USA in
Current control techniques for applications in virtual synchronous
2014 and 2016, respectively. Currently, he is work-
machines, in Proc. 2016 IEEE 6th Int. Conf. Power Systems (ICPS),
ing as volunteer research assistant in the Department
New Delhi, India, 2016, pp. 16.
of Electrical Engineering, South Dakota State Uni-
[45] R. Bellman, Dynamic Programming. Princeton, UK: Princeton Univer-
versity. His research interests include grid integration
sity Press, 1957.
[46] Q. L. Wei, D. R. Liu, and H. Q. Lin, Value iteration adaptive dynamic of renewable energy systems, distributed generation,
programming for optimal control of discrete-time nonlinear systems, power electronics, power quality, power systems
IEEE Trans. Cybern., vol. 46, no. 3, pp. 840853, Mar. 2016. reliability, computational intelligence, electric vehicles, and internet of energy.
[47] Q. L. Wei, D. R. Liu, and X. Yang, Infinite horizon self-learning
optimal control of nonaffine discrete-time nonlinear systems, IEEE
Trans. Neural Netw. Lear. Syst., vol. 26, no. 4, pp. 866879, Apr. 2015.
[48] Q. L. Wei, F. L. Lewis, D. R. Liu, R. Z. Song, and H. Q. Lin, Discrete-
time local value iteration adaptive dynamic programming: Convergence
analysis, IEEE Trans. Syst. Man Cybern. Syst., to be published. doi:
10.1109/TSMC.2016.2623766. Zhen Ni (M15) received the B.S. degree from
[49] Q. L. Wei, D. R. Liu, and Q. Lin, Discrete-time local value iter- the Department of Control Science and Engineer-
ation adaptive dynamic programming: Admissibility and termination ing, Huazhong University of Science and Technol-
analysis, IEEE Trans. Neural Netw. Lear. Syst., to be published. doi: ogy, Wuhan, China, in 2010, and the Ph.D. degree
10.1109/TNNLS.2016.2593743. from the Department of Electrical, Computer and
[50] Z. Ni, H. B. He, and J. Y. Wen, Adaptive learning in tracking control Biomedical Engineering, the University of Rhode
based on the dual critic network design, IEEE Trans. Neural Netw. Island (URI), Kingston, RI, USA, in 2015. He is
Lear. Syst., vol. 24, no. 6, pp. 913928, Jun. 2013. currently an Assistant Professor with the Department
[51] X. Luo and J. Si, Stability of direct heuristic dynamic programming for of Electrical Engineering and Computer Science,
nonlinear tracking control using PID neural network, in Proc. 2013 Int. South Dakota State University, Brookings, SD, USA.
Joint Conf. Neural Networks (IJCNN), Dallas, TX, USA, 2013, pp. 17. His current research interests include computational
[52] X. N. Zhong, Z. Ni, and H. B. He, A theoretical foundation of goal intelligence, smart grid, machine learning, and cyber-physical systems. He
representation heuristic dynamic programming, IEEE Trans. Neural received the URI Excellence in Doctoral Research Award (2016), Chinese
Netw. Lear. Syst., vol. 27, no. 12, pp. 25132525, Dec. 2016. Government Award for Outstanding Students Abroad by Chinese Government
[53] F. Liu, J. Sun, J. Si, W. T. Guo, and S. W. Mei, A boundedness result (2014), Second Prize of Graduate Student Poster Contest in IEEE Power and
for the direct heuristic dynamic programming, Neural Netw., vol. 32, Energy Society General Meeting (2015). He has been actively involved in
pp. 229235, Aug. 2012. numerous conference and workshop organization committees in the society,
[54] L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, Direct heuristic including the General Co-Chair of the IEEE CIS Winter School, Washington,
dynamic programming for nonlinear tracking control with filtered track- DC, USA, in 2016.
ing error, IEEE Trans. Syst. Man Cybern. B Cybern., vol. 39, no. 6,
pp. 16171622, Dec. 2009.
[55] H. G. Zhang, J. L. Zhang, G. H. Yang, and Y. H. Luo, Leader-based
optimal coordination control for the consensus problem of multiagent
differential games via fuzzy adaptive dynamic programming, IEEE
Trans. Fuzzy Syst., vol. 23, no. 1, pp. 152163, Feb. 2015.
[56] Q. L. Wei, R. Z. Song, and P. F. Yan, Data-driven zero-sum neuro-
Reinaldo Tonkoski (S04, M11) received the
optimal control for a class of continuous-time unknown nonlinear
B.A.Sc. degree in control and automation engineer-
systems with disturbance using ADP, IEEE Trans. Neural Netw. Lear.
ing, in 2004, the M.Sc. degree in electrical engi-
Syst., vol. 27, no. 2, pp. 444458, Feb. 2016.
neering in 2006 from PUC-RS (Pontifcia Universi-
dade Catlica do RS), Brazil, and the Ph.D. in 2011
from Concordia University, Canada. He was with
CanmetENERGY, Natural Resources Canada, from
January 2009 to January 2010 where he worked on
Naresh Malla (S16) received the B.E. degree from projects related to the grid integration of renewable
the Department of Electrical Engineering, Pulchowk energy sources. Presently, he is an Assistant Pro-
Campus, Kathmandu, Nepal, in 2010. He is now pur- fessor in the Electrical Engineering and Computer
suing the M.S. degree in the Electrical Engineering Science Department, South Dakota State University, Brookings, US. Dr.
and Computer Science Department, South Dakota Tonkoski has authored over seventy technical publications in peer reviewed
State University, Brookings, SD, USA. His current journals and conferences. His research interests include grid integration of
research interests include machine learning, com- renewable energy systems, distributed generation, power quality and power
putational intelligence, adaptive control for power electronics.
electronic devices, smart grid, resilience of electric
power delivery systems, electric power systems sta-
bility and control.

Anda mungkin juga menyukai