Data Centers
Rahul Urgaonkar, Bhuvan Urgaonkar, Michael J. Neely, Anand Sivasubramanian
General Terms
Algorithms, Performance, Theory
Keywords
Power Management, Data Centers, Stochastic Optimization,
Optimal Control
1.
INTRODUCTION
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SIGMETRICS11, June 711, 2011, San Jose, California, USA.
Copyright 2011 ACM 978-1-4503-0262-3/11/06 ...$10.00.
150
Price ($/MWHour)
Since the electricity bill of a data center constitutes a significant portion of its overall operational costs, reducing this
has become important. We investigate cost reduction opportunities that arise by the use of uninterrupted power
supply (UPS) units as energy storage devices. This represents a deviation from the usual use of these devices as
mere transitional fail-over mechanisms between utility and
captive sources such as diesel generators. We consider the
problem of opportunistically using these devices to reduce
the time average electric utility bill in a data center. Using the technique of Lyapunov optimization, we develop an
online control algorithm that can optimally exploit these
devices to minimize the time average cost. This algorithm
operates without any knowledge of the statistics of the workload or electricity cost processes, making it attractive in the
presence of workload and pricing uncertainties. An interesting feature of our algorithm is that its deviation from
optimality reduces as the storage capacity is increased. Our
work opens up a new area in data center power management.
100
50
0
0
20
40
60
80
100
Hour
120
140
160
2.
RELATED WORK
W(t)
P(t) - R(t)
+
Grid
P(t)
R(t)
-
Data
Center
D(t)
Battery
3. BASIC MODEL
We consider a time-slotted model. In the basic model, we
assume that in every slot, the total power demand generated
by the data center in that slot must be met in the current
slot itself (using a combination of power drawn from the utility and the battery). Thus, any buering of the workload
generated by the data center is not allowed. We will relax
this constraint later in Sec. 6 when we allow buering of
some of the workload while providing worst case delay guarantees. In the following, we use the terms UPS and battery
interchangeably.
(1)
(2)
Let Y (t) denote the battery energy level in slot t. Then, the
dynamics of Y (t) can be expressed as:
Y (t + 1) = Y (t) D(t) + R(t)
(3)
(4)
(5)
(6)
(7)
C(t) = C(S(t),
P (t))
(8)
(9)
S S (10)
4.
CONTROL OBJECTIVE
Let P (t), R(t) and D(t) denote the control decisions made
in slot t by any feasible policy under the basic model as
discussed in Sec. 3. These must satisfy the constraints (1),
(2), (6), (7), and (9) every slot. We dene the following
indicator variables that are functions of the control decisions
regarding a recharge or discharge operation in slot t:
j
j
1 if R(t) > 0
1 if D(t) > 0
1D (t) =
1R (t) =
0 else
0 else
Note that by (2), at most one of 1R (t) and 1C (t) can take
the value 1. Then the total cost incurred in slot t is given
by P (t)C(t) + 1R (t)Crc + 1D (t)Cdc . The time-average cost
under this policy is given by:
lim
t1
1X
E {P ( )C( ) + 1R ( )Crc + 1D ( )Cdc }
t =0
(11)
Minimize:
lim
R = lim
t1
1X
E {P ( )C( ) + 1R ( )Crc + 1D ( )Cdc}
t t
=0
lim
(13)
t1
X
E {R( ) D( )}
=0
stat
stat
E P
(t)C(t) + 1R (t)Crc + 1D (t)Cdc =
(15)
where the expectations above are with respect to the stationary distribution of (W (t), S(t)) and the randomized control
decisions.
Proof. This result follows from the framework in [8, 15]
and is omitted for brevity.
It should be noted that while it is possible to characterize
and potentially compute such a policy, it may not be feasible for the original problem P1 as it could violate the constraints (6) and (7). However, the existence of such a policy
can be used to construct an approximately optimal policy
that meets all the constraints of P1 using the technique of
Lyapunov optimization [8] [15]. This policy is dynamic and
does not require knowledge of the statistical description of
the workload and cost processes. We present this policy and
derive its performance guarantees in the next section. This
dynamic policy is approximately optimal where the approximation factor improves as the battery capacity increases.
Also note that the distance from optimality for our policy is
However, since opt , in pracmeasured in terms of .
tice, the approximation factor is better than the analytical
bounds.
5.
We now present an online control algorithm that approximately solves P1. This algorithm uses a control parameter
V > 0 that aects the distance from optimality as shown
later. This algorithm also makes use of a queueing state
variable X(t) to track the battery charge level and is dened
as follows:
X(t) = Y (t) V min Dmax Ymin
(16)
(17)
Note that X(t) can be negative. We will show that this definition enables our algorithm to ensure that the constraint
(4) is met.
We are now ready to state the dynamic control algorithm.
Let (W (t), S(t)) and X(t) denote the system state in slot t.
Then the dynamic algorithm chooses control action P (t) as
Whigh
Wmid
Wlow
t
i
h
Minimize: X(t)P (t) + V P (t)C(t) + 1R (t)Crc + 1D (t)Cdc
Subject to: Constraints (1), (2), (5), (9)
The constraints above result in the following constraint on
P (t):
Plow P (t) Phigh
(18)
where
Plow = max[0, W (t)Dmax] and Phigh = min[Ppeak , W (t)+
Rmax ]. Let P (t), R (t), and D (t) denote the optimal solution to P3. Then, the dynamic algorithm chooses the
recharge and discharge values as follows.
j
P (t) W (t) if P (t) > W (t)
R (t) =
0
else
D (t) =
30
1.25
92.5
40
2.5
91.1
50
3.75
88.5
75
6.875
88.0
100
10.0
87.0
Ymax 20
(this choice will become clear in Sec. 5.2 when we
8
relate V to the battery capacity). Note that the number of
slots for which a fully charged battery can sustain the data
center at maximum load is Ymax /Whigh .
In Table 1, we show the time average cost achieved for different values of Ymax . It can be seen that as Ymax increases,
the time average cost approaches the optimal value (this
behavior will be formalized in Theorem 1). This is remarkable given that the dynamic algorithm operates without any
knowledge of the future workload and cost processes. To examine the behavior of the dynamic algorithm in more detail,
we x Ymax = 100 and look at the sample paths of the control decisions taken by the optimal oine algorithm and the
dynamic algorithm during the rst 200 slots. This is shown
in Figs. 4 and 5. It can be seen that initially, the dynamic
tends to perform suboptimally. But eventually it learns to
make close to optimal decisions.
It might be tempting to conclude from this example that
an algorithm based on a price threshold is optimal. Specifically, such an algorithm makes a recharge vs. discharge
decision depending on whether the current price C(t) is
smaller or larger than a threshold. However, it is easy
to construct examples where the dynamic algorithm outperforms such a threshold based algorithm. Specically,
suppose that the W (t) process takes values from the interval [10, 90] uniformly at random every slot. Also, suppose
20
P(t)
20
0
94.0
15
10
0
20
40
60
80
100
time
120
140
160
180
200
P(t)
Ymax
V
Avg. Cost
15
10
0
20
40
60
80
100
time
120
140
160
180
200
5.1 Solving P3
In general, the complexity of solving P3 depends on the
structure of the unit cost function C(t). For many cases
of practical interest, P3 is easy to solve and admits closed
form solutions that can be implemented in real time. We
consider two such cases here. Let (t) denote the value of
the objective in P3 when there is no recharge or discharge.
Thus (t) = W (t)(X(t) + V C(t)).
(19)
(20)
opt + B/V
(21)
max[R2
,D 2
max
max
where B is a constant given by B =
2
and opt is the optimal solution to P1 under any feasible control algorithm (possibly with knowledge of future
events).
Theorem 1 part 4 shows that by choosing larger V , the timeaverage cost under the dynamic algorithm can be pushed
closer to the minimum possible value opt . However, Vmax
limits how large V can be chosen. We prove Theorem 1 in
the next section.
(22)
(23)
(24)
max[R2
,D 2
max
max
where B =
. Following the Lyapunov opti2
mization framework of [8], we add to both sides of (24) the
penalty term V E {P (t)C(t) + 1R (t)Crc + 1D (t)Cdc |X(t)} to
get the following:
stat
B + V E P stat (t)C stat (t) + 1stat
R (t)Crc + 1D (t)Cdc |X(t)
= B + V B + V opt
Taking the expectation of both sides and using the law of
iterated expectations and summing over t {0, 1, 2, . . . , T
1}, we get:
T
1
X
t=0
T 1
1 X
E {P (t)C(t) + 1R (t)Crc + 1D (t)Cdc } opt + B/V
T t=0
(27)
(29)
6.
Minimize:
t1
1X
E {P ( )C( ) + 1R ( )Crc + 1D ( )Cdc}
t t
=0
lim
Subject to: Constraints (2), (5), (6), (7), (9), (27), (29)
Finite average delay for W1 (t)
Similar to the basic model, we consider the following relaxed
problem:
P5 :
Minimize:
t1
1X
E {P ( )C( ) + 1R ( )Crc + 1D ( )Cdc}
t t
=0
lim
(30)
U <
(31)
W2(t)
W1(t)
U(t)
P(t) - R(t)
+
Grid
P(t)
R(t)
-
D(t)
Battery
(t)
Lemma 4. (Worst Case Delay) Suppose a control algorithm ensures that U (t) Umax and Z(t) Zmax for all
t, where Umax and Zmax are some positive constants. Then
the worst case delay for any delay tolerant workload is at
most max slots where:
1-(t)
Data Center
max =
(Umax + Zmax )/
t1
1X
E {U ( )}
t =0
(32)
Let ext and ext denote the optimal value for problems P4
and P5 respectively. Since P5 is less constrained than P4,
we have that ext ext . Similar to Lemma 1, the following
holds:
Lemma 3. (Optimal Stationary, Randomized Policy): If
the workload process W1 (t), W2 (t) and auxiliary process S(t)
are i.i.d. over slots, then there exists a stationary, random
E R(t)
= E D(t)
(33)
n
o
+ D(t))
(34)
E
(t)(P (t) R(t)
E {W1 (t)}
o
n
+
1D (t)Cdc = ext
(35)
E P (t)C(t)
1R (t)Crc +
where the expectations above are with respect to the stationary distribution of (W1 (t), W2 (t), S(t)) and the randomized
control decisions.
Proof. This result follows from the framework in [8, 15]
and is omitted for brevity.
The condition (34) only guarantees queueing stability, not
bounded worst case delay. We will now design a dynamic
control algorithm that will yield bounded worst case delay
while guaranteeing an average cost that is within O(1/V ) of
ext (and therefore ext ).
(37)
t+
max
X
[( )(P ( ) R( ) + D( ))]
=t+1
(38)
=t+1
(39)
(40)
i
h
Max:[U (t) + Z(t)]P (t) V P (t)C(t) + 1R (t)Crc + 1D (t)Cdc
6.3 Solving P6
Similar to P3, the complexity of solving P6 depends on
the structure of the unit cost function C(t). For many cases
of practical interest, P6 is easy to solve and admits closed
form solutions that can be implemented in real time. We
consider one such case here.
max
We dene an upper bound Vext
on the maximum value
that V can take in our algorithm for the extended model.
max
Vext
=
Zmax =V min +
(42)
(43)
(44)
40
80
60
40
0
10
15
20
Hour
Price ($/MWHour)
41
100
39
38
37
36
35
34
33
32
0.8
50
100
150
200
Ymax (MWminute)
250
300
0.6
0.4
0
10
15
20
Hour
t1
1X
E {P ( )C( ) + 1R ( )Crc + 1D ( )Cdc }
t =0
ext + Bext /V
(47)
7.
SIMULATION-BASED EVALUATION
We evaluate the performance of our control algorithm using both synthetic and real pricing data. To gain insights
into the behavior of the algorithm and to compare with the
optimal oine solution, we rst consider the basic model
Ymax
Battery, No WP
WP, No Battery
WP, Battery
15
95%
96%
92%
30
92%
92%
85%
50
89%
88%
79%
8.
Acknowledgments
This work was supported, in part, by the NSF grants CCF0811670, CNS-0720456, CNS-0615097, CAREER awards CCF0747525 and CNS-0953541, and a research award from HP.
This work was performed while Rahul Urgaonkar was a student at the University of Southern California.
9. REFERENCES
[1] California ISO Open Access Same-time Information System
(OASIS) Hourly Average Energy Prices.
http://oasisis.caiso.com.
[2] Lead-acid batteries: Lifetime vs. Depth of discharge.
http://www.windsun.com/Batteries/Battery_FAQ.htm.
[3] A. Bar-Noy, Y. Feng, M. P. Johnson, and O. Liu. When to
reap and when to sow: Lowering peak usage with realistic
batteries. In Proc. 7th International Conference on
Experimental Algorithms, 2008.
[4] A. Bar-Noy, M. P. Johnson, and O. Liu. Peak shaving through
resource buering. In Proc. WAOA, 2008.
[5] D. P. Bertsekas. Dynamic Programming and Optimal Control,
vols. 1 and 2. Athena Scientic, 2007.
[6] J. S. Chase, D. C. Anderson, P. N. Thakar, A. M. Vahdat, and
R. P. Doyle. Managing energy and server resources in hosting
centers. SIGOPS Oper. Syst. Rev., 35:103116, Oct. 2001.
[7] M. Gatzianas, L. Georgiadis, and L. Tassiulas. Control of
wireless networks with rechargeable batteries. IEEE Trans.
Wireless. Comm., 9:581593, Feb. 2010.
[8] L. Georgiadis, M. J. Neely, and L. Tassiulas. Resource
allocation and cross-layer control in wireless networks. Found.
and Trends in Networking, 1:1144, 2006.
[9] S. Gurumurthi, A. Sivasubramaniam, M. Kandemir, and
H. Franke. Drpm: Dynamic speed control for power
management in server class disks. In Proc. ISCA 03, 2003.
[10] U. Hoelzle and L. A. Barroso. The Datacenter as a Computer:
An Introduction to the Design of Warehouse-Scale Machines.
Morgan & Claypool, 2009.
[11] K. Le, R. Bianchini, M. Martonosi, and T. Nguyen. Cost- and
energy-aware load distribution across data centers. In Proc.
HOTPOWER, 2009.
[12] A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power aware page
allocation. SIGOPS Oper. Syst. Rev., 34:105116, Nov. 2000.
[13] D. Linden and T. B. Reddy. Handbook of Batteries. McGraw
Hill Handbooks, 2002.
[14] M. Marwah, P. Maciel, A. Shah, R. Sharma, T. Christian,
V. Almeida, C. Ara
ujo, E. Souza, G. Callou, B. Silva,
S. Galdino, and J. Pires. Quantifying the sustainability impact
of data center availability. SIGMETRICS Perform. Eval.
Rev., 37:6468, March 2010.
[15] M. J. Neely. Stochastic Network Optimization with
Application to Communication and Queueing Systems.
Morgan & Claypool, 2010.
[16] M. J. Neely, A. S. Tehrani, and A. G. Dimakis. Ecient
algorithms for renewable energy allocation to delay tolerant
consumers. In Proc. IEEE SmartGridComm, 2010.
[17] S. Park, W. Jiang, Y. Zhou, and S. Adve. Managing
energy-performance tradeos for multithreaded applications on
multiprocessor architectures. In Proc. ACM SIGMETRICS,
2007.
[18] A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, and
B. Maggs. Cutting the electric bill for internet-scale systems.
In Proc. SIGCOMM, 2009.
[19] R. Urgaonkar, B. Urgaonkar, M. J. Neely, and
A. Sivasubramaniam. Optimal power cost management using
stored energy in data centers. arXiv Technical Report:
arXiv:1103.3099v2, March 2011.
[20] Q. Zhu, F. David, C. Devaraj, Z. Li, Y. Zhou, and P. Cao.
Reducing energy consumption of disk storage using
power-aware cache management. In Proc. HPCA, 2004.