Distributed Subgradient Methods For Multi-Agent Optimization

Distributed Subgradient Methods for
Multi-agent Optimization
Asu Ozdaglar
February 2009
Department of Electrical Engineering & Computer Science
Massachusetts Institute of Technology, USA
MIT Laboratory for Information and Decision Systems 1

Motivation
Increasing interest in distributed control and coordination of networks
consisting of multiple autonomous agents
Motivated by many emerging networking applications, such as ad hoc wireless

communication networks and sensor networks, characterized by:
Lack of centralized control and access to information
Time-varying connectivity
Control and optimization algorithms for such networks should be:

Completely distributed relying on local information
Robust against changes in the network topology

Multi-Agent Optimization Problem
Goal: Develop a general computational model

for cooperatively optimizing a global system
objective through local interactions and com-
putations in a multi-agent system
Global objective is a combination of
individual agent performance measures
Examples:
Consensus problems: Alignment of estimates maintained by different agents

Control of moving vehicles (UAVs), computing averages of initial values
Parameter estimation in distributed sensor networks:

Regression-based estimates using local sensor measurements
Congestion control in data networks with heterogeneous users

Related Literature
Parallel and Distributed Optimization Algorithms:
General computational model for distributed asynchronous optimization
Tsitsiklis 84, Bertsekas and Tsitsiklis 95
Consensus and Cooperative Control:

Analysis of group behavior (flocking) in dynamical-biological systems
Vicsek 95, Reynolds 87, Toner and Tu 98
Mathematical models of consensus and averaging
Jadbabaie et al. 03, Olfati-Saber Murray 04, Boyd et al. 05
Game Theory/Mechanism Design for Distributed Cooperative Control:

Assign each agent a local utility function such that:
The equilibrium of the resulting game is the same as (or close to) the
global optimum
Derive learning algorithms that reach equilibrium
Marden, Arslan, Shamma 07 used this approach for the consensus problem
to deal with constraints
This Talk
Development of a distributed subgradient method for multi-agent optimization
[Nedic, Ozdaglar 08]
Convergence analysis and performance bounds for time-varying topologies
under general connectivity assumptions
Effects of local constraints [Nedic, Ozdaglar, Parrilo 08]
Effects of networked-system constraints: quantization, delay, asynchronism

Model
We consider a network of m agents with node set V = {1, . . . , m}
Agents want to cooperatively solve f1(x1, . . . , xn)
f2(x1, . . . , xn)
m
X
min fi (x)
xRn
i=1
Function fi (x) : Rn R is a convex objec-

tive function known only by node i fm (x1, . . . , xn)
Agents update and send their information at discrete times t0 , t1 , t2 . . .

We use xi (k) Rn to denote the estimate of agent i at time tk
Agent Update Rule:
Agent i locally minimizes fi according to:
m
X
xi (k + 1) = aij (k)xj (k) i (k)di (k)
j=1
aij (k) are weights, i (k) is stepsize, di (k) is subgradient of fi at xi (k)

ai (k) = (ai1 (k), . . . , aim (k)) represents is time-varying neighbors at slot k
The model includes consensus as a special case (fi (x) = 0 for all i)
Linear Dynamics and Transition Matrices
We let A(s) denote the matrix whose ith column is the vector ai (s) and
introduce the transition matrices
(k, s) = A(s)A(s + 1) A(k 1)A(k) for all k s
We use these matrices to relate xi (k + 1) to xj (s) at time s k:

m k1
m !
i
X i j
X X
x (k+1) = [(k, s)]j x (s) [(k, r + 1)]ij j (r)dj (r) i (k)di (k).
j=1 r=s j=1
We analyze convergence properties of the distributed method by establishing:

Convergence of transition matrices (k, s) (consensus part)
Convergence of an approximate subgradient method (effect of optimization)

Assumptions
Assumption (Weights) At all times k, we have
(a) The weights aij (k) are nonnegative for all agents i, j.
(b) There exists a scalar (0, 1) such that for all agents i,
aii (k) and aij (k)
for agents j communicating with agent i at time k.
(c) The agent weight vectors ai (k) = [ai1 (k), . . . , aim (k)]T are stochastic, i.e.,
Pm i
j=1 aj (k) = 1 for all i and k.
Example: Equal neighbor weights

1
aij (k) =
ni (k) + 1
where ni (k) is the number of agents communicating with i at time k

Information Exchange
Agent i influences any other agent infinitely

often - connectivity
Agent j send his information to neighbor-
ing agent i within a bounded time interval
- bounded intercommunication interval
At slot k, information exchange may be represented by a directed graph (V, Ek ) with

Ek = {(j, i) | aij (k) > 0}
Assumption (Connectivity) The graph (V, E ) is connected, where
E = {(j, i) | (j, i) Ek for infinitely many indices k}.
Assumption (Bounded Intercommunication Interval) There is some B 1 s.t.
(j, i) Ek Ek+1 Ek+B1 for all (j, i) E and k 0.

Properties of Transition Matrices
Lemma: Let Weights and Information Exchange assumptions hold. We then have
[(s + (m 1)B 1, s)]ij (m1)B for all s, i, and j,
where is the lower bound on weights and B is the intercommunication interval

bound.
We introduce the matrices Dk (s) as follows: for a fixed s 0,
Dk (s) = 0 (s + kB0 1, s + (k 1)B0 ) for k = 1, 2, . . . ,
where B0 = (m 1)B.
By the previous lemma, all entries of Dk (s) are positive.

Convergence of Transition Matrices
Lemma: Let Weights and Information Exchange assumptions hold. For each s 0,
we have:
(a) The limit D(s) = limk Dk (s) D1 (s) exists.
(b) The limit D(s) has identical rows and the rows are stochastic.
(c) For every j, the entries [Dk (s) D1 (s)]ji , i = 1, . . . , m, converge to the same
limit j (s) as k with a geometric rate:
k
j B0 B0
[Dk (s) D1 (s)]i j (s) 2 1 + 1
where is the lower bound on weights, B is the intercommunication interval

bound, and B0 = (m 1)B.

Proof Outline
We show that the sequence {(Dk D1 )x} converges for every x Rm
Consider the sequence {xk } with xk = Dk D1 x and write xk as
xk = zk + ck e, where ck = min [xk ]i

1im
Using the property that each entry of the matrix Dk is positive, we show
k
B0
kzk k 1 kz0 k for all k.
Hence zk 0 with a geometric rate.
We then show that the sequence {ck } converges to some c R and use the
contraction constant to establish the rate estimate
The final relation follows by picking x = ej , the j th unit vector

Convergence of Transition Matrices
Proposition: Let Weights and Information Exchange assumptions hold. For each
s 0, we have:
(a) The limit (s) = limk (k, s) exists.
(b) The limit matrix (s) has identical columns and the columns are stochastic, i.e.,
(s) = (s)e0 ,
where (s) Rm is a stochastic vector.
(c) For every i, [(k, s)]ji , j = 1, ..., m, converge to the same limit i (s) as k
with a geometric rate, i.e., for all i, j and all k s,

j 1 + B0 B0
ks
B0
[(k, s)]i i (s) 2 1
1 B0
where is the lower bound on weights, B is the intercommunication interval
bound, and B0 = (m 1)B.
The rate estimate in part (c) recently improved in [Nedic, Olshevsky, Ozdaglar,
Tsitsiklis 08]
Convergence Analysis
Recall the evolution of the estimates (with i (s) = ):
m k1
m !
X X X
xi (k + 1) = [(k, s)]ij xj (s) [(k, r + 1)]ij dj (r) di (k).
j=1 r=s j=1
Proof method: Consider a stopped process, where after time k, agents stop
computing subgradients but keep exchanging their estimates: di (k) = 0 for all
k k and all i.
It can be seen that the stopped process takes the form

m k
m !
X X X
xi (k + 1) = [(k, 0)]ij xj (0) [(k, r)]ij dj (r 1)
j=1 r=1 j=1
Using limk [(k, s)]ij = j (s) for all i, we see that the limit vector
limk xi (k) exists, independent of i, but depends on k:
lim xi (k) = y(k)

k

Behavior of Stopped Process
Stopped process is described by:
m
X
y(k + 1) = y(k) j (k)dj (k)
j=1
Subgradients dj (k) of fj are computed at xj (k) instead of y(k)
This would correspond to an approximate subgradient method for minimizing

P
j fj (x) provided that the values j (k) are the same for all j
Assumption (Doubly Stochastic Weights) The matrices A(k) are doubly

Pm i
stochastic, i.e., i=1 aj (k) = 1 for all j and k.
Can be ensured when the agents exchange their information simultaneously
and coordinate the selection of the weights aij (k)
In this case the stopped process reduces to

m
X j
y(k + 1) = y(k) d (k)
m j=1

Main Convergence Result
Proposition: Let
Weights, Doubly Stochastic Weights, and Information Exchange assumptions hold
The subgradients of fi be uniformly bounded by a constant L, and
max1jm kxj (0)k L.
Then, for the averages xi (k) of estimates xi (0), . . . , xi (k 1) we have
2

m dist (y(0), X ) C
f (xi (k)) f + + L2 + 2mC1
2k 2
P
where f is the optimal value of f = i fi , X is the optimal set,

1 X i
y(0) = x (0), C = 1 + 8mC1
m i
m 1 + B0
C1 = 1 + 1 , B0 = (m 1)B
1 (1 B0 ) B0 1 B0
Estimates are per iteration
Captures tradeoff between accuracy and computational complexity

Proof Outline
We analyze the stopped process:
m
X j
y(k + 1) = y(k) d (k)
m j=1
Establish approximate convergence relation for the running averages

k1
1X
y(k) = y(h)
k
h=0
Using the convergence rate estimate for the transition matrices (k, s), we show
i 1
Pk1 i
that y(k) are close to the agents averages x (k) = k h=0 x (h)
ky(k) xi (k)k 2LC1 for all i and k
Infer the result for the running averages x(k)

Constrained Consensus Problem
Estimates of agent i restricted to lie in a closed convex constraint set Xi
We assume that the intersection set X = m

i=1 Xi is nonempty
Examples where constraints important:

Motion planning and alignment problems, where each agents position is
limited to a certain region or range
Distributed constrained multi-agent optimization
This talk: Pure consensus problem in the presence of constraints

Projected Consensus Algorithm
For the constrained consensus problem, we develop a consensus algorithm based
on projections.
We use xi (k) Xi to denote the estimate of agent i at time tk
Given a closed convex set X Rn , and a vector y Rn , we define:
dist(y, X) = min ky xk, PX (y) = argminxX ky xk

xX
Agent Update Rule:
Agent i updates his estimates subject to his constraint set:

hX
m i
i
x (k + 1) = PXi aij (k)xj (k)
j=1
where ai (k) = (ai1 (k), . . . , aim (k))0 is the weight vector

Connection to Alternating Projections
Update rule similar to classical alternating projection method
Alternating/Cyclic Projection Methods:

Given closed convex sets X1 , X2 Rn , find a
point in X = X1 X2 :
x(k + 1) = PX1 (x(k))
x(k + 2) = PX2 (x(k + 1))
Convergence analysis:
Xi affine [Von Neumann; Aronszajn 50],
Xi convex [Gubin, Polyak, Raik 67]
Constrained Consensus Algorithm:

Given closed convex sets X1 , X2 Rn :
P 1
w (k) = j aj (k)xj (k)
1
P 2
w (k) = j aj (k)xj (k)
2
x1 (k + 1) = PX1 (w1 (k))

x2 (k + 1) = PX2 (w2 (k))

Convergence Analysis
Analysis of the impact of constraints
Recall the update rule

m
X
i
x (k + 1) = aij (k)xj (k) + ei (k),
j=1
where the projection error ei (k) is given by

hX
m i Xm
ei (k) = PXi aij (k)xj (k) aij (k)xj (k) = xi (k + 1) wi (k).
j=1 j=1
Proposition: Assume that the intersection X = m

i=1 Xi is nonempty. Let Doubly
Stochastic Weights assumption hold. Then:
lim ei (k) = 0 for all i.

k
Implication: The analysis translates into unconstrained case
The proof relies on two lemmas

Convergence of Projection Error
Lemma: Let w Rn and X Rn closed and

convex. For all x X,
kw PX (w)k2 kw xk2 kPX (w) xk2 .
Lemma: Let Doubly Stochastic Weights assumption hold. Then:
kxi (k + 1) xk kwi (k) xk x Xi (nonexpansiveness of projection)

Pm i 2 Pm i 2
i=1 kw (k) xk i=1 kx (k) xk x (by doubly stochasticity)
Of independent interest in convergence analysis of doubly stochastic matrices
Implication:
Pm
limk i=1 kwi (k) xk2 kxi (k + 1) xk2 = 0, for all x X

Convergence of the Estimates
Recall that the transition matrices are defined as follows:
(k, s) = A(s)A(s + 1) A(k 1)A(k) for all s and k with k s,
Using the transition matrices and the decomposition of the estimate evolution,
m k
m !
i
X i j
X X
x (k + 1) = [(k, s)]j x (s) + [(k, r)]ij ej (r 1) + ei (k).
j=1 r=s+1 j=1
Use a two-time scale analysis, where we define a similar stopped process

m k
m !
1 X j 1 X X j
y(k) = x (s) + e (r 1)
m j=1 m r=s+1 j=1
It can be seen that the agent estimates reach a consensus:
lim kxi (k) y(k)k = 0 for all i

k
Proposition: (Convergence) Let Weights, Doubly Stochastic Weights, and

Information Exchange assumptions hold. We have for some x X,
lim kxi (k) xk = 0 for all i.

k

Rate Analysis
Assumption (Interior Point): There exists

a vector x such that

m
x int(X) = int i=1 Xi ,
i.e., there exists some scalar > 0 such

that {z | kz xk } X.
Proposition: Let Interior Point Assumption hold. Let the weight vectors ai (k) be
given by ai (k) = (1/m, . . . , 1/m)0 for all i and k. Then,
Xm k X m
1
kxi (k) xk2 1 2
kxi
(0) xk 2
for all k 0,
i=1
4R i=1
Pm
where x X is the limit of the sequence {xi (k)}, and R = 1
i=1 kxi
(0) xk,
i.e., the convergence rate is linear.
Linear convergence rate extends to time-varying weights with Xi = X
Convergence rate for time-varying weights and different local constraints open
Summary
We presented a distributed subgradient method for multi-agent optimization
The method can operate over networks with time-varying connectivity
We proposed a constrained consensus policy for the case when agents have local
constraints on their estimates
This policy has connections to the classical alternating projection method
We analyzed the convergence and rate of convergence of the algorithms

Putting Things Together: Constrained Distributed
Optimization
Each agent has a convex closed local constraint set Xi [Nedic, Ozdaglar,
Parrilo 08]
Agent i updates his estimate by

m
X
v i (k) = aij (k)xj (k)
j=1
h i
i i i
x (k + 1) = PXi v (k) (k)d (k) ,
(k) > 0 is a diminishing stepsize sequence and di (k) is a subgradient of fi (x)

at x = v i (k).
Results:
Agent estimates generated by this algorithm converge to the same optimal

solution for the cases when the weights are constant and equal, and when the
weights are time-varying but Xi = X for all i.
Convergence analysis in the general case open!

Optimization over Random Networks
Existing work focuses on deterministic models of network connectivity (i.e.,
worst-case assumptions about intercommunication intervals)
Time-varying connectivity modeled probabilistically [Lobel, Ozdaglar 08]
Each component l of agent estimates evolve according to
xl (k + 1) = A(k)xl (k) (k)dl (k)
We assume that the matrix A(k) is a random matrix drawn independently over
time from a probability space of stochastic matrices.
This allows edges at any time k to be correlated.
Results:
We establish properties of random transition matrices by constructing positive

probability events in which information propagates from every node to every
other node in the network
We provide convergence analysis of a stochastic subgradient method under

different stepsize rules
Limits on Communication
Quantization Effects:
Agents have access to quantized estimates due to storage and communication
bandwidth constraints [Nedic, Olshevsky, Ozdaglar, Tsitsiklis 07]
Agent i updates his estimate by
m
X
i
x (k + 1) = aij (k)xjQ (k) di (k)
j=1
xiQ (k + 1) = bxi (k + 1)c

bc represents rounding down to the nearest integer multiple of 1/Q
Delay Effects:
Agents have access to outdated estimates due to communication delays
[Bliman, Nedic, Ozdaglar 08]
In the presence of delays, agent i updates his estimate by
m
X
xi (k + 1) = aij (k)xj (k tij (k)) di (k)
j=1
tij (k) is the delay in passing information from j to i

Applications to Social Networks
Growing interest in dynamics in a social network of communicating agents
A specific example is learning and information aggregation over networks
Consensus policies can be used to develop and analyze myopic/quasi-myopic

learning models [Golub, Jackson 07], [Acemoglu, Ozdaglar, ParandehGheibi
08]
In the context of social networks, these rules may be too myopic:

Alternative approach: Bayesian learning over social networks [Acemoglu,
Dahleh, Lobel, Ozdaglar 08]

References
A. Nedic and A. Ozdaglar, Distributed Subgradient Methods for Multi-agent
Optimization, IEEE Transactions on Automatic Control, 2008.
A. Ozdaglar, Constrained Consensus and Alternating Projections, Proc. of
Allerton Conference, 2007.
A. Nedic, A. Olshevsky, A. Ozdaglar, and J.N. Tsitsiklis, On Distributed
Averaging Algorithms and Quantization Effects, IEEE Transactions on
Automatic Control, 2008.
A. Nedic, A. Ozdaglar, and P.A. Parrilo, Constrained Consensus and
Optimization in Multi-agent Networks, submitted for publication, 2008.
I. Lobel and A. Ozdaglar, Distributed Subgradient Methods over Random
Networks, Proc. of Allerton Conference, 2008.
D. Acemoglu, M. Dahleh, I. Lobel, and A. Ozdaglar, Bayesian Learning in
Social Networks, submitted for publication, 2008.
D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi Spread of (Mis)Information
over Social Networks, working paper, 2008.
All papers can be downloaded from web.mit.edu/asuman/www


Distributed Subgradient Methods For Multi-Agent Optimization

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Distributed Subgradient Methods For Multi-Agent Optimization

Diunggah oleh

Hak Cipta:

Format Tersedia

Distributed Subgradient Methods for

Department of Electrical Engineering & Computer Science

Massachusetts Institute of Technology, USA

MIT Laboratory for Information and Decision Systems 1

Motivated by many emerging networking applications, such as ad hoc wireless

Control and optimization algorithms for such networks should be:

MIT Laboratory for Information and Decision Systems 2

Goal: Develop a general computational model

Consensus problems: Alignment of estimates maintained by different agents

Parameter estimation in distributed sensor networks:

Congestion control in data networks with heterogeneous users

MIT Laboratory for Information and Decision Systems 3

Consensus and Cooperative Control:

Game Theory/Mechanism Design for Distributed Cooperative Control:

Effects of local constraints [Nedic, Ozdaglar, Parrilo 08]

Effects of networked-system constraints: quantization, delay, asynchronism

MIT Laboratory for Information and Decision Systems 5

Function fi (x) : Rn R is a convex objec-

Agents update and send their information at discrete times t0 , t1 , t2 . . .

aij (k) are weights, i (k) is stepsize, di (k) is subgradient of fi at xi (k)

We use these matrices to relate xi (k + 1) to xj (s) at time s k:

We analyze convergence properties of the distributed method by establishing:

MIT Laboratory for Information and Decision Systems 7

aii (k) and aij (k)

for agents j communicating with agent i at time k.

Example: Equal neighbor weights

MIT Laboratory for Information and Decision Systems 8

Agent i influences any other agent infinitely

At slot k, information exchange may be represented by a directed graph (V, Ek ) with

E = {(j, i) | (j, i) Ek for infinitely many indices k}.

Assumption (Bounded Intercommunication Interval) There is some B 1 s.t.

(j, i) Ek Ek+1 Ek+B1 for all (j, i) E and k 0.

MIT Laboratory for Information and Decision Systems 9

[(s + (m 1)B 1, s)]ij (m1)B for all s, i, and j,

where is the lower bound on weights and B is the intercommunication interval

We introduce the matrices Dk (s) as follows: for a fixed s 0,

Dk (s) = 0 (s + kB0 1, s + (k 1)B0 ) for k = 1, 2, . . . ,

By the previous lemma, all entries of Dk (s) are positive.

MIT Laboratory for Information and Decision Systems 10

(a) The limit D(s) = limk Dk (s) D1 (s) exists.

where is the lower bound on weights, B is the intercommunication interval

MIT Laboratory for Information and Decision Systems 11

Consider the sequence {xk } with xk = Dk D1 x and write xk as

xk = zk + ck e, where ck = min [xk ]i

Hence zk 0 with a geometric rate.

The final relation follows by picking x = ej , the j th unit vector

MIT Laboratory for Information and Decision Systems 12

(a) The limit (s) = limk (k, s) exists.

where (s) Rm is a stochastic vector.

It can be seen that the stopped process takes the form

lim xi (k) = y(k)

MIT Laboratory for Information and Decision Systems 14

Subgradients dj (k) of fj are computed at xj (k) instead of y(k)

This would correspond to an approximate subgradient method for minimizing

Assumption (Doubly Stochastic Weights) The matrices A(k) are doubly

In this case the stopped process reduces to

MIT Laboratory for Information and Decision Systems 15

Estimates are per iteration

Captures tradeoff between accuracy and computational complexity

MIT Laboratory for Information and Decision Systems 16

Establish approximate convergence relation for the running averages

ky(k) xi (k)k 2LC1 for all i and k

Infer the result for the running averages x(k)