Anda di halaman 1dari 30

Distributed Subgradient Methods for

Multi-agent Optimization

Asu Ozdaglar

February 2009

Department of Electrical Engineering & Computer Science

Massachusetts Institute of Technology, USA

MIT Laboratory for Information and Decision Systems 1


Motivation
Increasing interest in distributed control and coordination of networks
consisting of multiple autonomous agents

Motivated by many emerging networking applications, such as ad hoc wireless


communication networks and sensor networks, characterized by:
Lack of centralized control and access to information
Time-varying connectivity

Control and optimization algorithms for such networks should be:


Completely distributed relying on local information
Robust against changes in the network topology

MIT Laboratory for Information and Decision Systems 2


Multi-Agent Optimization Problem

Goal: Develop a general computational model


for cooperatively optimizing a global system
objective through local interactions and com-
putations in a multi-agent system
Global objective is a combination of
individual agent performance measures

Examples:

Consensus problems: Alignment of estimates maintained by different agents


Control of moving vehicles (UAVs), computing averages of initial values

Parameter estimation in distributed sensor networks:


Regression-based estimates using local sensor measurements

Congestion control in data networks with heterogeneous users

MIT Laboratory for Information and Decision Systems 3


Related Literature
Parallel and Distributed Optimization Algorithms:
General computational model for distributed asynchronous optimization
Tsitsiklis 84, Bertsekas and Tsitsiklis 95

Consensus and Cooperative Control:


Analysis of group behavior (flocking) in dynamical-biological systems
Vicsek 95, Reynolds 87, Toner and Tu 98
Mathematical models of consensus and averaging
Jadbabaie et al. 03, Olfati-Saber Murray 04, Boyd et al. 05

Game Theory/Mechanism Design for Distributed Cooperative Control:


Assign each agent a local utility function such that:
The equilibrium of the resulting game is the same as (or close to) the
global optimum
Derive learning algorithms that reach equilibrium
Marden, Arslan, Shamma 07 used this approach for the consensus problem
to deal with constraints
MIT Laboratory for Information and Decision Systems 4
This Talk
Development of a distributed subgradient method for multi-agent optimization
[Nedic, Ozdaglar 08]
Convergence analysis and performance bounds for time-varying topologies
under general connectivity assumptions

Effects of local constraints [Nedic, Ozdaglar, Parrilo 08]

Effects of networked-system constraints: quantization, delay, asynchronism

MIT Laboratory for Information and Decision Systems 5


Model
We consider a network of m agents with node set V = {1, . . . , m}
Agents want to cooperatively solve f1(x1, . . . , xn)
f2(x1, . . . , xn)
m
X
min fi (x)
xRn
i=1

Function fi (x) : Rn R is a convex objec-


tive function known only by node i fm (x1, . . . , xn)

Agents update and send their information at discrete times t0 , t1 , t2 . . .


We use xi (k) Rn to denote the estimate of agent i at time tk
Agent Update Rule:
Agent i locally minimizes fi according to:
m
X
xi (k + 1) = aij (k)xj (k) i (k)di (k)
j=1

aij (k) are weights, i (k) is stepsize, di (k) is subgradient of fi at xi (k)


ai (k) = (ai1 (k), . . . , aim (k)) represents is time-varying neighbors at slot k
The model includes consensus as a special case (fi (x) = 0 for all i)
MIT Laboratory for Information and Decision Systems 6
Linear Dynamics and Transition Matrices
We let A(s) denote the matrix whose ith column is the vector ai (s) and
introduce the transition matrices
(k, s) = A(s)A(s + 1) A(k 1)A(k) for all k s

We use these matrices to relate xi (k + 1) to xj (s) at time s k:


m k1
m !
i
X i j
X X
x (k+1) = [(k, s)]j x (s) [(k, r + 1)]ij j (r)dj (r) i (k)di (k).
j=1 r=s j=1

We analyze convergence properties of the distributed method by establishing:


Convergence of transition matrices (k, s) (consensus part)
Convergence of an approximate subgradient method (effect of optimization)

MIT Laboratory for Information and Decision Systems 7


Assumptions
Assumption (Weights) At all times k, we have

(a) The weights aij (k) are nonnegative for all agents i, j.

(b) There exists a scalar (0, 1) such that for all agents i,

aii (k) and aij (k)

for agents j communicating with agent i at time k.

(c) The agent weight vectors ai (k) = [ai1 (k), . . . , aim (k)]T are stochastic, i.e.,
Pm i
j=1 aj (k) = 1 for all i and k.

Example: Equal neighbor weights


1
aij (k) =
ni (k) + 1
where ni (k) is the number of agents communicating with i at time k

MIT Laboratory for Information and Decision Systems 8


Information Exchange

Agent i influences any other agent infinitely


often - connectivity
Agent j send his information to neighbor-
ing agent i within a bounded time interval
- bounded intercommunication interval

At slot k, information exchange may be represented by a directed graph (V, Ek ) with


Ek = {(j, i) | aij (k) > 0}
Assumption (Connectivity) The graph (V, E ) is connected, where

E = {(j, i) | (j, i) Ek for infinitely many indices k}.

Assumption (Bounded Intercommunication Interval) There is some B 1 s.t.

(j, i) Ek Ek+1 Ek+B1 for all (j, i) E and k 0.

MIT Laboratory for Information and Decision Systems 9


Properties of Transition Matrices
Lemma: Let Weights and Information Exchange assumptions hold. We then have

[(s + (m 1)B 1, s)]ij (m1)B for all s, i, and j,

where is the lower bound on weights and B is the intercommunication interval


bound.

We introduce the matrices Dk (s) as follows: for a fixed s 0,

Dk (s) = 0 (s + kB0 1, s + (k 1)B0 ) for k = 1, 2, . . . ,

where B0 = (m 1)B.

By the previous lemma, all entries of Dk (s) are positive.

MIT Laboratory for Information and Decision Systems 10


Convergence of Transition Matrices
Lemma: Let Weights and Information Exchange assumptions hold. For each s 0,
we have:

(a) The limit D(s) = limk Dk (s) D1 (s) exists.

(b) The limit D(s) has identical rows and the rows are stochastic.

(c) For every j, the entries [Dk (s) D1 (s)]ji , i = 1, . . . , m, converge to the same
limit j (s) as k with a geometric rate:
k
j B0 B0
[Dk (s) D1 (s)]i j (s) 2 1 + 1

where is the lower bound on weights, B is the intercommunication interval


bound, and B0 = (m 1)B.

MIT Laboratory for Information and Decision Systems 11


Proof Outline
We show that the sequence {(Dk D1 )x} converges for every x Rm

Consider the sequence {xk } with xk = Dk D1 x and write xk as

xk = zk + ck e, where ck = min [xk ]i


1im

Using the property that each entry of the matrix Dk is positive, we show
k
B0
kzk k 1 kz0 k for all k.

Hence zk 0 with a geometric rate.

We then show that the sequence {ck } converges to some c R and use the
contraction constant to establish the rate estimate

The final relation follows by picking x = ej , the j th unit vector

MIT Laboratory for Information and Decision Systems 12


Convergence of Transition Matrices
Proposition: Let Weights and Information Exchange assumptions hold. For each
s 0, we have:

(a) The limit (s) = limk (k, s) exists.

(b) The limit matrix (s) has identical columns and the columns are stochastic, i.e.,

(s) = (s)e0 ,

where (s) Rm is a stochastic vector.

(c) For every i, [(k, s)]ji , j = 1, ..., m, converge to the same limit i (s) as k
with a geometric rate, i.e., for all i, j and all k s,

j 1 + B0 B0
ks
B0
[(k, s)]i i (s) 2 1
1 B0
where is the lower bound on weights, B is the intercommunication interval
bound, and B0 = (m 1)B.

The rate estimate in part (c) recently improved in [Nedic, Olshevsky, Ozdaglar,
Tsitsiklis 08]
MIT Laboratory for Information and Decision Systems 13
Convergence Analysis
Recall the evolution of the estimates (with i (s) = ):
m k1
m !
X X X
xi (k + 1) = [(k, s)]ij xj (s) [(k, r + 1)]ij dj (r) di (k).
j=1 r=s j=1

Proof method: Consider a stopped process, where after time k, agents stop
computing subgradients but keep exchanging their estimates: di (k) = 0 for all
k k and all i.

It can be seen that the stopped process takes the form


m k
m !
X X X
xi (k + 1) = [(k, 0)]ij xj (0) [(k, r)]ij dj (r 1)
j=1 r=1 j=1

Using limk [(k, s)]ij = j (s) for all i, we see that the limit vector
limk xi (k) exists, independent of i, but depends on k:

lim xi (k) = y(k)


k

MIT Laboratory for Information and Decision Systems 14


Behavior of Stopped Process
Stopped process is described by:
m
X
y(k + 1) = y(k) j (k)dj (k)
j=1

Subgradients dj (k) of fj are computed at xj (k) instead of y(k)

This would correspond to an approximate subgradient method for minimizing


P
j fj (x) provided that the values j (k) are the same for all j

Assumption (Doubly Stochastic Weights) The matrices A(k) are doubly


Pm i
stochastic, i.e., i=1 aj (k) = 1 for all j and k.
Can be ensured when the agents exchange their information simultaneously
and coordinate the selection of the weights aij (k)

In this case the stopped process reduces to


m
X j
y(k + 1) = y(k) d (k)
m j=1

MIT Laboratory for Information and Decision Systems 15


Main Convergence Result
Proposition: Let
Weights, Doubly Stochastic Weights, and Information Exchange assumptions hold
The subgradients of fi be uniformly bounded by a constant L, and
max1jm kxj (0)k L.
Then, for the averages xi (k) of estimates xi (0), . . . , xi (k 1) we have
2

m dist (y(0), X ) C
f (xi (k)) f + + L2 + 2mC1
2k 2
P
where f is the optimal value of f = i fi , X is the optimal set,

1 X i
y(0) = x (0), C = 1 + 8mC1
m i
m 1 + B0
C1 = 1 + 1 , B0 = (m 1)B
1 (1 B0 ) B0 1 B0

Estimates are per iteration

Captures tradeoff between accuracy and computational complexity

MIT Laboratory for Information and Decision Systems 16


Proof Outline
We analyze the stopped process:
m
X j
y(k + 1) = y(k) d (k)
m j=1

Establish approximate convergence relation for the running averages


k1
1X
y(k) = y(h)
k
h=0

Using the convergence rate estimate for the transition matrices (k, s), we show
i 1
Pk1 i
that y(k) are close to the agents averages x (k) = k h=0 x (h)

ky(k) xi (k)k 2LC1 for all i and k

Infer the result for the running averages x(k)

MIT Laboratory for Information and Decision Systems 17


Constrained Consensus Problem
Estimates of agent i restricted to lie in a closed convex constraint set Xi

We assume that the intersection set X = m


i=1 Xi is nonempty

Examples where constraints important:


Motion planning and alignment problems, where each agents position is
limited to a certain region or range
Distributed constrained multi-agent optimization

This talk: Pure consensus problem in the presence of constraints

MIT Laboratory for Information and Decision Systems 18


Projected Consensus Algorithm
For the constrained consensus problem, we develop a consensus algorithm based
on projections.

We use xi (k) Xi to denote the estimate of agent i at time tk

Given a closed convex set X Rn , and a vector y Rn , we define:

dist(y, X) = min ky xk, PX (y) = argminxX ky xk


xX

Agent Update Rule:

Agent i updates his estimates subject to his constraint set:


hX
m i
i
x (k + 1) = PXi aij (k)xj (k)
j=1

where ai (k) = (ai1 (k), . . . , aim (k))0 is the weight vector

MIT Laboratory for Information and Decision Systems 19


Connection to Alternating Projections
Update rule similar to classical alternating projection method

Alternating/Cyclic Projection Methods:


Given closed convex sets X1 , X2 Rn , find a
point in X = X1 X2 :
x(k + 1) = PX1 (x(k))
x(k + 2) = PX2 (x(k + 1))
Convergence analysis:
Xi affine [Von Neumann; Aronszajn 50],
Xi convex [Gubin, Polyak, Raik 67]

Constrained Consensus Algorithm:


Given closed convex sets X1 , X2 Rn :
P 1
w (k) = j aj (k)xj (k)
1
P 2
w (k) = j aj (k)xj (k)
2

x1 (k + 1) = PX1 (w1 (k))


x2 (k + 1) = PX2 (w2 (k))

MIT Laboratory for Information and Decision Systems 20


Convergence Analysis
Analysis of the impact of constraints

Recall the update rule


m
X
i
x (k + 1) = aij (k)xj (k) + ei (k),
j=1

where the projection error ei (k) is given by


hX
m i Xm
ei (k) = PXi aij (k)xj (k) aij (k)xj (k) = xi (k + 1) wi (k).
j=1 j=1

Proposition: Assume that the intersection X = m


i=1 Xi is nonempty. Let Doubly
Stochastic Weights assumption hold. Then:

lim ei (k) = 0 for all i.


k

Implication: The analysis translates into unconstrained case

The proof relies on two lemmas

MIT Laboratory for Information and Decision Systems 21


Convergence of Projection Error

Lemma: Let w Rn and X Rn closed and


convex. For all x X,

kw PX (w)k2 kw xk2 kPX (w) xk2 .

Lemma: Let Doubly Stochastic Weights assumption hold. Then:

kxi (k + 1) xk kwi (k) xk x Xi (nonexpansiveness of projection)


Pm i 2 Pm i 2
i=1 kw (k) xk i=1 kx (k) xk x (by doubly stochasticity)
Of independent interest in convergence analysis of doubly stochastic matrices

Implication:
Pm
limk i=1 kwi (k) xk2 kxi (k + 1) xk2 = 0, for all x X

MIT Laboratory for Information and Decision Systems 22


Convergence of the Estimates
Recall that the transition matrices are defined as follows:

(k, s) = A(s)A(s + 1) A(k 1)A(k) for all s and k with k s,

Using the transition matrices and the decomposition of the estimate evolution,
m k
m !
i
X i j
X X
x (k + 1) = [(k, s)]j x (s) + [(k, r)]ij ej (r 1) + ei (k).
j=1 r=s+1 j=1

Use a two-time scale analysis, where we define a similar stopped process


m k
m !
1 X j 1 X X j
y(k) = x (s) + e (r 1)
m j=1 m r=s+1 j=1

It can be seen that the agent estimates reach a consensus:

lim kxi (k) y(k)k = 0 for all i


k

Proposition: (Convergence) Let Weights, Doubly Stochastic Weights, and


Information Exchange assumptions hold. We have for some x X,

lim kxi (k) xk = 0 for all i.


k

MIT Laboratory for Information and Decision Systems 23


Rate Analysis

Assumption (Interior Point): There exists


a vector x such that

m
x int(X) = int i=1 Xi ,

i.e., there exists some scalar > 0 such


that {z | kz xk } X.

Proposition: Let Interior Point Assumption hold. Let the weight vectors ai (k) be
given by ai (k) = (1/m, . . . , 1/m)0 for all i and k. Then,
Xm k X m
1
kxi (k) xk2 1 2
kxi
(0) xk 2
for all k 0,
i=1
4R i=1
Pm
where x X is the limit of the sequence {xi (k)}, and R = 1
i=1 kxi
(0) xk,
i.e., the convergence rate is linear.

Linear convergence rate extends to time-varying weights with Xi = X

Convergence rate for time-varying weights and different local constraints open
MIT Laboratory for Information and Decision Systems 24
Summary
We presented a distributed subgradient method for multi-agent optimization

The method can operate over networks with time-varying connectivity

We proposed a constrained consensus policy for the case when agents have local
constraints on their estimates

This policy has connections to the classical alternating projection method

We analyzed the convergence and rate of convergence of the algorithms

MIT Laboratory for Information and Decision Systems 25


Putting Things Together: Constrained Distributed
Optimization
Each agent has a convex closed local constraint set Xi [Nedic, Ozdaglar,
Parrilo 08]

Agent i updates his estimate by


m
X
v i (k) = aij (k)xj (k)
j=1
h i
i i i
x (k + 1) = PXi v (k) (k)d (k) ,

(k) > 0 is a diminishing stepsize sequence and di (k) is a subgradient of fi (x)


at x = v i (k).

Results:

Agent estimates generated by this algorithm converge to the same optimal


solution for the cases when the weights are constant and equal, and when the
weights are time-varying but Xi = X for all i.

Convergence analysis in the general case open!


MIT Laboratory for Information and Decision Systems 26
Optimization over Random Networks
Existing work focuses on deterministic models of network connectivity (i.e.,
worst-case assumptions about intercommunication intervals)

Time-varying connectivity modeled probabilistically [Lobel, Ozdaglar 08]

Each component l of agent estimates evolve according to

xl (k + 1) = A(k)xl (k) (k)dl (k)

We assume that the matrix A(k) is a random matrix drawn independently over
time from a probability space of stochastic matrices.

This allows edges at any time k to be correlated.

Results:

We establish properties of random transition matrices by constructing positive


probability events in which information propagates from every node to every
other node in the network

We provide convergence analysis of a stochastic subgradient method under


different stepsize rules
MIT Laboratory for Information and Decision Systems 27
Limits on Communication
Quantization Effects:
Agents have access to quantized estimates due to storage and communication
bandwidth constraints [Nedic, Olshevsky, Ozdaglar, Tsitsiklis 07]
Agent i updates his estimate by
m
X
i
x (k + 1) = aij (k)xjQ (k) di (k)
j=1

xiQ (k + 1) = bxi (k + 1)c


bc represents rounding down to the nearest integer multiple of 1/Q
Delay Effects:
Agents have access to outdated estimates due to communication delays
[Bliman, Nedic, Ozdaglar 08]
In the presence of delays, agent i updates his estimate by
m
X
xi (k + 1) = aij (k)xj (k tij (k)) di (k)
j=1

tij (k) is the delay in passing information from j to i


MIT Laboratory for Information and Decision Systems 28
Applications to Social Networks
Growing interest in dynamics in a social network of communicating agents

A specific example is learning and information aggregation over networks

Consensus policies can be used to develop and analyze myopic/quasi-myopic


learning models [Golub, Jackson 07], [Acemoglu, Ozdaglar, ParandehGheibi
08]

In the context of social networks, these rules may be too myopic:


Alternative approach: Bayesian learning over social networks [Acemoglu,
Dahleh, Lobel, Ozdaglar 08]

MIT Laboratory for Information and Decision Systems 29


References
A. Nedic and A. Ozdaglar, Distributed Subgradient Methods for Multi-agent
Optimization, IEEE Transactions on Automatic Control, 2008.
A. Ozdaglar, Constrained Consensus and Alternating Projections, Proc. of
Allerton Conference, 2007.
A. Nedic, A. Olshevsky, A. Ozdaglar, and J.N. Tsitsiklis, On Distributed
Averaging Algorithms and Quantization Effects, IEEE Transactions on
Automatic Control, 2008.
A. Nedic, A. Ozdaglar, and P.A. Parrilo, Constrained Consensus and
Optimization in Multi-agent Networks, submitted for publication, 2008.
I. Lobel and A. Ozdaglar, Distributed Subgradient Methods over Random
Networks, Proc. of Allerton Conference, 2008.
D. Acemoglu, M. Dahleh, I. Lobel, and A. Ozdaglar, Bayesian Learning in
Social Networks, submitted for publication, 2008.
D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi Spread of (Mis)Information
over Social Networks, working paper, 2008.

All papers can be downloaded from web.mit.edu/asuman/www


MIT Laboratory for Information and Decision Systems 30

Anda mungkin juga menyukai