Greedy techniques
CSC 3102
0.1
Greedy Techniques
CSC 3102
0.2
Basics
Constructing a solution to an optimization problem through a sequence of
steps, each expanding a partially constructed solution obtained so far, until a complete solution to the problem is reached.
On each step, the choice made must be feasible, locally optimal and inrevocable.
Examples:
Constructing a minimum spanning tree (MST) of a weighted connected graph Grows a MST through a greedy inclusion of the nearest vertex or shortest
edge to the tree under construction Prims and Kruskals algorithms Solving the single-source shortest-path problem Finds shortest paths from a given vertex (the source) to all other vertices. Dijkstras algorithm Huffman tree and code A binary tree that minimizes the weighted path length from the root to the leaves containing a set of predefined weights An optimal prefix-free variable-length encoding scheme.
CSC 3102
0.3
CSC 3102
0.4
a
5
1 2 9
b
7
possible way so that there will be a path between every pair of points Graph representation, e.g., a network Solve the minimum spanning tree problem.
A spanning tree of a connected graph is its
1
a
5
1 9
b
7
graph is its spanning tree of the smallest weight. The weight of a tree is the sum of the weights on all its edges Sum of the lengths of all edges If the edge weights are unique, then there will be only one minimum spanning tree otherwise more than one MST exist.
CSC 3102
0.5
3 W=9
c
W = 17
1 2
Constructing a MST
Exhaustive search approach: List all spanning trees and find the one with
Efficient algorithms for finding a MST for a connected weighted graph Prims algorithm (R.C. Prim, 1957) Constructs a MST one vertex at a time by including the nearest vertex to the vertices already in the tree
Kruskals algorithm (J.B. Kruskal, 1956) Constructs a MST one edge at a time by selecting edges in increasing
order of their weights provided that the inclusion does not create a cycle.
CSC 3102
0.6
Prims Algorithm
Constructs a MST through a sequence of expanding subtrees The initial subtree (VT) in such a sequence consists of a single vertex selected
The algorithm stops after all the graphs vertices have been included in the
at each iteration The MST is then defined by the set of edges used for the tree expansions.
CSC 3102
0.7
Pseudocode
Each vertex (u) not in the current tree (VT) needs
label in the V - VT
Attach two labels to each non-tree vertex u Name of the nearest tree vertex and weight of the corresponding edge For vertices that are not adjacent to any tree vertex, the name label is null and the weight is infinity Split non-tree (u) vertices into two sets Fringe: us are adjacent to, at least, one tree vertex Unseen: us are yet to be affected by the algorithm.
Algorithm Prim(G) // Input: weighted connected // graph G = V, E // Output: ET , the set of edges // composing a MST of G. VT {v0} ET for i 1 to |V | -1 do find a minimum weight edge e* = (v*, u*) among all edges (v, u) such that v is in VT and u is in V - VT VT VT {u*} ET ET {e*} Return ET
Two operations after finding vertex u* to be added to the tree VT Move u* from the set V - VT to the minimum spanning tree VT For each remaining vertex u in V - VT that is connected by a shorter edge than us current distance label, update its labels by u* and weight of the edge between u* and u.
CSC 3102
0.8
Example
Tree vertices and remaining vertices. Selected vertex on each iteration is shown in bold. The labels indicate the nearest tree vertex and edge weight. a(-, -) b(a, 3) c(-, ) d(-, ) e(a, 6) f(a, 5) b(a, 3) c(b, 1) d(-, ) e(a, 6) f(b, 4) c(b, 1) d(c, 6) e(a, 6) f(b, 4) f(b, 4) d(f, 5) e(f, 2) e(f, 2) d(f, 5) d(f, 5)
b
3 4 5
1 4
c
6 5 2
a
6
b
3 4
c
5 2
CSC 3102
e
B.B. Karki, LSU
Correctness
Correctness can be proved by induction: T0 consisting of a single vertex must be a
e v
Ti-1
G-Ti-1
Graph G
Because T is a spanning tree it contains a unique path from v to u, which together with edge e forms a cycle in G. This path has to include another edge f (v, u) connecting Ti-1 to G-Ti-1 T+e-f is another spanning tree, with a smaller weight than T as e has smaller weight than f So T was not minimum, which is what we wanted to prove.
0.10
CSC 3102
Efficiency
Efficiency depends on the data structures chosen for the graph itself and the
priority queue of the set V- VT whose vertex priorities are the distances (edge weights) to the nearest tree vertices.
For a graph represented by its weight (adjacency) matrix and the priority
its children. The root contains the smallest element. Deletion of smallest element and insertion of a new element in a minheap of size n are O(log n) operations, and so is the operation of changing an elements priority.
For a graph represented by its adjacency linked lists and the priority queue
remember, for each vertex, the smallest edge connecting VT with that vertex.
Perform |V | -1 steps in
which we remove the smallest element in the heap, and at most 2 |E| steps in which we examine an edge e = (v, u). For each of these steps, we might replace a value on the heap, reducing its weight.
Algorithm PrimWithHeaps(G) VT {v0} ET make a heap of values (vertex, edge, wt(edge)) for i 1 to |V | -1 do let (u*, e*, wt(e*)) have the smallest weight in the heap remove (u*, e*, wt(e*)) from the heap add u* and e* to VT for each edge e = (u*, u) do if u is not already in VT find value (u, f, wt(f)) in heap if wt(e) < wt(f) replace (u, f, wt(f)) with (u, e, wt(e)) return ET
CSC 3102
0.12
Kruskals Algorithm
A greedy algorithm for constructing a minimum spanning tree (MST) of a weighted
connected graph.
Finds an acyclic subgraph with |V| - 1 edges for which the sum of the edge weights is the
smallest. Constructs a MST as an expanding sequence of subgraphs, which are always acyclic but are not necessarily connected until the final stage. The algorithm begins by sorting the graphs edges in non-decreasing order of their
weights and then scans the list adding the next edge on the list to the current subgraph provided that the inclusion does not create a cycle. Algorithm Kruskal(G) ET ; ecounter 0 k0 while encounter < |V| - 1 for k k + 1 to n do if ET {ei,k} is acyclic ET ET {ei,k}; ecounter ecounter + 1 return ET
CSC 3102
0.13
Example
b
3 4 5 1 4
c
6 5 2
Sorted list of tree edges: the selected edges are shown in red. bc ef ab bf cf af df ae cd de 1 2 3 4 4 5 5 6 6 8 Picking up any of the remaining edges (cf, af, ae, cd, de) will create a cycle. For a graph of 6 vertices, only five edges need to be picked up.
a
6
b
3 4
c
5 2
Total weight = 15
CSC 3102
0.14
e
B.B. Karki, LSU
On each iteration (operation), the algorithm takes next edge (u, v) from the ordered list
of the graph edges, finds the trees containing the vertices u and v, and, if these trees are not the same, unites them in a larger tree by adding the edge. This avoids a cycle.
Checking whether two vertices belong to two different trees requires an application of
v e u
CSC 3102
0.15
Union-Find Algorithm
Kruskals algorithm requires a dynamic partition of some n-element set S into a
element of S. Union-find operation: acts on the collection of n one-element subsets to give larger subsets. Abstract data type for the finite set: makeset(x) - creates an one-element set {x} find(x) - returns a subset containing x union(x,y) - constructs the union of disjoint subsets containing x and y. Subsets representative: Use one element from each of the disjoint subsets in a collection Two principal implementations Quick find - uses an array indexed by the elements of the set and the arrays values indicate the subsets representatives containing those elements. Each subset is implemented as a linked list. Quick union - represents each subset by a rooted tree with one element per node and the roots element as the subsets representative.
CSC 3102
0.16
CSC 3102
0.17
Problem Statement
For a given vertex called the source in a
weighted connected graph, find the shortest paths to all its other vertices.
7 source to a different vertex in the graph. 3 4 c b The resulting tree is a spanning tree. A variety of applications exist: 2 3 5 6 to find shortest route between two cities. 4 7 a e d Dijkstras algorithm finds the shortest paths to the graphs vertices in order of their 9 5 distance from a given source. A tree representing all possible Works for a graph with nonnegative shortest paths to four vertices, edge weights. b, d, c and e from the source a of path lengths of 3, 5, 7 and 9, Different versions of the problem: respectively. Single-pair shortest-path problem Single-destination shortest-paths problem If the source is different, then a All pairs shortest-paths problem different tree results. Traveling salesman problem.
0.18
CSC 3102
Dijkstras Algorithm
Dijkstras algorithm works in the same way as the Prims algorithm does. Both construct an expanding subtree of vertices by selecting the next vertex from the priority queue of the remaining vertices and using similar labeling. However, the priorities are computed in differently: Dijkstras algorithm compares path lengths (by adding edge weights) while Prims algorithm compares the edge weights as given. The algorithm works by first finding the shortest path from the source to a
the source form a subtree Ti of the given graph. The next vertices nearest to the source can be found among the vertices adjacent to the vertices of Ti. These adjacent vertices are referred to as fringe vertices. They are the candidates from which the algorithm selects the next vertex to the source.
CSC 3102
0.19
Labeling
For every fringe vertex u, the algorithm computes the sum of the distance to the nearest vertex v and the length dv of the shortest path from the source to v, and selects the vertex with the smallest such sum. Each vertex has two labels.
The numeric label d indicates the length of the shortest path from
v0
v* u*
the source to this vertex found by the algorithm so far When a vertex is added to the tree, d indicates the length of the shortest path from the source to that vertex. The other label indicates the name of the next-to-last vertex on such a path The parent of the vertex in the tree being constructed.
With such labeling, finding the next nearest vertex u* becomes a simple task of finding a
Pseudocode
Shows explicit operations on two
a shortest path has already been found. The priority queue Q of the fringe vertices. Initialize: initialize vertex
priority queue to empty. Insert: initialize vertex priority in the priority queue. Decrease: update priority of s with ds. DeleteMin: delete the minimum priority element.
CSC 3102
0.21
Algorithm Dijkstra(G) // Input: weighted connected graph // G = V, E and its vertex s // Output: The length dv of a shortest path from // s to v, and its penultimate vertex pv // for every vertex v in V (pv is the list // of predecessors for each v ) Initialize (Q) for every vertex v in V do dv ; pv Insert (Q, v, dv ) dv 0; Decrease (Q, s, ds ) VT for i 0 to |V | - 1 do u* DeleteMin(Q) VT VT {u*} for every vertex u in V - VT that is adjacent to u* do if du* + w(u*, u) < du du du* + w(u*, u); pu u* Decrease (Q, u, du )
Example
Tree vertices and remaining vertices. Selected
vertex on each iteration is shown in bold. The labels indicate the nearest tree vertex and path length.
a(-, 0)
b
3 2 7
4 5
c
6 4
3 b
3 2
7
4
c
4
d 5
e 9
e(d, 9)
e(d, 9)
b: d: c: e:
0.22
CSC 3102
Correctness
Correctness can be proved by induction: For i = 1, the assertion is true for the trivial path from the source to itself. For general step, assume that it is true for the algorithms tree Ti with i
vertices. Let vi+1 be the vertex to be added next to the tree by the algorithm.
All vertices on a shortest path from s to vi+1 must be in Ti because they are
minimizing the sum of dv and the length of the edge from u to an adjacent vertex not in the tree. dv is the shortest path from s to v (contained in Ti) by the assumption of induction.
CSC 3102
0.23
Efficiency
structures chosen for the graph itself and the priority queue.
For a graph represented by its weight matrix and the priority queue
CSC 3102
0.24
CSC 3102
0.25
scheme that assigns bit strings to characters based on their frequencies in a given text. Uses a greedy construction of binary tree whose leaves represent the alphabet characters and whose left and right edges are labeled with 0s and 1s. Assigns shorter bits to high-frequency characters and longer ones to low-frequency characters.
CSC 3102
0.26
Huffmans Algorithm
Initialize n one-node trees and label them with characters of the alphabet with the
subtrees of a new tree and record the sum of their weights in the root of the new tree as its weight The resulting binary tree is called Huffman tree
Obtain the codeword of a character by recording the labels (0 or 1) on the simple
path from the root to the characters leaf This is the Huffman code It provides an optimal encoding Dynamic Huffman encoding Coding tree is updated each time a new character is read from the source text
CSC 3102
0.27
A 0.35 11
B 0.1 100
C 0.2 00
D 0.2 01
0.15 101
1.0
l = li pi = 2.25
i=1
0.4
2
0.6
Variance: Var =
(l l )
i i=1
pi 0.19
0.2 C 0.2 D 0.25 0.35 A
In the fixed-length scheme each codeword will contain three bits. So Huffman code results in compression by (3 - 2.25) = 0.75, which is 25 %.
0.1 B
0.15 -
CSC 3102
0.28