Anda di halaman 1dari 60

  

i

i    
i 
i 

|  |

 
i A famous quote: Program = Algorithm + Data
Structure.
i All of you have programmed; thus have already been
exposed to algorithms and data structure.
i Perhaps you didn't see them as separate entities;
i Perhaps you saw data structures as simple
programming constructs (provided by STL--
STL--
standard template library).
i However, data structures are quite distinct from
algorithms, and very important in their own right.

|  |

 !"|
i The main focus of this course is to introduce you to a
systematic study of algorithms and data structure.
i The two guiding principles of the course are:
abstraction and formal analysis.
i Abstraction: We focus on topics that are broadly
applicable to a variety of problems.
i Analysis: We want a formal way to compare two
objects (data structures or algorithms).
i In particular, we will worry about "always correct"-
correct"-
ness, and worst-
worst-case bounds on time and memory
(space).

|  |

 # $|  %||
i ÷oundations of Algorithm Analysis and Data
Structures.
i Analysis:
ü How to predict an algorithm·s performance
ü How well an algorithm scales up
ü How to compare different algorithms for a problem
i Data Structures
ü How to efficiently store, access, manage data
ü Data structures effect algorithm·s performance

|  |



i  &
i   
|' &  
( ))*  )
( *  )*  )
*  )*  )
*  )* $ 
    
 ))* ( )
 * ( )*  )
*  )*  )
*  )*  +

|  |

&   
i ›onstructions of Euclid
i Newton's root finding
i ÷ast ÷ourier Transform
i ›ompression (Huffman, Lempel-
Lempel-Ziv, GI÷, MPEG)
i DES, RSA encryption
i Simplex algorithm for linear programming
i Shortest Path Algorithms (Dijkstra, Bellman-
Bellman-÷ord)
i Error correcting codes (›Ds, DVDs)
i T›P congestion control, IP routing
i Pattern matching (Genomics)
i Search Engines
|  |

, '  
i Two algorithms for computing the ÷actorial
i Which one is better?

i int factorial (int n) {


if (n <= 1) return 1;
else return n * factorial(n-
factorial(n-1);
}

i j j j 

j 
 

 
   

 
 

}
|  |

Role of Algorithms in Modern World

i Enormous amount of data


ü E-commerce (Amazon, EBay)
ü Network traffic (telecom billing, monitoring)
ü Database transactions (Sales, inventory)
ü Scientific measurements (astrophysics, geology)
ü Sensor networks. R÷ID tags
ü Bioinformatics (genome, protein bank)
i Amazonhired first Chief Algorithms Officer
(Udi Manber)

|  |

A Real-
Real-World Problem
i ›ommunication in the Internet
i Message (email, ftp) broken down into IP packets.
i Sender/receiver identified by IP address.
i The packets are routed through the Internet by special
computers called Routers.
i Each packet is stamped with its destination address,
but not the route.
i Because the Internet topology and network load is
constantly changing, routers must discover routes
dynamically.
i What should the Routing Table look like?

|  |

IP Prefixes and Routing
i Each router is really a switch: it receives packets at
several input ports, and appropriately sends them out to
output ports.
i Thus, for each packet, the router needs to transfer the
packet to that output port that gets it closer to its
destination.
i Should each router keep a table: IP address x Output
Port?
i How big is this table?
i When a link or router fails, how much information would
need to be modified?
i A router typically forwards several million packets/sec!
|  |

Data Structures
i The IP packet forwarding is a Data Structure problem!
i Efficiency, scalability is very important.

i Similarly, how does Google find the documents matching


your query so fast?
i Uses sophisticated algorithms to create index
structures, which are just data structures.
i Algorithms and data structures are ubiquitous.
i With the data glut created by the new technologies,
the need to organize, search, and update MASSIVE
amounts of information ÷AST is more severe than ever
before.
|  |

Algorithms to Process these Data
i Which are the top K sellers?
i ›orrelation between time spent at a web site
and purchase amount?
i Which flows at a router account for > 1%
traffic?
i Did source S send a packet in last s seconds?
i Send an alarm if any international arrival
matches a profile in the database
i Similarity matches against genome databases
Etc.
|  |

Max Subsequence Problem
i Given a sequence of integers A1, A2, «, An, find the
maximum possible value of a subsequence Ai, «, Aj.
i Numbers can be negative.
i You want a contiguous chunk with largest sum.

i Example: -2, 11, -4, 13, -5, -2


i The answer is 20 (subseq. A2 through A4).

i We will discuss 4 different algorithms,


algorithms, with time
complexities O(n3), O(n2), O(n log n), and O(n).
i With n = 106, algorithm 1 may take > 10 years; algorithm
4 will take a fraction of a second!

|  |

Algorithm 1 for Max Subsequence Sum

i Given A1,«,An , find the maximum value of


Ai+Ai+1
i+1+ +Aj0 if the max value is negative
j 
à è

j j
jj
j
j j
j


j j
à è `š `š

j j

 `š    š èè
   š èè   
j
à è à  š è 

j j
j
à è


 , (- ./0


|  |

Algorithm 2
i Idea: Given sum from i to j-
j-1, we can compute the sum
from i to j in constant time.
i This eliminates one nested loop, and reduces the running
time to O(n2).
j  

j j
jj
j
j j

j j
j


j

j j
j


 

|  |

Algorithm 3
i This algorithm uses divide-
divide-and-
and-conquer paradigm.
i Suppose we split the input sequence at midpoint.
i The max subsequence is entirely in the left half,
half,
entirely in the right half,
half, or it straddles the
midpoint..
midpoint
i Example:
left half | right half
4 -3 5 -2 | -1 2 6 -2
i Max in left is 6 (A1 through A3); max in right is
8 (A6 through A7). But straddling max is 11 (A1
thru A7).
|  |

Algorithm 3 (cont.)
i Example:
left half | right half
4 -3 5 -2 | -1 2 6 - 2
i Max subsequences in each half found by recursion.
i How do we find the straddling max subsequence?
i Key Observation:
Observation:
ü Left half of the straddling sequence is the max
subsequence ending with -2.
ü Right half is the max subsequence beginning with -1.

i A linear scan lets us compute these in O(n) time.


|  |

Algorithm 3: Analysis
i Thedivide and conquer is best analyzed
through recurrence:

T(1) = 1
T(n) = 2T(n/2) + O(n)

i This recurrence solves to T(n) = O(n log


n).
|  |

Algorithm 4
'  '        '
 ,| 1 23 | 1 24

'.  5 1 24 5 6 7+. 04 588 0


9
| 81 : 5 ;4

' . | * ,| 0


,| 1 |4
 ' . | 6 2 0
| 1 24
<
  ,|4
<

i Time complexity clearly O(n)


i But why does it work? I.e. proof of correctness
|  |

Proof of ›orrectness
i Max subsequence cannot start or end at a
negative Ai.
i More generally, the max subsequence cannot
have a prefix with a negative sum.
Ex: -2 11 -4 13 -5 -2
i Thus, if we ever find that Ai through Aj sums
to < 0, then we can advance i to j+1
ü Proof. Suppose j is the first index after i when the
sum becomes < 0
ü The max subsequence cannot start at any p between i
and j. Because Ai through Ap-1 is positive, so starting
at i would have been even better.
|  |

Algorithm 4
int maxSum = 0, thisSum = 0;
for( int j = 0; j < a.size( ); j++ )
{
thisSum += a[ j ];
if ( thisSum > maxSum )
maxSum = thisSum;
else if ( thisSum < 0 )
thisSum = 0;
}
return maxSum
‡ The algorithm resets whenever prefix is < 0. Otherwise,
it forms new sums and updates maxSum in one pass.
pass.
|  |

Why Efficient Algorithms Matter
i Suppose N = 10Õ
i A P› can read/process N records in 1 sec.
i But if some algorithm does N*N computation, then it
takes 1M seconds = 11 days!!!

i 100 ›ity Traveling Salesman Problem.


Problem.
ü A supercomputer checking 100 billion tours/sec still requires
10 years!

i ÷ast factoring algorithms can break encryption


schemes. Algorithms research determines what is safe
code length. (> 100 digits)
|  |

How to Measure Algorithm Performance

i What metric should be used to judge algorithms?


ü Length of the program (lines of code)
ü Ease of programming (bugs, maintenance)
ü Memory required
ü Running time

i Y j jjj   


ü Ruantifiable and easy to compare
ü Often the critical bottleneck
|  |

Abstraction
i An algorithm may run differently depending on:
ü the hardware platform (P›, ›ray, Sun)
ü the programming language (›, Java, ›++)
ü the programmer (you, me, Bill Joy)

i While different in detail, all hardware and prog models


are equivalent in some sense: mj j .
.

i It suffices to count basic operations.

i ›rude but valuable measure of algorithm·s performance as


a function of input size.
size.
|  |

Average, Best, and Worst-
Worst-›ase
i On which input instances should the
algorithm·s performance be judged?
i Average case:
ü Real world distributions difficult to predict
i Best case:
ü Seems unrealistic
i Worst case:
ü Gives an absolute guarantee
ü [j
[j 
|  |

Examples
i Vector addition X  !
for (int i=0; i<n; i++)
Z[i] = A[i] + B[i];
m  

i Vector (inner) multiplication A !


z = 0;
for (int i=0; i<n; i++)
z = z + A[i]*B[i];
m  "
|  |

Examples
i Vector (outer) multiplication X  !m
for (int i=0; i<n; i++)
for (int j=0; j<n; j++)
Z[i,j] = A[i] * B[j];
m  

iA program does all the above


m  
|  |

Simplifying the Bound
im       #
 
ü too complicated
ü too many terms
ü Difficult to compare two expressions, each
with 10 or 20 terms
i Do we really need that many terms?

|  |

Simplifications
i Keep just one term!
ü the fastest growing term (dominates the runtime)
i No constant coefficients are kept
ü ›onstant coefficients affected by machines, languages, etc.

i $j%&j (as n gets large) is determined


entirely by the j  term.

ü Example
Examplem
m  '(
ip n = 1,000, then T(n) = 10,001,040,800
i error is 0.01% i we drop all but the n3 term
ü In an assembly line the slowest worker determines the
throughput rate
|  |

Simplification
i Drop the constant coefficient
ü Does not effect the relative order

|  |

Simplification
i The faster growing term (such as 2n)
eventually will outgrow the slower growing
terms (e.g., 1000 n) no matter what their
coefficients!

i Put another way, given a certain increase


in allocated time, a higher order
algorithm will not reap the benefit by
solving much larger problem
|  |

, (   = (
m`è
` ` `` `' ` ` ` 
'`
  Rá Rá  Rá Rá R á á Rá
' 'Rá Rá Rá Rá Rá '  á
 Rá  Rá Rá Rá R á  
á
 Rá ' Rá Rá Rá 'á '

 Rá ' Rá R á  Rá 'á   

  Rá Rá R á á á       
 Rá Rá á á      ' '  
 Rá Rá á  
  '
 Rá á á 
     
  á 'á           

     > =   

|  |

` ` ` ` `' ` '`
  '
' '  
'    
 '   ' '
   '   
 '  ' '   '  '

§§§§
§§§§§
§§§§
§§§§
§§§§

§§§§ §§§

§§§§
§§
§§§§
§
§§§§

§
 

|  |

Another View
i More resources (time and/or processing power) translate into
large problems solved if complexity is low
.0
= +
= 
   +   
=
>2/   >2?  +
>22 >2 >22 >2
>222 > >2 >2
@A >? ?@ /7A
/ >2 AA A7A
A >2 >/ >7/
|  |

Asympotics
m è  
  

 '    ' ' 

  ' '   ' ' 

 '   ' ' 

 '   ' ' 

(      B& C  




|  |

›aveats
i ÷ollow the spirit, not the letter
ü a 100n algorithm is more expensive than n2
algorithm when n < 100
i àther considerations:
ü a program used only a few times
ü a program run on small data sets
ü ease of coding, porting, maintenance
ü memory requirements

|  |

Asymptotic Notations
i Big--à, ´bounded above byµ: T(n) = à(f(n))
Big
ü ÷or some c and N, T(n) @ c f(n) whenever n > N.

i Big--àmega, ´bounded below byµ: T(n) = G(f(n))


Big
ü ÷or some c>0 and N, T(n) ë c f(n) whenever n > N.
ü Same as f(n) = à(T(n)).

i Big--Theta, ´bounded above and belowµ: T(n) = w(f(n))


Big
ü T(n) = à(f(n)) and also T(n) = G(f(n))

i Little--o, ´strictly bounded aboveµ: T(n) = o(f(n))


Little
ü T(n)/f(n) = 0 as n =

|  |

By Pictures
i Big
Big--àh
(most
commonly used)
ü bounded above
i Big
Big--àmega °
ü bounded below
i Big
Big--Theta
ü exactly
i Small
Small--o
°
ü not as expensive as ...

|  |

Example
m `è `  } '` '

 
è 
è



` `
 '
` `
 
` `
|  |

,
 `è á  
 w è

  ` w`  è
`
 w` ' è
` '
 w`  è
`


w`   è
`
  w ` è
` w  `  `  è ` è
`
  w  ` è
|  |

Summary (Why à(n)?)
im       #
 
i Too complicated
i à  
ü a single term with constant coefficient
dropped
i Much simpler, extra terms and
coefficients do not matter asymptotically
i àther criteria hard to quantify
|  |

Runtime Analysis
i Useful rules
ü simple statements (read, write, assign)
ià(1) (constant)
ü simple operations (+ - * / == > >= < <=
ià(1)

ü sequence of simple statements/operations


irule of sums
ü for, do, while loops
irules of products
|  |

Runtime Analysis (cont.)
i Two important rules
ü Rule of sums
iif you do a number of operations in sequence, the
runtime is dominated by the most expensive
operation
ü Rule of products
iif you repeat an operation a number of times, the
total runtime is the runtime of the operation
multiplied by the iteration count

|  |

Runtime Analysis (cont.)
if (cond) then O(1)
body1 T1(n)
else
body2 T2(n)
endif

T(n) = O(max (T1(n), T2(n))

|  |

Runtime Analysis (cont.)
i Method calls
ü A calls B
ü B calls ›
ü etc.
iA sequence of operations when call
sequences are flattened
T(n) = max(TA(n), TB(n), TC(n))

|  |

Example
for (i=1; i<n; i++)
if A(i) > maxVal then
maxVal= A(i);
maxPos= i;

Asymptotic ›omplexity: à(n)

|  |

Example
for (i=1; i<n-
i<n-1; i++)
for (j=n; j>= i+1; j--
j--))
if (A(j-
(A(j-1) > A(j)) then
temp = A(j-
A(j-1);
A(j--1) = A(j);
A(j
A(j) = tmp;
endif
endfor
endfor

i Asymptotic ›omplexity is à(n2)


|  |

Run Time for Recursive Programs

i T(n) is defined recursively in terms of


T(k), k<n
i The recurrence relations allow T(n) to be
´unwoundµ recursively into some base
cases (e.g., T(0) or T(1)).
i Examples:
ü ÷actorial
ü Hanoi towers
|  |

An Analogy: ›ooking Recipes
i Algorithms are detailed and precise
instructions.
i Example: bake a chocolate mousse cake.
ü ›onvert raw ingredients into processed output.
ü Hardware (P›, supercomputer vs. oven, stove)
ü Pots, pans, pantry are data structures.
i Interplay of hardware and algorithms
ü Different recipes for oven, stove, microwave etc.
i New advances.
ü New models: clusters, Internet, workstations
ü Microwave cooking, 5-
5-minute recipes, refrigeration

|  |

  
i   ==      
 ' 7
i ( ''  
( ''  ))'' = &  + '  ' 
  == ( '  '    77 & 
(3 ,     3 & &(  7 
&  (3 ,    3  ( = & & 
  == ( 7
i   , ( ' &  ,   
  '   7
i  
  )) &= ,( 3  = 
  3 D(  3  &E
 3  3 = & '7
|  |



|
i , | E- F ( &
G, 3     7
i
& '- |''     ' H=
&  H7
i  =  - '    I  
 &= 3   '      3 
3 ' E '   =(    7
i |  -      3  
  ''
|  |


 ( J  
i   )   3    & ' (
  )
 = 5=    ( 3 &  &   ,  7
E '   ( 3  3    77 ' 3
     D7   =  (
= '   &  (  5= +- >)) 7 >22 
= '   &  (  5= +- >
 5=4 ?K =(  7 KL =(  E 4   7 =
' 7
i   ' ' '  D  
 ( J3 
& 5=  =   3   D  5=
 =    (7
i     $
 =(  .(    
=   3     07  ')
 3     07  ')  
   '    = & AM  ANAM )
AM  ANAM )>
7 3   &      . 07  '(
 ))  (-
  ' (  O3 E(. .O00
6 E(.O07 .,   3 &    0
|  |

| 
i  '    '  ' 3
3  7 F    3     
E   &7 F    =(  3
&   3  3  3 ' 3 '
    . 07
i ( -    23 >3  A 7
i $  ' ' - 4 , 4
=)))
= )=  7P7
i ( |  -     ' =( 
       7
i F E  =(   |     HH
 (- '  (  O3  E(   ' =  6
E(.O0  E(    =  *
E(.O07
|  |

|( 
i |'
|'))5  ( | 
ü    ." 3 & =3
)  0   . 0 &  3
=        3 
D ,   ' =3  = 7
ü |(  I   ( , =
 4   ( (  
    3  ( " %
    7
|  |


i ' 3  E      ' '   
,  . 07
i 3 &  &  E))=     3 & E
3 &  &  E
 & '   (-
ü = ' E 3
ü  
 7
i )        ( '  773 
       F   7
i   (  (   (  ( 3
   & = '   &   ' 7
i | & &   +  = '  3 & &
  + '  )      +7
i  3   &   E( ' = @2))A222  3 
 3   &   E( ' = @2
&     = ' 7
|  |

# 
|
i #  '  ' 
=    =5  
=  =7
i $( 
$( ))' =  =   =(
     7
i
   '  ( 
  (      '
   =
|  |

, ' #
i  ( -
 1  4  1  '   &  
 1  4  1  '   &  )) 
' 7 .& G 1 '4  4  (0
i   -
 1  4  1 E7
i  - .Q  '  0
 1 4  1 'GD G)) 
 1 4  1 'GD G
i  -
 1 4  1 D4

|  |

| 
i .MA0      
,3 | 
i E | ) =  "
E | )
i 
ü        
       )) .0 7
ü    $)
   $) .  0  7
i JE
ü  '  '        7
ü  I ,     .  03   &
 I ,     .  03   & )) 
.MA04
i $
ü   , '  
  , '  )))
)D7
ü .0 1 AM  8  1  8 7

|  |


=
=
i $ |D
i   
i ! 
i    $+
i (
=
i | $ 

|  |

  
i 5( $ ,3  |
i | 
 3 F  | 
 3
i | 
|  )) $3 | 
R
S 3 LES  3
i    D3 E E3
i   3
i (
7

|  |


Anda mungkin juga menyukai