=
= + O =
+ O =
(
+ O =
+ O =
1
0
2
1
0
2
1
0
2
1
0
2
) ] [ ] [ ( ]) [ ( ) (
n) expectatio of linearity (by )] ( [ ) (
) ( ) ( )] ( [
have we n, expectatio
of linearity using and sides both of ns expectatio Taking
) ( ) ( ) (
n
i
i
n
i
i
n
i
i
n
i
i
X aE aX E n E O n
n O E n
n O n E n T E
n O n n T
(8.1)
linsort - 20
Lin / Devi
Comp 122
Analysis Contd.
Claim: E[n
i
2
] = 2 1/n.
Proof:
Define indicator random variables.
X
ij
= I{A[j] falls in bucket i}
Pr{A[j] falls in bucket i} = 1/n.
n
i
=
=
n
j
ij
X
1
(8.2)
linsort - 21
Lin / Devi
Comp 122
Analysis Contd.
s s
=
s s =
= s s
=
s s
= =
=
+ =
(
(
+ =
(
=
(
(
|
|
.
|
\
|
=
n j
k j
n k
ik ij
n
j
ij
n
j n j
k j
n k
ik ij ij
ik
n
j
n
k
ij
n
j
ij i
X X E X E
X X X
X X E
X E n E
1 1 1
2
1 1 1
2
1 1
2
1
2
n. expectatio of linearity by , ] [ ] [
E
] [
(8.3)
linsort - 22
Lin / Devi
Comp 122
Analysis Contd.
2
2
2 2
1
1 1
] [ ] [ ] [
variables.
random t independen are and , Since
: for ] [
1
1
1
1
1 0
} bucket in falls ] [ Pr{ 1
} bucket in fall t doesn' ] [ Pr{ 0 ] [
n n n
X E X E X X E
X X k j
k j X X E
n
n n
i j A
i j A X E
ik ij ik ij
ik ij
ik ij
ij
= =
=
=
=
=
+
|
.
|
\
|
=
+ =
linsort - 23
Lin / Devi
Comp 122
Analysis Contd.
) (
) ( ) (
) / 1 2 ( ) ( )] ( [
.
1
2
1
1
1
) 1 (
1
1 1
] [
1
0
2
1 1 1
2
2
n
n O n
n O n n T E
n
n
n
n
n n
n
n
n n
n E
n
i
n
j n j
j k
n k
i
O =
+ O =
+ O =
=
+ =
+ =
+ =
=
= s s
=
s s
Substituting (8.2) in (8.1), we have,
(8.3) is hence,
Comp 122, Spring 2004
Hash Tables 1
linsort - 25
Lin / Devi
Comp 122
Dictionary
Dictionary:
Dynamic-set data structure for storing items indexed
using keys.
Supports operations Insert, Search, and Delete.
Applications:
Symbol table of a compiler.
Memory-management tables in operating systems.
Large-scale distributed systems.
Hash Tables:
Effective way of implementing dictionaries.
Generalization of ordinary arrays.
linsort - 26
Lin / Devi
Comp 122
Direct-address Tables
Direct-address Tables are ordinary arrays.
Facilitate direct addressing.
Element whose key is k is obtained by indexing into
the k
th
position of the array.
Applicable when we can afford to allocate an array
with one position for every possible key.
i.e. when the universe of keys U is small.
Dictionary operations can be implemented to take
O(1) time.
Details in Sec. 11.1.
linsort - 27
Lin / Devi
Comp 122
Hash Tables
Notation:
U Universe of all possible keys.
K Set of keys actually stored in the dictionary.
|K| = n.
When U is very large,
Arrays are not practical.
|K| << |U|.
Use a table of size proportional to |K| The hash tables.
However, we lose the direct-addressing ability.
Define functions that map keys to slots of the hash table.
linsort - 28
Lin / Devi
Comp 122
Hashing
Hash function h: Mapping from U to the slots of a
hash table T[0..m1].
h : U {0,1,, m1}
With arrays, key k maps to slot A[k].
With hash tables, key k maps or hashes to slot
T[h[k]].
h[k] is the hash value of key k.
linsort - 29
Lin / Devi
Comp 122
Hashing
0
m1
h(k
1
)
h(k
4
)
h(k
2
)=h(k
5
)
h(k
3
)
U
(universe of keys)
K
(actual
keys)
k
1
k
2
k
3
k
5
k
4
collision
linsort - 30
Lin / Devi
Comp 122
Issues with Hashing
Multiple keys can hash to the same slot
collisions are possible.
Design hash functions such that collisions are
minimized.
But avoiding collisions is impossible.
Design collision-resolution techniques.
Search will cost (n) time in the worst case.
However, all operations can be made to have an
expected complexity of (1).
linsort - 31
Lin / Devi
Comp 122
Methods of Resolution
Chaining:
Store all elements that hash to the same
slot in a linked list.
Store a pointer to the head of the linked
list in the hash table slot.
Open Addressing:
All elements stored in hash table itself.
When collisions occur, use a systematic
(consistent) procedure to store elements
in free slots of the table.
k
2
0
m1
k
1
k
4
k
5
k
6
k
7
k
3
k
8
linsort - 32
Lin / Devi
Comp 122
Collision Resolution by Chaining
0
m1
h(k
1
)=h(k
4
)
h(k
2
)=h(k
5
)=h(k
6
)
h(k
3
)=h(k
7
)
U
(universe of keys)
K
(actual
keys)
k
1
k
2
k
3
k
5
k
4
k
6
k
7
k
8
h(k
8
)
X
X
X
linsort - 33
Lin / Devi
Comp 122
k
2
Collision Resolution by Chaining
0
m1
U
(universe of keys)
K
(actual
keys)
k
1
k
2
k
3
k
5
k
4
k
6
k
7
k
8
k
1
k
4
k
5
k
6
k
7
k
3
k
8
linsort - 34
Lin / Devi
Comp 122
Hashing with Chaining
Dictionary Operations:
Chained-Hash-Insert (T, x)
Insert x at the head of list T[h(key[x])].
Worst-case complexity O(1).
Chained-Hash-Delete (T, x)
Delete x from the list T[h(key[x])].
Worst-case complexity proportional to length of list with
singly-linked lists. O(1) with doubly-linked lists.
Chained-Hash-Search (T, k)
Search an element with key k in list T[h(k)].
Worst-case complexity proportional to length of list.
linsort - 35
Lin / Devi
Comp 122
Analysis on Chained-Hash-Search
Load factor o=n/m = average keys per slot.
m number of slots.
n number of elements stored in the hash table.
Worst-case complexity: O(n) + time to compute h(k).
Average depends on how h distributes keys among m slots.
Assume
Simple uniform hashing.
Any key is equally likely to hash into any of the m slots,
independent of where any other key hashes to.
O(1) time to compute h(k).
Time to search for an element with key k is O(|T[h(k)]|).
Expected length of a linked list = load factor = o = n/m.
linsort - 36
Lin / Devi
Comp 122
Expected Cost of an Unsuccessful Search
Proof:
Any key not already in the table is equally likely to hash
to any of the m slots.
To search unsuccessfully for any key k, need to search to
the end of the list T[h(k)], whose expected length is .
Adding the time to compute the hash function, the total
time required is (1+).
Theorem:
An unsuccessful search takes expected time (1+).
linsort - 37
Lin / Devi
Comp 122
Expected Cost of a Successful Search
Proof:
The probability that a list is searched is proportional to the number
of elements it contains.
Assume that the element being searched for is equally likely to be
any of the n elements in the table.
The number of elements examined during a successful search for
an element x is 1 more than the number of elements that appear
before x in xs list.
These are the elements inserted after x was inserted.
Goal:
Find the average, over the n elements x in the table, of how many elements
were inserted into xs list after x was inserted.
Theorem:
A successful search takes expected time (1+).
linsort - 38
Lin / Devi
Comp 122
Expected Cost of a Successful Search
Proof (contd):
Let x
i
be the i
th
element inserted into the table, and let k
i
= key[x
i
].
Define indicator random variables X
ij
= I{h(k
i
) = h(k
j
)}, for all i, j.
Simple uniform hashing Pr{h(k
i
) = h(k
j
)} = 1/m
E[X
ij
] = 1/m.
Expected number of elements examined in a successful search is:
Theorem:
A successful search takes expected time (1+).
(
|
|
.
|
\
|
+
= + =
n
i
n
i j
ij
X
n
E
1 1
1
1
No. of elements inserted after x
i
into the same slot as x
i
.
linsort - 39
Lin / Devi
Comp 122
Proof Contd.
n
m
n
n n
n
nm
i n
nm
i n
nm
m n
X E
n
X
n
E
n
i
n
i
n
i
n
i
n
i j
n
i
n
i j
ij
n
i
n
i j
ij
2 2
1
2
1
1
2
) 1 ( 1
1
1
1
) (
1
1
1
1
1
] [ 1
1
1
1
2
1 1
1
1 1
1 1
1 1
o o
+ =
+ =
|
.
|
\
|
+
+ =
|
.
|
\
|
+ =
+ =
|
|
.
|
\
|
+ =
|
|
.
|
\
|
+ =
(
|
|
.
|
\
|
+
= =
=
= + =
= + =
= + =
(linearity of expectation)
Expected total time for a successful search
= Time to compute hash function + Time
to search
= O(2+o/2 o/2n) = O(1+ o).
linsort - 40
Lin / Devi
Comp 122
Expected Cost Interpretation
If n = O(m), then o=n/m = O(m)/m = O(1).
Searching takes constant time on average.
Insertion is O(1) in the worst case.
Deletion takes O(1) worst-case time when lists are doubly
linked.
Hence, all dictionary operations take O(1) time on
average with hash tables with chaining.