Hashing
David Kaplan
So …
How about O(1) insert/find/delete for any key type?
Adrien Adrien
2 roller-blade demon
Adrien roller-blade demon
Hannah Hannah
5 C++ guru
Hannah C++ guru
Hannah
Dave
Adrien f(x)
Donald
Ed
2
Example:
tableSize = 7 3
insert(4) 4
insert(17)
5
find(12)
insert(9) 6
delete(17)
Problems:
hash(“really, really big”) = well… something really, really big
Simplify computation
Use Horner’s Rule
int hash(String s) {
h = 0;
for (i = s.length() - 1; i >= 0; i--) {
h = (s[i] + 128*h) % tableSize;
}
return h;
}
no restriction on size!
when building a static table, we can try several values of A
more computationally intensive than a single mod
Definition:
H is a universal collection of hash functions if and only if …
For any two keys k1, k2 in K, there are at most |H|/m functions in H for
which h(k1) = h(k2).
Conclusions
Worst Enemy never knows which hash function we will choose –
neither do we!
No single input (set of keys) can always evoke worst-case behavior
r
ha(k) = ai ki mod size
i 0
Weaknesses
must choose prime table size larger than any ki
Weaknesses
need to turn non-integer keys into integers
[private]
Dictionary & findBucket(const Key & k) {
return table[hash(k)%table.size];
}
successful search:
5 5 5 5 5 5
40 40 40 40
6 6 6 6 6 6
76 76 76 76 76 76
probes: 1 1 1 3 1 3
Load Factor in Linear Probing
For any < 1, linear probing will find an empty slot
Search cost (for large table sizes)
successful search: 1 1
1
2 1
unsuccessful search: 1 1
1
2
2 1
Linear probing suffers from primary clustering
Performance quickly degrades for > 1/2
2 2 2 2 2
5 5
3 3 3 3 3
55
4 4 4 4 4
5 5 5 5 5
40 40 40 40
6 6 6 6 6
76 76 76 76 76
probes: 1 1 2 3 3
Bad Quadratic Probing Example
insert(76) insert(93) insert(40) insert(35) insert(47)
76%7 = 6 93%7 = 2 40%7 = 5 35%7 = 0 47%7 = 5
0 0 0 0 0
35 35
1 1 1 1 1
2 2 2 2 2
93 93 93 93
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
40 40 40
6 6 6 6 6
76 76 76 76 76
probes: 1 1 1 1
Quadratic Probing Succeeds
for ½
If size is prime and ½, then quadratic probing will
find an empty slot in size/2 probes or fewer.
show for all 0 i, j size/2 and i j
(h(x) + i2) mod size (h(x) + j2) mod size
by contradiction: suppose that for some i, j:
(h(x) + i2) mod size = (h(x) + j2) mod size
i2 mod size = j2 mod size
(i2 - j2) mod size = 0
[(i + j)(i - j)] mod size = 0
but how can i + j = 0 or i + j = size when
i j and i,j size/2?
same for i - j mod size = 0
1 1 1 1 1 1
47 47 47
2 2 2 2 2 2
93 93 93 93 93
3 3 3 3 3 3
10 10
4 4 4 4 4 4
55
5 5 5 5 5 5
40 40 40 40
6 6 6 6 6 6
76 76 76 76 76 76
probes: 1 1 1 2 1 2
Load Factor in Double Hashing
For any < 1, double hashing will find an empty slot
(given appropriate table size and hash2)
unsuccessful search: 1
1
No primary clustering and no secondary clustering
Issues
Which data structure should we use?
Which type of hash function should we use?
…
How many pointers
does each use?
functions
Table contains
buckets, each fitting in one disk block, with the data
a directory that fits in one disk block is used to hash
to the correct bucket
directory for k = 3
000 001 010 011 100 101 110 111
insert(11011)
insert(11000)
000 001 010 011 100 101 110 111