HASHING
All the programs in this file are selected from
Ellis Horowitz, Sartaj Sahni, and Susan Anderson-Freed
Fundamentals of Data Structures in C,
Computer Science Press, 1992.
CHAPTER 8
Symbol Table
Definition
A set of name-attribute pairs
Operations
CHAPTER 8
CHAPTER 8
CHAPTER 8
Static Hashing
hash table (ht)
b buckets
0
1
2
.
.
b-2
b-1
.
.
.
.
.
.
f(x): 0 (b-1)
...
...
...
.
.
.
s slots
CHAPTER 8
Synonyms
CHAPTER 8
CHAPTER 8
Example
synonyms:
char, ceil,
clock, ctime
overflow
0
1
2
3
4
5
6
25
Slot 0
acos
Slot 1
atan
char
define
exp
float
ceil
floor
synonyms
synonyms
Hashing Functions
Two requirements
easy computation
minimal number of collisions
division
f D ( x) x% M
(0 ~ (M-1))
CHAPTER 8
10
000000A1 =
000000B1 =
000000C1 =
00000X41 =
00NTXY1 =
000001011100
000010011100
000011011100
011000011100
011000011001011100
M=2i, i 6
Similarly, AXY, BXY, WTXY (M=2i, i 12) have the same bucket address
CHAPTER 8
11
(3)
CHAPTER 8
12
X=X1X2
Y=X2X1
X1 --> C(X1)
X2 --> C(X2)
Each character is represented by six bits
X: C(X1) * 26 + C(X2)
Y: C(X2) * 26 + C(X1)
(fD(X) - fD(Y)) % P
(where P is a prime number)
= C(X1) * 26 % P + C(X2) % P - C(X2) * 26 % P- C(X1) % P
P=3
64 % 3 * C(X1) % 3 + C(X2) % 3 64 % 3 * C(X2) % 3- C(X1) % 3
= C(X1) % 3 + C(X2) % 3 - C(X2) % 3- C(X1) % 3
=0
P = 7?
M is a prime number such that M does not divide rka for
small k and a (Knuth)
13
for example, M = 1009CHAPTER 8
Hashing Functions
Folding
CHAPTER 8
14
123
P1
shift folding
203
P2
241
P3
112
P4
20
P5
123
203
241
112
20
699
folding at
the boundaries
123
203
241
112
20
CHAPTER 8
15
Digital Analysis
Xm dm1 dm2
dmn
Select 3 digits from n
Criterion:
Delete the digits having the most skewed distributions
CHAPTER 8
16
Overflow Handling
CHAPTER 8
17
CHAPTER 8
18
CHAPTER 8
19
Example
Identifier
for
do
while
if
else
function
Additive Transform
102+111+114
100+111
119+104+105+108+101
105+102
101+108+115+101
102+117+110+99+116+105+111+110
CHAPTER 8
x
327
211
537
207
425
870
Hash
2
3
4
12
9
12
20
Linear Probing
(linear open addressing)
21
Linear Probing
void linear_insert(element item, element ht[])
{
int i, hash_value;
I = hash_value = hash(item.key);
while(strlen(ht[i].key)) {
if (!strcmp(ht[i].key, item.key))
fprintf(stderr, Duplicate entry\n);
exit(1);
}
i = (i+1)%TABLE_SIZE;
if (i == hash_value) {
fprintf(stderr, The table is full\n);
exit(1);
}
}
ht[i] = item;
}
CHAPTER 8
22
CHAPTER 8
23
Coalesce Phenomenon
bucket
0
2
4
6
8
10
x
bucket searched
acos
1
char
1
exp
1
cos
5
atol
9
ctime
9
bucket
1
3
5
7
9
25
x
bucket searched
atoi
2
define
1
ceil
4
float
3
floor
5
24
Quadratic Probing
25
rehashing
CHAPTER 8
26
27
Chain Insert
void chain_insert(element item, list_pointer ht[])
{
int hash_value = hash(item.key);
list_pointer ptr, trail=NULL, lead=ht[hash_value];
for (; lead; trail=lead, lead=lead->link)
if (!strcmp(lead->item.key, item.key)) {
fprintf(stderr, The key is in the table\n);
exit(1);
}
ptr = (list_pointer) malloc(sizeof(list));
if (IS_FULL(ptr)) {
fprintf(stderr, The memory is full\n);
exit(1);
}
ptr->item = item;
ptr->link = NULL;
if (trail) trail->link = ptr;
else ht[hash_value] = ptr;
}
CHAPTER 8
28
[0]
[1]
[2]
[3]
[4]
[5]
[6]
[25]
# of key comparisons=21/11=1.91
CHAPTER 8
29
.50
.75
.90
=n/b
hashing function chain/open chain/open chain/open
mid square
1.26/1.73 1.40/9.75 1.45/37.14
division
1.19/4.52 1.31/7.20 1.38/22.42
shift fold
1.33/21.75 1.48/65.10 1.40/77.01
Bound fold
1.39/22.97 1.57/48.70 1.55/69.63
digit analysis
1.35/4.55 1.49/30.62 1.52/89.20
theoretical
1.25/1.50 1.37/2.50 1.45/5.50
CHAPTER 8
.95
chain/open
1.47/37.53
1.41/25.79
1.51/118.57
1.51/97.56
1.52/125.59
1.48/10.50
30
dynamic hashing
(extensible hashing)
31
100 000
100 001
101 000
101 001
110 000
110 001
110 010
110 011
from LSB
to MSB
allocation:
lower order
two bits
32
0
1
a0,b0
c2
a1,b1
c3
+c1
a0,b0
c2
a1,b1
c5
1
c3
(b) inserting c5 with overflow
c2
0
a0,b0
+c5
a1,c1
b1
33
local depth
global depth: 4
Extendiable Hashing
f(x)=a set of binary digits --> table lookup
{000,100}
{001}
{010,110}
{011,111}
page pointer
2
4
2
2
{0000,1000,0100,1100}
{0001}
{0010,1010,0110,1110}
{0011,1011,0111,1111}
{0101,1101}
{1001}
{101}
a0,b0
c2
a1,b1
c3
a0,b0
c2
0
1
+c1
a1,b1
c5
CHAPTER 8
c3
34
hash(a0,2)=00
101 001
hash(b1,4)=1001
110 011
hash(c3,2)=11
100 001
hash(a1,4)=0001
110 001
hash(c1, 4)=0001
101 000
hash(b0,2)=00
110 010
hash(c2, 2)=10
110 101
hash(c5,3)=101
CHAPTER 8
35
CHAPTER 8
36
CHAPTER 8
37
CHAPTER 8
38
CHAPTER 8
39
paddr index;
int intindex;
index = hash(key, global_depth);
intindex = convert(index);
*ptr = directory[intindex];
return pgsearch(key, ptr);
}
void insert(brecord r, char *key)
{
paddr ptr;
if find(key, &ptr) {
fprintr(stderr, The key is already in the table.\n);
exit(1);
}
if (ptr-> num_idents != PAGE_SIZE) {
enter(r, ptr);
ptr->num_idents++;
}
else{ /*Split the page into two, insert the new key, and update global_depth
CHAPTER 8
40
CHAPTER 8
41
a0,b0
c2
a1,b1
c3
00 a0
b0
01 c2
10 a1
b1
11 c3
-
CHAPTER 8
42
000 a0
000 a0
b0
b0 1
01 c2 overflow 01 c2
+c5
page +c1
10 a1
10 a1
b1
c1
c5
b1
c5
11 c3
11 c3
100 100 new page
- new page
- insert c1
insert c5
start of expansion 2 page 10 overflows page 10 overflows
there are 4 pages
page 01 splits
page 00 splits
(a)
(c)
(b)
CHAPTER 8
43
00 a0
b0
01 c2
10 a1
b1
11 c3
-
*Figure 8.14: During the rth phase of expansion of directoryless method (p.426)
addressed by r bits
2r pages at start
suppose we are at phase r; there are 2 r pages indexed by r bits
CHAPTER 8
44
CHAPTER 8
45