Hash Table Ind
Hash Table Ind
Hash Table
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Pengertian
Merupakan struktur data yang menawarkan operasi insertion dan searching (juga deletion) dengan sangat cepat
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Pengertian
Hash tabel biasa juga disebut map, lookup table, assosiatif array atau dictionary Hash tabel adalah container yang mengizinkan akses langsung oleh indeks dengan tipe apapun. Hash tabel bekerja seperti array, akan tetapi indeksnya tidak harus integer. Contoh yang sederhana dari hash tabel adalah kamus
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Hash tabel berbasis array, dan array sangat sulit ditambah setelah array tersebut dibuat Untuk beberapa bentuk Hash tabel, performance nya semakin menurun bila tabel telah penuh. Sehingga programmer dari awal sudah harus memperhitungkan berapa besar data yang akan disimpan
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Introduction to Hashing
Hash function memetakan value ke indeks pada tabel. Fungsi tsb menyediakan akses ke suatu elemen, seperti suatu indeks menyediakan akses ke suatu elemen dari array. Hash tabel menyediakan implementasi Set dan Map interface
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Hasing merupakan suatu struktur penyimpanan data yang menghasilkan O(1) waktu pengembalian rata-rata. Dengan cara ini, item independen terhadap jumlah item lain pada collection
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Yang berhubungan dengan hash tabel adalah hash function yang mempunyai key sebagai argument dan mengembalikan nilai integer Dengan menggunakan sisa setelah membagi hash value dengan ukuran tabel, kita mempunyai pemetaan dari key ke indeks pada tabel
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
0 1 i n-1
key entry
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Misalkan hash function hf(x) = x, dimana x adalah nonnegative integer (identity function). Asumsikan tabel adalah array tableEntry dengan n = 7 elemen
hf(22) = 22 22 % 7 = 1 0 1 2 3 4 5 6
tableEntry[1]
hf(4) = 4
4%7=4
tableEntry[4]
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Dengan hash function hf() dan ukuran tabel n, indeks dari tabel untuk key i = hf(key)%n. Collision akan muncul jika ada dua key yang berbeda dengan dibagi oleh n
hf(22) = 22 hf(36) = 36 22 % 7 = 1 36 % 7 = 1 0 1 2 3 4 5 6
tableEntry[5] tableEntry[1]
hf(5) = 5 hf(33) = 33
5%7=5 33 % 7 = 5
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Mengevaluasi hash function harus efisien Hash function harus menghasilkan nilai hash yang terdistribusi secara uniform. Ini akan menyebarkan indek hash tabel ke tabel yang meminimalisasi collision
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Java programming language menyediakan hashing function dengan method hashCode() pada Object superclass
public int hashCode() { }
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
hashCode mengkonversi internal address dari objek menjadi integer value, yang mempunyai aplikasi yang terbatas karena 2 objek yang berbeda akan mempunyai value yang berbeda untuk hashCode(), walaupun mereka menyimpan data yang sama
// strings one and two are the same; not so for integer values // one.hashCode() and two.hashCode() String one = "java", two = "java";
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Kecuali data integer memiliki karakteristik random, ini bukan fungsi hash yang baik
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Untuk membuat hash function yang efisien, kita harus menggabungkan urutan karakter pada string untuk membentuk integer.
public int hashCode() { int hash = 0; for (int i = 0; i < n; i++) hash = 31*hash + s[i]; return hash; }
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Ketika ada dua atau lebih data di-hash ke dalam indeks yang sama,mereka tidak dapat mengisi posisi yang sama pada tabel
Pilihan yang dapat kita lakukan adalah mengalokasikan salah satu item ke posisi yang lain dalam tabel (linear probing) atau mendesain ulang tabel untuk menyimpan sekumpulan key yang collide pada setiap indeks (chaining dengan list yang terpisah)
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Hash tabel adalah array dari elemen yang berasosiasi dengan hash function. Untuk menambahkan item
Linear Probing
Awalnya, tag masing-masing entri pada tabel dengan empty Gunakan hash function thd key dan bagi value dengan ukuran tabel untuk memperoleh indeks. Jika entry nya kosong, masukkan item Jika tidak, mulai pada indeks hash berikutnya dan scan indeks-indeks berturut-turut,. Insertion dilakukan pada lokasi pertama yang terbuka
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
In general, a hash function may result in integer overflow and return a negative number. The following calculation insures that the table index is nonnegative.
tableIndex = (hashValue & Integer.MAX_VALUE) % tableSize
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Pencarian menghasilkan lokasi awal dari hash tanpa menemukan slot yang terbuka, tabel penuh dan algoritma linear probing throw suatu exception
tableIndex = x % 11
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Jika ukuran tabel relatif lebih besar dari jumlah item, linear probing akan bekerja dengan baik, karena hash function yang baik membuat indek yang terdistribusi ke semua tabel dan collision akan minimal. Karena rasio dari ukuran tabel terhadap jumlah item yang didekati 1, algoritma lebih buruk dari sequential search.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
HashCode
Urutan key/value yang disimpan pada tabel HashMap tergantung capacity dari tabel dan nilai dari hash code dari object
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Chaining dengan daftar terpisah mendefinisikan tabel hash sebagai urutan indeks dari linked list. Setiap list, disebut bucket, mengandung satu set item yang hash ke lokasi tabel yang sama
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Bucket adalah singly linked list. Masing-masing entri dari array merupakan simpul pertama dalam urutan item yang di hash ke indeks tabel. Node memiliki struktur dengan dua field, satu untuk nilai dan satu untuk referensi ke node berikutnya.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Untuk menambahkan objek item, gunakan hash function untuk mengidentifikasi indeks dari bucket yang tepat dalam array (tabel).
If table[i] is null, add item as the first entry in the list. Otherwise begin with the first node, entry = table[i], and compare item with entry.nodeValue. If there is no match, continue the scan with node entry.next, and so forth. If item is not in the list, add it to the
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Chaining dengan daftar terpisah umumnya lebih cepat daripada probing linear karena chaining hanya mencari item yang hash ke lokasi table yang sama Dengan linear probing, jumlah entri tabel adalah terbatas pada ukuran tabel, sedangkan linked list yang digunakan dalam chaining bertambah sesuai dgn yang diperlukan Untuk menghapus elemen, hanya dengan menghapusnya dari daftar terkait.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
As the number of entries in the hash table increases, search performance deteriorates. Rehashing increases the hash table size when the number of entries in the table is a specified percentage of its size.
Rehashing
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The generic class Hash stores elements in a hash table using chaining with separate lists and implements the Collection interface.
hashCode() must be provided by the generic type. The constructor creates a hash table with initial size 17. The table grows as rehashing occurs. The method toString() returns a commaseparated list that, by the nature of hashing, is not ordered.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The hash table is an array whose elements are the first node in a singly linked list.
Define an inner class Entry with an integer field hashValue that stores the hash code value and avoids recomputing the hash function during rehashing.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The Entry array, table, defines the singly-linked lists that store the elements. The integer variable hashTableSize specifies the number of entries in the table. The variable tableThreshold has the value
(int)(table.length * MAX_LOAD_FACTOR)
where the double constant MAX_LOAD_FACTOR specifies the maximum allowed ratio of the elements in the table and the table size.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
MAX_LOAD_FACTOR = 0.75 (number of hash table entries is 75% of the table size) is generally a good value. When the number of elements in the table equals tableThreshold, a rehash occurs. The variable modCount is used by iterators to determine whether external updates may have invalidated the scan.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The Hash class constructor creates the 17-element array table with 17 empty lists. A rehash will first occur when the hash collection size equals 12.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
// construct an empty hash table with 17 buckets public Hash() { table = new Entry[17]; hashTableSize = 0; tableThreshold = (int)(table.length * MAX_LOAD_FACTOR); } . . .
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Compute the hash index for the parameter item and scan the list to see if item is currently in the hash table. If so, return false. Create a new Entry with value item and insert it at the front of the list.
hashValue is assigned to the entry so it will not have to be computed when rehashing occurs.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Increment hashTableSize and modCount. If hashTableSize tableThreshold, call rehash(). The size of the new table is
2*table.length + 1
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The method rehash() takes the size of the new hash table as an argument performs rehashing.
Create a new table with the specified size and cycle through the nodes in the original table. For each node, use the hashValue field modulo the new table size to hash to the new index. Insert the node at the front of the linked list.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Hash remove()
Compute the hash table index. Using variables prev and curr that move through the linked list in tandem, search for item. If not present, return false; otherwise, remove item from the list. If prev == null, this involves updating table[index] to reference the successor to the front of the list. Decrement hashTableSize, increment modCount, and return true.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
return true;
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
return false;
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Search the hash table for the first nonempty bucket in the array of linked lists. Once the bucket is located, the iterator traverses all of the elements in the corresponding linked list and then continues the process by looking for the next nonempty bucket. The iterator reaches the end of the table when it reaches the end of the list for the last nonempty bucket.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Iterator objects are instances of the inner class IteratorImpl whose variables are:
Integer index that identifies the current bucket (table[index]) scanned by the iterator. The Entry reference next pointing to the current node in the current bucket. The variable lastReturned that references the last value returned by next(). The iterator variable expectedModCount used in conjunction with the collection variable modCount.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The elements enter the collection in the order (19, 32, 11, 27) using the identify hash function. The iterator visits the elements in the order (11, 32, 27, 19).
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
A loop iterates up the list of buckets until it locates the first nonempty bucket. The loop variable i becomes the initial value for index and table[i] references the front of the list. This is the initial value for next.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The method next() first determines that the operation is valid by checking that modCount and expectedModCount are equal and that we are not at the end of the hash table. If the iterator is in a consistent state, next() saves entry.value in lastReturned and uses a loop index i and entry to perform the iterator scan for the subsequent element in the hash table. The return value is lastReturned.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The remove() method first determines that the operation is valid by checking that lastReturned is not null and that modCount and expectedModCount are equal. If all is well, the iterator remove() method calls the Hash class remove() method with lastReturned as the argument. By assigning to expectedModCount the current value of modCount, the iterator remains consistent.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The design of the HashMap collection is similar to the implementation of TreeMap. A HashMap is not ordered since the position of elements depends on hashing the keys. This affects the method toString() which returns a listing of the elements based on the iterator order.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The HashMap class stores elements in a hash table containing linked lists of Entry objects. The inner class Entry contains key-value pairs.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The inner class Entry implements the Map.Entry interface which defines the methods getKey(), getValue() and setValue(). A toString() method returns a representation of an entry in the format "key=value". The constructor has arguments for each field in the node.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The methods get(), and containsKey() take a key reference argument and must locate a corresponding entry in the map.
This task is performed by the private HashMap method getEntry() which takes a key as an argument, applies the hash function to the key and searches the resulting list for a key-value pair with the same key.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
return null;
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Construct a table index by applying the hash function for the key and scan the linked list for a match with the key. If a match occurs, apply setValue() and return its result. If key does not occur in the list, insert a new Entry object at the front of the linked list. If the hash map size has reached the table threshold, apply rehashing. Conclude by returning null.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
return null;
}
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The HashSet class uses a HashMap by composition. The class defines a static Object reference called PRESENT. This becomes the value component for each entry in the map. The constant reference serves as a dummy placeholder in an entry pair.
HashSet Class
Declare a private instance variable map of type HashMap having T as the type of the set elements and Object as the value type. The constructor instantiates the map collection. This has the effect of creating an empty set.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
HashSet add()
The set methods are implemented with map methods that use the entry <item, PRESENT> as the argument.
add() uses the map method put(). If a duplicate exists, then put() simply updates the value field of the entry to PRESENT which is its current value. The map method returns null if a new element is added, so a return value of null indicates that the add() inserted item.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
HashSet iterator()
The HashSet iterator must traverse the keys in the map. Implement the method iterator() by returning an iterator for the key set collection view of the map.
// returns an iterator for the elements in the set public Iterator<T> iterator() { return map.keySet().iterator(); }
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
HashSet remove()
The HashSet remove() method calls the remove() method for the map. To determine whether an element was removed from the set, verify that the return value from the map remove() call is the reference PRESENT.
public boolean remove(Object obj) { return map.remove(obj) == PRESENT; }
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
A good hash function provides a uniform distribution of hash values. Hash table performance is measured by using the load factor = n/m, where n is the number of elements in the hash table and m is the number of buckets.
For linear probe, 0 1. For chaining with separate lists, it is possible that > 1.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
The worst case linear probe or chaining with separate lists occurs when all data items hash to the same table location. If the table contains n elements, the search time is O(n), no better than that for the sequential search.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Assume that the hash function uniformly distributes indices around the hash table.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Assume the number of elements n in the hash table is bounded by some amount, say, R*m, where m is the table size.
In this case, = n/m (R*m)/m = R, and the following relationships hold for the average cases, so the average running time is O(1)!
S 1 + /2 1 + R/2 U = R (Successful Search) (Unsuccessful Search)
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Use an ordered set or map if an iteration should return elements in order (average search O(log2n). Use an unordered set or map when fast access and updates are needed without any concern for the ordering of elements (average search time O(1)).
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Timing Example
Program SearchComp.java:
Reads a file of 25025 randomly ordered words and inserts each word into a TreeSet and into a HashSet. Determines the amount of time required to build both of the data structures. Shuffles the input from the file and times a search of the TreeSet and HashSet for each word in the shuffled input. Displays the time required for each search technique.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.
Note that the HashSet search time is considerably better than that for a TreeSet.
2005 Pearson Education, Inc., Upper Saddle River, NJ. All rights reserved.