Anda di halaman 1dari 4

com264/com6471

Assignment

Assignment 2
Purpose

70 points

Create your own Java implementation of a linked list that involves a recursive structure; Create your own Java implementation of a hash table and test its functions; Compare various hash functions and hashing strategies; Design a sorting algorithm; Implement inheritance in hash functions; Exercise creation of Java classes and objects, and their composition.

Overview A hashtable is an effective and practical data structure that realises dictionary operations in O (1) time. In this assignment, you will rst implement a simple hashtable. The hashtable plays a major role in the second part of the assignment, where a list of vocabulary words is created from a given text le. Your hashtable will have the following structure. First a hash function is used to derive a hashtable index that is specic to a given word. Ideally, different words are mapped into different hashtable indices, however some amount of clustering (i.e., difference words mapped into the same index) is often unavoidable. In order to alleviate the clustering issue, we store an individual word in a linked list structure selected by the hash function. A linked list has a recursive structure, and each object in the list consists of four instance elds index is the ascending integer each time a new object is created, word is a string entry, count carries the number of occurrences of that string, and nally next is pointing to the next object in the list (null if it is the last object). When a new word arrives, a new object is attached to the last object of a linked list selected by the hash function. If it already exists in the list, count is incremented.
hashtable indices 0 index count word next 1 2 3 hashtable of size m 4 5 0 0 0 0 0 0 1 2 linked list of length n n 1 index 1 1 2

format of an object in the linked list

m2 m1

0 0

array of dummy objects indexed 0

You may nd further detail of a hashtable and a hash function by searching the web Wikipedia may be a good place to start with. The tasks here involve creation of a few classes NewLinkedList, HashFunction and NewHash that provides the functionality for a hashtable, followed by the forth Java code WordList.java for counting vocabulary words. Important note You are not allowed to use any classes except: classes of your own; classes from the java.lang package; classes from the java.io package; the Scanner class from the java.util package; EasyReader and EasyWriter from the sheffield package.
Foundations of Object Oriented Programming 1

com264/com6471

Assignment

Requirements Implementation of your own linked list (Task 1) and hash function (Task 2) is the basic part. Task 3 can be exibly programmed as long as you use your implementation of the linked list and the hash function. It also involves creation of HTML document. You must use your implementation of the hash table when counting vocabulary items in Task 4. Finally you are required to experiment the functionality of the hash table you implemented. Task 1 In a linked list, objects are arranged in a linear order, determined by a reference in each object, providing a exible representation for dynamic sets. Your rst task is to write a class NewLinkedList that provides the following methods. As a starter, a template NewLinkedList.template.java is provided on MOLE. Your are free to add another methods if you nd them useful for your implementation. public NewLinkedList(String w, int j) is the constructor that sets word and index by parameters. It also set initial values for count and next. public void setWord(String w) increments count if w matches word. If they are different and if the next object exists, w is passed on to the next object. Otherwise, it creates a new object for w. public boolean isWord(String w) returns true if w matches word. If they are different and if the next object exists, w is passed to the next object. The method eventually returns false only when w does not match any of this and the following objects. note This method is completed in the template. public int getCount(String w) returns count if w is equal to word. If they are different and if the next object exists, w is passed to the next object. If w is found in one of the following objects, the corresponding count is returned. The method eventually returns 0 if w does not match any of this and the following objects. public int getLength() returns index if the next object does not exist. Otherwise, it tests the next object. note The method is able to nd the nal index, that is equivalent to the length, of the linked list. public String getWord(int j) returns word if j matches index. Otherwise, j is passed to the next object. note The method is able to nd the word string specied by an index in the list. public String toString() rst displays the attributes of the next object in the list if it exists. It also displays the attributes of this object. Task 2 A good hash function should result in a uniform distribution of hashtable indices so that the adverse effect of clustering may be reduced. However it is not a straightforward problem. The task requires you to implement a super class HashFunction and multiple subclasses, each of which provides a single implementation of a hash function algorithm, including the use of the rst letter, the division method and its Knuth variant outlined below. They are simple schemes, although they may not be as good as other carefully designed functions: use of the rst letter: simply returns a unicode value of the rst letter in the string; division method: returns the modulus of the key k when it is divided by an integer divisor d , i.e., h(k) = k mod d ; One approach to producing the key k from a string w may be to sum over all unicode values for each character in the string; Knuth variant: returns h(k) = k(k + 3) mod d . You are also encouraged to investigate and implement alternative algorithms, such as the multiplication method, to obtain various hashing values. An appropriate set of parameters and a return value should be considered. You should choose the right name for each subclass and its methods.

Foundations of Object Oriented Programming

com264/com6471

Assignment

Task 3 The hashtable is built with linked lists of length 0 or greater. The NewHash class will utilise an array of linked lists and methods provided by NewLinkedList. It chooses a hash function from your inventory in order to map a word string to one of linked lists. note A provision of a dummy object (with index 0, word null, and count 0) for each linked list may result in a simple implementation, although it is not a requirement. Create an HTML document that describes the NewHash class. Task 4 Using your implementation of the NewHash class, write a program WordList.java that reads plain text from a specied input le, counts the number of occurrences for each of different words, and outputs a vocabulary list into a le vocabulary.txt. note You should not directly access any methods/elds provided by NewLinkedList or any hash functions that you implemented. You may use the les small.txt, medium.txt, big.txt and massive.txt for testing your work, however your code should be able to handle other input les. It is safe to assume that an input le contains an English text of multiple lines and arbitrary length, consisting of words separated by spaces and/or new lines. For simplicity, all words are spelt using lower case letters only. Digits are replaced with spelt words (e.g., year nineteen ninety instead of year 1990, or two point five million pounds instead of 2.5M). All of punctuation are removed except full stop (.) and apostrophe (). The vocabulary list is in the two column format words in the rst column and their frequencies in the second. It is rst sorted in the frequency order, i.e., words that occurred more frequently appear at the higher rank. Words with the same frequency should be ordered alphabetically. Evaluation Test various hash functions and hashing strategies by observing how words are stored in the hash table. Are they distributed nicely (uniformly), or clustered locally? Any dependency on something (e.g., parameter values, input text)? Report The report should argue any issues relating to design and implementation. In particular it should outline your work in the following areas: Task 1: Describe how the recursive structure works in your linked list; Task 2: Explain your design of inheritance in hash functions and algorithms for all hash functions that you implemented; Task 3: Explain how you put the linked list structure and a choice of hash functions together to create your own hash table; Task 4: Describe your strategies for use of the hash table and for sorting; Evaluation: Discuss the performance of various hash functions and hashing strategies that you tested. The report is a typed (i.e., not handwritten) description of your work. You may include table(s) and gure(s) if they help your discussion. note It is not required to include the listing of the program code in the report. It is also not required to include the HTML document for the NewHash class it will be tested using javadoc.

Foundations of Object Oriented Programming

com264/com6471

Assignment

Download from MOLE: assignment2.zip contains the following les: assignment2.pdf NewLinkedList.template.java small.txt, medium.txt, big.txt, massive.txt Upload to MOLE: Program source code and the report: sourcecode2.zip it should include all source code that you programmed, and nothing else (i.e., do not zip bytecode, shefeld package, data les, HTML les, etc.); report2.pdf pdf format please. Paper handin: not required. late upload rule: For each day late, 3.5 point (i.e., 5%) will be deducted from your total grade. Saturday and Sunday are also counted as a day. The assignment will not be graded if it is upload more than a week late. Assessment criteria: 40 points are allocated for a complete set of program codes: 28 points code is fully functional for all requirements; code uses appropriate Java constructs, and displays good programming style in all places (e.g., sensible use of loops, indentation, commenting); javadoc produces clean and informative HTML document for the NewHash class; 20 27 points code compiles without an error; code is mostly, or fully functional, producing the correct outputs; code uses appropriate Java constructs, and displays good programming style in most places; javadoc produces clean and informative HTML document for the NewHash class; 19 points code produces compile error(s); code may be partly functional; some Java constructs are less appropriate; there are some problems with the programming style; javadoc produces an HTML document for the NewHash class; 30 points are allocated for the report: 21 points the report is clean, compact and well documented; clear, consistent and detailed discussions are made for all tasks and evaluation; 15 20 points the report is mostly clean, compact and well documented; discussions are mostly clear, consistent and sufciently detailed; 14 points the report is not very clean, not very compact or not very well documented; some important discussions are missing, less clear or inconsistent; Plagiarism: Any work that you handin for the assignments must be your own. Plagiarism will result in a mark of 0. Allowing someone else to knowingly copy your work will also result in a mark of 0, as it would be facilitating and encouraging plagiarism. Codes in the electronic handin directory may be scanned by plagiarism detection software.

Foundations of Object Oriented Programming

Anda mungkin juga menyukai