Anda di halaman 1dari 9

COMP09044 Algorithms and Collections

Coursework 2014/15
Amending and Experimenting with the BinarySearchTree class
Note that the development work for this coursework may be completed in
pairs (recommended, but you may NOT work in groups larger than two) or
individually.
If you work as a pair this means working on all four parts of the submission
together (the report is an individual report, but the development, testing and
experimentation should be done jointly) it is not intended that you delegate
different parts of the implementation work to partners individually, though it
is acceptable for you to identify who will do the lead work for each
function/part (as in, who types the code in to the computer once the
algorithm/logic has been agreed and who sits and watches over their
shoulder!), and you may wish to decide that where one person takes the lead
on implementing some function that the other person takes the lead
responsibility for testing it.
Part A - Extended Binary Search Trees
In the lecture based on chapter 9 of the textbook by Collins we met the
concept of the external path length of a tree and the External Path Length
Theorem. In the context of trees the material as presented in the
book/lecture is not all that interesting as the lower bound given by the
theorem for the external path length is typically much lower than the actual
external path length for a tree. It is noted at the end of section 9.4 that for a
two-tree the external path length can be proven to be at least k log2 k
(where k is the number of leaves which we will now call external nodes)
rather than k/2 floor(log2 k), and this is a more useful lower bound.
Many theoretical treatments of binary trees are framed in terms of an
extended binary tree in which leaves are appended wherever possible to the
nodes of the original tree (this is sometimes called decorating the tree). The
original nodes in such an extended tree are called internal nodes (and in
diagrams are usually represented by circles) and the appended nodes are
called external nodes (in diagrams, usually represented by squares). Among
the benefits that these treatments provide is that every extended binary
search tree is a two-tree and each node represents a distinct search outcome
a successful search terminates at an internal node and an unsuccessful
search terminates at an external node. This contrasts with the situation
where an unsuccessful search for an element in a binary search tree without
external nodes terminates when the search falls off the tree.
To illustrate and expand on some of these points, consider the extended
binary search tree formed by inserting the sequence: 72, 31, 44, 87, 37, 75,
60, 24 into an initially empty tree.

COMP09044 Cswk

Page 1

For simplicity, assume that the range of allowed values in the tree is 1-100.
The tree is shown below.
Internal path length (I) = (2x1) + (3x2) + (2x3) = 14
External path length (E) = (1x2) + (4x3) + (4x4) = 30
For an extended tree, E = 2n + I. For this tree: 30 = (2x8) + 14
The n+1 external nodes are notionally labeled with the values
that could be inserted at that position (assuming no duplicates
are allowed):
A = 1-23, B = 25-30, C = 32-36, D = 38-43, E = 45-59
F = 61-71, G = 73-74, H = 76-86, I = 88-100.

72
31

87

24

44

37

60

75

The labelling of the external nodes is purely for illustrative purposes, and
indicates where a search for any of the labelled values would terminate (so,
for example, a search for the value 22 would terminate in the external node
labelled A) in completing this coursework you will not need to store any
information in an external node (apart from the link to its parent).
The first part of the coursework is to amend the BinarySearchTree<E> class
discussed in the lecture so that it represents an extended binary search tree.
Whenever a new value is inserted into the tree the result is that an external
node is replaced by an internal node containing that value and two external
nodes (where the external nodes are the left and right children of the new
internal node).
To ensure that the Entry<E> class supports both internal and external nodes,
add the following methods to the Entry<E> class:
public boolean isExternal(); /* returns true if this entry is an
external node and false otherwise */

COMP09044 Cswk

Page 2

public E makeExternal(); /* converts this internal node to an


external node and returns the element that the internal node
contained (used when deleting an internal node that has no
internal nodes as children) */
public void makeInternal(E element); /* converts this external
node to an internal node containing the given element and adds
two new external nodes as the left and right children of the node
(used when inserting an element in to the tree) */
It may also simplify your solution if you ensure that the Entry<E> class has a
constructor that will allow you to create an internal node and another
constructor that will allow you to create an external node.
For external nodes the element and left and right links will all be null (an
external node is always a leaf), for an internal node none of these fields will
be null (as the BinarySearchTree class will not allow null elements to be
inserted, and an internal node is guaranteed to have two children).
Both internal and external nodes have a link to their parent (except for the
root of the tree, of course).
To allow some information about the tree to be gathered for testing
and experimental purposes:
Add a method to the BinarySearchTree<E> class to allow the elements to be
displayed in the order they would be visited in a breadth-first traversal of the
tree (it is easy to work out the tree structure from this). You can think about
whether you want a method that displays the tree directly or whether to
override the toString() method so that it returns a String representation of
the tree that can be displayed. You may find it helpful if the display includes
the external nodes. See pages 391-392 of the book for a description of an
iterative implementation of a breadth-first traversal using a queue.
Add a method to the BinarySearchTree<E> class that returns the height of
the tree (include the external nodes when calculating this).
Very important: Document the changes you have made to the
BinarySearchTree<E> class and write an application that tests that the
changes you have made behave as required. Make sure that you record
sample test cases (you should be able to copy the output from the Eclipse
console and paste them in to a Word document, for example) and include
them, and a discussion of them, in your submission.
Part B Insertion at the Root
The standard insertion algorithm you have studied inserts each new node at
the bottom of the tree in part A you amended the BinarySearchTree<E>

COMP09044 Cswk

Page 3

class so that (except perhaps when inserting the first item into an empty
tree) an insertion always inserts into an external node in the extended tree.
An alternative idea, involving the use of left and right rotations, is to revise
the insertion algorithm so that it inserts at the root, and this has the
potential advantage that the most recently inserted items are near the top of
the tree. If an application is more likely to search for elements that have
been inserted recently this approach should reduce the number of
comparisons required to find the element.
If new elements are inserted at the root, rather than in a leaf, the tree
resulting from the sequence of insertions used for the extended tree above
would result in this tree:
Internal path length (I) = (1x1) + (2x2) + (4x3) = 17
External path length (E) = (1x1) + (8x4) = 33
E = 2n + I = (2x8) + 17 = 33
A = 1-23, B = 25-30, C = 32-36, D = 38-43, E = 45-59
F = 61-71, G = 73-74, H = 76-86, I = 88-100.

24
A

60

37

31

75

44

72

87

Note that the last four elements added were 37, 75, 60 and 24 and these are
indeed the values closest to the root, with the last added value being in the
root node.
As we have not discussed root insertion at all algorithmically (though we
have discussed rotation in connection with AVL trees) here is a starting point,
courtesy of Robert Sedgewick (the code in the two boxes is from chapter
twelve of his book, Algorithms in Java, Parts 1-4). His Node class has a field
called l for the left child and a field called r for the right child, and less(x,y)
returns true if x is less than y and false otherwise. Note that you will need to
think about how to deal with the parent, which Sedgewicks code does not
COMP09044 Cswk

Page 4

deal with (the book/lecture did include discussion of this so look it up in one
of those places if you need help). It is vital that the parent links are correctly
updated after a rotation, as the successor() method relies on them.
Program 12.18 Rotations in BSTs
The twin routines perform the rotation operation on a BST. A right rotation
makes the old root the right subtree of the new root (the old left subtree of
the root); a left rotation makes the old root the left subtree of the new root
(the old right subtree of the root).
private Node rotR(Node h) {Node x = h.l; h.l = x.r; x.r = h; return x;}
private Node rotL(Node h) {Node x = h.r; h.r = x.l; x.l = h; return x;}

Program 12.19 Root insertion in BSTs


With the rotation methods of Program 12.18, a recursive method that inserts
a new node at the root of a BST is immediate: Insert the new item at the
root in the appropriate subtree, then perform the appropriate rotation to
bring it to the root of the main tree:
private Node insertT(Node h, ITEM x) {
if (h == null) return new Node(x);
if (less(x.key(), h.item.key()) {h.l = insertT(h.l, x); h = rotR(h);}
else {h.r = insertT(h.r, x); h = rotL(h);}
return h;
}
public void insert(ITEM x) { root = insertT(root, x); }

Modify your solution to Part A to provide a constructor for the


BinarySearchTree<E> class that takes a boolean parameter which, if true,
means that the add() method inserts at the root and if false will insert in an
external node. Do not provide any method to change the insertion method
once a tree has been created. This means that when a BinarySearchTree is
created it will either always insert at the root or always in a leaf. You may
also retain the default constructor which takes no parameter you should
ensure that by default the tree will insert using the standard algorithm (not
at the root). Note that, whether insertion is at the root or in an external
(leaf) node, the tree should still be an extended binary search tree that does
not allow duplicate values.
Once again, document the changes you make to the BinarySearchTree<E>
class and provide examples and a discussion of the test cases you used to
reassure yourself that the modified class behaves as required (and to do
that, you will need to include checks that methods of the tree that you have
not changed still work).

COMP09044 Cswk

Page 5

Part C Some experiments with BSTs


This part of the coursework is relatively open-ended. You will investigate the
number of comparisons necessary to search for items using the tree
implementations you have written for parts A and B against those required
for the red-black tree implementation in the concrete class java.util.TreeSet.
You will also investigate the effect on average tree height and number of
comparisons after large numbers of insertion/deletion pairs have been
executed for these implementations.
To perform your tests use the following class, which you can find in the
Assignments section on the Moodle site for the module:

public class Item implements Comparable<Item> {


/**
* Provides an immutable integer valued item that counts
* comparisons.
*/
private static long compCount = 0;
private final Integer value;

/**
* Constructor - creates an Item and sets its value
* @param value - the value for the Item
*/
public Item(Integer value) {
this.value = value;
}
/**
* The value of this Item
* @return the Item's value
*/
public Integer value() {
return value;
}
/**
* Compares the value of this Item with that of other according to
* the contract for Comparable.
* Increments the count of comparisons.
*/
@Override
public int compareTo(Item other) {
compCount++;
return value.compareTo(other.value);
}

COMP09044 Cswk

Page 6

/**
* Returns the total number of comparisons performed on instances
* of type Item since the counter was last reset (or the total if
* it has never been reset).
* @return the count of calls to compareTo() and equals() for type
* Item
*/
public static long getCompCount() {
return compCount;
}
/**
* Resets the count of comparisons to zero.
*/
public static void resetCompCount() {
compCount = 0;
}
...

// a few more methods here

In your investigation you should write an application that creates two


instances of your modified BinarySearchTree class, one inserting at the root
and one inserting in a leaf. The application should also create an instance of
java.util.TreeSet. In all cases, the element type for the trees should be Item.
Using randomly generated Items to construct the trees, plot the average
number of comparisons required to successfully search for an Item against
the size of the tree for each of the three trees. Do similar plots for the
average number of comparisons required for an unsuccessful search. You
should base these averages on a number of searches of each kind (successful
and unsuccessful) to avoid biasing your results. If you were to base it on just
one or two searches then you might bias the results if, by chance, you pick a
value that is near the root or far from the root in one tree but not in the
others, or for unsuccessful searches if it is always the same path that is
followed as would be the case if you were to always use values larger than
any value in the tree to look for, for example. You can use an application
such as Excel to plot the data. Include tree size ranges from at least ten to a
million elements. Discuss the results against the expected outcome.
Investigate and comment on the effect, if any, on the number of required
comparisons for searching for items after large numbers of insertions and
deletions have been performed on each tree once you have filled the tree
keep the average size of the tree over the experiment roughly constant (that
is, keep the overall number of deletions and insertions roughly equal).
Discuss whether you expect the trees to become progressively more
unbalanced over time as items are removed and new items added and
whether your observations (for your implementations you can directly inspect
the height of the trees) confirmed this expectation. Is there a larger or
smaller effect depending on what the average size of the tree is?

COMP09044 Cswk

Page 7

Investigate and comment on what, if any, advantage the tree that inserts at
the root has against the other two trees when searches involve only the most
recently inserted 10% of items.
Part D - Serialization
There is a separate handout providing information and guidance on
serialization. For this final part you are asked to have your
BinarySearchTree<E> class declare that it implements Serializable, so that
the state of a tree can be serialized by passing a tree to the writeObject()
method of an ObjectOutputStream, and deserialized by calling the
readObject() method of an ObjectInputStream and assigning the Object
returned to a BinarySearchTree variable. Marks will be awarded for this
working correctly and on the basis of the extent to which the serialized and
runtime forms of the tree are decoupled.
Summary and Marking Scheme
1. Part A (25%)
a. Update the class to implement an extended binary search tree.
b. Document your updates
c. Write an application to test your class and document your test
results.
d. Put this class in a package called part1.
2. Part B (20%)
a. Copy your class into a new package called part2 and update the
class to provide the option of inserting at the root.
b. Write an application to test your class when inserting at the root
and document your test results.
c. Run your tests that involve insertion again with a tree instance
that inserts using the standard algorithm (inserting in a leaf) to
confirm that the normal insertion mechanism still performs as
expected with the changes that you have made.
3. Part C (15% - for the approach taken and the data, but many of the
marks for the report will relate to your discussion of the results)
a. Write an application to investigate the performance of the
revised BinarySearchTree class produced in part B, as outlined
above. (In your report, contrast the performance with that of an
instance of TreeSet and discuss the results).
4. Part D (10%)
a. Implement serialization for your BinarySearchTree class and test
that you can serialize and deserialize an instance of the class.
Produce and submit an individual report (30%) on the work covering
the above points and critically appraising your work (if you did this as a
pair, this means that each of you should submit your own report). Note

COMP09044 Cswk

Page 8

that the report must be included. As noted above, for parts of the task
that involve your comparing things and discussing things, the comparison
and the discussion should be included in the report and the marks for the
report reflect this.
Note that you must include an acknowledgement in your report for any
sources you have used (on the web, textbooks etc) in developing or
commenting on design, code and in your testing, and any source code used
from sources other than the course textbook should be indicated in the code
by a comment at the beginning and end of the code stating the source from
which it was taken. If you receive any advice from anyone you should
acknowledge this also and indicate what part of the work it related to. You
should not share any of the code that you produce with anyone other than
your partner.
What You Should Submit
Each of you should submit the work, whether you worked as part of a
pair or on your own. Submit your report (as a Word or rich text format
document, or as a pdf) in a zip file which also contains all the Java source
code that you have written (you do not need to upload the Eclipse project
files just the source code files will be fine). Include your banner ID, and
that of your partner if you worked as a pair, in your report but do not include
your name anywhere in the submission. Submit all the files for your
submission as a zip file called named XXXX.zip (where XXXX are the last four
digits of your Banner ID) using the link in the Assignments section for this
module on Moodle, by 2300 on Monday 12 January 2015.

COMP09044 Cswk

Page 9