Anda di halaman 1dari 14

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.

html

CS360 Lecture notes -- Red-Black Trees #1


Jim Plank q Directory: /blugreen/homes/plank/cs360/notes/Rbtree-1 q Lecture notes -- plain text: /blugreen/homes/plank/cs360/notes/Rbtree-1/lecture q Lecture notes -- html: http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/ lecture.html
q

Compiling
In order to use the rbtree library, you should include the file "rb.h", which can be found in /blugreen/ homes/plank/cs360/include. Instead of including the full path name in your C file, just do: #include "rb.h", and then compile the program with: gcc -I/blugreen/homes/plank/cs360/include Also when you link your object files to make an executable, you need to include /blugreen/homes/ plank/cs360/objs/rb.o. The makefile in this directory does both of these things for you.

Red-Black Trees
The file README tells you all you should need to know about using the red-black tree library. As it might seem too high-level, I'll go over several examples. Rb-trees are data structures based on balanced binary trees. You don't need to know how they work -just that they do work, and all operations are in O(log(n)) time, where n is the number of elements in the tree. (If you really want to know more about red-black trees, let me know and I can point you to some texts on them). The main struct for rb-trees is the Rb_node. Like dlists, all rb-trees have a header node. You create a rbtree by calling make_rb(), which returns a pointer to the header node of an empty rb-tree. This header
http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (1 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

points to the main body of the rb-tree, which you don't need to care about, and to the first and last external nodes of the tree. These external nodes are hooked together with flink and blink pointers, so that you can view rb-trees as being dlists with the property that they are sorted. The Rb-tree data structure is a bit confusing with all the unions. The only fields that you need to care about are:
q

q q

c.list.flink -- the flink pointer. If r is the header node, then r->c.list.flink is the first external node in the rb-tree. If r is an external node, then r->c.list.flink points to the next external node in sorted order. If r is the last external node, then r->c.list.flink points to the header node. c.list.blink -- the blink pointer. k.key or k.ikey. This is the "key" field. All sorting is done with respect to the key. If the key is a character string, then you use k.key. If it is an integer, then you use k.ikey. v.val. This is the "value" field, much like the val field in dlists. This can hold any 4-byte quantity.

You use rb_insert(r, k, v) to create a new node with key k (where k is a character string) and val v, and insert it into the tree r in lexicographic order (i.e. it uses strcmp to compare strings). Thus, in jh.c, we create a tree and insert two strings ("Jim" and "Heather") into it. Since "Heather" is lexicographically less than "Jim", it will be the first node in the tree, and "Jim" will be the last. We print out the tree by traversing the external node list using the c.list.flink pointer:

#include < stdio.h > #include "rb.h" main() { Rb_node r, tmp; r = make_rb(); rb_insert(r, "Jim", NULL); rb_insert(r, "Heather", NULL); for (tmp = r->c.list.flink; tmp != r; tmp = tmp->c.list.flink) { printf("%s\n", tmp->k.key); } }

When you compile and run this, you'll see that "Heather" is printed first, and "Jim" second.
http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (2 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

UNIX> jh Heather Jim UNIX> To make things more readable, there are the following macros defined in rb.h: #define #define #define #define #define rb_first(n) (n->c.list.flink) rb_last(n) (n->c.list.blink) rb_next(n) (n->c.list.flink) rb_prev(n) (n->c.list.blink) rb_empty(t) (t->c.list.flink == t)

Thus, the above for loop can be written: for (tmp = rb_first(r); tmp != r; tmp = rb_next(tmp) { printf("%s\n", tmp->k.key); } Since tree-traversal is something you tend to do a lot, I have also defined the macro rb_traverse, in which the for loop is put into a #define: #define rb_traverse(ptr, lst) \ for((ptr) = rb_first((lst)); (ptr) != (lst); (ptr) = rb_next((ptr))) Thus, the above for loop can be written: rb_traverse(tmp, r) { printf("%s\n", tmp->k.key); }

Now, suppose you want to sort a file lexicographically (i.e. pretty much alphabetically). With rb-trees, this is very simple. You just read in lines of text, insert them into a rb-tree, and then traverse the rb-tree and print out the lines: (this is in mysort.c)

#include #include #include #include

< stdio.h > < string.h > "rb.h" "fields.h"

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (3 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

main() { IS is; char *copy; Rb_node sorted_lines, tmp; sorted_lines = make_rb(); is = new_inputstruct(NULL); while(get_line(is) >= 0) { copy = strdup(is->text1); rb_insert(sorted_lines, copy, NULL); } rb_traverse(tmp, sorted_lines) { printf("%s", tmp->k.key); } }

Notice that I make a copy of the string before inserting it into the tree. If I don't make a copy, then all of the k.key fields will point to the same array is->text1, which means that all of the fields will have the same string. Thus, you must make a copy. Try mysort.c out for yourself: UNIX> head randfile 13 hkrob 13 isofq 15 lninv 0 ezvpy 8 xxgxs 18 wzypq 19 jatzg 16 vrbdg 3 kkwfb 0 bbvhy UNIX> head randfile | mysort 0 bbvhy 0 ezvpy 13 hkrob 13 isofq 15 lninv 16 vrbdg
http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (4 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

18 wzypq 19 jatzg 3 kkwfb 8 xxgxs UNIX>

In mysort2.c, we implement 'sort -r', which sorts the lines of standard input and prints them out in reverse order. This is a simple matter of traversing the list backwards. Try it out to make sure it works: UNIX> head randfile | mysort2 8 xxgxs 3 kkwfb 19 jatzg 18 wzypq 16 vrbdg 15 lninv 13 isofq 13 hkrob 0 ezvpy 0 bbvhy UNIX>

Now, suppose we want to implement 'sort -u', which works just like 'sort', only it just prints out one copy of each line. I.e., the following is how 'sort -u' and 'sort' differ: UNIX> cat > jfile Jim Jim Heather UNIX> sort jfile Heather Jim Jim UNIX> sort -u jfile Heather Jim UXIX> One way we can do this is to read all lines into an rb-tree as in mysort.c, and then only print out a line if it is different from the previous line in the rb-tree. Since the tree is sorted, duplicate lines will be

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (5 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

adjacent to each other in the rb-tree, so this algorithm will indeed work, and is in mysortu0.c. Note that in the body of the rb_traverse loop is a check to see if a node is the first one in the list. If so, then it prints that line. Otherwise, it checks to see if the node's line is equal to the previous one and only prints it if not. The reason we need that extra check is because we don't know what the value of sorted_lines>k.key is. It could cause strcmp() to dump core. Thus we must take care not to call strcmp() on it.

#include #include #include #include

< string.h > < stdio.h > "fields.h" "rb.h"

main() { IS is; char *copy; Rb_node sorted_lines, tmp; int found; sorted_lines = make_rb(); is = new_inputstruct(NULL); while(get_line(is) >= 0) { copy = strdup(is->text1); rb_insert(sorted_lines, copy, NULL); } rb_traverse(tmp, sorted_lines) { if (tmp == rb_first(sorted_lines) || strcmp(tmp->k.key, tmp->c.list.blink->k.key) != 0) printf("%s", tmp->k.key); } }

A few years ago in class, I tried to code up mysort.c using a temporary string s2: s2 = NULL; rb_traverse(tmp, sorted_lines) { if (strcmp(tmp->k.key, s2) != 0) printf("%s", tmp->k.key); s2 = tmp->k.key; }

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (6 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

This doesn't work (try it -- it dumps core). Why? I'll let you figure it out. How would you fix it? A second way to implement 'sort -u' is to only insert a node into the rb-tree if it is not there already. To do this, we need to use: rb_find_key_n(Rb_node t, char *k, int *f) This works as follows: If there is a node with key k in the tree, then rb_find_key_n sets *f to be 1, and returns a pointer to that node. If there is no node with key k in the tree, then rb_find_key_n sets *f to zero and returns a pointers to the Rb_node in the tree whose value is the smallest value greater than k. If there is no value in the tree greater than or equal to k, then the root of the tree is returned. Like rb_insert (), rb_find_key_n() works on character strings, and works in O(log(n)) time, where n is the number of elements in the tree. So, to implement 'sort -u', we first check to see if a string is in the tree already. If so, then we do nothing. If not, then we insert it into the tree. Mysortu1.c does this:

#include #include #include #include

< string.h > < stdio.h > "fields.h" "rb.h"

main() { IS is; char *copy; Rb_node sorted_lines, tmp; int found; sorted_lines = make_rb(); is = new_inputstruct(NULL); while(get_line(is) >= 0) { /* Insert the line into the tree only if it is not there already */ (void) rb_find_key_n(sorted_lines, is->text1, &found); if (!found) { copy = strdup(is->text1); rb_insert(sorted_lines, copy, NULL); } }

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (7 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

rb_traverse(tmp, sorted_lines) { printf("%s", tmp->k.key); } }

The reason I have the (void) before "rb_find_key_n" is because we don't care about rb_find_key_n's return value. Thus we tell the compiler to cast it to a (void). Otherwise, the compiler will complain that we are ignoring the return value. Now, suppose we'd like to print out the lines sorted, with duplicates removed, and print out a count of how many times each line is in the file. This can be done by using the val field. When we first insert a line into the tree, we set its val to a newly malloc'd integer, which has been initialized to be 1. After that, whenever we find that a line is already in the tree, we increment its val field. This will take some care, since v.val is a (char *) and not an (int *), but that's ok -- both occupy 4 bytes, so we can cast them back and forth. The code is in mysortu2.c:

#include < string.h > #include < stdio.h > #include "fields.h" #include "rb.h" #define talloc(type, size) (type *) malloc(sizeof(type)*(size)) main() { IS is; char *copy; Rb_node sorted_lines, tmp, r; int found; int *count; sorted_lines = make_rb(); is = new_inputstruct(NULL); while(get_line(is) >= 0) { r = rb_find_key_n(sorted_lines, is->text1, &found); /* If the line is already in the tree, then just increment its count */ if (found) {
http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (8 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

count = (int *) r->v.val; *count = *count + 1; /* Otherwise, insert the line into the tree with a count of one */ } else { count = talloc(int, 1); *count = 1; copy = strdup(is->text1); rb_insert(sorted_lines, copy, (char *)count); } } rb_traverse(tmp, sorted_lines) { count = (int *) tmp->v.val; printf("%6d\t%s", *count, tmp->k.key); } }

We can be more more sloppy -- instead of mallocing space for the integer, we can just use the v.val field as an integer. This works, because both (char *)'s and (int)'s occupy 4 bytes on our machine. It makes the code uglier (and less portable). It's in mysortu3.c. Finally, you can use integers as keys instead of (char *)'s. To do this, you should use rb_inserti() and rb_find_ikey_n() instead of rb_insert() and rb_find_key_n(). Moreover, you should access the key field as k.ikey instead of k.key, because this treats it as an integer (this is why you use unions). mysorti. c shows how do sort lines like they were integers. Note that you now should keep a copy of the entire string in the v.val field becuase the k.ikey field holds atoi(s), which is just an integer:

#include #include #include #include

< stdio.h > < string.h > "fields.h" "rb.h"

main() { IS is; char *copy; Rb_node sorted_lines, tmp; int i;

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (9 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

sorted_lines = make_rb(); is = new_inputstruct(NULL); while(get_line(is) >= 0) { copy = strdup(is->text1); i = atoi(is->text1); rb_inserti(sorted_lines, i, copy); } rb_traverse(tmp, sorted_lines) { printf("%s", tmp->v.val); } }

Try this using randfile as input, and note how the output differs from mysort:

UNIX> head randfile 13 hkrob 13 isofq 15 lninv 0 ezvpy 8 xxgxs 18 wzypq 19 jatzg 16 vrbdg 3 kkwfb 0 bbvhy UNIX> head randfile | mysort 0 bbvhy 0 ezvpy 13 hkrob 13 isofq 15 lninv 16 vrbdg 18 wzypq 19 jatzg 3 kkwfb 8 xxgxs UNIX> head randfile | mysorti 0 bbvhy 0 ezvpy
http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (10 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html

3 kkwfb 8 xxgxs 13 isofq 13 hkrob 15 lninv 16 vrbdg 18 wzypq 19 jatzg UNIX>

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-1/lecture.html (11 of 11)21/05/2004 06:26:29 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/lecture.html

CS360 Lecture notes -- Red-Black Trees #2


Jim Plank q Directory: /blugreen/homes/plank/cs360/notes/Rbtree-2 q Lecture notes -- plain text: /blugreen/homes/plank/cs360/notes/Rbtree-2/lecture q Lecture notes -- html: http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/ lecture.html
q

This lecture goes over more fun things you can do with red-black trees.

sorti1.c
First, suppose you want to implement "sort -n", which sorts lines of stdin as numbers, but resolves collisions. In other words, if you have two lines "1 b" and "1 a", it will print the first before the second, because it is lexicographically less than the first. Try it out: UNIX> cat > f1 1 b 1 a 0 Hmmm Jim Heather < CNTL-D > UNIX> sort -n f1 0 Hmmm Heather Jim 1 a 1 b UNIX> Note that the lines "Heather" and "Jim" are treated like they are zero when sorting as integers. So, this is more complex than before -- we need to do two levels of sorting -- first sorting lines as integers using atoi(), and then resolving collisions by sorting lines as strings using strcmp(). There are two ways we can do this. The first is in sorti1.c. What this does is use two levels of trees. The first-level tree is a red-black tree that sorts lines by their integer value. In other words, it is keyed by atoi (s), and uses rb_inserti() to do the insertion. The v.val field is another red-black tree, which we call a
http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/lecture.html (1 of 3)21/05/2004 06:26:45 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/lecture.html

second-level tree, that contains all the strings with that integer value, sorted lexigraphically. In other words, for the file f1 above, the first-level tree will have two nodes -- one with key 0, and one with key 1. The node with key 0 will have a v.val field that is another red-black tree with three nodes: "0 Hmmm", "Heather", and "Jim". The node with key 1 will have a v.val field that is a red-black tree with 2 nodes: "1 a" and "1 b". When we go to print out the file, we traverse the first-level tree. On each node of that tree, we traverse the second level tree and print out the string, which is in the k.key field. Note that processing a line is a little more complex. First, you look for atoi(s) in the first-level tree. If it is not found, you create a node for it whose v.val field is a new rb-tree. Then you insert the string into this second-level tree. Try this out on the file randfile. Does it work correctly? See how the output differs from mysorti.c in the previous lecture.

sorti2.c -- passing a function to rb_insertg()


The second way to effect "sort -n" this is to use just one tree, but define a different comparison function. When we use rb_insert(), strings are inserted into the tree using strcmp() as the comparison function. When we use rb_inserti(), integers are inserted into the tree using standard inequality (<, >, =) for the comparison. There is a third function, rb_insertg(), which allows you to pass a comparison function as an argument, and that is used to perform the insertion. Specifically, the function must take two arguments (char *k1, char *k2), (actually, k1 and k2 should be (void *)'s. in other words they are just pointers) and returns:
q q q

1 if k1 is greater than k2 0 if k1 is equal to k2 -1 if k1 is less than k2

For this program, we write a comparison function atoicmp() which compares two strings using atoi(), and if atoi() says that they are equal, then it uses strcmp(). With this mode of insertion, we only need one tree, and thus just have to do a simple tree traversal to print out the sorted file: The code is here. rb_find_gkey_n() works with rb_insertg() just like rb_find_ikey_n() works with rb_inserti(), and rb_find_key_n() works with rb_insert().

read_roster.c

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/lecture.html (2 of 3)21/05/2004 06:26:45 p.m.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/lecture.html

Look at read_roster.c. It goes through the following steps. First, it opens the roster file, and makes a new red-black tree. Next, it reads in each line of the roster file, which is in the following format: last name, first name, height, position, year, team, and home town. After reading in each line, it inserts a new node into the rb-tree, keyed on the last name, and containing the struct with all of this information in the v.val field. Once the file is read in, it prompts for a last name. When one is entered, it is looked up in the rb-tree. If found, the roster entry for that player is printed out. If not found, an error statement is printed. This shows how to use a rb-tree to perform logarithmic time searching. Try it out to see how it works.

http://www.cs.utk.edu/~plank/plank/classes/cs360/360/notes/Rbtree-2/lecture.html (3 of 3)21/05/2004 06:26:45 p.m.

Anda mungkin juga menyukai