Collections
1
Chapter 16 Collections
In Brief
16. Collections
The .NET BCL also provides functions that work with elements in a
collection. All these algorithms depend on your class implementing
particular interfaces. You will want to implement functionality such
as ordering and copying using these interfaces to get the most out of
the collection classes.
Arrays
The array is the simplest collection. It is the most limited, and it is the
fastest. It is the fastest because it provides the least dynamic data
structure. You must declare the number of dimensions and the size of
the array when you create it. Any array you create is derived from the
System.Array abstract class. The array is the simplest collection
structure. It is the only collection class that is part of the System
namespace, instead of the System.Collections namespace.
2
In Brief
16. Collections
had originally planned.
These two restrictions help to make the array faster than other col-
lection classes. You do not need to check types when you examine
elements in an array. Also, there is much less work for the runtime to
manage the memory in an array. The array is also unique in that you
can declare arrays with multiple dimensions or jagged arrays.
Notice that the size of the array is not part of the variable type. The
following is legal C#:
When you specify the size of the array, as in the second example, the
number of elements in the initializer must match the size of the array.
Neither of the following will compile:
Once you create the array, you can access the elements by using the
array index:
3
Chapter 16 Collections
This example sets i to the value of the fourth element in the array.
The lowest index in all C# arrays is 0. If you try to access the
16. Collections
These simple examples show arrays with two dimensions. You can,
however, use any number of dimensions in an array. In fact, you can
even mix jagged arrays and multiple dimensional arrays.
4
In Brief
TIP: The extra overhead means that multiple dimension arrays are more efficient than
jagged arrays. You should use multiple dimension arrays instead of jagged arrays unless you
16. Collections
need to have different lengths for the different rows in your jagged array.
Collection Classes
The simple array can handle many of the cases where you need more
than one object of a particular type. When you need more sophisti-
cated data storage, the .NET BCL contains many classes you can use
for different data structures. I go through the collection classes in
order from the most general to the most specific. All the collection
classes store references to System.Object objects. Therefore, any
time you work with the elements in the collection, you will need to
explicitly cast the object in the collection to the specific type of ob-
ject you actually stored in the collection.
ArrayList
The ArrayList stores a sequence of elements. The ArrayList differs
from the Array class in that you can change the storage capacity of
an ArrayList dynamically. Internally, the ArrayList class uses an
array to store the objects in the collection. The performance charac-
teristics of the ArrayList and the array are very similar. The dynamic
memory management of the ArrayList will affect the performance
of this data structure. You can control these effects by using the Ca-
pacity property of the ArrayList.
TIP: You should choose the ArrayList to get the best performance for an unsorted sequence
of elements that must change size dynamically. The ArrayList also gets better performance
for sorted operations when you can control the frequency of insert and remove operations.
HashTable
The HashTable stores key/value pairs. HashTables cannot contain
multiple copies of the same key value. The HashTable provides the
5
Chapter 16 Collections
TIP: You should use the HashTable when most of your element access is searching for
elements, not iterating the collection. The HashTable is also faster than the SortedList when
elements are frequently added to the collection.
SortedList
The SortedList stores key/value pairs, just like the hash table does.
The SortedList differs from the HashTable in how it stores values
in the collection. A SortedList stores two arrays internally, one for
the keys and one for the values. Using the SortedList, you can search
for keys in the structure, just like you can with the HashTable. When
you iterate a SortedList, you will get the keys in sorted order. Every
insert operation moves elements in the internal arrays; each new ob-
ject gets inserted in sorted order. Insertions and lookups are slower
than with the HashTable, but you do get to iterate all the items in
order.
TIP: You should use the SortedList when you need key/value pairs in sorted order. It is
slower than the HashTable, but you can control the iteration order.
Stack
The Stack is a special purpose collection. The Stack provides a lim-
ited interface to support the specific interface of a stack: Items are
pushed onto and popped off of a stack. You can enumerate all the
items in the stack by using the IEnumerable interface.
TIP: You should use the Stack only when you need the particular characteristics of a last-in,
first-out stack container.
Queue
The Queue is also a special purpose collection. It provides a limited
interface to support adding items to the end of the queue and remov-
ing them from the front of the queue. The Queue does support the
IEnumerable interface, so you can examine all the items in the queue.
6
In Brief
TIP: You should only use the Queue when you need the particular characteristics of a first-in,
first-out queue container.
16. Collections
BitArray
The BitArray is a special structure that stores bits to represent an
array of Boolean values. The BitArray trades space for speed. The
BitArray supports Boolean logic operations on the bits in the
BitArray. You can perform AND, OR, NOT, and XOR operations on
all the bits in two different BitArray containers.
TIP: You should use the BitArray when your design creates Boolean data structures that
model large numbers of Boolean values.
Choosing a Collection
Choosing one of the special purpose collections should be fairly ob-
vious. If you need the special properties in the Queue, Stack, or
BitArray, that is what you use. You should choose one of the general
collection classes based on the performance characteristics of the
collection and which operations you will perform most often. Here
are a few selection tips:
• Use an array of the specified type if you do not need to resize the
collection often. This will give you the best performance.
• Use the ArrayList structure when you need to resize the collec-
tion. This is the most common collection to use when you need or
want the dynamic memory management provided by the runtime.
• Use the HashTable when you need to store keys and values, and
the majority of your access is searching for a specific key.
• Use the SortedList when you need to store keys and values, and
you need to enumerate those values in order.
7
Chapter 16 Collections
The boxing and unboxing operations take time and memory. You do
pay a performance penalty when you put value types in collections.
16. Collections
You can avoid this penalty by storing your value types in arrays, rather
than the more sophisticated collection classes.
Collection Interfaces
The collection classes are primarily distinguished by their performance
characteristics. You can write code that works correctly with mul-
tiple different collection classes by coding to the interfaces defined
by the collection classes.
The .NET design team separated the collection classes from the inter-
faces that collections implement for two different reasons. In addi-
tion to allowing you to create methods that can work with different
collection classes, it can help you write your own collection classes.
By implementing these interfaces, you can create a specialized col-
lection class that will work with any methods that work with the stan-
dard collections in the .NET BCL. The collections that are outside of
the System.Collections namespace implement many of these inter-
faces so that those specialized collections work seamlessly with the
.NET BCL.
ICollection
The ICollection interface provides methods that are implemented
by all classes that store more than one object. All the collection classes
listed in the “Collection Classes” section, including arrays, implement
ICollection. The ICollection interface contains a property for the
number of objects in the collection. The only method in the
ICollection interface supports copying the collection to an array.
IEnumerable
The IEnumerable interface contains one method: GetEnumerator
(). All the collections support this interface. In addition, many classes
that can act like a collection support this interface. For example, the
string class supports this interface so that you can enumerate the
characters in the string.
IEnumerator
You use an object that supports the IEnumerator interface to enu-
merate all the elements in a collection. The IEnumerator interface
has methods to move forward in the collection and to reset the enu-
merator to the beginning of the collection. Every collection returns
8
In Brief
16. Collections
The HashTable and SortedList classes support a special kind of
enumerator called the IDictionaryEnumerator. The IDictionary
Enumerator contains extra methods to return either the key or the
value in a dictionary.
IDictionary
The IDictionary interface is derived from both the ICollection and
IEnumerable interfaces. It adds those methods that are specific to
collections that store key/value pairs and support lookup based
on the keys. The IDictionary interface is implemented by
the HashTable, the SortedList, and many other special purpose col-
lections outside of the System.Collections namespace. The
IDictionary interface lets you add and remove key/value pairs or
enumerate the keys or values separately.
IList
The IList interface contains methods that should be supported by
any collection that stores a sequence of values. IList methods let you
add and remove elements, find elements by value, and retrieve values
based on their index in the collection. The common methods used by
all collections are actually part of the IList interface.
IComparer
The IComparer interface is used by the Sort () methods in the
ArrayList and SortedList collections to sort the objects that are
stored in the collection. You implement the IComparer interface to
provide an ordering in your class. If your class does not naturally
support an ordering function, you need not implement this interface.
If your class does not support an ordering function, simply don’t sup-
port this interface; the collections still work, except that you cannot
sort the elements.
9
Chapter 16 Collections
IHashCodeProvider
The IHashCodeProvider provides a custom hash function for dif-
16. Collections
10
Immediate Solutions
Immediate Solutions
16. Collections
The key point to remember as you look at these examples is that the
.NET collections are designed so that similar or identical constructs
work with different collections.
Using Arrays
Many of the samples written so far have included some form of array.
The System.Array and instantiations of that class are by far the most
common simple collection you will use.
11
Chapter 16 Collections
dimension array. When you declare the array, you must declare how
many dimensions are in the array. The number of dimensions is also
referred to as the rank of the array.
When you create the array, you must specify the length of each
dimension:
Like single dimension arrays, you can initialize the array when you
create it, or even when you declare the array:
12
Immediate Solutions
16. Collections
// of single dimension arrays.
JaggedArray[0] = new int [4]; // Create an array
JaggedArray[1] = new int [3]; // Second Array.
JaggedArray[2] = new int [5]; // Third Array.
// Create elements:
JaggedArray[0][0] = 0;
JaggedArray[0][1] = 1;
JaggedArray[0][2] = 2;
JaggedArray[0][3] = 3;
// Next array:
JaggedArray[1][0] = 4;
JaggedArray[1][1] = 5;
JaggedArray[1][2] = 6;
// Last array:
JaggedArray[2][0] = 7;
JaggedArray[2][1] = 8;
JaggedArray[2][2] = 9;
JaggedArray[2][3] = 10;
JaggedArray[2][4] = 11;
The foreach construct provides you with two very important benefits:
• The elements being retrieved from the array are typechecked.
Any incompatible elements in the array are skipped. In fact, the
InvalidCastException gets caught and handled, without you
needing to write any extra code.
13
Chapter 16 Collections
This will print all the numbers in the array. The innermost dimension
increases first, the outermost dimension last. For example, the pre-
ceding statement will print 0 through 11, in order. If you want to enu-
merate the rows and columns separately, you will need to use the
GetLength () method of the array class:
WARNING! Do not use the Rank or Length properties to enumerate the array. The
Rank property tells you how many dimensions are in the array. The Length property
gives the total length of the array, not the length of a single dimension.
14
Immediate Solutions
16. Collections
{ new int [] {0,1,2,3},
new int [] {4,5,6},
new int [] {7,8,9,10,11}};
The ArrayList stores a single sequence of elements, and you can add
and remove those elements anytime in your program. The Remove
() method lets you remove an element either by index or by value:
15
Chapter 16 Collections
Reserving Space
The ArrayList has two properties that let you manage the storage in
the ArrayList container: Capacity and Count. Capacity stores the
amount of space reserved in the collection for elements. Count re-
turns the number of elements actually in the collection. You can con-
trol the way in which the collection grows by manipulating the
Capacity before a large insert operation:
Trim the ArrayList with caution, because the next Add () will cause
the ArrayList to grow, with the associated time cost.
Enumerating Collections
These collections all contain indexers that return a reference to the
individual elements in the collection:
16
Immediate Solutions
16. Collections
l.Add (2); // Add one element.
l.AddRange (twoDArray); // Add a range of elements.
foreach (int i in l) {
Console.WriteLine (i.ToString ());
}
The foreach statement means you do not have to make the cast when
you retrieve objects from your collections. However, the foreach state-
ment does perform the same cast. If the wrong type of object is in
your collection, an InvalidCastException gets thrown, and the re-
mainder of the collection does not get enumerated.
17
Chapter 16 Collections
The class simply stores two strings: the artist and the title of the al-
bum. The last method, CompareTo (), provides the ordering func-
tion for the Album class. The CompareTo () method should return
–1 if this object is less than the reference object, 0 if the objects are
equivalent, and 1 if this object is greater than the reference object. If
the other object is not of the correct type, the CompareTo () method
should throw an ArgumentException.
// Sort:
arr.Sort ();
// Print all the albums:
foreach (Album a in arr) {
Console.WriteLine (a);
}
But, what if you want to sort the albums by title? Well, an overloaded
version of the Sort () method takes another parameter: an object
that supports the IComparer interface. IComparer has one method,
Compare (), which compares two objects of the same type. The re-
turn value for Compare () follows the same convention as the
CompareTo () method in the Icomparable () method. Here is a
simple class to compare the albums based on the title:
18
Immediate Solutions
16. Collections
public int Compare (object l, object r) {
Album left = l as Album;
Album right = r as Album;
if ((left == null) || (right == null))
throw new ArgumentException ();
if (left.Title != right.Title)
return left.Title.CompareTo (right.Title);
else
return left.Artist.CompareTo (right.Artist);
}
}
// Using it is simple:
// Sort by title:
arr.Sort (new TitleComparer());
foreach (Album a in arr) {
Console.WriteLine (a);
}
// Sort by title.
arr.Sort (new TitleComparer());
// Search by title (returns 2):
index = arr.BinarySearch (l, new TitleComparer ());
Console.WriteLine (index.ToString ());
WARNING! If you do not use the same Compare () functions for both the Sort () and
BinarySearch () functions, you get undefined results. Searching on an unsorted
collection also yields undefined behavior.
The ArrayList collection contains many other methods that let you
manipulate the collection. These methods let you copy ranges of the
ArrayList, remove or replace elements, or access individual elements.
19
Chapter 16 Collections
The .NET BCL has two kinds of dictionary collections: the HashTable
and the SortedList. Both of these collections store key/value pairs.
Because these collections store pairs of values, the interface is a bit
different from the sequence containers. When you enumerate these
values, you get a special type of object that contains the key/value
pair. The observable difference between these two types of collec-
tions is the order of the elements in the collections. When I refer to a
“dictionary” in any of the following discussions, the discussion is valid
for both the SortedList and the HashTable. The performance of the
collections differs as well. Searching for a specific key in a HashTable
is a constant time operation—it completes in the same amount of
time, no matter how many objects are in the collection. Searching for
a specific key in a SortedList is a logarithmic operation. The time to
find an element grows with the logarithm of the number of elements
in a collection.
20
Immediate Solutions
16. Collections
SortedList list = new SortedList();
list.Add ("Gordon", "Bare Naked Ladies");
list.Add ("Astro Lounge", "Smashmouth");
list.Add ("American Beauty", "Grateful Dead");
list.Add ("Abraxus", "Santana");
list.Add ("Wheels of Fire", "Cream");
Finding Elements
The dictionary classes contain indexers that let you find values in the
collection by the key. These give you a very natural syntax to access
and change elements:
21
Chapter 16 Collections
Enumerating Elements
As with other collections, you can use the foreach statement to enu-
16. Collections
foreach (DictionaryEntry d in h) {
Console.WriteLine ("{0}, {1}", d.Key, d.Value);
}
Up to this point, every section has shown you that the HashTable is
faster than the SortedList. Enumerations are the one area where the
SortedList may be more appropriate. The SortedList does keep
the collection in sorted order, based on the keys. The HashTable
enumerates the collection based on the value returned by the hash
function. That order is most likely meaningless to you and your users.
22