Anda di halaman 1dari 35

Topic 1011: Topics in

Computer Science
Dr J Frost
(jfrost@tiffin.kingston.sch.uk)

Last modified: 2nd November

A note on these Computer Science slides


These slides are intended to give just an introduction to two key
topics in Computer Science: algorithms and data structures. Unlike
the other topics in the Riemann Zeta Club, theyre not intended to
give a deeper knowledge required for solving difficult problems. The
main intention is to provide an initial base of Computer
Science knowledge, which may help you in your university
interviews.
In addition to these slides, its highly recommended that you study
the following Riemann Zeta slides to deal with more specific
Computer-Science-ey questions:
1. Logic
2. Combinatorics
3. Pigeonhole Principle

Slide Guidance
Any box with a ? can be clicked to reveal the
? answer (this works particularly well with
interactive whiteboards!).
Make sure youre viewing the slides in
slideshow mode.
For multiple choice questions (e.g. SMC), click your
choice to reveal the answer (try below!)

Question: The capital of


Spain is:
A: London
B: Paris

C: Madrid

Contents
1.
2.
3.
4.
5.
6.
7.
8.

Time and Space Complexity


Big O Notation
Sets and Lists
Binary Search
Sorted vs Unsorted Lists
Hash Tables
Recursive Algorithms
Sorting Algorithms
a) Bubble Sort
b) Merge Sort
c) Bogosort

Time and Space Complexity


Suppose we had a list of unordered numbers, which the
computer can only view one at a time.

Suppose
we want to check if the number 8 is in the list.
If the size of the problem is , (i.e. there are cards in the list), then in
the worst case, how much time will it take to check that some
number is in there?
And given that the list is stored on a disc (rather than in memory),
how much memory (i.e. space) do we need for our algorithm?
(Worst Case) Time
Space Complexity
Complexity
Time
Space
If theres n items to check, and each
takes some constant amount of
time to check, so we know the time
will be at most some constant times
n.

We only need one slot of memory for the


number were checking against the list,
and one slot of memory for the current
item in the list were looking at. So the
space needed will be constant, and

Big O notation
So the time and space complexity of an algorithm gives us a measure of
how complex the algorithm is in terms of the time itll take, and the
space required to do its handywork.

= 3 10x
2
2x
+
3

Since and for all positive , then

2x
3

We can see that as becomes

In mathematics, Big O notation is used to measure how some expression


grows.
Suppose for example we have the function:

larger, the and terms become


inconsequential, because the
term dominates.

Were not interested in the


scaling of 15, since this doesnt
tell us anything about the growth
of the function.

We say
that:

i.e. grows
cubically.

Big O notation

Formally,
if , then there is some constant such that for all sufficiently
large . So technically we could say that , because the big-O just provides
an upper bound to the growth. But we would want to keep this upper
bound as low as possible, so it would be more useful to say that .

While
big-O notation has been around for centuries (particularly in
number theory), in the 1950s, it started to be used to describe the
complexity of algorithms.
Returning to our probably of finding a number in an ordered list, we
can now express our time and space complexity using big-O notation
Remember that the
(in terms of the list size ):
Time Complexity
Space Complexity
constant scaling

doesnt matter in bigO notation. So 1 is


used to mean
constant time/space.

Big O notation
Well see some examples of more algorithms and their complexity in
a second, but lets see how we might describe algorithms based on
their complexity
Time Complexity
We say the time complexity of the
algorithm is

constant time

linear? time
quadratic
? time
polynomial
? time
exponential
? time
logarithmic
? time

Sets and lists


A data structure is, unsurprisingly, some way of structuring data,
whether as a tree, a set, a list, a table, etc.
Theres two main ways of representing a collection of items: lists
and sets.
Lists
Example

Sets

<4, -2, 3, 6,
3>

Does ordering of
items matter?

Yes
?

Duplicates
allowed?

Yes
?

No.

and are the


same set.

No
?

Binary Search

12 15 20

Suppose we have either a set or list where the items are in

ascending order. We want to determine if the number 14 is in


the list.
Previously, when the items were unordered, we have to scan
through the whole list (a simple algorithm where the time
complexity was ).

But can we do better?

More specifically, seeing if an item is within


an unsorted list is known as a linear search.
(Because we have to check every item, taking
time linear in n!)

Binary Search

This line represents


where the number
were looking for could
possibly be. At the
start of a binary
search, the number
could be anywhere.

12 15 20

A sensible thing to do
is to look at the
number just after the
centre. That way, we
can round down our
search by half in one
step.
In this case , so we
know that if the
number is in the

Looking to
see if 14 is in
our list/set.

Binary Search

12 15 20

Now we looking

halfway across what


we have left to check.
The number just after
the halfway point is 15.
, so if 14 is in our
collection, it must be to
the left of this point.

Looking to
see if 14 is in
our list/set.

Binary Search

12 15 20

Now wed compare our


number 14 against the
12.
Now since , we now
know that 14 is not in
the collection of items.

Looking to
see if 14 is in
our list/set.

Binary Search

12 15 20

We can see on each step, we half the number of items that need to be
search. The number of steps (i.e. the time complexity) in terms of the
number of items must therefore be:

Time Complexity ?

This makes sense when you think about it. If , then , i.e. we can half 16 four

times until we get to 1, so only 4 steps are needed.


You might be wondering why we wrote instead of . This is because changing
the base of a log only scales by a constant, and as we saw, big-O notation
doesnt care about constant scaling. So the base is irrelevant.

Space Complexity ?

We only ever looking


at one number at a
time, so only need a
constant amount of

Sorted vs unsorted lists


Keeping our list sorted, or leaving it unsorted, has advantages either
way. Weve already seen that keeping the list sorted makes it much
quicker to see if the list contains an item or not.
What is the time complexity of the best algorithm to do each of
these tasks?
Sorted
Unsorted
Seeing if the list
contains a
particular value.

Adding an item
to the list.

We find the correct position to insert in time using


a binary search. If we have some easy to splice in
the new item somewhere in the middle of the list,
without having to move the items after up to
make space, then were done. However, if we do
have to move up the items after (e.g. the values
are stored in an array), then it takes time to shift
the items up, hence its time overall.

Merging
two lists
(of size and
respectively,
where )

We can just stick the item on


the end!

Start with the largest list, with its items. Then


insert each of the items from the second list into
it. Each insert operation costs time (from above).
But theres items to add.

Easy again. Just have the end


of the first list somehow link to
the start of the first list so that
theyre joined together.

Sorted vs unsorted lists


Sorted

Unsorted

Seeing if the list


contains a particular
value. an item to
Adding

the
list. two lists (of
Merging
size and
respectively, where )

We can see that the advantage of keeping the list unsorted is


that its much quicker to insert new items into the list.
However, its much slower to find/retrieve an item in the list,
because we cant exploit binary search.
So its a trade-off.

Hash Table
Hash Tables are structure which allow us to do certain operations to do
with collections much more quickly: e.g. inserting a value into the
collection, and retrieving!

Imagine
we had 10 buckets to put new values into. Suppose we
had a rule which decided what bucket to put a value into:
Find the remainder when is divided by 10 (i.e. )

Hash Table
We can use our mod 10 hash function to insert new values into our
hash table.

3
1

6
7

4
2

1
9

11
2

55

57

29

33

69

Hash Table
The great thing about a hash table is that if we want to check if
some value is contained within it, we only need to check within the
bucket it corresponds to.
e.g. Is 65 in our hash table?
Using the same hash function, wed just check Bucket 5. At this
point, we might just do a linear search of the items in the bucket to
see if the 65 matches. In this case, wed conclude that 65 isnt part
of our collection of numbers.

3
1

11
2
4
2
2

33

69

55
4

57
6
7
7

29
1
9
9

Hash Table
Suppose weve put n items in a hash table with k buckets:
Operation

Time Complexity

Seeing if some number


is contained in our
collection.
Inserting a new item
into the hash table
structure.

3
1

O(n/k)

But only if our chosen hash function distributes items fairly


evenly across buckets. But if our data tended to have 1 as the
last digit, mod 10 would be a bad hash function because all
the items would end up in the same bucket. The result would be
that if we wanted to then check if 71 was in our collection, wed
end up having to check every item still! Using mod p where p
is a prime, reduces
O(1)this problem.

Presuming the hash function takes a constant amount of time to


evaluate, we just stick the new item at the top of the correct
bucket. We could always keep the buckets sorted. In which
case, insertion would take O(log(n/k)) time.

11
2
4
2
2

33

69

55
4

57
6
7
7

29
1
9
9

Recursive Algorithms

The Towers of Hanoi is a classic game in which the aim is to


get the tower (composed of varying sized discs) from the first
peg to the last peg. Theres one spare peg available.
The only rule is that a larger disc can never be on top of a
larger disc, i.e. on any peg the discs must be in decreasing size
order from bottom to top.
Theres two questions we might ask:
1. For n discs, what is the minimum number of moves
required to win?

Recursive Algorithms

We can answer both questions at the same time.


Suppose HANOI(START,SPARE,GOAL,n) is a function which
generates a sequence of moves for n discs, where START is the
start peg, SPARE is the spare peg and GOAL is the goal peg.
Then we can define an algorithm as such:

Recursive Algorithms

Recursively solve the problem of moving n-1 pegs from the


start peg to the spare peg.
i.e. HANOI(START,GOAL,SPARE, n-1)
(notice that weve made the original goal peg the new spare peg and vice versa)

Its quite common to define a function in terms of itself but


with smaller arguments. Its recommended you first look at
some of the examples in the Recurrence Relations section of
the RZC Combinatorics slides to get your head around this.

Recursive Algorithms

Next move the 1 remaining disc (or whatever disc is at the top
of the peg) from the start to goal peg.
i.e. MOVE(START,GOAL)

Recursive Algorithms

Finally, recursively solve the problem moving n-1 discs from


the spare peg to the target peg.
i.e. HANOI(SPARE,START,GOAL, n-1)
Notice here that the original start peg is now the spare peg,
and the spare peg the start peg.

Recursive Algorithms

Putting this together, we have the algorithm:


FUNCTION HANOI(START, SPARE, GOAL, n) =
HANOI(START, GOAL, SPARE, n-1),
MOVE(START, GOAL),
HANOI(SPARE, START, GOAL, n-1)

But just like recurrences in maths, we need a base case, to


say what happens when we only have to solve the problem
when n=1 (i.e. we have one disc):
FUNCTION HANOI(START, SPARE, GOAL, 1) =
MOVE(START, GOAL)

Recursive Algorithms
A

We can see this algorithm in action. If the 3 pegs are A, B and C, and we
have 3 discs, then we want to execute HANOI(A, B, C, 3) to get our moves:
HANOI(A, B, C, 3)
= HANOI(A, C, B, 2), MOVE(A, C), HANOI(B, A, C, 2)
= HANOI(A, B, C, 1), MOVE(A, B), HANOI(C, B, A, 1), MOVE(A, C),
HANOI(B, C, A, 1), MOVE(B, C), HANOI(A, B, C, 1)
= MOVE(A, C), MOVE(A, B), MOVE(C, A), MOVE(A, C), MOVE(B,
A),
MOVE(B, C), MOVE(A, C)

Recursive Algorithms

The same approach applies when counting the minimum number of


moves.
Let F(n) be the number of moves required to move n discs to the
target peg.
1. We require F(n-1) moves to move n-1 discs from the start to spare
peg.
2. We require 1 move to move the remaining
disc to the goal peg.
?
3. We require F(n-1) moves to move n-1 discs from the spare to goal
peg.

This gives us the recurrence relation F(n) = 2F(n-1) + 1

Sorting Algorithms
One very fundamental algorithm in Computer Science is sorting a
collection of items so that they are in order (whether in numerical order,
or some order weve defined).
Well look at the main well-known algorithms, and look at their time
complexity.

1
9

3
1

4
2

55

6
7

11
2

Bubble Sort
This looks at each pair of numbers first, starting with the 1 st
and 2nd, then the 2nd and 3rd , and swaps them if theyre in
the wrong order:

3
1

1
9

55

4
2

11
2

6
7

Click to
Animate

At the end of the first pass*, we can guarantee that the


largest number will be at the end of the list.
We then repeat the process, but we can now ignore the last
number (because its in the correct position). This continues,
until eventually on the last pass, we only need to compare
the first two items.
* A pass in an algorithm means that weve looked through all the values (or
some subset of them) within this stage. You can think of a pass as someone
checking your university personal statement and making corrections, before
you give this updated draft to another person for an additional pass.

Bubble Sort

3
1

1
9

55

4
2

11
2

6
7

Time Complexity?

O(n2)
The first pass requires n-1 comparisons,
the next pass
?
requires n-2 comparisons, and so on, giving us the sum of an
arithmetic sequence.
So the exact number of comparison is n(n-1)
This is growth quadratic in n, i.e. O(n2)

Merge Sort
First treat each individual value as an individual list (with
1 item in it!)

3
1

1
9

4
2

55

1
9

3
1

4
2

1
9

3
1

4
2

55

1
9

3
1

55

4
2

55

11
2

6
7

11
2

6
7
4

6
7

4
6
7

11
2

11
2

Then we repeatedly merge each pair of lists, until we only


have 1 big fat list.
Well go into more detail on this merge operation on the
next slide.

Merge Sort
At each point in the algorithm, we know each smaller list will be in
order.
Merging two sorted lists can be done quite quickly (click the button
below):
1 3
11
4 6
Click to
55
2 4
Animate
9 1
2
2 7

New merged
list

General jist: Start with a marker at the beginning of


each list.
Compare the two elements at the markers. The
lowest value gets put in the new list, and the marker
at that item used moves up one. Then repeat!

Merge Sort
Time Complexity?

O(n log n)
Each merging phase requires exactly n steps because when merging
each pair of lists, each comparison
? puts an element in our new list. So
theres exactly n comparisons.
Theres log2 n phases, because similarly to the binary search, each
phase halves the number of mini-lists.

Bogosort
The Bogosort, also known as Stupid Sort, is intentionally a joke
sorting algorithm, but provides some educational value. It simply goes
like this:
1. Put all the elements of the list in a completely random order.
2. Check if the elements are in order. If so, youre done. If not,
then go back to Step 1.
We can describe time complexity in different ways: the worst-case
behaviour (i.e. the longest amount of time the algorithm can
Worst
Case Time
Average
Case Time
possibly
take)Complexity?
and the average-case
behaviour
(i.e.Complexity?
how long we
expect the algorithm to take on average)
The algorithm theoretically
O(n n!)
may never terminate, because
There are n! possible ways the items
the order may be wrong every
can be ordered. Presuming no
duplicates in the list, theres a 1 in n!
time.

chance that the list is in the correct


order. We therefore expect to have to
repeat Step 1 n! times.
Each check in Step 2 requires checking
all the elements, which is O(n) time.
It might be worth checking out the
Geometric Distribution in the RZC

Anda mungkin juga menyukai