**2. Sorting**

The
problem we address is as follows:

We have n objects, which we can compare with one
another and judge between them, and when we do so we always assume we have a
method for determining that one is bigger than the other.

We actually usually do not sort the objects
themselves which could be huge and moving one at all could be a large chore.
Instead we represent each object by a numerical **key. **Given two
keys, we can go back to the objects and make whatever appropriate comparison we
want among the two objects, and determine which key is bigger in the sense we
are considering.** **

The keys are given to us in a random
order. We want to rearrange them in order of size.

Our tools are the ability

to compare
keys: given keys a and b, to determine which is bigger;

to move
keys around, and

to do an
action we will call “compare and switch” by which we mean: compare the two keys
a and b, put the larger one on the right and the smaller one on the left.

Our object is to do this with the
least amount of effort, where by effort we can mean either the fewest
comparisons, the least motion, the least time when comparisons can be done in
parallel, or whatever else you can think of.

We will stick to binary comparisons in
which we compare one key with another.

The problem is actually a variant of
the weighing problem except there are no ties, and there seems to be no point
in comparing several keys at a time with a similar number of others.

First question**: what is an upper bound on the number of comparisons needed to solve
this problem?**

We can argue exactly as in the
weighing case. The number of outcomes of our comparisons must equal or exceed
the number of possible situations.

Here the actual ordering of our n keys
could be any ordering of them: and there are n! of these.

The number of possible outcomes
of k comparisons is two to the power k,
(there are no ties so only two possible outcomes per comparison)

We get n! < 2^{k}
or log n! < k, where here the log is taken to base 2.

As we shall see a bit later, we can
find an expression for n! which is quite accurate for our purposes. We claim
here, with proof deferred that it has the form c(n/e)^{n+1/2}, where c
is a constant.

**Exercise:1.1 This yields a
bound for k. Figure it out.**

Now lets consider some schemes for actually sorting.

It
turns out that almost any conceivable approach can be used to accomplish this.

Here are some examples.

1.** Sorting by insertion**: start with an unsorted mess of keys,

pick
out two and sort them,

then
insert another key to get three sorted keys, etc,

at
stage k you will have k or so sorted keys and take another one from the
unsorted rest and “**insert it**” among
the sorted ones, until all are sorted.

2. **Sorting by merging:** you start with your
unsorted mess,

sort
them into pairs

then
by a step for “merging two sorted lists to make one” sort them into fours,

then
eights, then etc, until they are all sorted.

There
is simple merging, and more complicated such things; we will eventually look at
a **non-adaptive** sorting procedure of
this kind.

3. **Sorting by pulling off the biggest (or
smallest), successively**:

the
top,

next
to top,

third
biggest, ….

We
will look at two such methods: **sorting
by having a tournament, and heap-sort.**

4.
Sorting by

inserting
keys one at a time where they belong, dividing the rest so that smaller kets
are to the left of it and bigger ones to the right.

This
is called **quicksort.**

Let
us consider these

**Quicksort**

The basic step, starting with n unsorted keys arranged in a line, is

pick one of them, x, and by comparing it with all
n-1 other keys, rearrange the line so that each key smaller than x is to its
left and each larger key is to its right.

here is a small example: x=6 and the
other 7 keys (n=8) are, in order

5 1
9 45 7 8 2.

There are three questions to address:

**first, how do we choose x?**

**how
do we arrange it so that smaller keys are to the left of x and larger keys to
the right?**

**Finally,
what does this accomplish for us?**

** **

** **

For the first you usually pick a key
not picked before at random and use it as x.

We insert x where it belongs as
follows.

We start by putting x at one end of
the list and all the other keys on it are said to be alive.

We then **compare x with the live key (call it z) furthest from x**. We
rearrange those two keys only, leaving the others fixed by putting the larger
one on the right and the smaller on the left. We then declare z dead.

We repeat this step until all keys
other than x are dead.

Then our task is accomplished.

Let
us see what happens with our keys.

we
start with **6** 5 1 9 45 7 8 2.

mark dead keys in boldface

first
we compare 6 with 2 the key farthest from 6; 6 is bigger and they therefore
switch and 2 dies, giving us **2*** *5 1 9 45 7 8 **6.**

Now
we compare 6 and 5 (the farthest live key from 6), there is no switch, and 5
dies;

we compare 6 with 1 and 1 similarly dies,

then compare 6 with 9, 6 switches, and 9 dies,

then
we compare 6 successively with 8 then 7 then 45, each of which dies at the comparison.

Writing
out every single step the procedure looks like

** 6** 5 1 9 45 7 8 2

**2*** *5
1 9 45 7 8

**2 5**
1 9 45 7 8 __6__

**2 5 1**
9 45 7 8 __6__

**2 5 1**
** 6** 45 7 8

**2 5 1**
** 6** 45 7

**2 5 1**
** 6** 45

**2 5 1**
__6__**45 7 8 9**

So with n-1 comparisons (you have to
compare x with every other key to be sure that that key ends up on the correct
side of x) we get x in its right place.

** What good does this do us**?

We have split the problem into two
smaller ones, the left hand one (here
sorting 2 5 1, a three key problem) and a right hand one (here a 4 key
problem);

in general, the sum of the of the
sizes of the two problems will be n-1, (since we can now forget about x itself)

And what good does this do us? if we
are lucky (or skillful) and x is near the middle of the ordering of the keys it
does us lots of good; we can use the same procedure again on both of the two remaining
unsorted blocks, and therefore with n-2 additional comparisons, break the keys
into **four **smaller groups, and so on.

If we break it evenly each time, we
will be done after log n (to base 2)
rounds of this procedure. This will
require a total of roughly n log n steps.

On the other hand we could have very
bad luck and x could always be maximal or minimal every time we pick it. If so,
(in worst possible case) this method could be quadratic, since all the
comparisons to insert x could reduce the problem in size merely from n to n-1.

Quicksort is often used, because if
you are able to pick the keys randomly, you can expect to take only a small
multiple of n(log n) steps.

Here is a plausibility argument for
this: if you pick x at random, at approximately 1/3 of the time your x will be
between the n/3rd smallest and n/3rd largest. If you only look at the effects
of these choices, for each the problem addressed goes down in size from what it
was, say m, to two problems, the larger of which has size at most 2m/3; a
reduction by a factor of 3/2 at worst.

This means that after log n to the
base 3/2 of these good steps, the largest should go down to 1 and that means
after sort of 3*log_{3/2} n (“x insertion”) steps we should be done.
And we can more or less expect to be done in at most this many steps.

This gives us a horribly crude
performance estimate of an upper bound that is only a small constant off from
the best.

**Heap Sort**

This is a **pull
off at the top method**, that has the
virtue that it can sort everything in place, has no need of remembering
anything, uses only comparison and/or comparison and switch plus n-1 additional
switches of keys. At the end the keys are all lined up in order.

How is it done?

Here are the steps.

First, number the keys as they are
given to you and imagine they are in the form of the vertices of a balanced
binary tree (better called root system since as a tree it is upside down) (with
vacancies perhaps on the bottom row.)

**What
does this mean**?: key number 1 is at the top; it has two children, keys 2
and 3; these have children, 2 has keys 4 and 5, and 3 has 6 and 7 as children.

In general, **the children of key x will be keys 2x and 2x+1**, unless of course we
started with fewer keys than 2x+1; if there are fewer than 2x keys, then x is
at the bottom of the tree.

**The first
major step is to rearrange the keys in order to make this tree into a “heap**”. This means, **we arrange it
so that each key is bigger than its children**. We will discuss how to do
this shortly. It is not the big step, since it takes a number of actions only
of order n, not nlogn.

Once one has a heap, the top (active) key is bigger
than its children which are each bigger than their own, and so on, so that the
top key is bigger than **every (active)
key**.

What we do then is to take that top key and switch
it with the last active key, and then make the former top key inactive.

The active keys then form a “headless heap”; they
are heap-like except for the top key, which is not necessarily bigger than its
children.

The key step then, which gets repeated over and over
again, is to convert a “headless heap” back into a heap.

We will see that this can be done in at most 2logn
steps (log to base 2), (often somewhat fewer) so that this method takes at most
roughly twice the minimum number of steps.

**So how do we
convert a headless heap into a heap**?

The possibly wrongly placed key is in
position 1;

we compare its children, in positions
2 and 3, and determine that the bigger child is in position 2 (**say**);

we then compare and switch 1 with 2
putting the bigger of the two keys in 1. Now the key in 1 is definitely bigger
than that in 2, and since it is at least as big as the former 2 key, it is
bigger than the key in 3 as well.

**There
is still a problem; **

**If
the new key in 2 is the key that was in
1, the sub-tree whose top key is now 2
may be a headless heap**.

We treat this subtree exactly the same
way we treated the whole tree before;

namely we compare the children of 2, (5
with 4) and determine which is bigger;

we compare and switch the key now in 2
with the bigger, and by the same argument, the new key in 2 is bigger than the
new ones in 5 and 4, and the headless heap problem is now at worst in the
subtree headed by either 5 or 4 **but not
both.**

We continue this process, at each
step, **which consists of a comparison and
a comparison and switch**, the headless heap problem trickles **one level down** in the tree.

When the possibly bad head becomes a
leaf among the active keys, the problem disappears: **a leaf is automatically a heap since it has no children to be bigger
than it.**

** **

The depth of our tree is log_{2}
n; so in at most 2log_{2} n comparisons, we cure the headless heap
problem; and are ready to make another switch.

Suppose we start with 7 keys and make
them into the following heap:

k1 = 15

k2 = 11 k3 = 6

k4 = 7 k5
= 8 k6 = 3 k7 = 1

We start with the switch between k1 and k7 after
which the new k7 dies.

We underline the dead keys, put the suspect head
location in boldface along with the largest of its child's keys;

**k1** = 1

k2 = 11 k3 = 6

k4 = 7 k5
= 8 k6 = 3 k7** = 15**

Now the problem is k1; we compare its
children, k2 and k3 and find k2 to be bigger;

**k1** = 1

k2 = **11** k3 = 6

k4 = 7 k5
= 8 k6 = 3 k7** = 15**

Now
compare and switch k1 with k2 to get:

k1 = 11

** **

**k2** = 1 k3 = 6

k4 = 7 k5
= 8 k6 = 3 k7** = 15**

Now the problem is k2; we compare its
children and find k5 bigger

k1 = 11

** **

**k2** = 1 k3 = 6

k4 = 7 k5
= **8** k6 = 3
k7** = 15**

Now compare and switch k2 with k5 to
get:

k1 = 11

k2 = 8 k3 = 6

k4 = 7 **k5**
= 1 k6 = 3 k7** = 15**

We now have a heap again and can
switch again:

**k1** = 3

k2 = **8** k3 = 6

k4 = 7 k5
= 1 k6 = ** 11** k7

We now have our headless problem at
the top: we compare k2 and k3 and find that k2 is bigger; so we next compare
and switch k1 and k2; combining these two steps we push the headless problem to
level 2, to k2 getting

k1 = 8

** **

**k2** = 3 k3 = 6

k4 = **7** k5 = 1
k6 = ** 11** k7

__ __

Comparing k2’s children, k4 and k5, we
find k4 larger and so next compare and switch k2 and k4, getting

k1 = 8

k2 = 7 k3 = 6

** **

**k4** = 3 k5 = 1
k6 = ** 11** k7

We now have a heap again and can
switch; getting

**k1** = 1

k2 = **7** k3 = 6

k4 = 3 k5 =** 8** k6 =

Again the problem is at the top; we
compare k2 and k3, find k2 bigger, and compare and switch k2 and k1 to get

k1 = 7

** **

**k2** = 1 k3 = 6

k4 = **3** k5 =** 8** k6 =

This time k2’s has only one live descendent, so we
can just compare and switch it with k4, which again gives a heap among the now
few live keys:

k1 = 7

k2 = 3 k3 = 6

**k4** = 1 k5 =** 8** k6 =

__ __

After one more switch we get

**k1** = 1

k2 = 3 k3 = **6**

k4 = ** 7** k5 =

Now we compare k2 with k3, find k3 large, and
compare and switch k1 with k3, and k3 is now a leaf, so we have a heap again.

k1 = 6

k2 = 3 **k3** = 1

k4 = ** 7** k5 =

__ __

Now we make our last switch and now have to do only one comparison and switch
between k1 and k2 and a switch and we are done, getting

k1 = __1__

k2 = ** 3** k3 =

k4 = ** 7** k5 =

Notice that after every switch step
except the next to last, the first compare and switch step is unnecessary; the
suspect top key, having come from the bottom, is definitely smaller than one of
the level two keys, and so

it
will always switch.

Subsequent comparison and switches may
not switch, and if so the children's heaps below need not be fixed at all.

**Now
how do get the original keys into a heap in the first place?**

**Here
is one way**: if we look at the leaves alone, they **are heaps** since there are no descendents to be bigger than them.

If we go up one level, then we have **headless heaps with two levels**.

We know how to handle headless heaps!

So we make them into heaps by curing
their heads as done ad nauseam above.

Now we can go up to the third level
from the bottom: including these keys we **now
have headless heaps again**!

Well, cure, them and keep rising in
the same way.

Each time we extend our heapishness
one level up the tree, having cured to the previous level, we need only cure
headless heaps!

Another way is to put all the small
keys at the bottom level, then all the next smallest keys at the next level,
etc.

In our case that involves finding the
4th largest key out of our 7 and putting it and the smaller keys in the last
row, then finding the middle key of the top 3 and then putting the bottom two
in level two, and the biggest at the top.

For a full tree like we have here,
this requires finding the median of the keys, (the middle element) and the
median of the top half, etc. We will see how to do that with a number of
comparisons linear in n.

** **

**Tournament Sort**

** **

**Tournament
sort** is a pull -off- top method that is very efficient but applying it
involves remembering which keys have been compared to which others and using
that information to determine what to do next. This takes some effort; it
really isn’t too bad otherwise.

First you have a regular tournament
among the players, oops among the keys.

At first they play in pairs. The
winners advance to the next round and again play each other. Winners again
advance, etc., until there is one winner.

Assume that we started with a power of
two number of keys, like 32.

Then
after one round there are 16 winners, second round 8 third round 4, fourth
round 2, and at last 1 winner.

Notice that the winner plays 5
matches, (in our case key comparisons) and everyone else plays at most 4
matches with players other than the winner.

At this point there have been 16+8+4+2+1
=31 comparison and we have 1 winner.

**But
notice that the second best must have lost only to the winner.**

We can therefore find it by comparing
the 5 players that the winner beat.

So have the first eliminated play the
second, the winner of that play the third eliminated of the winner’s victims,
etc.

You can show that we can determine the
second winner with at most 4 contests and **again,
there will be at most 5 players that the second winner and winner alone beat**.

And these can fight it all out in 4
more contests, etc, so we can find the second third fourth fifth etc, winners,
all with at most 4 contests each.

The number of comparisons to find the
next winner after the first here is 4 or log_{2}32 - 1. In general we end up with somewhat fewer than log_{2}n
comparisons per key after the first.

Please convince yourself that if you
always have the players who were eliminated after playing the fewest games play
first. the keys that are eligible to become k-th biggest are always at most 5
in number.

Notice that as you go along, more and
more of the keys have their ranks determined, so that the number of keys left
to compare with to determine the new winner goes down from the 4 above to 3
then 2 then 1.

**Simple Merging**

This involves sorting into pairs, then combining
them into sorted 4’s then 8’s then etc..

At each stage you take two sets of keys of the same
size, each of which is already sorted, and combine them together.

Notice that the only candidate for top of both of
two sorted lists are **the tops of the two
lists**.

If we compare them and pull off the biggest, we
again have two sorted lists left, and can again pull off the biggest of the two
tops.

If we keep on doing this until one list is depleted,
we will have sorted the pair of lists into one.

The trouble with this approach (which at one time
was used commercially) is that it requires extra space to put the keys that are
pulled off the top. Since you do not know which list will be depleted when you
merge them, you can’t use the partially depleted list space very efficiently.

**Insertion Sort**

To describe **insertion sort**, you need only say how you will go about inserting a
new unknown key, call it x, in a sorted list of size k.

Let the sorted keys be denoted as key1,
key2, .. keyk

What we want to do is to compare x to
a middle key, and the one in position [k/2]+1
namely key([k/2]+1) will do.

If x is smaller we want to move all the keys from
this middle and beyond over by 1 to make room for x, and insert x into the list
key1, key2, … key(k/2).

If f is bigger, we don’t move anything
but now insert x into the right hand half of the list: key([k/2]+2),…,keyk.

This can be shown to require a minimum
number of comparisons, but it requires lots of key movement, which can be bad
if key movement has cost.

We
can describe this algorithm for insertion of x into L as follows, if we call it
I(x,L) and call moving a list L over M(L)

I(x,{1,…,k}) = if
x<key ([k/2]+1) I(x,{1,…,[k/2]})
and M{[k/2]+1,…,k}

otherwise I(x,{[k/2]+2,…,k})

Exercises:

1.
Estimate the number of comparisons needed in each of the five algorithms

discussed
above. Also estimate the number of other actions needed, such as movements of
keys.

2.
If you can program, write a computer program to implement each of these
methods. If you do not program, write explicit directions in “pseudocode” to do
this. (Pseudocode, means a set of instructions in English that are so explicit
that a moron (ie, a computer) could follow them and perform the sorting task.