Boost logo

Boost :

From: Sam Schetterer (samthecppman_at_[hidden])
Date: 2007-03-18 17:18:11


Herve Bronnimann wrote
>Sam: again I'm curious (read: highly skeptical). When you say "much
slower" do you base yourself on actual data? >I.e. Have you tried it? The
reason I ask is that I don't believe it. Let's talk for concreteness of
three levels (name, >number, grade). Any algorithm that does three passes
on the data when one will do will be slower (doesn't matter how >complicated
your comparison is, if all the name data is distinct the outer sort
comparison will conclude quickly). >Radix sort insists on doing all three
passes even if all names are distinct. I'm sure data will back me up, if
not please >show me the numbers. --Herve
Here are some numbers for normal quicksort and radix quicksort on sorting
strings - these are from Algorithms in C++ third edition, by Robert
Sedgewick.
N is the number of words, and these words are from Moby Dick.
N standard quicksort three way partitioning quicksort
radix quicksort
12500
7 6
               6
25000 14 12
11
50000
34 26
25
100000 83
61 54
Now, these results are from single words.
here is the general equation for quicksort and radix quicksort on strings
Radix quicksort - k + (2N log N), where k is the average number of bytes in
each string.
Normal quicksort k(2N log N), where k is the average number of bytes in each
string.
Imagine the speed difference in the sorting algorithms if the strings were
first and last names, which commonly reach lengths of 15 or more characters,
unlike the words in Moby Dick. Think of how much faster radix quicksort
would be.
>From the point of view of a user appending bytes onto the end of a string
from a native arithmetic type, even if you lose time on the integer bytes,
you gain a very large amount of time when you sort the name part of the
string.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk