Boost logo

Boost :

From: Steven Ross (spreadsort_at_[hidden])
Date: 2008-07-04 10:31:27


Hi Phil,

I investigated a little further about VIA C3s and downloaded your testcase,
so I have more to add.

On Thu, Jul 3, 2008 at 9:36 AM, Phil Endecott <
spam_from_boost_dev_at_[hidden]> wrote:

>
> No, this is Linux. It's a VIA C3, so it probably has less cache than many
> other systems.
>

>From my reading, it actually has a larger L1 cache (and smaller L2 cache),
and for that purpose a MAX_SPLITS as high as 13 might be better. Increasing
LOG_MIN_SPLIT_COUNT might also speed up your test a little. Those are
probably the only consts that are worth changing.
std::sort performs very well as soon as the data is small enough to fit in
the L1 cache. Spreadsort performs less iterations the larger MAX_SPLITS is,
though if MAX_SPLITS exceeds the cache size, its performance gets worse by
50% or sometimes more (drops off a cliff). The best usage of Spreadsort is
usually to cut the data up small enough for std::sort to fit in the L1 cache
and finish sorting.

> OK, I've put a 14 Mbyte file called placenames.dat.bz2 in the directory
> http://chezphil.org/tmp/ (I've not written the URL in full to stop search
> engines from wasting their time on it). When un-bziped you'll find a
> tab-delimited 3-column text file. I got basically the same results using
> the latitude and longitude columns.
>

What's the best way to read data like that from a file?
fscanf("%f %f %s\n")? Won't that have problems if there is a space? I
don't do much file parsing in C/C++.
Spreadsort is often even faster relative to std::sort when sorting key +
data relative to sorting just keys, because it performs less swaps than
std::sort, and outside of swapping just uses the keys. It also looks like
you could use multikey sorting, which is a logical next step once I have
integer_sort, float_sort, and string_sort fully implemented.

Steve


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk