From: Steven Ross (spreadsort_at_[hidden])
Date: 2008-07-08 10:42:11
I haven't gotten around to making changes yet (weekends are when I have
time), but I still have some feedback.
- Use new and delete rather than *alloc and free.
> - Can any exceptions occur? If so, is the dynamically allocated memory
> leaked? Could you use a smart pointer to avoid this?
Using *alloc, they should return NULL if allocation failed, and that
situation is handled. With new and delete I'll have to catch memory
exceptions (and write a constructor for the Bin structure) and probably use
smart pointers. I could also convert to only using a fixed-size bin array,
and putting this on the stack, as if you look at what the default constants
do, the bin array ends up being of size MAX_SPLITS.
If it doesn't impact performance, I'll be happy to do it the C++ way.
Otherwise I'll have to investigate what's going on in detail.
> - Does it work with uint64_t? I see some variables declared as 'long',
> which often can't store a uint64_t.
There exists the problem of how to identify and deal with negatives. I try
to avoid forcing any particular return data type, but in the case of divMin
and divMax, they are used in performance-critical loops, so they should be
used as constants without any extra computation. They are already divided
by 512, so an unsigned should fit inside a signed data type without trouble.
Should I use an int64_t?
- If std::vector<int>::iterator really is slower than int* (and I suggest
> some investigation of the assembler to see why this is), then you could add
> a specialisation that converts vector::iterators into pointers.
Good point. I can look at that, though it could be due to my ancient
system. I see the same performance problem with std::sort, and for a vector
that isn't vector<bool> (why would anybody sort a vector<bool>?) I should be
able to convert to a pointer.
I haven't read assembler since college, and I don't even remember how to
obtain the assembler for C++ source.
I do also have a Windows development environment if finding assembler on
that is easier.
> - Is there a use-case for a pure radix sort, without the std::sort
> fallback? i.e. are there types that can't easily provide a comparison
> function, or distributions where the std::sort fallback is never useful?
A stable, external LSB Radix Sort could be templated by overriding the &
operator, and passing in the key size in bits as an argument. This would be
particularly useful to those who need a stable sort, and for those with
small keys (32 bits or less) and memory to spare, it should be quite fast
for large n, running in O(n(keysize/bincount)) operations, and without
recursion. The ideal bincount would be close to Spreadsort's MAX_SPLITS,
probably between 9 and 11. It would probably be best to have a radix_sort
and a float_radix_sort. I make no claim to be an expert on LSB Radix sort,
and would welcome someone else writing it. If I did write it, I'd probably
take some fast open-source implementation and just template it.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk