|
Boost : |
From: Lewis Hyatt (lhyatt_at_[hidden])
Date: 2007-03-22 22:30:38
Sam Schetterer <samthecppman <at> gmail.com> writes:
>
> Hello. I have recently uploaded the updated sorting.zip file to vault, and
> it contains the full header file for multikey quicksort. For those of you
> who download the library and those of you who have downloaded the example
> library, some feedback, good or bad, is appreciated. Thanks.
Hi Sam-
First of all, please try to exhibit more patience. Most of the people here are
only working during their spare time.
Secondly, you seem to be using a very bizarre interface to this mailing list.
You don't quote properly, and you keep starting new threads to respond to old
threads. This makes it harder for people to find your messages, assuming they
are willing to put in the effort to find them in the first place. If you can't
figure out how to get your mailer to work properly, please just use the gmane
interface here: http://news.gmane.org/gmane.comp.lib.boost.devel. IMHO, it's the
best way to do it anyway.
Anyway, on to the code. I don't know much about sorting. I tried compiling your
radix sort example code last week. It failed to compile because of misuse of the
typename keyword, either too often or not often enough. I fixed that, and was
able to compile the code, but your example failed. (It tested an error
condition, and reported an error). I tried writing my own test code to do some
timings, but your code crashed. What compiler are you using? I am surprised the
example radix code worked at all on any compiler.
I haven't tried compiling the multi-key quicksort code here, but I did read the
beginning of it and found a number of problems with the fundamental design. I
tried to indicate a few things in the code annotations below, but I didn't make
it all the way through.
All things considered, I think it is great that you are so excited about C++,
and that you want to apply what you know to sorting algorithms, but you still
have *a lot* to learn before you will be able to write a library suitable for
inclusion into Boost. Here are my suggestions:
-Slow down. You don't need to post code to the vault every day. Instead,
focus on learning how to improve the algorithms and the fundamental design.
-Premature optimization is bad, and it seems to be your number one concern.
You need to write a working library *first*, and then optimize it. Adding things
like different calling conventions and i686-specific assembly instructions is
necessary _way down the road_, if at all--certainly not now, when your code
doesn't even work or compile on all systems, and you yourself don't know whether
it is endianness-independent or not.
-I think you have done a good job of identifying specific goals for the
library at this point. The focus on radix sorts, multikey sorts, etc, could
eventually produce something very valuable. But you need to write a coherent set
of timings and test cases to demonstrate conclusively that your code is easy to
use and is better than std::sort. Quoting theoretical arguments from some book
is not enough; you need concrete test cases to establish this.
-I think you need to learn more about how to write good generic code. One
possible suggestion: read through your STL implementation of <algorithm>,
<numeric>, and <vector>. Make sure you understand what each and every line does,
and why it is there. If you don't have a good STL implementation to look at,
then look at gcc, which is very comprehensible. Then, read through the complete
source of some top-notch boost generic libraries to understand the techniques
used there. I would recommend at least ptr_container and shared_ptr, which are
incredibly useful and well-designed generic libraries. There are many others in
Boost as well, of course. Understanding these concepts is infinitely more
important than adding inline assembly language and unnecessarily tossing in the
register keyword here and there.
-Read all of the articles here: http://www.gotw.ca/gotw/
-Learn about exception safety.
I hope these suggestions are useful to you and encourage you to keep working on
this C++ library.
Code Annotations prefixed by ==> :
-------------------------------------------
//include file for multikey quicksort
#ifndef BOOST_MULTIKEY_QUICKSORT_HPP
#define BOOST_MULTIKEY_QUICKSORT_HPP
#include "detail/insertion.hpp"
#include <deque>
#ifndef BOOST_NO_FASTCALL
#define BOOST_NO_FASTCALL
#define callingconv __fastcall
#else
#define callingconv
#endif
==> First of all, __fastcall is a nonstandard extension to C++ and most likely
does not belong in a general library. In any case, it should not be enabled by
default. Also, your logic here does not make sense. You should not #define
BOOST_NO_FASTCALL right before implementing fastcall!
namespace boost
{
namespace detail
{
template<typename T, typename Holder>
class qobjclass
//I use a class because only class templates can be
//partially specialized
{
public:
inline static const T& qobj(typename const Holder& a, int d)
{
return a[d];
}
};
==> The use of typename in this context is incorrect. You only need to use
typename to clarify that a dependent name must be a type. In this context,
Holder has to be a type. it looks like you added typename everywhere because you
don't fundamentally understand what it is for. I haven't noted all of the
mistakes below.
==> You should not use int to index an array; std::size_t is the correct type to
use to index an array, lacking any other information.
==> There is no need to type "inline" when you define a method inside a class
definition, as it is already implied. When you do add it unnecessarily, it
mainly just suggests that you intend your code to be "fast", but you don't
understand what that means.
==> If a class has only public members, just make it a struct to signify your
intent.
template<typename T>
class qobjclass<T, typename std::deque<T>::iterator>
//I use a class
//because only class templates can be partially specialized
{
public:
inline static const T& qobj(
typename const std::deque<T>::iterator& a,
int d)
{
return *(a + d);
}
};
==> What is the point of this? This is fully equivalent to the general template
already. And why is std::deque appearing at all in a *generic* sorting library?
Even if deque is somehow so special, why do you only support std::deque<T,
std::allocator<T> > ? What about other allocators?
==> You seem to be under the impression that there is a difference between a[d]
and *(a+d). There isn't. Again, the issue here is that you are more concerned
with ill-motivated optimizations than with writing a good algorithm in the first
place.
==> Iterators should be passed by value.
==> It looks like instead of your previous class, what you are trying to say is
that Holder should be a Random Access Iterator with value_type T. You should
phrase this as follows:
template<typename RandomAccessIterator>
class qobjclass {
typedef std::iterator_traits<RandomAccessIterator> traits;
public:
static typename traits::value_type const& qobj(
RandomAccessIterator i,
typename traits::difference_type d)
{
return i[d];
}
};
==> With this formulation it is clear that your qobjclass is fully redundant,
since all it does it provide a synonym for operator[] for a generic random
access iterator. You need to re-organize the code below to be templated on a
random access iterator type. Whether it is a pointer or a std::deque::iterator
or something else is not your concern.
//I need all of these special functions because I cannot do
//partial specializations for function templates, but I need
//to simulate it
template<typename T>
inline T& _get(typename T* a, int d)
{
return a[d];
}
==> Again typename is incorrect here, and this function is redundant, since this
code would compile for any random access iterator, not just a pointer. Also,
don't use identifiers beginning with an underscore, as the rules for when these
are reserved and when they are not reserved are too hard to remember. The whole
point of putting something in namespace detail is that you don't have to mangle
the name by hand.
template<typename T>
inline T& _get(typename std::deque<T>::iterator& a, int d)
{
return *(a + d);
}
==> This specialization is already equivalent to the generic version, which is
already unnecessary.
#define qobj(a, b) qobjclass<KeyHolds, Key>::qobj((a), (b))
#define get(a, b) _get<Key>((a), (b))
==> These macros do not save enough typing to be worth the evilness of using
them. Also, if you do use them, you should use the boost naming conventions and
you should #undef them at the end of the file. But don't use them.
template<typename KeyHolds, typename Key>
void callingcov mkquicksort(
typename Key* a, int l, int r,
int bit,typename const KeyHolds& term)
=> You don't seem to understand how to make this code generic at all. The whole
point of the templates is to allow any random access iterator, not just a
pointer.
{
if(r <= l) return;
register int i = l - 1, j = r, d = bit;
typename const KeyHolds v =
qobj(get(a, j), d);//speed increase with registers
==> I'm pretty certain it's been a good many decades since the register keyword
held any meeting at all. With today's modern compilers, it's probably just as
likely to confuse the optimizer and make things worse, but in all likelihood
it's fully equivalent to whitespace. Do you have any evidence to back this up
whatsoever? What makes you think you can tell better than the compiler what
should be in a register?
==> At this point, I stopped reading, as I don't know enough to evaluate your
algorithms, but I do know enough to know that this code needs to be redesigned.
It's not just a matter of a few details here and there; it's a matter of a
dramatic increase in your understanding of how to write a generic library in
C++.
-Lewis
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk