Boost logo

Boost :

From: Hervé Brönnimann (hervebronnimann_at_[hidden])
Date: 2007-12-10 21:26:59


Daniel: First, thanks for your reply. And sorry if I came across as
overly critical. I think your library is excellent. The point of
the review process is to try to make it even better :) In fact, if
we could make the C++ standard better in the process, I'd be delighted.

As I've said in my review, but to be clear I'll say once again, none
of these items under discussion below are contingent to my
(enthusiastic) vote in favor of your library. They are merely for
your consideration.

On Dec 10, 2007, at 3:54 PM, Daniel James wrote:
>> I find it unfortunate that it's not defined in the standard. Howard
>> gives the same implementation I would have given, which works in O(N)
>> time.
>
> And only works for unordered_set. How would you define equality for
> unordered_multimap? Do you compare the mapped values, does the order
> of the values matter? If it doesn't then the comparison becomes more
> expensive, if it does then it's a bit odd for an 'unordered' container
> (although this implementation does preserve insertion order, if it
> didn't then things would be worse). Is it okay to define equality in
> terms of the equality predicate for keys, but operator== for values?

It works equally well with a little tweak if you require equality of
count(key) for the multiset. I didn't think about map/multimap,
thanks for pointing this out. But Daniel, what exactly does
operator== for std::multimap do??? In fact, because insertion in
std::multimap inserts at the end of the range, the value of
std::multimap depends on the insertion order. You wouldn't be doing
anything different with unordered_multimap. And as you mention, your
implementation does already preserve insertion order, so you wouldn't
have any problem to supply these extensions, and they'd be well-
defined. Isn't it better if you can provide stronger guarantees
than the current draft standard, just as you already do for exceptions?

Now the standard should really define what happens when inserting
equivalent keys in unordered_multimap, just as it does for multimap,
but we're not fixing the standard. I'll raise that point on another
mailing list if I get around it. As I understand, it depends on
whether you use singly- or doubly-linked lists for chaining, and I
buy your argument as to why use doubly-linked lists for multiset. I
think you made good points there.

As for the equality comparison vs predicate, the same again is true
for operator== for the standard associative containers, defined as
a.size() == b.size() && std::equal(a.begin(), a.end(), b.begin()
[sic] (no, *I* didn't forget the final paren, it's not on page 671
either). Nowhere in that makes use of the "equivalence" implied by
operator<, which is used for uniqueness in set/map and equivalence in
multiset/multimap. So by defining equality in terms of the equality
predicate for keys in order to check for the uniqueness property, but
operator== for both keys and values in the implementation of
operator== for unordered_map/multimap, you wouldn't exactly break new
ground.

Or am I missing some subtle implications or other clause in the
standard?

>> I found also the following bogus note in the Working Draft:
>>
>>> 10 Unordered associative containers are not required to support the
>>> expressions a == b or a != b. [ Note: This is because the container
>>> requirements define operator equality in terms of equality of
>>> ranges. Since the elements of an unordered associative container
>>> appear in an arbitrary order, range equality is not a useful
>>> operation. — end note]
>>>
>
> That's not bogus at all. An equality operator for unordered containers
> would have completely different semantics to all other containers
> (especially when using custom comparisons, comparing associative
> containers can have surprising results).

Why not? As I argue above, the same is already true of the ordered
associative containers. I don't think you'd be providing anything
more surprising than already present in the standard.

As for surprising semantics, a set and multiset have well-defined
notions of value (in a mathematical sense), i.e., a subset from a
ground set of key, which is exactly what operator== compares. For
map and multimap, the value is a function from a ground set of keys
to a value for map, sequence of values for multimap (sequence means
that order matters). The operational definition in terms of range is
simply an algorithm for comparing those values. Exactly the same is
true for the values of unordered containers; operator== as Howard
and I've defined them precisely perform this comparison of value. I
find nothing surprising in that. The fact that operator== is defined
in terms of ranges for sequences and ordered associative containers,
and is defined/implemented differently for unordered containers, is
not surprising, it's necessary. Again, I believe the standard could
be improved on this, and wouldn't your library be the natural vector
to provide a reference implementation for a proposal to define
operator== for unordered containers?

> "Right now, I'm trying to limit myself to obvious design decisions for
> extensions so I don't really want to go into this area.

Fair enough. You gave good points. I hope my counter-arguments are
equally relevant, but you're of course free to not implement them.
Still, as boost explores possibilities for later standardization, it
could be hoped that you would provide an implementation that provides
prior art for future discussion in the committee. Whether you or
someone else (perhaps me, I really would like to) would make the
proposal, is for the future.

Still, I think there's only one sensible definition for
unordered_set, and unordered_multiset, and the multiset version isn't
trivial either. Would be nice if at least those werre in the
library, perhaps with a named function instead of an operator. Would
it be a problem if you at least provided those?

>> The prime number list is too short, it roughly doubles every time.
>> On the other hand, I understand that it wouldn't do that have a
>> *much* larger list, as this static array is included in every
>> translation unit.

> I think it can actually cause an ODR violation so I probably should
> change that anyway. David Abrahams recently posted a way to avoid this
> by including the value in a template, so I'll try that.

In theory, yes. In practice, linkers don't really enforce it, I
hear. Having it as a template doesn't solve the basic problem, which
is the replication of identical data in many translation units. In
fact, it might exacerbate it by having the same table defined
multiple times in the same translation unit! Where I work, we're
trying to curb executable sizes (which can be enormous, like you
wouldn't believe). This is precisely what would get a frown at the
code review.

> How much of a difference would it make? The memory requirements for
> the array of buckets is typically smaller than the memory required for
> the nodes (unless you have a small number of nodes or increase the
> maximum load factor). T

You're correct, unless people like to have very large maps of small
types (unordered_map<int, some_type*> with 10M elements) where it
starts to really matter. On the other end, it also matters if a user
has lots of small maps. I think it can really be a problem, in fact
I have been in that situation myself (having to fix the prime number
table of our implementation of hash table, by request of one user
with performance-critical needs). So here I actually speak from
practical experience.

I believe the best thing, is to move the table to the .cpp, but I
also I totally understanding the appeal of header-only libraries.
Would you consider a configuration flag, whereby unless explicitly
defined, your table is in the header, but for a user which controls
its own site installation, a table can be generated into a .cpp and
compiled into a library to be linked into programs using
Boost.Unordered? I actually did write such a piece of code, it isn't
hard, and you can specify the step (30% or 100%) and the range of the
desired table. If you include this code into the example section and
make the flag public, your users can choose what they prefer based on
their requirements.

Then again, if they're that sophisticated, maybe they can do it
themselves.

>> Re: swap: I really think you should be following the committee's
>> current advice (I'm looking at the code of hash_table_impl.hpp, lines
>> 1194 -- 1219), sooner rather than later. But my opinion is biased.
>
> Why is your opinion biased?

My boss is John Lakos, and my colleague Pablo Halpern. They both
have strong opinions on allocators and value-semantic types. In
fact, they're arguing exactly about these issues with the committee
(N2436, N2387, N1850 ). I've been converted myself :)

> That works both ways, as the committee could change its mind. Since we
> don't have concepts, following their current decision means slow
> swaps. Which isn't that bad since it's only for the rare occasions
> when the allocators aren't equal. I haven't got a strong preference
> here either, but this was discussed before and the general consensus
> seemed to be for what I've done.

What I hear from Pablo and John is that they're getting good support
around their proposal (ok, they're biased too :)... Maybe you're
right to wait and see what the committee will do. Hopefully, won't
take long as C++0x is around the horizon and there'll be two meetings
in Jan. and June. Somehow, I didn't think that Pablo's proposal
(N2436) required concepts, but required traits instead. But I'm not
intimately familiar.

--
Hervé Brönnimann
hervebronnimann_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk