Boost logo

Boost :

From: Joaquín Mª López Muñoz (joaquin_at_[hidden])
Date: 2004-03-29 06:05:29


Hello David, thanks for the effort of reviewing indexed_set
(and seems you made a thorough review indeed!)

"David B. Held" ha escrito:

> *** What is your evaluation of the design?

[...]

>
> It's too bad that you have to specify the type of the member when
> using member<>, but I don't see a way to infer it without having
> measurable runtime overhead.

This is a recurrent request, but afaik noone has come up with a
solution.

> The range() member is very clever and is a good use case for
> Boost.Lambda, IMO.

Credit here goes entirely to Pavel, who pushed me to include this
memfun --I was reluctant to add non-standard features.

> It seems to me that modify() need not have the destructive policy
> that it does. It appears that modify() erases the node from the
> index, then reinserts it into the appropriate place. However, it seems
> to me that modify() could first look to see if the modified node
> already exists, save the iterator to the correct insertion point, and
> then only unlink the node if the modification would be succesful.
>
> So the strategy would be like this:
>
> 1) Change the modifier function object to require a value() member
> that gives the new value to be used.
> 2) Instead of calling mod() right away, call something like
> lower_bound(), using mod.value(). Save the iterator if there isn't
> already an element there that conflicts.
> 3) Perform the erase as usual, and then do mod(), link2(), etc.

Ummm... If I followed your approach I would be incurring two
extra copies which is precisely what modify() is meant to avoid.
For non-destructive updating you can use update(), that's the
idea. If I'm not making myself clear please tell me so.

> One thing that I think should be added (though not affecting my
> vote) is ranked indexes. So just as an index might be unique or
> non-unique, it can be ranked or not ranked. Ranking gives an
> index an O(log n) ordinal search. That is, you can search for the
> ith item according to a certain index in O(log n) time, given an
> ordered index (obviously, the sequenced indexes would have an
> O(n) search time). This does change the structure of the nodes,
> and necessarily makes them bigger for ranked indexes. So it is
> not a trivial change. However, I am sure that it is doable and would
> add value to the library.

Yep. The idea of ranked containers came up recently, as introduced
by Peter Palotas. I'll add this to the future work section. The most
wanted new features seem to be (in this order) serialization,
hashed indices and ranked indices.

> *** What is your evaluation of the implementation?

[...]

> Some of the lines are > 80 columns (e.g.: the copyright notice).

Is this strictly required? If so I'll go thru the code and make the changes.
If not, I'd rather leave it like that.

> I think that BOOST_NO_MEMBER_TEMPLATES should be
> deprecated. I don't think there's a single supported compiler in
> the Boost compiler configs that defines this any more. I think
> checking against BOOST_MSVC6_MEMBER_TEMPLATES is
> sufficient, but it would be nice to not even have to check against
> that.

<off topic> It'd be nice if Boost.Config provided a (possibly
automatically generated) reference table showing which compilers
do have a specific defect macro.</off topic>

Would it be OK to you if we delayed the remotion of this macro till
the next dev cycle? I fear I could be inadvertently missing an
otherwise supportable compiler.

> I do believe the parts that come from other code need to retain the
> original copyright. The parts that were inspired by, but not directly
> derived from, another implementation probably don't need a
> copyright notice, but should probably give a mention in the copyright,
> at the least as a matter of professional courtesy.

No problem. Apart from stl_set.h I borrowed nothing from other
sources. Will add the (c) where appropriate.

> *** What is your evaluation of the documentation?

[...]

> Here is a somewhat minor nitpick. In the tutorial, the index_list is
> specified as so:
>
> (type of index)<[(tag)[,(key extractor)[,(comparison predicate)]]]>
> (type of index)<[(key extractor)[,(comparison predicate)]]>
>
> I would change it to thus:
>
> (unique | non_unique)<[(tag)[,(key extractor)[,(comparison
> predicate)]]]>
> (unique | non_unique)<[(key extractor)[,(comparison predicate)]]>

Yes, why not. Consider it changed.

> It would be nice to be able to read through the documentation like
> a manual. Thus, next links on each page would be appropriate.
>
> On the performance page, you state that the tests were performed on
> a system with 260 KB "RAM memory", running W2K. I'm absolutely
> certain that is not correct. ;) "256 MB RAM" sounds a lot more likely.

You're absolutely correct. Fixed. Thanks.

> Something that I think should be added to the documentation is a
> concept page that lists the main concepts in IndexedSet and a concept
> interface a la the SGI STL reference (www.sgi.com/tech/stl).

More on this later down the post.

> *** What is your evaluation of the potential usefulness of the library?

[...]

> *** Did you try to use the library? With what compiler?

[...]

> *** How much effort did you put into your evaluation? A glance? A
> quick reading? In-depth study?

[...]

> *** Are you knowledgeable about the problem domain?

[...]

> *** Do you think the library should be accepted as a Boost library?
>
> Yes, most definitely.

Great! The score is 9-0 so far.

>
> *** Naming issues:
>
> Here's my take on names, but should not be construed as affecting
> my vote for inclusion...
>
> Namespace: I think too many namespace levels are not helpful.
> A separate 'container' namespace seems a bit superfluous to me.
> There is no danger in name clashes within Boost between container
> libraries and other kinds of libraries that I can see, and I don't see
> that it provides a significant documentation benefit. I think additional
> namespace depth is most appropriate for collections of algorithms,
> where a proliferation of names is more likely to lead to name
> collisions.

Well, I stated previously that I'll do what people agree on. I don't
know if the final decision should be made by Pavel as the review manager.
IMHO, qualified identifiers will be so long anyway that having an
additional ::container:: won't hurt much.

> Library name: I don't think IndexedSet is so bad, but an alternative
> that I like is CompositeSet. Then there could be a Composite
> Sorted Associative Container concept which would support the
> notion of containers with multiple keys. Then CompositeSet
> would be a model of Composite Sorted Associative Container.
> A synonym for Sorted Associative Container could be Simple
> Sorted Associative Container or Trivial Sorted Associative
> Container.
>

The main problem with "Set" slipping into the name of the container
is that, for instance, the following

indexed_set<int,index_list<sequenced<> > >

does not have anything to do with sets (it resembles a std::list of
constant ints.) For this reason I like Multicontainer better (or something
along this line.)

<off topic>Reviewers have showed little or no interest in sequenced indices.
Seems weird to me, as they can be used to construct a seemingly useful
list-like container with fast lookup.</off topic>

As for the Composite Sorted Associative Container concept, we have the
same problem. Technically, the requirements imposed to current or
future index types by indexed_set are much weaker than being some
sort of associative container. I cannot even state global complexity
bounds (please check the reference, page "Index reference", section
"Complexity signature").
I agree with you a concept section would be highly desirable, but here
I definitely need some help. What you request is not trivial.

> Indexes: According to the American Heritage Dictionary, Fourth
> Edition, as channeled by www.dictionary.com, "indexes" is
> preferred to "indices". Another spelling metric I like to use is
> Google. According to Google, "indexes" returns 5.8 million
> hits, while "indices" returns 5.5 million hits. So that's only a
> 5% difference, but it's statistically significant. ;) I personally
> think the preferred spelling should be used throughout the
> documentation, but that's more of a nitpick than a criticism.

My Latin (I speak Spanish) heritage makes me favor "indices"
to "indexes" (and "minuscule" to "miniscule", etc.)
The Google metric could be biased by the fact that "indexes" is
not only the plural of "index" but also the 3rd singular formal
of the Present tense of the verb "index".
May people allow me to retain "indices"? If only as a small
concession to those of use who studied and love the
olde Latin language.

> Anyway, I prefer "ordered index" to "regular index". As far as
> index types go, I prefer "ordered_unique" to "set_like", due to
> consistency. I think Ordered Index should be a concept for
> this library.

Others do also like ordered_unique etc. I do too.

> nth_index_type: I don't like the word "nth", personally. I suggest
> either of the following naming schemes:
>
> nth_index_type -> ordinal_index
> index_type -> named_index | tagged_index
>
> nth_index_type -> index_byval | index_bynum
> index_type -> index_byname | index_bytag
>
> Generalize appropriately to the iterator types.

I recently proposed the following

template<int N> struct nth_index;
template<typename Tag> struct index;
template<int N> struct nth_index_iterator;
template<int N> struct nth_index_const_iterator;
template<typename Tag> struct index_iterator;
template<typename Tag> struct index_const_iterator;

which seems to me good enough. Don't you like it?
The nth_ prefix has some tradition in C++, why not use it here
too?

> update(): Actually, I do prefer "replace()" over "update()", because
> it does more accurately reflect the operation.

Me too.

> As far as header organization goes, I would think of it this way:
> What are the major features and concepts of the library, and
> how will they be used separately or together? Organize the
> headers in a way that allows users to include feature sets as
> modules. If you have to, create dummy headers that just include
> sets of headers that are logically dependent so that users can
> activate a certain feature by including that header. Also,
> consider the cost of each header. It is worth the extra effort
> to isolate expensive headers than cheap ones. Other than
> that, I don't have any particular suggestions.
>

Well, I'm gathering all the feedback from the reviewers wrt to
naming and header organization and will try to come up with some
consistent proposal today or tomorrow.

Thank you again for taking the effort to review the library. I hope
you can find it useful in the future. (Here goes a challenge: what
about writing a multiply indexed map on top of indexed_set?
You're the map expert here at Boost ;) Please check the Future
work section for a preliminary discussion on the subject of
indexed maps.)

Best,

Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk