Boost logo

Boost :

Subject: [boost] [Containers] Review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-10 17:26:55


Dear All,

Following are some comments about Ion's proposed Boost.Containers. I
have not really reviewed the code, but frankly that doesn't seem
necessary: Ion knows what he's doing, much of this code has already
been in Boost for a while as part of Interprocess, and much of it is
designed to implement standard interfaces. I'll confine my comments to
a few words about the documentation, and some observations about the
"flat" containers.

Regarding the documentation, I believe that it would benefit from:

- The "Acknowledgements" page is currently empty. There are various
(c) notices in the source referring to sources from which parts of the
code may have come. This page should clarify these sources, including
any cases where the original code has now all been replaced, and assert
that the licenses are all in order. It should also point out the
history of the code in Boost, i.e. that parts were previously in
Boost.Interprocess, so that anyone with previous exposure to the
Interprocess containers is not confused.

- A few lines of rationale for stateful allocators would be useful,
i.e. pointing out that although the std::allocator has a typedef for
pointer the containers can assume it is T*, and the implications of
that. (I dealt with all this back in about 2004, and it took me a long
time to get up-to-speed with the issues.)

- Similarly, some rationale for string (and pointing out how it differs
from std::string in the common implementations) would be useful. (Now,
I can't even find where it says that it is non-COW. I know that I've
seen that written somewhere. A page specifically about string shown on
the top-level table of contents would be appropriate.)

- The Doxygen reference docs seem to jump to the header docs from the
table of contents; I think it would be more useful to jump primarily to
the classes. Is this the same issue that I saw in the last review?

Regarding the "flat" containers, I've previously posted about this in a
couple of threads on this list.

I have used this sort of structure in two applications recently:

- In the first case, I store read-only (key,value) data in a binary
file. I have a utility that prepares this file offline by inserting
into a vector<pair<>>. I then sort it, and save the contents of the
vector in the file. In the program that uses the data, I memory-map
the file. I then have an adaptor class that takes a pair of pointers
to the begin and end of the mapped data and permits read-only access
like Ion's flat_multimap<>.

- In the second case, I have an insert-only key-value data structure.
New data is inserted in batches. I have a vector<pair<>> and insert
new data at the end; at the end of each batch, I sort the new values
and merge them into the body of the vector using std::inplace_merge.

Neither of these cases can be implemented efficiently using the current
containers. Doing so would need:

- Access to the underlying container, at least for reading.
- An "adaptor" version that could wrap a pair of pointers to raw
memory, or more generally a pair of iterators, at least for reading.
- Some sort of efficient batched insert mode.

Regarding the batched insert, we have previously discussed the
complexity of inserting a range. Fundamentally, however, I don't think
it is acceptable to have efficient insertion ONLY in the case of the
insert(begin,end) method: it is too common to want to insert several
values in a loop without creating a temporary container. I'm not aware
of any established (i.e. std:: or boost::) pattern for doing this:

class flat_container {
   void insert(val); // appends at the end of the vector
   void commit(); // merges new values into the body of the vector
};

- Between calling insert() and calling sort(), what do lookups do?
Undefined? Only look at old values? Sort-on-demand?
- Is there also an "insert one" method? How are the variants named?
- What about erase()?
- How about a "batch" or "transaction" object?:

flat_container c;
{
   flat_container::insert_batch b(c);
   b.insert(data1);
   b.insert(data2);
} // b's dtor merges new values.

A further interesting case is when the new values are already sorted,
because insertion can be more efficient in this case:

void insert_sorted(iter first, iter last)
{
   resize vector to make space for new items
   for each new item in reverse order {
     find the position where it should be inserted using binary chop
     move old items after that point forward
     move the new item in behind them
   }
}

I could imagine a general insert_batch and a specialised
insert_sorted_batch, for example.

Anyway, this is all just food for thought really. It would be
interesting to hear from other people who have used data structures
like this. In my own code I have taken a rather ad-hoc approach, but
that is the nature of custom vs. library development: to be useful, a
library needs to try to second-guess the requirements of all potential
users, yet do so without becoming too complex for anyone to learn.

Hopefully there will be some more discussion about some of this, but as
far as the review is concerned we should allow Ion to decide what he
eventually commits.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk