Boost logo

Boost :

Subject: Re: [boost] [Containers] Review
From: Ion Gaztañaga (igaztanaga_at_[hidden])
Date: 2011-08-11 02:49:23


El 10/08/2011 23:26, Phil Endecott escribió:
> Dear All,
>
> Following are some comments about Ion's proposed Boost.Containers. I
> have not really reviewed the code, but frankly that doesn't seem
> necessary: Ion knows what he's doing, much of this code has already been
> in Boost for a while as part of Interprocess, and much of it is designed
> to implement standard interfaces. I'll confine my comments to a few
> words about the documentation, and some observations about the "flat"
> containers.

Thanks for your time.

> Regarding the documentation, I believe that it would benefit from:
>
> - The "Acknowledgements" page is currently empty. There are various (c)
> notices in the source referring to sources from which parts of the code
> may have come. This page should clarify these sources, including any
> cases where the original code has now all been replaced, and assert that
> the licenses are all in order. It should also point out the history of
> the code in Boost, i.e. that parts were previously in
> Boost.Interprocess, so that anyone with previous exposure to the
> Interprocess containers is not confused.

Ok.

>
> - A few lines of rationale for stateful allocators would be useful, i.e.
> pointing out that although the std::allocator has a typedef for pointer
> the containers can assume it is T*, and the implications of that. (I
> dealt with all this back in about 2004, and it took me a long time to
> get up-to-speed with the issues.)

Ok, this already has been requested.

> - Similarly, some rationale for string (and pointing out how it differs
> from std::string in the common implementations) would be useful. (Now, I
> can't even find where it says that it is non-COW. I know that I've seen
> that written somewhere. A page specifically about string shown on the
> top-level table of contents would be appropriate.)

Ok.

> - The Doxygen reference docs seem to jump to the header docs from the
> table of contents; I think it would be more useful to jump primarily to
> the classes. Is this the same issue that I saw in the last review?

Doxygen reference is auto-generated by Quickbook, but I'll check what we
need to do to improve this.

> Regarding the "flat" containers, I've previously posted about this in a
> couple of threads on this list.
>
> I have used this sort of structure in two applications recently:
>
> - In the first case, I store read-only (key,value) data in a binary
> file. I have a utility that prepares this file offline by inserting into
> a vector<pair<>>. I then sort it, and save the contents of the vector in
> the file. In the program that uses the data, I memory-map the file. I
> then have an adaptor class that takes a pair of pointers to the begin
> and end of the mapped data and permits read-only access like Ion's
> flat_multimap<>.
>
> - In the second case, I have an insert-only key-value data structure.
> New data is inserted in batches. I have a vector<pair<>> and insert new
> data at the end; at the end of each batch, I sort the new values and
> merge them into the body of the vector using std::inplace_merge.
>
> Neither of these cases can be implemented efficiently using the current
> containers. Doing so would need:
>
> - Access to the underlying container, at least for reading.

I think we can get something useful now that underlying implementations
are movable or swappable. Something to extract the implementation,
manipulate it and reinsert it (with some checks in debug mode about
invariants).

> - An "adaptor" version that could wrap a pair of pointers to raw memory,
> or more generally a pair of iterators, at least for reading.

We could write an adaptor taking a read-only random-access iterator
range (boost::range) that could be used as implementation, only
implementing read-only support, so that flat_xxx fails (static_asserts
with "no permission to modify container: read only implementation") at
compile time if you call non-const member functions.

> - Some sort of efficient batched insert mode.

It would be nice if you could measure your inplace_merge strategy with
insert(iterator begin, iterator end) for different input ranges and
already inserted values.

> Regarding the batched insert, we have previously discussed the
> complexity of inserting a range. Fundamentally, however, I don't think
> it is acceptable to have efficient insertion ONLY in the case of the
> insert(begin,end) method: it is too common to want to insert several
> values in a loop without creating a temporary container. I'm not aware
> of any established (i.e. std:: or boost::) pattern for doing this:

I agree we should find a solution. However, I think this shouldn't stop
flat_xxx adoption, since it's already quite useful in its current form
as a drop-in replacement for set/map. We can work in the mailing list
until we find a satisfactory solution.

> class flat_container {
> void insert(val); // appends at the end of the vector
> void commit(); // merges new values into the body of the vector
> };
>
> - Between calling insert() and calling sort(), what do lookups do?
> Undefined? Only look at old values? Sort-on-demand?
> - Is there also an "insert one" method? How are the variants named?
> - What about erase()?
> - How about a "batch" or "transaction" object?:
>
> flat_container c;
> {
> flat_container::insert_batch b(c);
> b.insert(data1);
> b.insert(data2);
> } // b's dtor merges new values.

Sounds very interesting, insert_batch could reuse memory from the
container, we could even "suggest" the insertion range in the
constructor (so that we can "reserve" in the flat container). each
insertion could write at the end of the container ("reserved" memory)
destroying already constructed objects in case reserve fails and we have
already inserted some values (we could also "move" them, right?). In the
destructor we can just use your inplace_merge algorithm, if that's more
efficient that current insert loop.

> A further interesting case is when the new values are already sorted,
> because insertion can be more efficient in this case:

So similar to constructors taking already sorted ranges, right? We can
also add these "extensions" to map/set, although I'm not an expert in
trees so we'd need to rebalance the tree for each insertion. But I guess
that with help from boosters we can find an efficient batch insertion
for tree-like containers ;-)

Thanks for your valuable input!

Ion


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk