Boost logo

Boost :

Subject: Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30
From: Thorsten Ottosen (tottosen_at_[hidden])
Date: 2017-10-04 09:45:40


Hi Ion,

Thanks for the detailed review. Some minor comments follow.
>
> The library presents two containers, devector and batch_deque. I find
> the first one as a relatively simple but interesting vector extension
> supporting front insertions. In the future it could maybe take advantage
> of the backwards memory expansion (taking the previously adjacent free
> memory segment so that new elements could be added to front without
> moving the rest of the elements) supported by Boost.Interprocess and the
> Boost.Container’s modified DLMalloc.

Interesting idea.

>
> ----------
> devector
> --------
>
> 1) From a design point of view, I would prefer not to include the small
> buffer optimization with the data structure, although reviewing code, it
> seems many parts are to be optimized.

Everything should be removed by the optimizer; if not, then that section
of code needs to be amended with a constant expression.

> It's like just having small_vector
> without having vector, most libraries have two classes for these two use
> cases. Different classes also ease that a type erasured base class can
> be defined independent from the static storage size (Boost.Container
> calls it small_vector_base, LLVM calls it SmallVectorBase)  I think we
> should have a devector, and add small_devector and static_devector
> containers if there is demand.

Can they share some of the implementation, or are we left with a double
(or triple) implementation effort?

> 2) Defaulting size_type to unsigned int is arbitrary and against
> existing practice.

We didn't do that as an arbitrary decision.

> According to the usual STL rules, size_type shall
> come from the allocator, which knows what type can represent the number
> of elements that can be allocated.

I have no problems with STL rules, but it as you point out, even
Boost.Container deviates from several of those.
>
> 3) reserve_only_tag: I would rename it as “reserve_only_t” (similarly to
> std::allocator_arg_t and Boost.Container’s “default_init_t”). The front
> and back capacity version should be enough as I think  devector should
> not show preference for back insertion (if a user wants only a back
> capacity, then it would use a vector and not a devector).

So you would prefer it to divide the buffer in two approximately equal
chucks? Or that one has to specify each chunk size explicitly each time
(so full symmetry, and the one-sided constructor gone)?

> Strictly
> speaking, capacity constructors are a bit redundant, but the same could
> be said about default constructing constructors:
> //it could be implemented as default constructor + resize
> explicit vector(size_type n, const Allocator & allocator = Allocator());

Yes, just like push_back is redundant when we have insert( iter, x ) :-)

> A reserve-only constructor plus an “unsafe” emplacement could make a
> container similar to an array, where no redundant checks are done when
> filling the array. But I don’t expect noticeable performance gains with
> this reserve-only constructor.

No, it is for convenience.
>
> 4) unsafe_uninitialized_tag: I don’t like these non-safe constructors,
> which are  the cases?. With reserve-only constructors + unsafe emplace
> back a user can achieve nearly the same performance. For POD types (like
> the socket example mentioned in the documentation), I would suggest
> something like Boost.Container’s “default_init_t”. Instead of
> “unsafe_uninitialized_resize_front” I suggest resize_front(size_type,
> default_init_t)

I didn't know about these default_init_t when we started in 2015. They
have a good design IMO. But unfortunately they don't get rid of double
initialization for non-trivial types. So maybe both

resize_front( size_type, default_init_t )
resize_front( size_type, uninitialized_t )

would be an idea.

> 5) reserve() / resize(): I think they have no sensible meaning in the
> container (capacity() could be useful, depending on the reallocation
> strategy). We should not treat back insertion as the “default”
> insertion. IMHO the API of devector should be symmetrical.

Symmetry is good, but what about generic code?

> 8) Regarding "move_if_noexcept", Boost.Container does not use it as it
> consider a pessimized design for a situation (throwing moves) that will
> occur nearly never
[snip]
> devector itself would be copied if
> small_buffer_size is used, etc.).

only when it stores elements with a non-noexcept move though.

> 9) Minor comment 2: IMHO I think iterators would be better as classes
> instead of pointers, a user can always use data() to make sure pointers
> are returned. Many times unwanted overload resolutions can happen if
> pointers are used, so I prefer the type safety offered by a class-based
> iterator. It can also offer checked iterators in the future.

I think you have a way to turn these into pointers again in
Boost.Container, By defining some preprocessor symbol, right? Having
pointers matters when interfacing with other types, because this is the
only case where the iterators are seen as contiguous.

I would much rather see that in debug mode the iterator type is a class
and in release mode it is a pointer compared to having extra
preprocessor symbols.

> 2) Segment_iterators: Raw pointers are stored in segment iterators. That
> means that those could not be placed in the memory that the allocator
> allocates, if the allocator does not defined ::pointer as T*.

There is definitely some work relating to allocators and such.

> Also, it seems strange that there is not segment_type/segment_info
> class. The iterator itself has members that return the address and the
> size of a segment. I think that’s unexpected, the expected segment
> iterator would return a reference to a class (e.g. segment_type) that
> represents the segment,

I think we discussed that design at some point. Benedek, can you
remember why we went in the other direction?

> 3) stable_insert: I can’t see the usefulness of this operation. A
> batch_deque is a sequence, but the user cares about the insertion point
> but does not care if additional elements are created somewhere? I fail
> to see a real use case.

A deque is a sequence, sure, but also a container with reference
stability. I'm curious, when you use Boost.Intrusive, where do store the
elements?
> --------------------------
> devector
> --------------------------
>
> emplace_slow_path: The forced temporary creation should be avoided, it
> is overkill for containers of containers, devector<static_vector> or
> containers with sentinel nodes. IMHO supporting references to already
> included elements is a bad feature of std::vector (a corner case
> supported in single element insertion, ignored for ranges, etc.)
> Boost.Container chose not to support it and no one has demanded that
> feature:
> http://www.boost.org/doc/libs/1_65_1/doc/html/container/Cpp11_conformance.html#container.Cpp11_conformance.container_const_reference_parameters

Yeah, I remember heated discussion about this in past. I'm not a fan of
that feature of the STL containers.

>
> Reallocation strategy: First, I considered this strategyunnatural, but
> now I am not sure. For a push_front, if no free front capacity is found,
> but there is free back capacity, then all data is shifted to insert the
> element. If interleaved front and back insertions are performed, I don’t
> know if this will lead to too many data movements. When memory is
> reallocated to make room for a front insertions, the free back capacity
> is unchanged and all new free capacity goes to front. This means that
> the last operation (front or back) determines where all the new capacity
> goes. I find a bit strange that N == front_back_capacity() does not mean
> that a reallocation will happen if we push_back N times, as repeated
> data movement will happen if front_free_capacity() is available.
>
> Another strategy is to treat both ends independently: if front insertion
> has no free space, it could always reallocate and make room at front, so
> further push_fronts don’t provoke data movement. This would be similar
> to batch_deque, where memory is allocated if there is no free front
> capacityregardless of the free back capacity  (no data movement),f and
> free segment is recycled from back capacity to insert elements at front.
> Obviously, this approach consumes more memory. Yet another strategy
> would be distribute any new free capacity (after reallocation) between
> front and back, indepently from which operation (back or front) has
> provoked the allocation.

This is tricky. I think the best is to treat both ends independently.
If you start moving things around, you can break the O(1) amortized
guarantee by interleaving push_front/push_back.

> 2) The data structure o batch_deque is slightly bigger than the
> traditional one, just asking if it’s intended or it was developed like
> this to reuse code. Traditional deque (from SGI) used four elements (map
> pointer, map size, iterator to begin, iterator to end). Your
> implementation defines the map as a devector, which stores size and
> capacity for the map, but this allows you  handle the excess capacity of
> the map, I’m not sure if it adds performance benefits.

Reuse & simplicity would be the answer. If the iterators are fat, I bet
you get the same size as as a batch_deque.

Having a secondary ad hoc data structure with amortized O(1) double
ended growth doesn't exactly make the code easier to reason about.

kind regards

Thorsten


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk