Boost logo

Boost :

Subject: Re: [boost] Interest in a container which can hold multiple data types?
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2015-05-05 12:58:50


M.a. wrote:
>> On 05 May 2015, at 17:17, Boris Rasin <boris_at_[hidden]> wrote:
>>
>>> On 5/5/2015 1:12 PM, TONGARI J wrote:
>>> 2015-05-05 17:21 GMT+08:00 Boris Rasin <boris_at_[hidden]>:
>>>
>>>>> On 5/5/2015 9:31 AM, Thijs (M.A.) van den Berg wrote:
>>>>>
>>>>> tuple_vector<Point, Line, Rectangle, Circle> shapes;
>>>>>>> shapes.push_back(Point{1.0, 1.0} );
>>>>>> This is neat, but it' not quite the same thing. Unlike
>>>>>> std::vector<boost::any>, your tuple_vector does not organize objects into
>>>>>> strictly linear arrangement. Using your example, there is no way to draw
>>>>>> shapes in the order they were inserted into container.
>>>>>>
>>>>>> Good point! So it loses overall ordering, only preservers it at the
>>>>> type level. The reason is that objects gets stored in linear memory per
>>>>> type.
>>>> One could conceivably add another internal container to maintain overall
>>>> index. But than again, is such tuple_vector<Point, Line, Rectangle, Circle>
>>>> really better than std::vector<boost::variant<Point, Line, Rectangle,
>>>> Circle>>? It would use less memory per object when some stored types are
>>>> smaller than others, and in some circumstances these savings would exceed
>>>> overhead of multiple internal containers and an additional index container.
>>>> Any other advantages?
>>> If the data sequence shows some affinity, I guess an ordered
>>> tuple_vector-based sequence can provide some performance benefit in
>>> traversal (with some crafted for_each method).
>>>
>>> That is, a sequence of [AAABBB] may be traversed faster than [ABABAB] for
>>> such a container, and I believe in the later case such that the types are
>>> uniformly distributed, vector<variant> will perform better.
>> Why would a sequence of [AAABBB] be traversed faster for such a container compared to vector<variant>? Especially if it would need additional look up in internal index container on every iteration?
>>
> I don't think it makes sense to add an index to an tuple_vector, it's more like an unordered heterogenous container, it'll be good at iterating all "B"s though. Every variant will have pros and cons when implemented correctly.
>
> In general, the challenge imo is iterating heterogenous containers, a specialized for_each will probably be very handy.

Such container would be the most useful for people wanting to fight with
performance degradation caused by some non-perfect memory access
patterns, cache misses, etc.
There are at least two concepts:

1. A heterogenous container/collection grouping objects of the same type
in the same place in memory (homogeneous containers)
This is AFAIU something you're planning to implement.

Take a look at the poly_collection by Joaquín M López Muñoz
http://bannalia.blogspot.com/2014/05/fast-polymorphic-collections.html

2. A semantically homogeneous container breaking large objects into
smaller ones in order to separate the hot and cold data to improve caching
I think tuple_vector would be a good name for such container.

Some time ago I implemented a dirty version full of hacks only to do
some benchmarking:
https://github.com/awulkiew/test-tuple_vector

The goal was to implement a collection semantically the same as vector
of tuples but internally breaking the tuple into elements and storing
those elements in separate containers.
The tricky part is that the reference type of such collection is not a
true reference but a tuple of references, or a similar type which should
semantically work like a reference, with collapsing, etc.

AFAIK such container could also be used if some use case required to
store specific chunks of data in buffers, e.g. in GPU processing where
the GPU must be fed with one type of data.

Regards,
Adam


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk