Boost logo

Boost Users :

Subject: Re: [Boost-users] container with large number of elements and a small number of invalide elements
From: Vincent N. Virgilio (virgilio_at_[hidden])
Date: 2013-11-05 13:34:25


Yes, stable_partition (and probably partition) involve swaps, according to
SGI's doc.

Instead, you might apply one of the above partition functions to a parallel
container of indices, and dereference with the predicate?

Seems like this is steering back to an ersatz version of a boost adaptor.

Vince

On Tue, Nov 5, 2013 at 1:29 PM, MM <finjulhich_at_[hidden]> wrote:

> On 5 November 2013 18:09, Vincent N. Virgilio <virgilio_at_[hidden]> wrote:
>
>> On Tue, Nov 5, 2013 at 12:43 PM, MM <finjulhich_at_[hidden]> wrote:
>>
>>> On 4 November 2013 18:18, MM <finjulhich_at_[hidden]> wrote:
>>>
>>>> On 3 November 2013 22:55, Jeff Flinn <Jeffrey.Flinn_at_[hidden]> wrote:
>>>>
>>>>> On 11/1/2013 7:24 PM, Travis Gockel wrote:
>>>>>
>>>>>> On Fri, Nov 1, 2013 at 1:18 PM, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
>>>>>> <mailto:Jeffrey.Flinn_at_[hidden]>> wrote:
>>>>>>
>>>>>> On 11/1/2013 1:28 PM, MM wrote:
>>>>>>
>>>>>> On 1 November 2013 12:37, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
>>>>>> <mailto:Jeffrey.Flinn_at_[hidden]>
>>>>>> <mailto:Jeffrey.Flinn_at_gmail.__com
>>>>>>
>>>>>> <mailto:Jeffrey.Flinn_at_[hidden]>>> wrote:
>>>>>> On 10/31/2013 1:14 PM, MM wrote:
>>>>>> On 31 October 2013 16:10, Klaim - Joël Lamotte
>>>>>> <mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
>>>>>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>
>>>>>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
>>>>>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>>>
>>>>>> wrote:
>>>>>>
>>>>>> What would "valid"/"invalid" means here?
>>>>>> If it's "exist" or "don't exist", then maybe
>>>>>> using
>>>>>> boost::optional<MyPod> would be enough.
>>>>>>
>>>>>> That's right. How about sizeof(
>>>>>> boost::optional<MyPod>
>>>>>> ) vs sizeof(
>>>>>> MyPOD ) ?
>>>>>> Also, what sort of iterator can iterate over just
>>>>>> the valid
>>>>>> values? or
>>>>>> how to implement one?
>>>>>> The point here is that there's a small number of
>>>>>> elements that are
>>>>>> invalid, their indices in the container are random,
>>>>>>
>>>>>> So why not keep the invalid items from getting into the
>>>>>> container in
>>>>>> the first place?
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> because they are processed at a later stage, their indices in
>>>>>> the
>>>>>> container are relevant,
>>>>>>
>>>>>>
>>>>>> I'd need a better description of just what defines and item to be
>>>>>> valid/invalid. And just what this processed at a later stage
>>>>>> means.
>>>>>>
>>>>>> If you've some functions:
>>>>>>
>>>>>> bool valid(const MyPod& ) { return ...; }
>>>>>> bool invalid(const MyPod& pod) { return !valid(pod); }
>>>>>>
>>>>>> , and a pre-existing container then you can get an adapted view
>>>>>> using boost::range::adaptors::__filtered like:
>>>>>>
>>>>>>
>>>>>> typedef std::vector<MyPod> MyPods;
>>>>>>
>>>>>> MyPods all(...);
>>>>>> MyPods bad;
>>>>>>
>>>>>> boost::copy(all | filtered(invalid), std::back_inserter(bad));
>>>>>>
>>>>>> boost::foreach(all | filtered(valid), someFunction);
>>>>>>
>>>>>> It all depends on the context of where this is used. Another
>>>>>> approach uses the boost icl (interval container library). One
>>>>>> example is:
>>>>>>
>>>>>> http://www.boost.org/doc/libs/__1_55_0b1/libs/icl/doc/html/_
>>>>>> _boost_icl/projects.html#boost___icl.projects.large_bitset
>>>>>>
>>>>>> <http://www.boost.org/doc/libs/1_55_0b1/libs/icl/doc/
>>>>>> html/boost_icl/projects.html#boost_icl.projects.large_bitset>
>>>>>>
>>>>>> I've used a similar approach to just store the "valid" bases in a
>>>>>> dna sequences and keeping track of gaps with an interval
>>>>>> container.
>>>>>> The advantage is avoiding the filtering cost when the application
>>>>>> is
>>>>>> primarily interested in the valid items only.
>>>>>>
>>>>>>
>>>>>> I would recommend checking out Google sparsehash -- specifically,
>>>>>> sparse_hash_set<T>:
>>>>>> http://code.google.com/p/sparsehash/source/browse/
>>>>>> trunk/src/sparsehash/sparse_hash_set
>>>>>>
>>>>>> The biggest difference between that and something like an
>>>>>> std::vector<boost::optional<T>> is google::sparse_hash_set will
>>>>>> re-order
>>>>>> values on you (the nature of hash tables). It also forces you to have
>>>>>> a
>>>>>> "deleted key" which exists in the realm of possible T values. Based on
>>>>>> impression that I get from your original email, this fits with your
>>>>>> idea
>>>>>> of an "invalid" T.
>>>>>>
>>>>>> That said, it is a ridiculously low-overhead container (on the order
>>>>>> of
>>>>>> ~2 bits/entry vs 4 bytes/entry with a boost::optional).
>>>>>>
>>>>>
>>>>> That's 2x10^6 bits for the OP's case, where-as the ICL approach is
>>>>> 64bits per interval. In the OP's case that would be 64x100(max number of
>>>>> invalid items assuming all invalid separated by atleast 1 valid item) about
>>>>> 3 orders of magnitude less.
>>>>>
>>>>>
>>>>> Jeff
>>>>>
>>>>
>>>> Thanks for the various answers. I'm reading about ICL, and sparsehash
>>>> to learn more.
>>>>
>>>> basically,
>>>>
>>>> struct MyPOD { double d1; ..... double d8; }
>>>> and I currently have std::vector<MyPOD> with 1e6 elements.
>>>> 100 of those would be == a particular instance of MyPOD invalid {
>>>> +Inf, +Inf, ... -Inf,, 0. }. It's a known, predefined MyPOD constant
>>>>
>>>> I need:
>>>> 1. calculate the max element, and the min element, _while_ ignoring the
>>>> 100 or so invalid entries (I don't know where they are), I can't however
>>>> change the vector<MyPOD>. it's read only.
>>>>
>>>> 2. I need also to plot it (ie, plot all the elements that are not
>>>> Invalid).
>>>> The index of each element in vector is transformed and returns a sort
>>>> of coordinate to plot a point
>>>>
>>>> So, I would have wrapped this vector into a class, and exposed similar
>>>> interface to vector, but would have implemented an iterator that just skips
>>>> the invallid entries:the iterator would skip the invalid entry until next
>>>> valid one when incremented or decremented.
>>>> My class would also be indexable with [] with this access not with the
>>>> same order as vector's [] (it would skip the invalid entries, so slower)
>>>>
>>>> Perhaps the best choice is a boost range adapted view.
>>>>
>>>> Reading the docs....
>>>>
>>>> thanks very much
>>>>
>>>> MM
>>>>
>>>
>>> I have wrapped my vector in a class, and used boost range adaptor
>>> filtered.
>>>
>>> I've implemented operator[](std::size_t i) as a simple
>>>
>>> begin is the begin iterator to a filtered view on myvector |
>>> boost::adapators::filtered( ... ) with a predicate that ignores my invalid
>>> values.
>>>
>>> std::advance(begin, i)
>>> /// this advance on the view gives a performance equal to what a forward
>>> only iterator (not random access iterator) would give.
>>>
>>> This appears to be too slow in my case, I need a better order of
>>> magnitude of such access,
>>>
>>> I probably should have a solution where I store the indices of the
>>> invalid values, and I implement
>>> operator[](std::size_t i) with the help of those indices.
>>>
>>> MM
>>>
>>> Is std::stable_partition too slow? (Sorry if answered earlier in the
>>> thread.)
>>>
>>> http://www.sgi.com/tech/stl/stable_partition.html
>>>
>>> Vince
>>>
>>> I am unable to copy or modify my original vector, I can only browse over
> it, iterate over it in different ways.
> stable_partition would move around the elements if I'm not mistaken
>
> MM
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net