|
Boost Users : |
Subject: Re: [Boost-users] container with large number of elements and a small number of invalide elements
From: Vincent N. Virgilio (virgilio_at_[hidden])
Date: 2013-11-05 13:09:03
Is std::stable_partition too slow? (Sorry if answered earlier in the
thread.)
http://www.sgi.com/tech/stl/stable_partition.html
Vince
On Tue, Nov 5, 2013 at 12:43 PM, MM <finjulhich_at_[hidden]> wrote:
> On 4 November 2013 18:18, MM <finjulhich_at_[hidden]> wrote:
>
>> On 3 November 2013 22:55, Jeff Flinn <Jeffrey.Flinn_at_[hidden]> wrote:
>>
>>> On 11/1/2013 7:24 PM, Travis Gockel wrote:
>>>
>>>> On Fri, Nov 1, 2013 at 1:18 PM, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
>>>> <mailto:Jeffrey.Flinn_at_[hidden]>> wrote:
>>>>
>>>> On 11/1/2013 1:28 PM, MM wrote:
>>>>
>>>> On 1 November 2013 12:37, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
>>>> <mailto:Jeffrey.Flinn_at_[hidden]>
>>>> <mailto:Jeffrey.Flinn_at_gmail.__com
>>>>
>>>> <mailto:Jeffrey.Flinn_at_[hidden]>>> wrote:
>>>> On 10/31/2013 1:14 PM, MM wrote:
>>>> On 31 October 2013 16:10, Klaim - Joël Lamotte
>>>> <mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
>>>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>
>>>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
>>>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>>> wrote:
>>>>
>>>> What would "valid"/"invalid" means here?
>>>> If it's "exist" or "don't exist", then maybe using
>>>> boost::optional<MyPod> would be enough.
>>>>
>>>> That's right. How about sizeof( boost::optional<MyPod>
>>>> ) vs sizeof(
>>>> MyPOD ) ?
>>>> Also, what sort of iterator can iterate over just the
>>>> valid
>>>> values? or
>>>> how to implement one?
>>>> The point here is that there's a small number of
>>>> elements that are
>>>> invalid, their indices in the container are random,
>>>>
>>>> So why not keep the invalid items from getting into the
>>>> container in
>>>> the first place?
>>>>
>>>> Jeff
>>>>
>>>> because they are processed at a later stage, their indices in
>>>> the
>>>> container are relevant,
>>>>
>>>>
>>>> I'd need a better description of just what defines and item to be
>>>> valid/invalid. And just what this processed at a later stage means.
>>>>
>>>> If you've some functions:
>>>>
>>>> bool valid(const MyPod& ) { return ...; }
>>>> bool invalid(const MyPod& pod) { return !valid(pod); }
>>>>
>>>> , and a pre-existing container then you can get an adapted view
>>>> using boost::range::adaptors::__filtered like:
>>>>
>>>>
>>>> typedef std::vector<MyPod> MyPods;
>>>>
>>>> MyPods all(...);
>>>> MyPods bad;
>>>>
>>>> boost::copy(all | filtered(invalid), std::back_inserter(bad));
>>>>
>>>> boost::foreach(all | filtered(valid), someFunction);
>>>>
>>>> It all depends on the context of where this is used. Another
>>>> approach uses the boost icl (interval container library). One
>>>> example is:
>>>>
>>>> http://www.boost.org/doc/libs/__1_55_0b1/libs/icl/doc/html/_
>>>> _boost_icl/projects.html#boost___icl.projects.large_bitset
>>>>
>>>> <http://www.boost.org/doc/libs/1_55_0b1/libs/icl/doc/
>>>> html/boost_icl/projects.html#boost_icl.projects.large_bitset>
>>>>
>>>> I've used a similar approach to just store the "valid" bases in a
>>>> dna sequences and keeping track of gaps with an interval container.
>>>> The advantage is avoiding the filtering cost when the application is
>>>> primarily interested in the valid items only.
>>>>
>>>>
>>>> I would recommend checking out Google sparsehash -- specifically,
>>>> sparse_hash_set<T>:
>>>> http://code.google.com/p/sparsehash/source/browse/
>>>> trunk/src/sparsehash/sparse_hash_set
>>>>
>>>> The biggest difference between that and something like an
>>>> std::vector<boost::optional<T>> is google::sparse_hash_set will
>>>> re-order
>>>> values on you (the nature of hash tables). It also forces you to have a
>>>> "deleted key" which exists in the realm of possible T values. Based on
>>>> impression that I get from your original email, this fits with your idea
>>>> of an "invalid" T.
>>>>
>>>> That said, it is a ridiculously low-overhead container (on the order of
>>>> ~2 bits/entry vs 4 bytes/entry with a boost::optional).
>>>>
>>>
>>> That's 2x10^6 bits for the OP's case, where-as the ICL approach is
>>> 64bits per interval. In the OP's case that would be 64x100(max number of
>>> invalid items assuming all invalid separated by atleast 1 valid item) about
>>> 3 orders of magnitude less.
>>>
>>>
>>> Jeff
>>>
>>
>> Thanks for the various answers. I'm reading about ICL, and sparsehash to
>> learn more.
>>
>> basically,
>>
>> struct MyPOD { double d1; ..... double d8; }
>> and I currently have std::vector<MyPOD> with 1e6 elements.
>> 100 of those would be == a particular instance of MyPOD invalid { +Inf,
>> +Inf, ... -Inf,, 0. }. It's a known, predefined MyPOD constant
>>
>> I need:
>> 1. calculate the max element, and the min element, _while_ ignoring the
>> 100 or so invalid entries (I don't know where they are), I can't however
>> change the vector<MyPOD>. it's read only.
>>
>> 2. I need also to plot it (ie, plot all the elements that are not
>> Invalid).
>> The index of each element in vector is transformed and returns a sort of
>> coordinate to plot a point
>>
>> So, I would have wrapped this vector into a class, and exposed similar
>> interface to vector, but would have implemented an iterator that just skips
>> the invallid entries:the iterator would skip the invalid entry until next
>> valid one when incremented or decremented.
>> My class would also be indexable with [] with this access not with the
>> same order as vector's [] (it would skip the invalid entries, so slower)
>>
>> Perhaps the best choice is a boost range adapted view.
>>
>> Reading the docs....
>>
>> thanks very much
>>
>> MM
>>
>
> I have wrapped my vector in a class, and used boost range adaptor filtered.
>
> I've implemented operator[](std::size_t i) as a simple
>
> begin is the begin iterator to a filtered view on myvector |
> boost::adapators::filtered( ... ) with a predicate that ignores my invalid
> values.
>
> std::advance(begin, i)
> /// this advance on the view gives a performance equal to what a forward
> only iterator (not random access iterator) would give.
>
> This appears to be too slow in my case, I need a better order of magnitude
> of such access,
>
> I probably should have a solution where I store the indices of the invalid
> values, and I implement
> operator[](std::size_t i) with the help of those indices.
>
> MM
>
>
>
>
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>
Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net