Boost logo

Boost Users :

Subject: Re: [Boost-users] container with large number of elements and a small number of invalide elements
From: Travis Gockel (travis_at_[hidden])
Date: 2013-11-01 19:24:32


On Fri, Nov 1, 2013 at 1:18 PM, Jeff Flinn <Jeffrey.Flinn_at_[hidden]> wrote:

> On 11/1/2013 1:28 PM, MM wrote:
>
>> On 1 November 2013 12:37, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
>> <mailto:Jeffrey.Flinn_at_gmail.**com <Jeffrey.Flinn_at_[hidden]>>> wrote:
>> On 10/31/2013 1:14 PM, MM wrote:
>> On 31 October 2013 16:10, Klaim - Joël Lamotte
>> <mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
>> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>> wrote:
>>
>> What would "valid"/"invalid" means here?
>> If it's "exist" or "don't exist", then maybe using
>> boost::optional<MyPod> would be enough.
>>
>> That's right. How about sizeof( boost::optional<MyPod> ) vs
>> sizeof(
>> MyPOD ) ?
>> Also, what sort of iterator can iterate over just the valid
>> values? or
>> how to implement one?
>> The point here is that there's a small number of elements that are
>> invalid, their indices in the container are random,
>>
>> So why not keep the invalid items from getting into the container in
>> the first place?
>>
>> Jeff
>>
>> because they are processed at a later stage, their indices in the
>> container are relevant,
>>
>
> I'd need a better description of just what defines and item to be
> valid/invalid. And just what this processed at a later stage means.
>
> If you've some functions:
>
> bool valid(const MyPod& ) { return ...; }
> bool invalid(const MyPod& pod) { return !valid(pod); }
>
> , and a pre-existing container then you can get an adapted view using
> boost::range::adaptors::**filtered like:
>
> typedef std::vector<MyPod> MyPods;
>
> MyPods all(...);
> MyPods bad;
>
> boost::copy(all | filtered(invalid), std::back_inserter(bad));
>
> boost::foreach(all | filtered(valid), someFunction);
>
> It all depends on the context of where this is used. Another approach uses
> the boost icl (interval container library). One example is:
>
> http://www.boost.org/doc/libs/**1_55_0b1/libs/icl/doc/html/**
> boost_icl/projects.html#boost_**icl.projects.large_bitset<http://www.boost.org/doc/libs/1_55_0b1/libs/icl/doc/html/boost_icl/projects.html#boost_icl.projects.large_bitset>
>
> I've used a similar approach to just store the "valid" bases in a dna
> sequences and keeping track of gaps with an interval container. The
> advantage is avoiding the filtering cost when the application is primarily
> interested in the valid items only.
>
>
> Jeff
>
>
>
>
>
> ______________________________**_________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/**mailman/listinfo.cgi/boost-**users>
>

I would recommend checking out Google sparsehash -- specifically,
sparse_hash_set<T>:
http://code.google.com/p/sparsehash/source/browse/trunk/src/sparsehash/sparse_hash_set

The biggest difference between that and something like an
std::vector<boost::optional<T>> is google::sparse_hash_set will re-order
values on you (the nature of hash tables). It also forces you to have a
"deleted key" which exists in the realm of possible T values. Based on
impression that I get from your original email, this fits with your idea of
an "invalid" T.

That said, it is a ridiculously low-overhead container (on the order of ~2
bits/entry vs 4 bytes/entry with a boost::optional).

-- 
Travis Gockel
Chief λ Combinator


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net