Boost logo

Boost Users :

Subject: Re: [Boost-users] container with large number of elements and a small number of invalide elements
From: Jeff Flinn (Jeffrey.Flinn_at_[hidden])
Date: 2013-11-03 17:55:27


On 11/1/2013 7:24 PM, Travis Gockel wrote:
> On Fri, Nov 1, 2013 at 1:18 PM, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
> <mailto:Jeffrey.Flinn_at_[hidden]>> wrote:
>
> On 11/1/2013 1:28 PM, MM wrote:
>
> On 1 November 2013 12:37, Jeff Flinn <Jeffrey.Flinn_at_[hidden]
> <mailto:Jeffrey.Flinn_at_[hidden]>
> <mailto:Jeffrey.Flinn_at_gmail.__com
> <mailto:Jeffrey.Flinn_at_[hidden]>>> wrote:
> On 10/31/2013 1:14 PM, MM wrote:
> On 31 October 2013 16:10, Klaim - Joël Lamotte
> <mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>
> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>
> <mailto:mjklaim_at_[hidden] <mailto:mjklaim_at_[hidden]>>>> wrote:
>
> What would "valid"/"invalid" means here?
> If it's "exist" or "don't exist", then maybe using
> boost::optional<MyPod> would be enough.
>
> That's right. How about sizeof( boost::optional<MyPod>
> ) vs sizeof(
> MyPOD ) ?
> Also, what sort of iterator can iterate over just the valid
> values? or
> how to implement one?
> The point here is that there's a small number of
> elements that are
> invalid, their indices in the container are random,
>
> So why not keep the invalid items from getting into the
> container in
> the first place?
>
> Jeff
>
> because they are processed at a later stage, their indices in the
> container are relevant,
>
>
> I'd need a better description of just what defines and item to be
> valid/invalid. And just what this processed at a later stage means.
>
> If you've some functions:
>
> bool valid(const MyPod& ) { return ...; }
> bool invalid(const MyPod& pod) { return !valid(pod); }
>
> , and a pre-existing container then you can get an adapted view
> using boost::range::adaptors::__filtered like:
>
> typedef std::vector<MyPod> MyPods;
>
> MyPods all(...);
> MyPods bad;
>
> boost::copy(all | filtered(invalid), std::back_inserter(bad));
>
> boost::foreach(all | filtered(valid), someFunction);
>
> It all depends on the context of where this is used. Another
> approach uses the boost icl (interval container library). One
> example is:
>
> http://www.boost.org/doc/libs/__1_55_0b1/libs/icl/doc/html/__boost_icl/projects.html#boost___icl.projects.large_bitset
> <http://www.boost.org/doc/libs/1_55_0b1/libs/icl/doc/html/boost_icl/projects.html#boost_icl.projects.large_bitset>
>
> I've used a similar approach to just store the "valid" bases in a
> dna sequences and keeping track of gaps with an interval container.
> The advantage is avoiding the filtering cost when the application is
> primarily interested in the valid items only.
>
>
> I would recommend checking out Google sparsehash -- specifically,
> sparse_hash_set<T>:
> http://code.google.com/p/sparsehash/source/browse/trunk/src/sparsehash/sparse_hash_set
>
> The biggest difference between that and something like an
> std::vector<boost::optional<T>> is google::sparse_hash_set will re-order
> values on you (the nature of hash tables). It also forces you to have a
> "deleted key" which exists in the realm of possible T values. Based on
> impression that I get from your original email, this fits with your idea
> of an "invalid" T.
>
> That said, it is a ridiculously low-overhead container (on the order of
> ~2 bits/entry vs 4 bytes/entry with a boost::optional).

That's 2x10^6 bits for the OP's case, where-as the ICL approach is
64bits per interval. In the OP's case that would be 64x100(max number of
invalid items assuming all invalid separated by atleast 1 valid item)
about 3 orders of magnitude less.

Jeff


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net