On 4 November 2013 18:18, MM <finjulhich@gmail.com> wrote:
On 3 November 2013 22:55, Jeff Flinn <Jeffrey.Flinn@gmail.com> wrote:
On 11/1/2013 7:24 PM, Travis Gockel wrote:
On Fri, Nov 1, 2013 at 1:18 PM, Jeff Flinn <Jeffrey.Flinn@gmail.com
<mailto:Jeffrey.Flinn@gmail.com>> wrote:

    On 11/1/2013 1:28 PM, MM wrote:

        On 1 November 2013 12:37, Jeff Flinn <Jeffrey.Flinn@gmail.com
        <mailto:Jeffrey.Flinn@gmail.com>
        <mailto:Jeffrey.Flinn@gmail.__com

        <mailto:Jeffrey.Flinn@gmail.com>>> wrote:
             On 10/31/2013 1:14 PM, MM wrote:
                 On 31 October 2013 16:10, Klaim - Joël Lamotte
                 <mjklaim@gmail.com <mailto:mjklaim@gmail.com>
        <mailto:mjklaim@gmail.com <mailto:mjklaim@gmail.com>>
                 <mailto:mjklaim@gmail.com <mailto:mjklaim@gmail.com>
        <mailto:mjklaim@gmail.com <mailto:mjklaim@gmail.com>>>> wrote:

                      What would "valid"/"invalid" means here?
                      If it's "exist" or "don't exist", then maybe using
                      boost::optional<MyPod> would be enough.

                 That's right. How about sizeof(  boost::optional<MyPod>
        ) vs sizeof(
                 MyPOD ) ?
                 Also, what sort of iterator can iterate over just the valid
                 values? or
                 how to implement one?
                 The point here is that there's a small number of
        elements that are
                 invalid, their indices in the container are random,

             So why not keep the invalid items from getting into the
        container in
             the first place?

             Jeff

        because they are processed at a later stage, their indices in the
        container are relevant,


    I'd need a better description of just what defines and item to be
    valid/invalid. And just what this processed at a later stage means.

    If you've some functions:

       bool   valid(const MyPod& ) { return ...; }
       bool invalid(const MyPod& pod) { return !valid(pod); }

    , and a pre-existing container then you can get an adapted view
    using boost::range::adaptors::__filtered like:


    typedef std::vector<MyPod> MyPods;

    MyPods all(...);
    MyPods bad;

    boost::copy(all | filtered(invalid), std::back_inserter(bad));

    boost::foreach(all | filtered(valid), someFunction);

    It all depends on the context of where this is used. Another
    approach uses the boost icl (interval container library). One
    example is:

    http://www.boost.org/doc/libs/__1_55_0b1/libs/icl/doc/html/__boost_icl/projects.html#boost___icl.projects.large_bitset

    <http://www.boost.org/doc/libs/1_55_0b1/libs/icl/doc/html/boost_icl/projects.html#boost_icl.projects.large_bitset>

    I've used a similar approach to just store the "valid" bases in a
    dna sequences and keeping track of gaps with an interval container.
    The advantage is avoiding the filtering cost when the application is
    primarily interested in the valid items only.


I would recommend checking out Google sparsehash -- specifically,
sparse_hash_set<T>:
http://code.google.com/p/sparsehash/source/browse/trunk/src/sparsehash/sparse_hash_set

The biggest difference between that and something like an
std::vector<boost::optional<T>> is google::sparse_hash_set will re-order
values on you (the nature of hash tables). It also forces you to have a
"deleted key" which exists in the realm of possible T values. Based on
impression that I get from your original email, this fits with your idea
of an "invalid" T.

That said, it is a ridiculously low-overhead container (on the order of
~2 bits/entry vs 4 bytes/entry with a boost::optional).

That's 2x10^6 bits for the OP's case, where-as the ICL approach is 64bits per interval. In the OP's case that would be 64x100(max number of invalid items assuming all invalid separated by atleast 1 valid item) about 3 orders of magnitude less.


Jeff

Thanks for the various answers. I'm reading about ICL, and sparsehash to learn more.

basically, 

struct MyPOD { double d1; ..... double d8; }
and I currently have std::vector<MyPOD> with 1e6 elements.
100 of those would be == a particular instance of MyPOD  invalid  { +Inf, +Inf, ... -Inf,, 0. }. It's a known, predefined MyPOD constant

I need:
1. calculate the max element, and the min element, _while_ ignoring the 100 or so invalid entries (I don't know where they are), I can't however change the vector<MyPOD>. it's read only.

2. I need also to plot it (ie, plot all the elements that are not Invalid).
The index of each element in vector is transformed and returns a sort of coordinate to plot a point 

 So, I would have wrapped this vector into a class, and exposed similar interface to vector, but would have implemented an iterator that just skips the invallid entries:the iterator would skip the invalid entry until next valid one when incremented or decremented.
My class would also be indexable with [] with this access not with the same order as vector's [] (it would skip the invalid entries, so slower)

Perhaps the best choice is a boost range adapted view.

Reading the docs.... 

thanks very much

MM

I have wrapped my vector in a class, and used boost range adaptor filtered.

I've implemented operator[](std::size_t i) as a simple

begin is the begin iterator to a filtered view on  myvector | boost::adapators::filtered( ... ) with a predicate that ignores my invalid values.

std::advance(begin, i)
/// this advance on the view gives a performance equal to what a forward only iterator (not random access iterator) would give.

This appears to be too slow in my case, I need a better order of magnitude of such access,

I probably should have a solution where I store the indices of the invalid values, and I implement 
operator[](std::size_t i) with the help of those indices.

MM