Boost logo

Boost Users :

Subject: Re: [Boost-users] Merging sorted iterators?
From: ajb_at_[hidden]
Date: 2008-09-17 01:11:12


G'day Roland.

Quoting "Dr. Roland Bock" <rbock_at_[hidden]>:

> I have a bunch of iterators over some sorted input from several big
> files which do not fit into memory.
>
> What I would like to have is one iterator which allows me to iterate
> over the combined sorted input (of course, the result should be sorted,
> too).

I wrote the following iterator some time ago. It's for ints, it
only handles two input iterators, and doesn't handle end conditions
(this wasn't an issue in the application), but it illustrates the idea:

template<typename It1, typename It2>
class merge_iterator
     : public boost::iterator_facade<
              merge_iterator<It1, It2>
            , const int
            , incrementable_traversal_tag
>
{
public:
     merge_iterator(It1 p_it1, It2 p_it2)
        : m_it1(p_it1), m_it2(p_it2)
     {
     }

private:
     friend class iterator_core_access;

     void increment()
     {
        int x1 = *m_it1;
        int x2 = *m_it2;
        if (x1 <= x2) { ++m_it1; }
        if (x1 >= x2) { ++m_it2; }
     }

     const int& dereference() const
     {
        return std::min(*m_it1, *m_it2);
     }

     It1 m_it1;
     It2 m_it2;
};

As you can see, these things are actually pretty simple to write
in Boost. Indeed, it's hard to see how this could be made any
simpler and still retain C++ syntax.

For your problem, we'd have to assume that all of the input iterators
have the same time, and store them in a priority_queue ordered on
initial element. This is the sort of thing I'd be looking to do:

// WARNING: Untested code follows. This may not compile, let
// alone work. There are probably const-correctness issues, for
// example. Usual disclaimers about using at own risk apply as
// if incorporated herein.

template<typename It>
struct iterator_pair
{
     It m_cur;
     It m_end;

     bool at_end()
     {
         return m_cur == m_end;
     }

     iterator_pair<It>& increment()
     {
         ++m_cur;
         return *this;
     }

     const It& current()
     {
         return m_cur;
     }
};

template<typename It>
struct iterator_pair_comparator
{
     bool operator()(const It& p_i1, const It& p_i2)
     {
         return std::less_than(*p_i1.current(), *p_i2.current());
     }
};

template<typename T, typename It>
class merge_iterator
     : public boost::iterator_facade<
              merge_iterator<T, It>
            , const T
            , incrementable_traversal_tag
>
{
public:
     // Constructor and/or stuff to add iterators NYI. Also need
     // to implement some way to signal end conditions.

private:
     friend class iterator_core_access;

     void increment()
     {
         It top = m_iterators.top();
         if (!top.increment().at_end())
         {
             m_iterators.push(top);
         }
     }

     const T& dereference() const
     {
        return *m_iterators.top().current();
     }

     std::priority_queue<iterator_pair<It>,
         std::deque<iterator_pair<It> >,
         iterator_pair_comparator<It> >
             m_iterators;
};

If you come up with a nice variation on this, please add it to the
vault.

Cheers,
Andrew Bromage


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net