Boost logo

Boost Users :

From: Marco Correia (mvc_at_[hidden])
Date: 2006-02-05 11:54:13


Hi Paul,

Your algorithm perfoms much better than mine, and I think it would never occur
to me.

Thanks a lot!
Marco

On Saturday 04 February 2006 12:57, Paul Mensonides wrote:
> > -----Original Message-----
> > From: Marco Correia [mailto:mvc_at_[hidden]]
> >
> > Thanks!
> >
> > Today I managed to get a little bit further, let me tell you where I
> > am.
> >
> > First I aliased every symbol of the alphabet as an integer.
> >
> > #define A 1
> > #define B 2
> > ...
>
> This is a good start, but the wrong way to do this part of it. Instead,
> you should do something like:
>
> #define CONVERT(sym) BOOST_PP_CAT(CONVERT_, sym) #define CONVERT_A 1
> #define CONVERT_B 2 ...
>
> > #define NOT_EQ(s,E1,E2) \
> > BOOST_PP_NOT_EQUAL(E1,E2)
> >
> > #define REMOVE_ALL(Elem,Seq) \
> > BOOST_PP_SEQ_FILTER(NOT_EQ, Elem, Seq)
> >
> > #define REMOVE_ELEM(s,Seq,Elem) REMOVE_ALL(Elem,Seq)(Elem)
> >
> > #define REMOVE_DUPLICATES(Seq) \
> > BOOST_PP_SEQ_FOLD_LEFT(REMOVE_ELEM, Seq, Seq)
> >
> > This starts with the full sequence, and for each element, removes all
> > occurences and add it in the end.
>
> I like that you're thinking about this algorithmically. This is an
> interesting solution. The downside is that it is redundantly adding and
> later removing elements of the result sequence when the input sequence
> contains the same element again.
>
> > Do you see any other way, more efficient than this?
>
> This is how I did it:
>
> #include <boost/preprocessor/cat.hpp>
> #include <boost/preprocessor/comparison/not_equal.hpp>
> #include <boost/preprocessor/control/while.hpp>
> #include <boost/preprocessor/seq/filter.hpp>
> #include <boost/preprocessor/seq/seq.hpp> #include
> <boost/preprocessor/tuple/elem.hpp>
>
> #define CONVERT(sym) BOOST_PP_CAT(CONVERT_, sym)
>
> #define CONVERT_0 0
> #define CONVERT_A 1
> #define CONVERT_B 2
> #define CONVERT_C 3
> #define CONVERT_D 4
> #define CONVERT_E 5
> #define CONVERT_F 6
> #define CONVERT_G 7
> #define CONVERT_H 8
> #define CONVERT_I 9
> #define CONVERT_J 10
> // ...
>
> #define ID(x) (x)
>
> #define PRED_1(d, tuple) \
> CONVERT( \
> BOOST_PP_SEQ_HEAD(BOOST_PP_TUPLE_ELEM(2, 1, tuple)) \
> ) \
> /**/
> #define PRED_2(s, x, y) \
> BOOST_PP_NOT_EQUAL(CONVERT(x), CONVERT(y)) \
> /**/
>
> #define OP_1(d, tuple) OP_2 tuple
> #define OP_2(result, seq) \
> ( \
> result(BOOST_PP_SEQ_HEAD(seq)), \
> BOOST_PP_SEQ_FILTER( \
> PRED_2, BOOST_PP_SEQ_HEAD(seq), \
> BOOST_PP_SEQ_TAIL(seq) \
> ) \
> ) \
> /**/
>
> #define REMOVE_DUPLICATES(seq) \
> BOOST_PP_TUPLE_ELEM( \
> 2, 0, \
> BOOST_PP_WHILE( \
> PRED_1, OP_1, \
> (ID, seq(0)) \
> ) \
> ) \
> /**/
>
> REMOVE_DUPLICATES( (A)(B)(A)(C)(B)(B)(A)(C) ) // (A)(B)(C)
>
> Instead of using FOLD_LEFT, this one works by using WHILE to iterate over
> the elements of the input sequence. This allows you to *change* the
> sequence that you're iterating over. At the beginning of the algorithm, a
> (0) is added as a rogue element to the end of the sequence (since you can't
> have nil sequences). An extra CONVERT_ macro is defined that converts 0 to
> 0 (i.e. a no-op).
>
> The predicate passed to WHILE tests whether the head of the to-be-processed
> sequence is non-zero after being CONVERT'ed.
>
> The operation passed to WHILE appends the first element of the sequence to
> the result, and removes all occurrences of that element in the
> to-be-processed sequence.
>
> Note that we "initialize" the result sequence to ID, but append to the tail
> of the result (and thus ID removes itself when the first element gets added
> to the result).
>
> With each application of the operation, the tuple changes:
>
> ( ID , (A)(B)(A)(C)(B)(B)(A)(C)(0) ) // start
> ( (A) , (B)(C)(B)(B)(C)(0) )
> ( (A)(B) , (C)(C)(0) )
> ( (A)(B)(C) , (0) ) // end
>
> At the end of the loop, we simply extract the first element of tuple. From
> there, if you want to concatenate all the elements of the result together,
> just use SEQ_CAT.
>
> The worst-case scenario for any given length of input sequence occurs when
> there are no duplicates. E.g.
>
> REMOVE_DUPLICATES( (A)(B)(C)(D)(E)(F)(G)(H)(I)(J) )
>
> Even though this version is more complex, it is consistently faster because
> it doesn't do any redundant work.
>
> Regards,
> Paul Mensonides
>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users

-- 
Marco Correia <mvc_at_[hidden]>

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net