Boost logo

Boost :

From: Baptiste Lepilleur (blepctc_at_[hidden])
Date: 2002-09-18 15:12:25


----- Original Message -----
From: "Craig Henderson" <cdm.henderson_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Tuesday, September 17, 2002 1:37 PM
Subject: [boost] Re: Longest Common Subsequence

>
> "Gennadiy Rozental" <gennadiy_at_[hidden]> wrote in message
> news:am6olc$4lc$1_at_main.gmane.org...
> [snip]
>
> A brief discussion over the class or function interface has already taken
> place on the list. I concluded that the class interface was preferable and
> Baptiste Lepilleur provided a nice justification for it. "it keeps the
> algorithm interface as simple as possible and mesh very well with existing
> STL algorithm. It makes it less clumsy to add functions to extend the
> interface, and allows
> for automatic type deduction. It also keep the algorithm apart from sugar
> coat functions like apply_diff ".

Actually, I was refering to the templatized function interface !!!

I've done an attempt at converting the algorithm to a template function (and
some refactoring to hopefully improve readability). You can get the result
at:
http://gaiacrtn.free.fr/temp/. It should convince you of how much simple and
flexible it makes thing. Feel free to use it as you see fit.

The inteface look like this:

 template<typename size_type, typename InIt1, typename InIt2, typename
OutIt>
 size_type
 compute_lcs( InIt1 first_begin,
              InIt1 first_end,
              InIt2 second_begin,
              InIt2 second_end,
              OutIt result );

size_type need to be explicitly specified. It is the storage type used by
the intermediate array. To go further, I would replace size_type with a type
used as a 2d-array abstraction. This would solve memory allocation and array
cell size issues.

Input iterators must be random access iterators, as performance penality
would be to great otherwise (with the current implementation). Looking back
at the code, it seems that it should be possible to implement the main loop
using only pre-decrementation on the iterator, lowering the constraint to
bidirectional iterator.

I also implemented a 'sugar-coat' function as an example:

 template<typename InIt, typename ChangeIt, typename OutIt>
 void extract_lcs( InIt input_sequence_first,
                   ChangeIt change_first,
                   ChangeIt change_last,
                   OutIt lcs );

It outputs the lcs given one of the input sequence and the list of computed
changes.

A test application is provided which show how to reconstruct the original
input and pinpoint the difference without storing anything other than a
sequence of change_type.

Though, while the interface is generic, the algorithm itself is still a long
way from being all purpose. Memory usage grow very quickly: computing lcs on
two 256 characters string requires 64ko (assuming that can use a byte as
size_type, I haven't looked at the algorithm closely enough to see if an
array cell value can be greater than the length of an input sequence).

Now, what's really need to be done to make this algorithm general purpose is
to implement the array slicing algorithm you talked about. ;-)

Regards,
Baptiste.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk