Boost logo

Boost :

From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2002-09-17 06:37:13


"Gennadiy Rozental" <gennadiy_at_[hidden]> wrote in message
news:am6olc$4lc$1_at_main.gmane.org...
[snip]

> Note also that there are in fact 2 different algorithms here not one.
But
> useful in specific contexts. One is to compute only the length of LCS.
> Second is to produce the LSC itself. it appears that first is much more
easy
> and particularly could be implemented with use of extra memory
proportional
> to maximum of size(A) and size(B), while second will require memory
> proportional to size(A)*size(B) (in fact there is linear memory algorithm
> for the second problem either but it more difficult and not discussed
here).

Yes, this is covered in the documentation and comments in the code. See the
"Still to do" section of the docs; "Implement a new llcs() method with a
faster algorithm for calculating the length _only_". Using the compute_lcs()
method is expensive if all that is required is the length, hence the
compute_llcs() method.

> Note that we never mentioned in formal description any notion of "find
> diff" or "apply diff". That mean that those are independent algorithms
that
> could use generic LCS algorithms in their implementation.
> Our first attempt to formalize the solution would look like these two
> independent algorithms
>
> template<typename Iterator1, typename Iterator2>
> size_t
> longest_common_subsequence_length( Iterator1 begin1, Iterator1 end1,
> Iterator2
> begin2, Iterator2 end2 );
>
> template<typename Iterator1, typename Iterator2, typename ResultIterator>
> void
> longest_common_subsequence( Iterator1 begin1, Iterator1 end1,
> Iterator2 begin2,
Iterator2
> end2,
> ResultIterator res );
>

A brief discussion over the class or function interface has already taken
place on the list. I concluded that the class interface was preferable and
Baptiste Lepilleur provided a nice justification for it. "it keeps the
algorithm interface as simple as possible and mesh very well with existing
STL algorithm. It makes it less clumsy to add functions to extend the
interface, and allows
for automatic type deduction. It also keep the algorithm apart from sugar
coat functions like apply_diff ".

> First thing that we may find is missing is Allocator template parameter
> since both algorithms need to allocate the memory to keep helper
> information. Alternative would be to require the user to provide this
memory
> somehow.

I'd prefer the Allocator approach in line with STL algos.

> Let now look on the requirements on the iterator types. to answer on this
> question one will need to look on the algorithms description in David
> Eppstein work. It appeared that there two ways to implement the algorithm:
> top-bottom and bottom-up. Second is more efficient. On the hand first
> algorithm probably require only Forward iterators, while second require at
> least Bidirectional iterators (or even Random Access, like with your
> implementation). To be as generic as possible we will need to implement
both
> versions and dispatch to specific implementation depends on iterators
> categories.

I need to think about the requirements on the iterators a bit more before I
can comment on this. I appreciate it is an issue that must be addressed.

> The most difficult issue with proposed solution interface is output
> result specification. The problem is that there maybe very different thing
> user would expect as an algorithm output: values of first sequence that
> belongs to LCS, values of second sequence that belongs to LCS, references
to
> them both and so on. The solution is to leave the decision how to
> interpret/populate output to the user. The user will be required to
provide
> the binary predicate P that should accept to iterators as an argument. now
> for every pair of iterators it1, it2 which algorithm found are belong to
the
> result LCS it will call P( it1, it2 ). That should be generic enough to
> cover most reasonable cases

The output of the LCS algorithm was difficult. I opted for the mask to
determine was is required and a container to receive the requested results
as a good solution, but admit it is not necessarily the best.

The use of a user-defined predicate to process the result set on a callback
is an interesting approach. However, I think your suggestion of a binary
predicate that is passed elements contained in both original sequences (ie.
elements of the subsequence) is too limited. The user would have to do a lot
of within the predicate it track the changes if they were interested in the
elements removed from the first or added to the second, in repect of the
other. The 'change list' approach that I adopted works well in all three
situations.

[snip]

> Now as an example application you could implement usage of LCS algorithm
for
> "diff" problems.
I already do in the test problem.

>
> Finally couple comments to your code, while I am at it:
> 1. 6 STL includes as well as TRACE staff on top I presume temporary since
> none of them is required for generic algorithm implementation.
These are mostly for debugging purposes. The only one that is required in
production code is deque.

> 2. free_heap_ptr: why not checked_delete?
Historical reasons, I guess. I can change this without any worries. However,
this will become redundant with the introduction of an Allocator that you
suggested ealier :-)

> 3. const lcs_size_type *pvalue. Boost recommend to use C++ notation. "*"
is
> part of the type not a value property.
Sorry, I don't understand this comment. What is it that you/boost objects
to?

> 4. FWIW Bottom-Up algorithm implementation should not require random
access
> iterators.
See Above

> 5. sequence_traits defined but never used.
This is defined for the user. See the test file lcs.cpp, the sequence_traits
struct is used at line 240 to enable the LCS algo to manipulate a raw C++
char[] array.

Thanks for you feedback.

Regards
-- Craig


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk