|
Boost : |
From: Gennadiy Rozental (gennadiy_at_[hidden])
Date: 2002-09-17 03:21:16
Hi, Craig
Without sinking into the details of your code, I wanted to make one
general comment: independently of whether your design/implementation is
good, it has very little to do with generic Longest Common Subsequence (LCS
in future) algorithm.
Let me for those who don't know give a little overview of what are we
talking about here. LCS algorithms solving following problem: For a given
sequences A and B find longest common space subsequence.
For example if :
A = { 3, 7, 1, 2, 5, 8, 9 }
B = { 2, 1, 3, 2, 7, 9 }
LCS(A,B) = {1,2,9}
This algorithm has application in several practical domains, one of the most
important is implementing of "diff" utility.
Note also that there are in fact 2 different algorithms here not one. But
useful in specific contexts. One is to compute only the length of LCS.
Second is to produce the LSC itself. it appears that first is much more easy
and particularly could be implemented with use of extra memory proportional
to maximum of size(A) and size(B), while second will require memory
proportional to size(A)*size(B) (in fact there is linear memory algorithm
for the second problem either but it more difficult and not discussed here).
Note that we never mentioned in formal description any notion of "find
diff" or "apply diff". That mean that those are independent algorithms that
could use generic LCS algorithms in their implementation.
Our first attempt to formalize the solution would look like these two
independent algorithms
template<typename Iterator1, typename Iterator2>
size_t
longest_common_subsequence_length( Iterator1 begin1, Iterator1 end1,
Iterator2
begin2, Iterator2 end2 );
template<typename Iterator1, typename Iterator2, typename ResultIterator>
void
longest_common_subsequence( Iterator1 begin1, Iterator1 end1,
Iterator2 begin2, Iterator2
end2,
ResultIterator res );
First thing that we may find is missing is Allocator template parameter
since both algorithms need to allocate the memory to keep helper
information. Alternative would be to require the user to provide this memory
somehow.
Let now look on the requirements on the iterator types. to answer on this
question one will need to look on the algorithms description in David
Eppstein work. It appeared that there two ways to implement the algorithm:
top-bottom and bottom-up. Second is more efficient. On the hand first
algorithm probably require only Forward iterators, while second require at
least Bidirectional iterators (or even Random Access, like with your
implementation). To be as generic as possible we will need to implement both
versions and dispatch to specific implementation depends on iterators
categories.
The most difficult issue with proposed solution interface is output
result specification. The problem is that there maybe very different thing
user would expect as an algorithm output: values of first sequence that
belongs to LCS, values of second sequence that belongs to LCS, references to
them both and so on. The solution is to leave the decision how to
interpret/populate output to the user. The user will be required to provide
the binary predicate P that should accept to iterators as an argument. now
for every pair of iterators it1, it2 which algorithm found are belong to the
result LCS it will call P( it1, it2 ). That should be generic enough to
cover most reasonable cases. So resulting interface look like this:
template<typename Iterator1, typename Iterator2, typename Allocator=...>
size_t
longest_common_subsequence_length( Iterator1 begin1, Iterator1 end1,
Iterator2
begin2, Iterator2 end2 );
template<typename Iterator1, typename Iterator2, typename Predicate,
typename Allocator=...>
void
longest_common_subsequence( Iterator1 begin1, Iterator1 end1,
Iterator2 begin2, Iterator2
end2,
Predicate P = ... );
Now as an example application you could implement usage of LCS algorithm for
"diff" problems.
Finally couple comments to your code, while I am at it:
1. 6 STL includes as well as TRACE staff on top I presume temporary since
none of them is required for generic algorithm implementation.
2. free_heap_ptr: why not checked_delete?
3. const lcs_size_type *pvalue. Boost recommend to use C++ notation. "*" is
part of the type not a value property.
4. FWIW Bottom-Up algorithm implementation should not require random access
iterators.
5. sequence_traits defined but never used.
Regards,
Gennadiy.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk