"Baptiste
Lepilleur" <blepctc@free.fr> wrote in message
news:002001c2552a$ff558fa0$d0b0c2d4@gaia...
> I would suggest slighty changing
your example. I'm not familar with the
> Longest Common
Subsequence
> algorithm and looking at your example I was falsely lead to
believe that
> (possibly):
> - the input sequence need to be
sorted
> - the matched element needed to be at the same position in the
input
> sequences
Baptiste, thanks for your comments. I agree with
your observations on my
choice of example; it wasn't the best. I just
selected one example from my
test harness. A better one is listed
below.
The input sequences need not be sorted, and the matched elements
do no need
to be at the same position.
>
> Giving a quick
look at the page the pointed to, it does not seems to be
the
> case
(only the relative order seems to matter). From what I
understood,
LCS
> is a "diff" algorithm. It's equivalent to a
combination of set_intersect
and
> set_difference on two sorted
sequence, right ?
Yes, it is a "diff" algorithm; as you would expect to
find in a code
management system for calculating a delta between two source
files.
>
> Beyond the count of inserted and deleted elements, do
we also get access
to
> what was deleted, inserted ?
Yes. I am
open to discussion on how best to achieve this. The current
implementation
fills a std::vector<> containing a list of changes (param 1
of
compute_lcs). The second parameter determines what is done with each
"record"
to reconstruct the second container from the first: REMOVE - the
record is
removed, INSERT - the record is inserted and KEEP - the record
is
unchanged.
>
> Still have to look deeper into what that
algorithm can do, but it looks
> promising.
>
> On the
practial side, wouldn't it be simpler is compute_lcs() returned 0
if
>
there was no common sequence ? (this avoid dealing with a special
case).
Yes. I've changed this now ;-)
A better
example:
int main(int, char **)
{
const char
*str1 = "nematode knowledge";
const char *str2 = "empty
bottle";
typedef cmp::diff<std::string>
compare_t;
compare_t compare(str1, str1+strlen(str1),
str2, str2+strlen(str2));
signed
lcs;
std::basic_string<char> result_str1,
result_str2;
compare_t::result_set
seq;
// calculate the lowest common
substring
lcs = compare.compute_lcs(&seq, cmp::REMOVE
| cmp::KEEP | cmp::INSERT);
std::cout << "Comparing:
\"" << str1 << "\" & \"" << str2 << "\""
<<
std::endl
<< "Lowest Common Subsequence length is " << lcs
<<
std::endl;
compare_t::result_set::iterator
it = seq.begin();
compare_t::result_set::iterator ite = seq.end();
for
(; it != ite; ++it)
{
compare_t::result_set::value_type res =
*it;
if (res->type() ==
cmp::REMOVE)
{
result_str1 +=
res->data();
result_str2 += '_';
}
else if (res->type() ==
cmp::INSERT)
{
result_str1 +=
'_';
result_str2 += res->data();
}
else
{
result_str1 +=
res->data();
result_str2 += res->data();
}
}
std::cout <<
result_str1 <<
std::endl
<< result_str2 << std::endl <<
std::endl;
return 0;
}
The output
is:
Comparing: "nematode knowledge" & "empty bottle"
Lowest Common
Subsequence length is 7
nema_tode_ kn_ow__ledge
_em_pt___y
__bo_ttle___