|
Boost : |
From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2002-09-07 03:46:29
"Baptiste Lepilleur" <blepctc_at_[hidden]> wrote in message
news:002001c2552a$ff558fa0$d0b0c2d4_at_gaia...
> I would suggest slighty changing your example. I'm not familar with the
> Longest Common Subsequence
> algorithm and looking at your example I was falsely lead to believe that
> (possibly):
> - the input sequence need to be sorted
> - the matched element needed to be at the same position in the input
> sequences
Baptiste, thanks for your comments. I agree with your observations on my
choice of example; it wasn't the best. I just selected one example from my
test harness. A better one is listed below.
The input sequences need not be sorted, and the matched elements do no need
to be at the same position.
>
> Giving a quick look at the page the pointed to, it does not seems to be
the
> case (only the relative order seems to matter). From what I understood,
LCS
> is a "diff" algorithm. It's equivalent to a combination of set_intersect
and
> set_difference on two sorted sequence, right ?
Yes, it is a "diff" algorithm; as you would expect to find in a code
management system for calculating a delta between two source files.
>
> Beyond the count of inserted and deleted elements, do we also get access
to
> what was deleted, inserted ?
Yes. I am open to discussion on how best to achieve this. The current
implementation fills a std::vector<> containing a list of changes (param 1
of compute_lcs). The second parameter determines what is done with each
"record" to reconstruct the second container from the first: REMOVE - the
record is removed, INSERT - the record is inserted and KEEP - the record is
unchanged.
>
> Still have to look deeper into what that algorithm can do, but it looks
> promising.
>
> On the practial side, wouldn't it be simpler is compute_lcs() returned 0
if
> there was no common sequence ? (this avoid dealing with a special case).
Yes. I've changed this now ;-)
A better example:
int main(int, char **)
{
const char *str1 = "nematode knowledge";
const char *str2 = "empty bottle";
typedef cmp::diff<std::string> compare_t;
compare_t compare(str1, str1+strlen(str1), str2, str2+strlen(str2));
signed lcs;
std::basic_string<char> result_str1, result_str2;
compare_t::result_set seq;
// calculate the lowest common substring
lcs = compare.compute_lcs(&seq, cmp::REMOVE | cmp::KEEP | cmp::INSERT);
std::cout << "Comparing: \"" << str1 << "\" & \"" << str2 << "\"" <<
std::endl
<< "Lowest Common Subsequence length is " << lcs <<
std::endl;
compare_t::result_set::iterator it = seq.begin();
compare_t::result_set::iterator ite = seq.end();
for (; it != ite; ++it)
{
compare_t::result_set::value_type res = *it;
if (res->type() == cmp::REMOVE)
{
result_str1 += res->data();
result_str2 += '_';
}
else if (res->type() == cmp::INSERT)
{
result_str1 += '_';
result_str2 += res->data();
}
else
{
result_str1 += res->data();
result_str2 += res->data();
}
}
std::cout << result_str1 << std::endl
<< result_str2 << std::endl << std::endl;
return 0;
}
The output is:
Comparing: "nematode knowledge" & "empty bottle"
Lowest Common Subsequence length is 7
nema_tode_ kn_ow__ledge
_em_pt___y __bo_ttle___
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk