Boost logo

Boost :

From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2002-09-06 15:20:08


"Baptiste Lepilleur" <blepctc_at_[hidden]> wrote in message
news:002001c2552a$ff558fa0$d0b0c2d4_at_gaia...

> I would suggest slighty changing your example. I'm not familar with the
> Longest Common Subsequence
> algorithm and looking at your example I was falsely lead to believe that
> (possibly):
> - the input sequence need to be sorted
> - the matched element needed to be at the same position in the input
> sequences

Baptiste, thanks for your comments. I agree with your observations on my
choice of example; it wasn't the best. I just selected one example from my
test harness. A better one is listed below.

The input sequences need not be sorted, and the matched elements do no need
to be at the same position.

>
> Giving a quick look at the page the pointed to, it does not seems to be
the
> case (only the relative order seems to matter). From what I understood,
LCS
> is a "diff" algorithm. It's equivalent to a combination of set_intersect
and
> set_difference on two sorted sequence, right ?

Yes, it is a "diff" algorithm; as you would expect to find in a code
management system for calculating a delta between two source files.

>
> Beyond the count of inserted and deleted elements, do we also get access
to
> what was deleted, inserted ?

Yes. I am open to discussion on how best to achieve this. The current
implementation fills a std::vector<> containing a list of changes (param 1
of compute_lcs). The second parameter determines what is done with each
"record" to reconstruct the second container from the first: REMOVE - the
record is removed, INSERT - the record is inserted and KEEP - the record is
unchanged.

>
> Still have to look deeper into what that algorithm can do, but it looks
> promising.
>
> On the practial side, wouldn't it be simpler is compute_lcs() returned 0
if
> there was no common sequence ? (this avoid dealing with a special case).

Yes. I've changed this now ;-)

A better example:

int main(int, char **)
{
    const char *str1 = "nematode knowledge";
    const char *str2 = "empty bottle";

    typedef cmp::diff<std::string> compare_t;
    compare_t compare(str1, str1+strlen(str1), str2, str2+strlen(str2));

    signed lcs;
    std::basic_string<char> result_str1, result_str2;
    compare_t::result_set seq;

    // calculate the lowest common substring
    lcs = compare.compute_lcs(&seq, cmp::REMOVE | cmp::KEEP | cmp::INSERT);
    std::cout << "Comparing: \"" << str1 << "\" & \"" << str2 << "\"" <<
std::endl
                << "Lowest Common Subsequence length is " << lcs <<
std::endl;

    compare_t::result_set::iterator it = seq.begin();
    compare_t::result_set::iterator ite = seq.end();
    for (; it != ite; ++it)
    {
        compare_t::result_set::value_type res = *it;
        if (res->type() == cmp::REMOVE)
        {
            result_str1 += res->data();
            result_str2 += '_';
        }
        else if (res->type() == cmp::INSERT)
        {
            result_str1 += '_';
            result_str2 += res->data();
        }
        else
        {
            result_str1 += res->data();
            result_str2 += res->data();
        }
    }

    std::cout << result_str1 << std::endl
              << result_str2 << std::endl << std::endl;

    return 0;
}

The output is:

Comparing: "nematode knowledge" & "empty bottle"
Lowest Common Subsequence length is 7
nema_tode_ kn_ow__ledge
_em_pt___y __bo_ttle___


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk