|
Boost : |
From: Bohdan (warever_at_[hidden])
Date: 2002-10-23 08:35:23
"Ben Young" <ben_at_[hidden]> wrote in message
news:Pine.LNX.4.33.0210230844140.11148-100000_at_fileserver.cluster.local...
>
> Also, if any one is interested, I have an implementation of an Levenstein
> distance that calculates the sub set of a sorted range that is within a
> specified edit distance of a reference "string", far more efficiently than
> just calculating it for each element. I implemented this for a spell
> checker I wrote, but I can imagine it having lots of other uses.
>
> At the moment it needs some work (its a bit specific) but say if you would
> like it
>
> Cheers
This is closer to what i'm trying to do. But AFAIK this algorithm still uses
dynamic memory allocation.
Can i/we take a look at it ?
I think that more appropriate for spellchecker will be algrithm with limited gap
length, i.e. algorithm
which assumes that it is impossible that word has more than N insert/delete
errors following each other.
IMHO this is not too strong assumption for spellchecker, because if there is
high probability that
user makes 4 insert/delete mistakes in in succession than most probably he is
drunk of his knowledges
on language are ~0.
Actually that is what i have now. But i'm still looking for a theory on this.
The only thing i found is
algorithm which penalize gap length, but it still uses dynamic memory :(
PS:
if talking about aesthetics ... the simplest Lavenstain difference
implementation :
size_t recdiff( char const *s0, char const *s1 )
{
if( !*s0 ) return strlen( s1 );
if( !*s1 ) return strlen( s0 );
return min3( recdiff(s0+1,s1) + 1, recdiff(s0,s1+1) + 1, recdiff(s0+1,s1+1)
+ int(*s0!=*s1) );
}
But aesthetics hasn't many to share with effectiveness :o)
regards,
bohdan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk