Boost logo

Boost :

Subject: Re: [boost] [gsoc-2013] proposal for approximate string matching
From: Yu Jiang (sunlightt07_at_[hidden])
Date: 2013-04-29 12:25:35


2013/4/29 Erik Erlandson <eje_at_[hidden]>

>
> >
> > Part A, similarity metrics(use edit distance as an example):
> > template....
> > (1) (size_type or unsigned) edit_distance(const Range1T&, const
> Range2T&);
> > (2) (size_type or unsigned) edit_distance(const Range1T&, const Range2T&,
> > (size_type or unsigned) threshold);
> > (3) (size_type or unsigned) edit_distance(const Range1T&, const Range2T&,
> > equal_func);
> > (4) (user-defined type) edit_distance(const Range1T&, const Range2T&,
> cost);
> >
> > the cost struct may be like this:
> > template<T, V>
> > struct cost
> > {
> > V substitution(const T&, const T&) const;
> > V insertion(const T &a) const;
> > V deletion(const T &a) const;
> > }
>
>
> This seems like a good approach to me.
>
>
> >
> > (5) pair<container of operations, edit_distance>
> > edit_distance_with_operation(const Range1T&, const Range2T&);
> > (6) pair<container of operations, edit_distance>
> > edit_distance_with_operation(const Range1T&, const Range2T&, threshold);
> > (7) pair<container of operations, edit_distance>
> > edit_distance_with_operation(const Range1T&, const Range2T&, equal_func);
> > (8) pair<container of operations, edit_distance>
> > edit_distance_with_operation(const Range1T&, const Range2T&, cost);
> > the "operation" may be a pair<"operator", "position">, where "operator"
> may
> > be an enum of insertion, deletion and substitution, "position" is an
> > unsigned value.
>
>
> In the past, I would have said "returning a container on the stack will be
> an efficiency problem", however with the new "move" semantics, I *think* it
> will OK if it is implemented properly.
>
> I have only read about the new "move" semantics, and not used them, so if
> I'm wrong about that somebody please set me straight :)
>
>
Thanks a lot for your valuable advice!

Now I also have much clear thought for Part B (similarity search).
Several days ago you suggested me to make the "searcher" class behave like
a container.
Currently I think similarity search is very like exact search, where we
usually use the "unordered_map".
In fact, when we call the functions "find" or "operator[]" of an
"unordered_map", exactly matched values are returned.
Here, what I want to implement is just to return similar matched values. So
here I'd like to make
my searcher class like an "unordered_map". I 'll support as much as
possible functions
in the "unordered_map" as long as it's useful and compatible.

>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk