Boost logo

Boost :

Subject: [boost] [gsoc-2013]Approximate string matching
From: Yu Jiang (sunlightt07_at_[hidden])
Date: 2013-04-23 04:26:35


Hello, I'm Yu Jiang, currently a master student from Tsinghua University in
China. I'm attracted by the idea of "approximate string matching" since
it's highly related about my research and also my interest.

I have seen in a previous discussion, another student proposed to implement
some basic matching functions such as "edit distance" for boost. He said by
using dynamic programming, the time complexity is O(m*n). However, if we
give a threshold T and only report the exact edit distance if it's no
larger than T, the complexity can be O(T * min(m,n)). I think sometimes we
just need to decide whether the edit distance is small enough.

Furthermore, I'm also interested in approximate string search. That is,
given a query string word Q and a candidate word set S, find all the
strings in S who have the edit distance no larger than a threshold T with
the query string. This is very useful in many area such as data
integration, and I have strong knowledge in fast algorithms to solve this.

So, is it compatible for gsoc 2013 and for boost? Looking forward to your
opinions.

Thanks,
Yu


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk