Boost logo

Boost :

Subject: Re: [boost] [GSoC] NLP Idea for GSoC 2011 project
From: Andrew Sutton (asutton.list_at_[hidden])
Date: 2011-03-27 08:56:08


Hi,

> I have worked with Boost on my projects but haven't really thought of using
> C++ as a language for NLP.
>
> The NLP that I have done is on Python and Java, for their built-in string
> methods.

I think this would be an interesting project, but doing it correctly
would require *way* more than 3 months of effort. If you were
interested in starting to work on an NLP support library, you should
focus on designing a small set of tools (WordNet support, stopword
removal support, stemmers, etc.),

> I don't know if this is inexperience or ignorance, but does C++ work well
> for NLP?

There's no reason it should not be a great choice for NLP
applications. It might be worth pointing out that some of the
performance critical components of the Python NLTK (Natural Language
Toolkit) are written in C... and we all know that C++ is a better C
than C :)

> Also, I was looking at some C++ code using Boost/tokenizer.hpp that
> tokenized some text and it looked a bit scary.

Welcome to Boost. The learning curve can be a bit steep, but don't let
that scare you away.

Andrew


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk