Boost logo

Boost :

Subject: Re: [boost] [GSoC] NLP Idea for GSoC 2011 project
From: Sarma Tangirala (tvssarma.omega9_at_[hidden])
Date: 2011-03-28 21:23:17


Hello,

On 27 March 2011 18:26, Andrew Sutton <asutton.list_at_[hidden]> wrote:

> Hi,
>
> > I have worked with Boost on my projects but haven't really thought of
> using
> > C++ as a language for NLP.
> >
> > The NLP that I have done is on Python and Java, for their built-in string
> > methods.
>
> I think this would be an interesting project, but doing it correctly
> would require *way* more than 3 months of effort. If you were
> interested in starting to work on an NLP support library, you should
> focus on designing a small set of tools (WordNet support, stopword
> removal support, stemmers, etc.),
>

I do realize it will take a lot more than a summer's worth of effort, but as
you pointed out, a small library with a couple of basic of tools could be an
excellent start point.

>
> > I don't know if this is inexperience or ignorance, but does C++ work well
> > for NLP?
>
> There's no reason it should not be a great choice for NLP
> applications. It might be worth pointing out that some of the
> performance critical components of the Python NLTK (Natural Language
> Toolkit) are written in C... and we all know that C++ is a better C
> than C :)
>
>

I have worked on the Python NLTK and absolutely loved it. I did not know
that the critical components were written in C/C++. But I must admit, I
haven't seen a final application written entirely in C/C++. I am a moderate
to good level programmer and I think the reason why a lot of people prefer
python is for the simplicity of the code or as one forum user put it, "the
syntactical fluff and non-abstraction" that goes with C++.

> Also, I was looking at some C++ code using Boost/tokenizer.hpp that
> > tokenized some text and it looked a bit scary.
>
> Welcome to Boost. The learning curve can be a bit steep, but don't let
> that scare you away.
>
> Andrew
>

Haha. Thanks for the welcome!

I do realize the complexity involved in a project such as Boost, but for a
noob, I was in total awe! :)

-- 
Regards,
Sarma Tangirala,
Junior - Class of 2012,
Department of Information Science and Technology,
College of Engineering Guindy - Anna University

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk