Subject: Re: [boost] [GSoC] NLP Idea for GSoC 2011 project
From: Sarma Tangirala (tvssarma.omega9_at_[hidden])
Date: 2011-04-07 15:59:43
Just a small follow-up.
I was caught in exam week and could not do anything constructive for a
while. I did catch up with my advisor who specializes in AI and she was also
of the opinion that a small set of tool properly implemented should keep me
busy through the summer.
I am preparing my proposal and should submit in a while.
I want to know if I have a good chance of being selected. Any advice at this
stage would be awesome.
I am looking at tagging, chunking, tokenizing and parsing, stemming and
stop-word removal as suggested.
I will be using the O'Reilly NLTK book as a model reference. Any other good
reference sources would be helpful!
On 29 March 2011 06:53, Sarma Tangirala <tvssarma.omega9_at_[hidden]> wrote:
> On 27 March 2011 18:26, Andrew Sutton <asutton.list_at_[hidden]> wrote:
>> > I have worked with Boost on my projects but haven't really thought of
>> > C++ as a language for NLP.
>> > The NLP that I have done is on Python and Java, for their built-in
>> > methods.
>> I think this would be an interesting project, but doing it correctly
>> would require *way* more than 3 months of effort. If you were
>> interested in starting to work on an NLP support library, you should
>> focus on designing a small set of tools (WordNet support, stopword
>> removal support, stemmers, etc.),
> I do realize it will take a lot more than a summer's worth of effort, but
> as you pointed out, a small library with a couple of basic of tools could be
> an excellent start point.
>> > I don't know if this is inexperience or ignorance, but does C++ work
>> > for NLP?
>> There's no reason it should not be a great choice for NLP
>> applications. It might be worth pointing out that some of the
>> performance critical components of the Python NLTK (Natural Language
>> Toolkit) are written in C... and we all know that C++ is a better C
>> than C :)
> I have worked on the Python NLTK and absolutely loved it. I did not know
> that the critical components were written in C/C++. But I must admit, I
> haven't seen a final application written entirely in C/C++. I am a moderate
> to good level programmer and I think the reason why a lot of people prefer
> python is for the simplicity of the code or as one forum user put it, "the
> syntactical fluff and non-abstraction" that goes with C++.
> > Also, I was looking at some C++ code using Boost/tokenizer.hpp that
>> > tokenized some text and it looked a bit scary.
>> Welcome to Boost. The learning curve can be a bit steep, but don't let
>> that scare you away.
> Haha. Thanks for the welcome!
> I do realize the complexity involved in a project such as Boost, but for a
> noob, I was in total awe! :)
> Sarma Tangirala,
> Junior - Class of 2012,
> Department of Information Science and Technology,
> College of Engineering Guindy - Anna University
-- Regards, Sarma Tangirala, Junior - Class of 2012, Department of Information Science and Technology, College of Engineering Guindy - Anna University
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk