Boost logo

Boost :

Subject: Re: [boost] [GSoC] NLP Idea for GSoC 2011 project
From: Sarma Tangirala (tvssarma.omega9_at_[hidden])
Date: 2011-04-07 15:59:43


Just a small follow-up.

I was caught in exam week and could not do anything constructive for a
while. I did catch up with my advisor who specializes in AI and she was also
of the opinion that a small set of tool properly implemented should keep me
busy through the summer.

I am preparing my proposal and should submit in a while.

I want to know if I have a good chance of being selected. Any advice at this
stage would be awesome.

I am looking at tagging, chunking, tokenizing and parsing, stemming and
stop-word removal as suggested.

I will be using the O'Reilly NLTK book as a model reference. Any other good
reference sources would be helpful!

On 29 March 2011 06:53, Sarma Tangirala <tvssarma.omega9_at_[hidden]> wrote:

> Hello,
>
>
> On 27 March 2011 18:26, Andrew Sutton <asutton.list_at_[hidden]> wrote:
>
>> Hi,
>>
>> > I have worked with Boost on my projects but haven't really thought of
>> using
>> > C++ as a language for NLP.
>> >
>> > The NLP that I have done is on Python and Java, for their built-in
>> string
>> > methods.
>>
>> I think this would be an interesting project, but doing it correctly
>> would require *way* more than 3 months of effort. If you were
>> interested in starting to work on an NLP support library, you should
>> focus on designing a small set of tools (WordNet support, stopword
>> removal support, stemmers, etc.),
>>
>
>
> I do realize it will take a lot more than a summer's worth of effort, but
> as you pointed out, a small library with a couple of basic of tools could be
> an excellent start point.
>
>
>
>>
>> > I don't know if this is inexperience or ignorance, but does C++ work
>> well
>> > for NLP?
>>
>> There's no reason it should not be a great choice for NLP
>> applications. It might be worth pointing out that some of the
>> performance critical components of the Python NLTK (Natural Language
>> Toolkit) are written in C... and we all know that C++ is a better C
>> than C :)
>>
>>
>
> I have worked on the Python NLTK and absolutely loved it. I did not know
> that the critical components were written in C/C++. But I must admit, I
> haven't seen a final application written entirely in C/C++. I am a moderate
> to good level programmer and I think the reason why a lot of people prefer
> python is for the simplicity of the code or as one forum user put it, "the
> syntactical fluff and non-abstraction" that goes with C++.
>
>
> > Also, I was looking at some C++ code using Boost/tokenizer.hpp that
>> > tokenized some text and it looked a bit scary.
>>
>> Welcome to Boost. The learning curve can be a bit steep, but don't let
>> that scare you away.
>>
>> Andrew
>>
>
> Haha. Thanks for the welcome!
>
> I do realize the complexity involved in a project such as Boost, but for a
> noob, I was in total awe! :)
>
>
>
> --
>
> Regards,
> Sarma Tangirala,
> Junior - Class of 2012,
> Department of Information Science and Technology,
> College of Engineering Guindy - Anna University
>
>

-- 
Regards,
Sarma Tangirala,
Junior - Class of 2012,
Department of Information Science and Technology,
College of Engineering Guindy - Anna University

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk