Boost logo

Boost :

Subject: Re: [boost] [GSoC] NLP Idea for GSoC 2011 project
From: Sarma Tangirala (tvssarma.omega9_at_[hidden])
Date: 2011-04-08 08:42:37


I finally finished my proposal and its here

http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/omega9/1

<http://www.google-melange.com/gsoc/proposal/review/google/gsoc2011/omega9/1>Any
comments or suggestions would be helpful.

Thanks again for this opportunity!

On 8 April 2011 01:29, Sarma Tangirala <tvssarma.omega9_at_[hidden]> wrote:

> Just a small follow-up.
>
> I was caught in exam week and could not do anything constructive for a
> while. I did catch up with my advisor who specializes in AI and she was also
> of the opinion that a small set of tool properly implemented should keep me
> busy through the summer.
>
> I am preparing my proposal and should submit in a while.
>
> I want to know if I have a good chance of being selected. Any advice at
> this stage would be awesome.
>
> I am looking at tagging, chunking, tokenizing and parsing, stemming and
> stop-word removal as suggested.
>
> I will be using the O'Reilly NLTK book as a model reference. Any other good
> reference sources would be helpful!
>
> On 29 March 2011 06:53, Sarma Tangirala <tvssarma.omega9_at_[hidden]> wrote:
>
>> Hello,
>>
>>
>> On 27 March 2011 18:26, Andrew Sutton <asutton.list_at_[hidden]> wrote:
>>
>>> Hi,
>>>
>>> > I have worked with Boost on my projects but haven't really thought of
>>> using
>>> > C++ as a language for NLP.
>>> >
>>> > The NLP that I have done is on Python and Java, for their built-in
>>> string
>>> > methods.
>>>
>>> I think this would be an interesting project, but doing it correctly
>>> would require *way* more than 3 months of effort. If you were
>>> interested in starting to work on an NLP support library, you should
>>> focus on designing a small set of tools (WordNet support, stopword
>>> removal support, stemmers, etc.),
>>>
>>
>>
>> I do realize it will take a lot more than a summer's worth of effort, but
>> as you pointed out, a small library with a couple of basic of tools could be
>> an excellent start point.
>>
>>
>>
>>>
>>> > I don't know if this is inexperience or ignorance, but does C++ work
>>> well
>>> > for NLP?
>>>
>>> There's no reason it should not be a great choice for NLP
>>> applications. It might be worth pointing out that some of the
>>> performance critical components of the Python NLTK (Natural Language
>>> Toolkit) are written in C... and we all know that C++ is a better C
>>> than C :)
>>>
>>>
>>
>> I have worked on the Python NLTK and absolutely loved it. I did not know
>> that the critical components were written in C/C++. But I must admit, I
>> haven't seen a final application written entirely in C/C++. I am a moderate
>> to good level programmer and I think the reason why a lot of people prefer
>> python is for the simplicity of the code or as one forum user put it, "the
>> syntactical fluff and non-abstraction" that goes with C++.
>>
>>
>> > Also, I was looking at some C++ code using Boost/tokenizer.hpp that
>>> > tokenized some text and it looked a bit scary.
>>>
>>> Welcome to Boost. The learning curve can be a bit steep, but don't let
>>> that scare you away.
>>>
>>> Andrew
>>>
>>
>> Haha. Thanks for the welcome!
>>
>> I do realize the complexity involved in a project such as Boost, but for a
>> noob, I was in total awe! :)
>>
>>
>>
>> --
>>
>> Regards,
>> Sarma Tangirala,
>> Junior - Class of 2012,
>> Department of Information Science and Technology,
>> College of Engineering Guindy - Anna University
>>
>>
>
>
> --
> Regards,
> Sarma Tangirala,
> Junior - Class of 2012,
> Department of Information Science and Technology,
> College of Engineering Guindy - Anna University
>
>

-- 
Regards,
Sarma Tangirala,
Junior - Class of 2012,
Department of Information Science and Technology,
College of Engineering Guindy - Anna University

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk