Boost logo

Boost Users :

Subject: Re: [Boost-users] [rfc] a library for gesture recognition, speech recognition, and synthesis
From: Roland Bock (rbock_at_[hidden])
Date: 2009-10-27 06:07:44


Stjepan Rajko wrote:
[...]
> OK, I just completed a small experiment on the 9 texts of the Brown
> Corpus categorized as "humor". I used 6 of the texts for training, and 3
> for testing.
>
> I created one submodel per tag
> (http://kh.aksis.uib.no/icame/manuals/brown/INDEX.HTM#bc6), trained each
> from the training data, and then connected the submodels into a larger
> model with transitions also trained by the training data.
>
> Here are the results:
>
> Out of 7159 tagged parts of speech (words, symbols, etc.) present in the
> 3 test texts:
> 5190 were tagged correctly
> 300 were tagged incorrectly
> 1669 were not tagged, because the word or symbol was not present (at
> least not in a verbatim form) in the training data.
>
> So, if you only consider the 7159-1669=5490 parts that could possibly be
> tagged based on what the training data covers, you get a 94.5% success rate.
>
> By using a larger training set, the number of non-tagged parts should go
> down. Also, I'm sure there are domain-specific tricks to improving the
> results.
>
> BTW., 95% of work to get this done was putting together the code that
> reads the corpus, since I already have generic code that does this kind
> of experiment.

That is most impressive! I am looking forward to analysing the work you
did and using it for German, too (which will be more complex, if I am
not mistaken). Alas, as I wrote earlier, I have to patiently complete
some other things before diving in :-)

> Great! I hope to have things cleaned up and better documented by then.

Thanks for your efforts! I really appreciate it!

Regards,

Roland


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net