Boost logo

Boost Users :

Subject: [Boost-users] [boost.locale] Question about boundary rules
From: alex_perry (alex.perry_at_[hidden])
Date: 2012-03-28 20:41:51


Have just been exploring boost.locale which I hadn't used before. However I'm
not quite understanding some of the behaviour for boundary rules when
segmenting text.

Fortunately I can see this behaviour happening in the example code so its
probably my misunderstanding and easily corrected.

If I compile
http://www.boost.org/doc/libs/1_49_0/libs/locale/doc/html/boundary_8cpp-example.html

and run it I get :-

[...skipped to avoid long quote ]
Part [Linux2.6] has number(s)
Part [ ] has no word characters
Part [and] has letter(s)
Part [ ] has no word characters
Part [Windows7] has number(s) letter(s)
Part [ ] has no word characters
[...]

However I don't understand why "Linux2.6" is detected as having number(s)
but no letters whilst "Windows7" is detected as having both. It doesn't
appear to be the decimal point "Linux26" has the same behaviour (whilst
"Linux2" is detected as having both).

I haven't debugged this just glanced at the code (which seems to be setting
these flags based on all the icu ruleBasedBreakIterator getRuleStatusVec()).

Thought I would just ask whether I'm misunderstanding something fundemental
here before trying to understand what is going on here and where (if there
is one) the problem is

TIA

Alex Perry

ps Just in case this is a known platform / version issue I was running this
on :-
Windows7
MSVC 10
boost 1.49
icu 4.9.1

--
View this message in context: http://boost.2283326.n4.nabble.com/boost-locale-Question-about-boundary-rules-tp4514159p4514159.html
Sent from the Boost - Users mailing list archive at Nabble.com.

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net