Boost logo

Boost Users :

Subject: [Boost-users] [boost.locale] Question about boundary rules
From: alex_perry (alex.perry_at_[hidden])
Date: 2012-03-28 20:41:51

Have just been exploring boost.locale which I hadn't used before. However I'm
not quite understanding some of the behaviour for boundary rules when
segmenting text.

Fortunately I can see this behaviour happening in the example code so its
probably my misunderstanding and easily corrected.

If I compile

and run it I get :-

[...skipped to avoid long quote ]
Part [Linux2.6] has number(s)
Part [ ] has no word characters
Part [and] has letter(s)
Part [ ] has no word characters
Part [Windows7] has number(s) letter(s)
Part [ ] has no word characters

However I don't understand why "Linux2.6" is detected as having number(s)
but no letters whilst "Windows7" is detected as having both. It doesn't
appear to be the decimal point "Linux26" has the same behaviour (whilst
"Linux2" is detected as having both).

I haven't debugged this just glanced at the code (which seems to be setting
these flags based on all the icu ruleBasedBreakIterator getRuleStatusVec()).

Thought I would just ask whether I'm misunderstanding something fundemental
here before trying to understand what is going on here and where (if there
is one) the problem is


Alex Perry

ps Just in case this is a known platform / version issue I was running this
on :-
boost 1.49
icu 4.9.1

View this message in context:
Sent from the Boost - Users mailing list archive at

Boost-users list run by williamkempf at, kalb at, bjorn.karlsson at, gregod at, wekempf at