Boost logo

Boost :

From: Graham (Graham_at_[hidden])
Date: 2008-03-10 16:10:29

>We can implement UTF-8's and UTF-16's skip_forward by looking at the

>current byte. But does that work with all encodings? I think it doesn't

>work for shift encodings, unless you're willing to come to a stop on a

>shift character. I'm not: there's a rule for some shift encodings that

>they *must* end in the initial shift state, which means that there's a

>good chance that a shift character is the last thing in the string.

>would mean, however, that if you increment an iterator that points to

>the last real character, it must scan past the shift character or it

>won't compare equal to the end iterator. Unless you're willing to scan

>past the shift in the equality test, another thing I wouldn't do.


>Seems to me that shift encodings are a lot more pain than they're

>I really have to wonder why anyone would ever have come up with them.




As Unicode characters that are not in page zero can require more than 32

to encode them [yes really] this means that one 'character' can be very

in UTF-8/16 encoding. It is even worse if you start looking at

characters [graphemes] where you can easily have three characters make
up a

conceptual character.


The only way I have found of handling this is to base the string

on a proper Unicode character support library according to the Unicode

This means that you need character movement support, grapheme support,

sorting support.


As I said to Phil, Rogier and I completed a Unicode character library

Release under boost, but never submitted it to Boost as we had intended

release it with a string library built on it, and never had time to do

second part of the work.






Boost list run by bdawes at, gregod at, cpdaniel at, john at