Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-15 04:25:04
> From: Patrick Horgan <phorgan1_at_[hidden]>
> On 01/14/2011 02:05 PM, Peter Dimov wrote:
> > John B. Turpish wrote:
> > > By the way, I disagree with Peter's assessment that, "you rarely, if
> > > need to access the Nth character," but I will gladly cede that
> > > this depends on your problem domain.
> > It obviously depends on the problem domain :-) but, when
> > talking about Unicode, you can't reliably access the Nth character,
> > in general, even with UCS-32. (As far as I know.)
> I don't understand. UCS-32 (I assume you meant encoded as UTF-32)
> is a fixed width encoding so the n-th character is just
> 4n away from the beginning of the string. Right?
Nth Unicode code-point is at nth position not a character.
For example in word "×©Ö¸××Ö¹×" as 4 characters "×©Ö¸"â, "×"â, "×Ö¹"â, "×"â and 6
code points: ×©â Ö¸â ×â ×â Ö¹â ×
Where two code points are diacritic marks.
Boost.Locale has special character iterator to handle characters for this
purpose and it
works on characters and not code points.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk