Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-15 04:25:04

Next message: Paul A. Bristow: "Re: [boost] [Wiki] Changes in information about gcc warnings."
Previous message: vicente.botet: "Re: [boost] [pgi-10.1] relocations compiler option"
In reply to: Patrick Horgan: "Re: [boost] [General] Always treat std::strings as UTF-8"
Next in thread: Robert Kawulak: "Re: [boost] [General] Always treat std::strings as UTF-8"

> From: Patrick Horgan <phorgan1_at_[hidden]> > On 01/14/2011 02:05 PM, Peter Dimov wrote: > > John B. Turpish wrote: > > > By the way, I disagree with Peter's assessment that, "you rarely, if ever, > > > need to access the Nth character," but I will gladly cede that > > > this depends on your problem domain. > > > > It obviously depends on the problem domain :-) but, when > > talking about Unicode, you can't reliably access the Nth character, > > in general, even with UCS-32. (As far as I know.) > > I don't understand. UCS-32 (I assume you meant encoded as UTF-32) > is a fixed width encoding so the n-th character is just > 4n away from the beginning of the string. Right? No, Nth Unicode code-point is at nth position not a character. For example in word "×©Ö¸×œ×•Ö¹×" as 4 characters "×©Ö¸"â€Ž, "×œ"â€Ž, "×•Ö¹"â€Ž, "×"â€Ž and 6 code points: ×©â€Ž Ö¸â€Ž ×œâ€Ž ×•â€Ž Ö¹â€Ž ×ž Where two code points are diacritic marks. Boost.Locale has special character iterator to handle characters for this purpose and it works on characters and not code points. See: http://cppcms.sourceforge.net/boost_locale/html/tutorial.html#8e296a067a37563370ded05f5a3bf3ec Artyom

Next message: Paul A. Bristow: "Re: [boost] [Wiki] Changes in information about gcc warnings."
Previous message: vicente.botet: "Re: [boost] [pgi-10.1] relocations compiler option"
In reply to: Patrick Horgan: "Re: [boost] [General] Always treat std::strings as UTF-8"
Next in thread: Robert Kawulak: "Re: [boost] [General] Always treat std::strings as UTF-8"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk