Boost logo

Boost Users :

Subject: Re: [Boost-users] UTF-16
From: Dominique Devienne (ddevienne_at_[hidden])
Date: 2009-07-24 11:17:55


On Wed, Jul 22, 2009 at 2:25 PM, Robert Dailey<rcdailey_at_[hidden]> wrote:
> Problem with that is that std::string::length() no longer provides a
> meaningful value. It will count each byte is 1 character.

Instead of ICU, there's also http://utfcpp.sourceforge.net/ with its
utf8::distance, which may be lighter weight. --DD

Quoting from that web page:
This function is used to find the length (in code points) of a UTF-8
encoded string. The reason it is called distance, rather than, say,
length is mainly because developers are used that length is an O(1)
function. Computing the length of an UTF-8 string is a linear
operation, and it looked better to model it after std::distance
algorithm.

In case of an invalid UTF-8 sequence, a utf8::invalid_utf8 exception
is thrown. If last does not point to the past-of-end of a UTF-8
sequence, a utf8::not_enough_room exception is thrown.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net