Boost logo

Boost :

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-21 16:02:23


On Thu, 21 Oct 2004 12:08:18 -0700, Eric Niebler
<eric_at_[hidden]> wrote:
> I disagree. The user should be allowed to twiddle as many bits as she
> pleases, even permitted to create an invalid UTF string. However,
> operations that interpret the string as a whole (comparison,
> canonicalization, etc.) should detect invalid strings and throw. The
> reason is that people will need to manipulate strings at the bit level,
> and intermediate states may be invalid, but that the final state may be
> valid. We shouldn't do too much nannying during these intermediate states.

Manipulation of strings at the bit level is always possible -- use
std::string to fiddle with your UTF-8 string all you like. IMO
unicode::string (or whatever you wish to call it) should always
contain a valid Unicode string.
Otherwise, every time operator++() is called on an iterator many
checks would have to be done, and exceptions may be thrown. The
iterators would take more memory because they would have to know their
begin and end positions, too.

Furthermore, I'm not convinced the average C++ programmer should be
expected to know the Unicode standard well enough to make twiddling
bits the primary mode of Unicode string manipulation.

Regards,
Rogier


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk