Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Patrick Horgan (phorgan1_at_[hidden])
Date: 2011-01-18 20:27:27


On 01/18/2011 04:39 PM, Chad Nelson wrote:
> ...elision by patrick...
> The conversion code in those classes does exactly that, and will (at
> the moment) throw an exception on any problem.
>
> It is, again at the moment, possible for a programmer to get invalid
> encodings into the utf*_t strings, but it shouldn't be possible to ever
> get them from the conversion functions. The unit tests that I wrote for
> it (not included in the package) deliberately tries to feed in invalid
> code, just to ensure that it's caught correctly.
It shouldn't be possible at all to have one with invalid encodings in
it. Is it that you don't check in the constructors to make sure that
the data passed in is valid for the encoding? I could just imagine
someone ending up with user data from a web page in one of these
strings. Could you get invalid data in there? If so, it's just a
matter of a clever person looking for an exploit. You don't want to go
passing around utf8_t strings that are invalid to trusting routines. If
you _are_ going to have these types their utility comes from being able
to trust that they are what they say they are. If you can have one that
isn't what it says it is you might as well just have std::string.

Patrick


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk