Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Chad Nelson (chad.thecomfychair_at_[hidden])
Date: 2011-01-18 21:07:39
On Tue, 18 Jan 2011 17:27:27 -0800
Patrick Horgan <phorgan1_at_[hidden]> wrote:
>On 01/18/2011 04:39 PM, Chad Nelson wrote:
>> It is, again at the moment, possible for a programmer to get invalid
>> encodings into the utf*_t strings, but it shouldn't be possible to
>> ever get them from the conversion functions. The unit tests that I
>> wrote for it (not included in the package) deliberately tries to
>> feed in invalid code, just to ensure that it's caught correctly.
> It shouldn't be possible at all to have one with invalid encodings in
> it. Is it that you don't check in the constructors to make sure that
> the data passed in is valid for the encoding?
In the present incarnation, it's that the code using the classes can
directly manipulate the internal storage if it wants to. For the
purpose I designed those classes (use within my company), that's not a
problem, but I'll certainly change it before offering it up for
dissection by bloodthirsty Boost reviewers. ;-)
> I could just imagine someone ending up with user data from a web page
> in one of these strings. Could you get invalid data in there?
Only if the program blindly puts it there -- a problem that our
code-review system should prevent. In the hypothetical Boost version,
you'd *have* to feed it into the class through something like the
utf8_t::precoded function, and that function would confirm that it's
all correct before allowing it in.
> If so, it's just a matter of a clever person looking for an exploit.
> You don't want to go passing around utf8_t strings that are invalid
> to trusting routines. If you _are_ going to have these types their
> utility comes from being able to trust that they are what they say
> they are. If you can have one that isn't what it says it is you
> might as well just have std::string.
A valid point, and one I'll keep in mind for the next iteration of
-- Chad Nelson Oak Circle Software, Inc. * * *
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk