|
Boost : |
Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-01-18 08:48:59
At Tue, 18 Jan 2011 13:27:29 +0200,
Peter Dimov wrote:
>
> Dave Abrahams wrote:
>
> > I think the reason to use separate types is to provide a type-safety
> > barrier between your functions that operate on utf-8 and system or
> > 3rd-party interfaces that don't or may not. In principle, that should
> > force you to think about encoding and decoding at all the places where
> > it may be needed, and should allow you to code naturally and with
> > confidence where everybody is operating in utf8-land.
>
> Yes, in principle. It isn't terribly necessary if everybody is
> operating in UTF-8 land though.
But they won't be. That's not today's reality.
> It's a bit like defining a separate integer type for nonnegative
> ints for type safety reasons - useful in theory, but nobody does it.
I refer you to Boost.Units
> If you're designing an interface that takes UTF-8 strings,
...as we are...
> it still may be worth it to have the parameters be of a
> utf8-specific type, if you want to force your users to think about
> the encoding of the argument each time they call one of your
> functions...
Or, you may want to use a UTF-8 specific type to force users of legacy
char* interfaces (and ourselves) to think about decoding each time
they call a legacy char* interfaces.
> this is a
> legitimate design decision. If you're in control of the whole program,
> though, it's usually not worth it - you just keep everything in UTF-8.
By definition, since we're library designers, we don't have said
control. And people *will* be using whatever Boost does with "legacy"
non-UTF-8 interfaces.
-- Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk