Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-01-18 08:48:59


At Tue, 18 Jan 2011 13:27:29 +0200,
Peter Dimov wrote:
>
> Dave Abrahams wrote:
>
> > I think the reason to use separate types is to provide a type-safety
> > barrier between your functions that operate on utf-8 and system or
> > 3rd-party interfaces that don't or may not. In principle, that should
> > force you to think about encoding and decoding at all the places where
> > it may be needed, and should allow you to code naturally and with
> > confidence where everybody is operating in utf8-land.
>
> Yes, in principle. It isn't terribly necessary if everybody is
> operating in UTF-8 land though.

But they won't be. That's not today's reality.

> It's a bit like defining a separate integer type for nonnegative
> ints for type safety reasons - useful in theory, but nobody does it.

I refer you to Boost.Units

> If you're designing an interface that takes UTF-8 strings,

...as we are...

> it still may be worth it to have the parameters be of a
> utf8-specific type, if you want to force your users to think about
> the encoding of the argument each time they call one of your
> functions...

Or, you may want to use a UTF-8 specific type to force users of legacy
char* interfaces (and ourselves) to think about decoding each time
they call a legacy char* interfaces.

> this is a
> legitimate design decision. If you're in control of the whole program,
> though, it's usually not worth it - you just keep everything in UTF-8.

By definition, since we're library designers, we don't have said
control. And people *will* be using whatever Boost does with "legacy"
non-UTF-8 interfaces.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk