Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Dave Abrahams (dave_at_[hidden])
Date: 2011-01-18 08:37:23


At Mon, 17 Jan 2011 21:46:36 -0800,
Emil Dotchevski wrote:
>
> > I think the reason to use separate types is to provide a type-safety
> > barrier between your functions that operate on utf-8 and system or
> > 3rd-party interfaces that don't or may not.  In principle, that should
> > force you to think about encoding and decoding at all the places where
> > it may be needed, and should allow you to code naturally and with
> > confidence where everybody is operating in utf8-land.  The typical
> > failures I've seen, where there is no such mechanism (e.g. in Python
> > where there's no static typing), are caused because programmers lose
> > track of whether what they're handling is encoded as utf-8 or not.
>
> UTF-8 allows the use of char * for type erasure for strings, much like
> void * allows that in general.

Yes, that's exactly my point, although this isn't a property of UTF-8;
it's a more general thing. In a dynamic language like Python
everything is type-erased.

> Using C++ type tags to discriminate
> between different data pointed by void pointers is mostly redundant

Exactly. I'm suggesting, essentially, to avoid the use of void
pointers except where you're forced to, at the boundaries with
"legacy" interfaces.

-- 
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk