Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Christian Holmquist (c.holmquist_at_[hidden])
Date: 2011-01-18 15:50:51


>
> There are two ways this could go AFAICS:
>
> 1. We just use std::string for UTF-8 and eventually the whole world
> will catch up
>
This would be nice.

> 2. We establish some other type for UTF-8 and *it* becomes the lingua
> franca
>
> If Boost abandons std::string in interfaces that expects UTF-8,
does that mean I as a user need to sprinkle
boost::to_utf_8(my_std_string,...) // in whatever form to_utf8 may be
all over my/ours (quite gigantic) code base?
Without doing so, I assume will cause compilation errors, but for what gain?
If some code was broken before, it will remain so after I've injected all
those to_utf8 calls as well.
To solve actual problems I need to track the origin of my std::string's
content, which require a traditional bug-hunting session anyway.
No additional typed interface in the world will help me here IMO.

Aren't things still enough of a mess out there that #2 is just as
> likely to work well?
> --
>

"Just as likely to work well" doesn't sound good enough for me, from a
maintenance point of view. I can picture how the changeset looks on the poor
branch that decides to upgrade to such a version of boost.
The problem isn't the type, but the content.

There are algorithms in stl that have requirements on their input (sorted,
usually), why is this different?
I'm sure it wouldn't be supported with an introduction of
sorted_value_input_iterator that I can pass to std::set_xxx functions. (?).

What would be helpful if doable, is to build boost with
BOOST_TRACK_INVALID_UTF_8, also for release builds.
This would cause an exception or a call to user-defined function if boost
code stumbles upon bad strings.

- Christian


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk