Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-15 04:39:02

> From: Dave Abrahams <dave_at_[hidden]>
> Peter Dimov wrote:
> >
> > Alexander Lamaison wrote:
> > > I'm opposed to this strategy simply because it differs from the way
> > > existing libraries treat narrow strings.
> >
> > It differs from them because it's right, and existing libraries are
> > wrong. Unfortunately, they'll continue being wrong for a long time,
> > because of this same argument.
> Does the "right" strategy come with some policies/practices that can
> allow it to coexist with the existing "wrong" libraries? If so, I'm
> all +1 on it.

Combining old libraries with new ones:

It would be simple to combine a library that
uses old policies with new ones.

namespace boost {
   std::string utf8_to_ansi(std::string const &s);
   std::string ansi_to_utf8(std::string const &s);
   std::wstring utf8_to_wide(std::string const &s);
   std::string wide_to_utf8(std::wstring const &s);

- If it supports wide strings call boost::utf8_to_wide
  **under Windows platform** and nothing is lost.

- If it supports only narrow strings:

  a) if it is encoding agnostic: like some unit-test
     that only open files named with ASCII names,
     then you can safely ignore and pass UTF-8 string
     as ASCII and ASCII as UTF-8 as is the subset of it.

  b) Do following:

     1. Fill a bug to library owner on not-supporting
        Unicode strings under Windows.

     2. Use utf8_to_ansi/ansi_to_utf8 to pass strings
        to this library under Windows.

Current State of Using Wide/ANSI API in Boost:

I've did a small search to find which libraries use what API:

Following use both types of API:


According to new policy they should replace
ANSI api by wide api and conversion between UTF-8 and UTF-16

Following libraries use only ANSI API


The should replace their ANSI api by Wide one
with a simple glue of utf8_to_wide/wide_to_utf8

Following libraries use STL functions that are not aware of unicode under


- Serialization
- Graph
- wave
- datetime
- property_tree
- progam_options


- gil
- spirit
- python
- regex

Need to replace with something like:




that work with UTF-8 under windows.

The rest of the libraries seems to be encoding agnostic.



Boost list run by bdawes at, gregod at, cpdaniel at, john at