Boost logo

Boost :

Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Joshua Boyce (raptorfactor_at_[hidden])
Date: 2011-01-15 07:14:02


On Sat, Jan 15, 2011 at 8:39 PM, Artyom <artyomtnk_at_[hidden]> wrote:

>
> Combining old libraries with new ones:
> ======================================
>
> It would be simple to combine a library that
> uses old policies with new ones.
>
> namespace boost {
> std::string utf8_to_ansi(std::string const &s);
> std::string ansi_to_utf8(std::string const &s);
> std::wstring utf8_to_wide(std::string const &s);
> std::string wide_to_utf8(std::wstring const &s);
> }
>
>
> - If it supports wide strings call boost::utf8_to_wide
> **under Windows platform** and nothing is lost.
>
> - If it supports only narrow strings:
>
> a) if it is encoding agnostic: like some unit-test
> that only open files named with ASCII names,
> then you can safely ignore and pass UTF-8 string
> as ASCII and ASCII as UTF-8 as is the subset of it.
>
> b) Do following:
>
> 1. Fill a bug to library owner on not-supporting
> Unicode strings under Windows.
>
> 2. Use utf8_to_ansi/ansi_to_utf8 to pass strings
> to this library under Windows.
>
>
> Current State of Using Wide/ANSI API in Boost:
> ==============================================
>
> I've did a small search to find which libraries use what API:
>
> Following use both types of API:
> -------------------------------
>
> thread
> asio
> system
> iostreams
> regex
> filesystem
>
>
> According to new policy they should replace
> ANSI api by wide api and conversion between UTF-8 and UTF-16
>
> Following libraries use only ANSI API
> --------------------------------------
>
> interprocess
> spirit
> test
> random
>
>
>
> The should replace their ANSI api by Wide one
> with a simple glue of utf8_to_wide/wide_to_utf8
>
> Following libraries use STL functions that are not aware of unicode under
> windows
>
> ---------------------------------------------------------------------------------
>
>
> std::fstream
>
> - Serialization
> - Graph
> - wave
> - datetime
> - property_tree
> - progam_options
>
>
> fopen
>
> - gil
> - spirit
> - python
> - regex
>
>
>
> Need to replace with something like:
>
> boost::fstream
>
> and
>
> boost::fopen
>
> that work with UTF-8 under windows.
>
>
> The rest of the libraries seems to be encoding agnostic.
>
> Artyom
>
>
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

boost::filesystem::fstream uses a wide string under Windows afaik (assuming
it can detect that you're using an STL implementation which has wide-string
overloads -- aka Dinkumware). However there's still the problem that if
you're using MinGW (or some other non-MSVC toolset that doesn't use a recent
Dinkumware STL implementation) then it will drop back to a narrow string and
we're back where we started again...


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk