Boost logo

Boost :

Subject: Re: [boost] [unicode] Interest Check / Proof of Concept
From: James Porter (porterj_at_[hidden])
Date: 2008-11-19 19:29:38


Eric Niebler wrote:
> Agree. Thanks Zach. I'm discouraged that every time the issue of a
> Unicode library comes up, the discussion immediately descends into a
> debate about how to design yet another string class. Such a high level
> wrapper *might* be useful (strong emphasis on "might"), but the core
> must be the Unicode algorithms, and the design for a Unicode library
> must start there.

Since it seems like there's a lot of concern with making a new string
type, how about the following (off-the-cuff):

* Iterator filters a la Zach's message:

        typedef std::basic_string<char16_t> utf16_string;

        utf16_string u_string = /*...*/;
        std::string std_string = /*...*/;

        typedef boost::recoding_iterator<boost::utf16, boost::utf8>
                utf16_to_utf8_iter;
        std::copy(utf16_to_utf8_iter(u_string.begin()),
                utf16_to_utf8_iter(u_string.end()),
                std::back_inserter(std_string));

* Runtime-defined filters:

        typedef boost::recoding_iterator<boost::utf16,boost::runtime>
                utf16_to_any_iter;
        boost::runtime *my_codec = /*...*/;
        std::copy(utf16_to_utf8_iter(u_string.begin(), my_codec),
                utf16_to_utf8_iter(u_string.end(), my_codec),
                std::back_inserter(std_string));

* Shorthand for the above two points:

        boost::transcode(u_string, boost::utf16(),
                std_string, boost::utf8());

* String views that can wrap up the encoding type and the data (a
container of some kind: strings, vector<char>s, ropes, etc):

        boost::estring_view<utf8> my_utf8_string(std_string);
        boost::estring_view<> my_rt_string(str, my_codec);

        boost::transcode(my_utf8_string, my_rt_string);

Luckily, most of the work I've done is in making the encoding facets
extensible and chooseable at runtime, so I wouldn't mourn the loss of my
(frankly none-too-zazzy) string class.

- Jim


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk