Boost logo

Boost :

From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-01-06 22:33:36


Beman Dawes wrote:

> > hm...why not remove the dependency of std::basic_string altogether and
> > make it a template parameter.
>
> Jonathan Turkanis' original comment was:
>
> > (One thing I don't understand is why the character type of wbuffer_convert
> > is allowed to be specified as the second template argument. It seems to
> > me that the character type should always be equal to Codevt::intern_type.)
>
> But I think that you are closer to the real problem with the
> proposal; the full string type rather than just the character type
> should be a template parameter. That allows any std::basic_string to
> be used.

I was talking about wbuffer_convert; at the time I hadn't looked at
wstring_convert very closely. Since then I started to factor the code conversion
routines out of the iostreams library to make them more useful for string
conversion. I haven't worked on it much since I finihsed the iostreams revision,
but I was leaning toward an interface someting like this for string conversion:

    template<typename Codecvt = use_default>
    struct string_converter { // Nice name ;-)
        // typedefs

        template<typename InIt, typename OutIt>
        OutIt narrow(InIt first, InIt last, OutIt dest);

        template<typename InIt, typename OutIt>
        OutIt widen(InIt first, InIt last, OutIt dest);

        // Convenience functions:

        template<typename WideStr> // Version of Thorsten's suggestion
        basic_string<typename Codecvt::extern_type>
        narrow(const WideStr&);

        template<typename NarrowStr> // Version of Thorsten's suggestion
        basic_string<typename Codecvt::intern_type>
        widen(const NarrowStr&);
    };

    // Convenience functions:

    template<typename InIt, typename OutIt>
    OutIt narrow(InIt first, InIt last, OutIt dest)
    {
          string_converter<> cvt;
          return cvt::narrow(first, last, dest);
    }

    template<typename InIt, typename OutIt>
    OutIt widen(InIt first, InIt last, OutIt dest)
    {
          string_converter<> cvt;
          return cvt::widen(first, last, dest);
    }

    template<typename WideStr>
    basic_string<typename Codecvt::extern_type>
    narrow(const WideStr& str)
    {
          string_converter<> cvt;
          return cvt::narrow(str);
    }

    template<typename NarrowStr>
    basic_string<typename Codecvt::intern_type>
    widen(const NarrowStr& str)
    {
          string_converter<> cvt;
          return cvt::widen(str);
    }

Remarks:

1. The names 'narrow' and 'wide' could be confused with the ctype members of the
same name, which do not perform code conversion, but I like them better than
'to_bytes' and 'from_bytes' (since extern_type may not represent a byte) and
'wide_to_multi_char' and 'multi_char_to_wide' (too long)

2. The narrow and widen overloads which take iterators have the same signature
as std::copy.

3. If no Codecvt template parameter is specified, an instance of
codecvt<wchar_t, char, mbstate_t> is fetched from the global locale. The
non-member versions of narrow and widen use this option.

4. Thorsten asks why the widening and narrowing functions shouldn't be
non-member functions. One answer is that code conversion can be (slightly) more
efficient if a large buffer is used. Making the core conversion functions member
functions allows buffers to be used for several string conversions. A second
answer is that it's a bit awkward to specify a codecvt in a non-member function:

       narrow< utf8_codecvt_facet<char_t> >
            (str.begin(), str.end(), back_inserter(dest));

or

       narrow( str.begin(), str.end(), back_inserter(dest),
           utf8_codecvt_facet<wchar_t>() );

When a non-default codecvt is being used, I think it's reasonable to ask people
to use a member function, the keep the non-member usage simple.

> --Beman

Jonathan


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk