|
Boost : |
From: Jonathan Turkanis (technews_at_[hidden])
Date: 2005-01-06 22:33:36
Beman Dawes wrote:
> > hm...why not remove the dependency of std::basic_string altogether and
> > make it a template parameter.
>
> Jonathan Turkanis' original comment was:
>
> > (One thing I don't understand is why the character type of wbuffer_convert
> > is allowed to be specified as the second template argument. It seems to
> > me that the character type should always be equal to Codevt::intern_type.)
>
> But I think that you are closer to the real problem with the
> proposal; the full string type rather than just the character type
> should be a template parameter. That allows any std::basic_string to
> be used.
I was talking about wbuffer_convert; at the time I hadn't looked at
wstring_convert very closely. Since then I started to factor the code conversion
routines out of the iostreams library to make them more useful for string
conversion. I haven't worked on it much since I finihsed the iostreams revision,
but I was leaning toward an interface someting like this for string conversion:
template<typename Codecvt = use_default>
struct string_converter { // Nice name ;-)
// typedefs
template<typename InIt, typename OutIt>
OutIt narrow(InIt first, InIt last, OutIt dest);
template<typename InIt, typename OutIt>
OutIt widen(InIt first, InIt last, OutIt dest);
// Convenience functions:
template<typename WideStr> // Version of Thorsten's suggestion
basic_string<typename Codecvt::extern_type>
narrow(const WideStr&);
template<typename NarrowStr> // Version of Thorsten's suggestion
basic_string<typename Codecvt::intern_type>
widen(const NarrowStr&);
};
// Convenience functions:
template<typename InIt, typename OutIt>
OutIt narrow(InIt first, InIt last, OutIt dest)
{
string_converter<> cvt;
return cvt::narrow(first, last, dest);
}
template<typename InIt, typename OutIt>
OutIt widen(InIt first, InIt last, OutIt dest)
{
string_converter<> cvt;
return cvt::widen(first, last, dest);
}
template<typename WideStr>
basic_string<typename Codecvt::extern_type>
narrow(const WideStr& str)
{
string_converter<> cvt;
return cvt::narrow(str);
}
template<typename NarrowStr>
basic_string<typename Codecvt::intern_type>
widen(const NarrowStr& str)
{
string_converter<> cvt;
return cvt::widen(str);
}
Remarks:
1. The names 'narrow' and 'wide' could be confused with the ctype members of the
same name, which do not perform code conversion, but I like them better than
'to_bytes' and 'from_bytes' (since extern_type may not represent a byte) and
'wide_to_multi_char' and 'multi_char_to_wide' (too long)
2. The narrow and widen overloads which take iterators have the same signature
as std::copy.
3. If no Codecvt template parameter is specified, an instance of
codecvt<wchar_t, char, mbstate_t> is fetched from the global locale. The
non-member versions of narrow and widen use this option.
4. Thorsten asks why the widening and narrowing functions shouldn't be
non-member functions. One answer is that code conversion can be (slightly) more
efficient if a large buffer is used. Making the core conversion functions member
functions allows buffers to be used for several string conversions. A second
answer is that it's a bit awkward to specify a codecvt in a non-member function:
narrow< utf8_codecvt_facet<char_t> >
(str.begin(), str.end(), back_inserter(dest));
or
narrow( str.begin(), str.end(), back_inserter(dest),
utf8_codecvt_facet<wchar_t>() );
When a non-default codecvt is being used, I think it's reasonable to ask people
to use a member function, the keep the non-member usage simple.
> --Beman
Jonathan
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk