Boost logo

Boost :

From: Beman Dawes (bdawes_at_[hidden])
Date: 2005-01-05 11:02:25


Toward the end of a thread with the subject "std::string <-> std::wstring
conversion" there was some discussion of how the C++ committee N1683
proposal could be improved. I volunteered to write up our discussions.

See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html for
a copy of the proposal.

Here is a draft of what I have written so far. Comments and improvements
welcome.

--Beman

Critique of Code Conversion Proposal (N1683)
--------------------------------------------

N1683==04-0123, Proposed Library Additions for Code Conversion, proposes
sorely need code conversion facilities for the standard library. (See
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html) Without
these facilities programmers concerned with internationalization are forced
to reinvent the wheel; Boost has run into that problem two or three times
in existing libraries, and additional times in libraries currently in the
Boost pipeline. The proposal should be accepted by the LWG as a high
priority need.

That being said, there are several concerns described in this paper which
may indicate the proposal can be further refined and improved.

1. Hard-wired byte_string type in wstring_convert
-------------------------------------------------

The underlying wstring_convert design seems flexible enough to cope with
conversion between any two character types which meet std::basic_string
requirements. Conversion is actually performed by std::codecvt, which is
already parameterized by both internalT and externalT types. It seems
artificial to restrict wstring_convert::byte_string to
std::basic_string<char>. New character types such as the proposed char16_t
and char32_t will need conversions to and from other wide types, yet with
the current restriction wstring_convert could not be used for that purpose.

Suggested change: replace
     typedef std::basic_string<char> byte_string;
with:
     typedef std::basic_string<typename Codecvt::extern_type> byte_string;
and change from_bytes argument types accordingly.

If this suggested change is accepted, it will probably make sense to rename
some wstring_convert members.

2. wstring_convert template parameter Elem seems unneeded
---------------------------------------------------------

The wstring_convert template parameter Elem seems unneeded. Isn't it always
Codecvt::intern_type?

Suggested change: remove the Elem parameter and replace
     Elem
with
     Codecvt::intern_type

3. Need target-argument form for wstring_convert conversion functions
---------------------------------------------------------------------

wstring_convert's conversion functions are in the form:

     byte_string to_bytes(const wide_string& wstr) const;

While this form is often useful and should be retained, it may imply an
extra copy of the result if a compiler is not smart enough to optimize the
copy away.
Suggested change is to add additional functions in the form:

     void to_bytes(const wide_string& wstr, byte_string & target) const;

4. More explicit name for wstring_convert
-----------------------------------------

"wstring" might be misleading, depending on the actual types involved.
"convert" is a verb, yet nouns make better class names.

Suggested change:
     wstring_convert
to:
     string_converter

5. Standardese needed
---------------------

The proposal needs improved standardese. For example, the requirements on
the template parameters need to be specified and the function description
converted to canonical form.

6. Comparable changes need to be made for wbuffer_convert
---------------------------------------------------------

Any of the above changes which are accepted need to be folded into
wbuffer_convert.

Acknowledgements
----------------

This critique is based on discussions with Thorsten Ottosen, Stefan
Slapeta, and Jonathan Turkanis.

Revised: 05 January 2005


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk