Boost logo

Boost :

Subject: Re: [boost] [strings][unicode] Proposals for Improved String Interoperability in a Unicode World
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2012-01-28 20:12:34


On 01/28/2012 05:46 PM, Beman Dawes wrote:
> Beman.github.com/string-interoperability/interop_white_paper.html
> describes Boost components intended to ease string interoperability in
> general and Unicode string interoperability in particular.
>
> These proposals are the Boost version of the TR2 proposals made in
> N3336, Adapting Standard Library Strings and I/O to a Unicode World.
> See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3336.html.
>
> I'm very interested in hearing comments about either the Boost or the
> TR2 proposal. Are these useful additions? Is there a better way to
> achieve the same easy interoperability goals?

I think you should consider the points being made in N3334.
While that proposal is in my opinion not good enough, it raises an
important issue that is often present with std::string-based or similar
designs.

A function that takes a std::string, or a boost::filesystem::path for
that matter, necessarily causes the callee to copy the data into a
heap-allocated buffer, even if there is no need to.

Use of the range concept would solve that issue, but then that requires
making the function a template. A type-erased range would be possible,
but that has significant performance overhead.
a string_ref or path_ref is maybe the lesser evil.

> Where is the best home for the Boost proposals? A separate library?
> Part of some existing library?
>
> Are these proposals orthogonal to the need for deeper Unicode
> functionality, such as Mathias Gaunard's Unicode components?

It seems all you really care about is having iterator adaptors that do
character set conversion, allowing to lazily convert any range of any
encoding to a particular Unicode encoding.
This has always been the goal of my library, which somewhat provides
that along with more advanced Unicode features. Those two things could
live separately though.

For standardization, the problem with iterator adaptors is that they
cannot be as fast as free functions operating on pointers, unless the
optimizer is pretty darn good. The conversion algorithms are also fully
template and cannot be put in the library binary.
Those are disadvantages compared to the mechanisms that exist today in
the standard.

By the way you only have input iterator adaptors. In my library I've
implemented bidirectional iterator adaptors and output iterator adaptors.
You've only been considering input, but output can also be useful
depending on the situation.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk