Boost logo

Boost :

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Soares Chen Ruo Fei (crf_at_[hidden])
Date: 2011-08-09 06:45:03


2011/8/9 Klaim - Joël Lamotte wrote:
> I'm reading the documentation and I must say it's very clear and easy to
> understand (at least for someone who did follow the recent discussions about
> the subject on this mailing list).

Thanks for the feedback. This brings encouragement and motivation for
me to continue improving this library. :)

> Minor error : missing '>' in
> http://crf.scriptmatrix.net/ustr/ustr/unicode_string_adapter.html (String
> Concatenation)
>
> unicode_string_adapter< std::vector< char16_t > second_string = USTR("你好");

Ahh I see I missed that. Thanks for noticing it, I've updated it on my website.

> Also, in the same section, maybe adding informations about potential stream
> operators (I don't see them so far) might help?

Currently Boost.Ustr provides limited support towards I/O. Due to
portability issues I think it is very hard for Boost.Ustr alone to
solve problems such as printing Unicode strings to the screen. In
fact, I have no idea on how to print even raw Unicode strings onto the
Windows terminal. (Any Windows expert knows how to solve this?)

My current solution is to rely on the raw string class to provide the
actual I/O operations. So for example it is possible to print a
`unicode_string_adapter<std::string>` by passing the const reference
of the raw string to std::cout via operator *(). I'd also implemented
a convenient function that automatically exposes the internal raw
string when the string adapter is passed to std::cout through operator
<<().

On the other hand I haven't considered much for input stream.
`unicode_string_adapter_builder` already has code point and code unit
output iterators so I think it shouldn't be too hard to perform input
stream operation through these output iterators. Though I didn't
provide operator *() for the mutable builder class as I think exposing
the raw string class in the string adapter builder would re-enable
read operations on the mutable string.
(`unicode_string_adapter_builder` purposely forbids read operations to
discourage programmers from reading and writing strings at the same
time)

My conclusion is that instead of trying to make
`unicode_string_adapter` to work with the old iostream libraries, we
could instead leverage on the encoding-agnostic-API advantage of
Boost.Ustr to implement a truly portable I/O library for Unicode
strings. For example, a `print(str)` function that accepts a
`unicode_string_adapter` or a `scan(mstr)` function that accepts a
`unicode_string_adapter_builder` can correctly print or scan Unicode
strings of any encoding regardless of the actual encoding that the
system use. (No more pain of choosing whether to use wide or non-wide
version of the functions)

That said, I'd like to disclaim that I am not familiar with the C++
iostream library as I feel that the design is too complex and I
personally don't like that design. My bias might be wrong and I'm
willing to add more functionality into Boost.Ustr to work with
iostream if there are actually simple ways to do it.

> if you have the chance
>> to go back in time and restart your project from scratch, are you
>> willing to use Boost.Ustr in your library APIs and do you think that
>> it could have solved the Unicode problems you're having right now?
>
>
> From the documentation, that library seems to solve the problem as  by
> several boost authors. However, real-world usage and experimentation would
> help figuring out if there is any important flaw in the design.
>
> As a boost user, I find it interesting even for my non-library projects, but
> I wouldn't expect all boost libraries to expose encoded strings in their
> interfaces. Maybe just Boost.Locale and Boost.FileSystem.
>
> If I can find time in the coming weeks, I'll try to use it in a prototype
> and provide feedbacks.

Thanks. I'm looking forward to hear your feedback. Btw do feel free to
email me if you face any problem compiling the code, as I have not yet
throughoutly tested it on all platform, though it should work on at
least Ubuntu Linux, Mac OS X, and Windows 7. You might also want to
add the C++11 flag in your bjam command line option as that is not
enabled by default.

cheers,

Soares


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk