Boost logo

Boost :

Subject: Re: [boost] [gsoc] unicode tools and an unicode string type
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-03-30 10:30:34


Beman Dawes wrote:
> On Sun, Mar 29, 2009 at 9:40 PM, Mathias Gaunard

>> - iterator adaptors to iterate sequences of code units, code points and
>> graphemes, and eventually more, from a sequence in UTF-8, UTF-16, UCS-2 or
>> UTF-32/UCS-4.
>
> What about conversion algorithms to conveniently generate these
> sequences in the first place?

I am not really interested in supporting arbitrary conversions between
charsets, since that is mostly about writing big charset-specific
look-up tables.
This could be done by a separate library.

Conversions between the different Unicode encodings as well as from
charsets that are included verbatim into Unicode (such as ISO-8859-1)
should probably be allowed, however.

>>From prior discussions, it seemed to me that there were actually needs
> for several unicode string types.
>
> * Specific UTF-8, UTF-16, UTF-*, string classes to be used within an
> application, when a particular Unicode string type and internal
> representation is the optimal choice.
>
> * A single utf_string that varies its internal representation at
> run-time. This is the choice for communication between third parties
> where not enough is known about the applications to choose a
> particular internal representation, or within an application when the
> application must cope with runtime changing needs..

Since I was seeing integration with existing string types as important,
I was thinking of actually templating on the underlying string type.
unicode_string<std::string>, for example. The underlying value type of
the string type gives the type of encoding used (here, UTF-8).

Different levels of type erasure could maybe be used to either forget
the underlying string type or the underlying encoding.

This still requires some thought, obviously. Any ideas on how it should
be done welcome.

> While this is an interesting proposal, it appears to me to be several
> years worth of work. How would you structure the first summer's work?
> Would you aim at breadth (a prototype covering the whole) or depth
> (production quality work that concentrates on one aspect)?

I suppose breadth. Quality could be increased after the SoC. I'm more
interested in using the time where I am mentored to come up with
interesting and practical designs that would pass a review.

Also, I'm quite the regular on this list, so it's not like I'll
disappear once the SoC is done.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk