Boost logo

Boost :

Subject: Re: [boost] [rfc] Unicode GSoC project
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-05-14 18:28:52


Eric Niebler wrote:
> Mathias Gaunard wrote:

> Also needed are tables that store the
> various character properties, and (hopefully) some parsers that build
> the tables directly from the Unicode character database so we can easily
> rev it whenever the database changes.

For the record, I have scripts that can generate ISO-8859-* to/from
unicode tables from the downloaded data; I'll happily contribute this
if it is useful to anyone.

> The library provides the following core types in the boost namespace:
>
> uchar8_t
> uchar16_t
> uchar32_t
>
> In C++0x, these are called char, char16_t and char32_t.

I liked that idea of making them obviously-unsigned; I had some nasty
bugs with my UTF-8 code where I made invalid assumptions about signs.
But of course being consistent with C++0x is more important.

> I strongly disagree with requiring normalization form C for the concept
> UnicodeRange. There are many more valid Unicode sequences.

Agreed.

> the concrete algorithms must come first.

Agreed. Mathias, I would love to see a sort of "end user perspective"
view of how this library will be used, i.e. its scope and basic usage pattern.

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk