Boost logo

Boost :

From: Matt Austern (austern_at_[hidden])
Date: 2001-09-26 12:29:57

dietmar_kuehl_at_[hidden] wrote:
> What do other LWG members think of Unicode characters? Do we need
> another character type, eg. with character and string literals
> introduced by an 'U' like in U'B' or U"Boost"? We should discuss
> this with Core. ...or just 'ucchar_t' support, wide enought to hold
> Unicode ie. what 'wchar_t' was intended for and leave those stuck
> with a 16 bit 'wchar_t' in the rain? Is there anybody who really´
> wants character or string literals not fitting into 16 bits?

The discussion was about a proposal (originally made to the C
committee, forwarded to C++) to add support for utf-8 encodings.
The proposal was fairly short, and included core language changes
that would create utf-8 string literals.

I didn't sense a lot of enthusiasm for it. I wasn't all that
enthusiastic about it myself: I didn't think it was all that well
thought out, and I didn't think that utf-8 string literatls would
be all that useful by themself. I was also unhappy about the idea
of introducing more and more special cases for different kinds of
string literals, instead of providing something user-extensible.

I do think that some kind of core language changes might be useful.
The Xalan people report that the inability to write Unicode
string literals in any convenient way is a nuisance. They've had
to resort to monstrosities like this:
  WideChar xmltag[] = { wideX, wideM, wideL, wideNull };
(I've made up the type and constant names, because I'm too lazy to
look up the exact details. I think they use UCS-4 internally.)

Incidentally, I'd recommend checking out GNOME libxml. It's a
C interface, it's poorly documented, and in some ways it's not as
convenient as it ought to be, but I found it faster and more
convenient than the other XML libraries I've tried. They use
utf-8 internally.


Boost list run by bdawes at, gregod at, cpdaniel at, john at