Boost logo

Boost :

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2005-07-23 08:01:46


Hello Graham,

There was a student project aiming to produce a Unicode library, but I
didn't hear anything of it after the thread in
http://lists.boost.org/boost/2005/03/22580.php

There are loads of comments and ideas in that thread. Everyone wants a
Unicode library, but no-one seems to have enough time to write it
well. I again have been playing with the idea of trying to write a
library over the past few weeks.

You seem to be quite well versed in Unicode. My (hopefully
constructive) comments on your post:
First, are WORD and DWORD the Windows equivalents of uint16_t and
uint32_t, respectively?
I think the C++ way would be to ultimately leave the choice of
encoding to the user through a template parameter. This would, I
guess, do away with the assign* and insert* methods for various
encodings.
I think the normalisation form should be an invariant of the string as
well (and a template parameter). This makes it possible to implement
operator== and operator< as binary comparisons of codepoints, so that
they will be relatively fast (more so for UTF-8 and UTF-32 than for
UTF-16). People will surely want to use the string as a key for
std::map's, for example. Other more expensive collation methods
(including localised ones) could be implemented by different classes.
As far as the iterators are concerned, I believe the standard Unicode
string should contain grapheme clusters, and thus its iterator should
have this beast as its value_type (I would call it "character" because
as far as the Unicode standard and combining characters are concerned,
C++ programmers in general are "users", and grapheme clusters is what
they think of as characters).

Hope this helps.
Rogier


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk