Boost logo

Boost :

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2007-09-24 06:54:33


Phil Endecott wrote:
> Dear All,
>
> Something that I have been thinking about for a while is storing
> strings tagged with their character set. Since I now have a practical
> need for this I plan to try to implement something. Your feedback
> would be appreciated.
>
Hi,

I've played around with this concept a lot already. I basically think
that encoding-bound strings are a MUST for proper, safe,
internationalized string handling. Everything else, in particular the
current situation, is a mess.

If you want, I can package up what I've done so far (not really much,
but a lot of comments containing concepts) and put it somewhere.

One thing: I think runtime-tagged strings are useless. Programming
should happen with one or at most two fixed encodings, known at compile
time. Because of the differences in behaviour in encodings (base unit 8,
16 or 32 bits, or 8 with various endians, fixed-length encodings vs
variable-length encodings, ...), it is not good to write a type handling
them all at runtime. I think that runtime-specified string conversion
should be an I/O question. In other words, when character data enters
your program, you convert it to the encoding you use internally, when it
leaves the program, you convert it to an external encoding. In-between,
you use whatever your program uses, and you specify it at compile time.

I'd be willing to cooperate on this project, too. I'm mostly busy with
my new I/O stuff, but the tagged strings form the foundation of the text
I/O part, so I need the character library sooner or later anyway.

Sebastian Redl


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk