Boost logo

Boost :

Subject: [boost] [unicode] Interest Check / Proof of Concept
From: James Porter (porterj_at_[hidden])
Date: 2008-11-18 23:38:16

Over the past few months, I've been tinkering with a Unicode string
library. It's still *far* from finished, but it's far enough along that
the overall structure is visible. I've seen a bunch of Unicode proposals
for Boost come and go, so hopefully this one will address the most
common needs people have.

The library is based on two (immutable) string types: ct_string and
rt_string. ct_strings are _C_ompile _T_ime tagged with a particular
encoding, and rt_strings are _R_un _T_ime tagged with an encoding. This
is to allow for faster conversion when the encoding is known at
compile-time, but to allow for conversion at run-time (useful for
reading XML!).

General usage would look something like this:

        ct_string<ct::utf8> foo("Hello, world!");

        ct_string<ct::utf16> bar;

        rt_string baz;

Note the use of ct::utf8 and rt::utf8. As you might expect from the
syntax, ct::utf8 is a type, and rt::utf8 is an object. Broadly speaking,
  to create an encoding, you create a class with read and write methods,
and then you create an instance of an rt_encoding<MyEncoding>. Most of
this is laid out in the comments of my code, so I won't go into too much
detail here.

There's still a lot missing from the code (most notably,
dynamically-sized strings and string concatenation), but here's a
rundown of what *is* present:

* Compile-time and run-time tagged strings
* Re-encoding of strings based on compile-/run-time tags
* Uses simple memory copying when source and dest encodings are the same
* Forward iterators to step through code points in strings

If you'd like to take a look at the code, it's available here: . I've tested it in gcc
4.3.2 and MSVC8, but most modern compilers should be able to handle it.
Comments and criticisms are, of course, welcome.

- Jim

Boost list run by bdawes at, gregod at, cpdaniel at, john at