Boost :

Date view	Thread view	Subject view	Author view

Subject: [boost] [unicode] Interest Check / Proof of Concept
From: James Porter (porterj_at_[hidden])
Date: 2008-11-18 23:38:16

Next message: Bjørn Roald: "Re: [boost] BOOST_FOREACH slow?"
Previous message: Kenneth Laskoski: "[boost] namespace noncopyable_"
Next in thread: Andrew Sutton: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Reply: Andrew Sutton: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Reply: Zach Laine: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Reply: Phil Endecott: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Maybe reply: Kasra: "Re: [boost] [unicode] Interest Check / Proof of Concept"

Over the past few months, I've been tinkering with a Unicode string
library. It's still *far* from finished, but it's far enough along that
the overall structure is visible. I've seen a bunch of Unicode proposals
for Boost come and go, so hopefully this one will address the most
common needs people have.

The library is based on two (immutable) string types: ct_string and
rt_string. ct_strings are _C_ompile _T_ime tagged with a particular
encoding, and rt_strings are _R_un _T_ime tagged with an encoding. This
is to allow for faster conversion when the encoding is known at
compile-time, but to allow for conversion at run-time (useful for
reading XML!).

General usage would look something like this:

ct_string<ct::utf8> foo("Hello, world!");

ct_string<ct::utf16> bar;
bar.encode(foo);

rt_string baz;
baz.encode(bar,rt::utf8);

Note the use of ct::utf8 and rt::utf8. As you might expect from the
syntax, ct::utf8 is a type, and rt::utf8 is an object. Broadly speaking,
to create an encoding, you create a class with read and write methods,
and then you create an instance of an rt_encoding<MyEncoding>. Most of
this is laid out in the comments of my code, so I won't go into too much
detail here.

There's still a lot missing from the code (most notably,
dynamically-sized strings and string concatenation), but here's a
rundown of what *is* present:

* Compile-time and run-time tagged strings
* Re-encoding of strings based on compile-/run-time tags
* Uses simple memory copying when source and dest encodings are the same
* Forward iterators to step through code points in strings

If you'd like to take a look at the code, it's available here:
http://www.teamboxel.com/misc/unicode.tar.gz . I've tested it in gcc
4.3.2 and MSVC8, but most modern compilers should be able to handle it.
Comments and criticisms are, of course, welcome.

- Jim

Next message: Bjørn Roald: "Re: [boost] BOOST_FOREACH slow?"
Previous message: Kenneth Laskoski: "[boost] namespace noncopyable_"
Next in thread: Andrew Sutton: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Reply: Andrew Sutton: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Reply: Zach Laine: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Reply: Phil Endecott: "Re: [boost] [unicode] Interest Check / Proof of Concept"
Maybe reply: Kasra: "Re: [boost] [unicode] Interest Check / Proof of Concept"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk