Boost logo

Boost :

Subject: Re: [boost] [unicode] Interest Check / Proof of Concept
From: Andrew Sutton (andrew.n.sutton_at_[hidden])
Date: 2008-11-19 09:33:05


>
> There's still a lot missing from the code (most notably, dynamically-sized
> strings and string concatenation), but here's a rundown of what *is*
> present:
>
> * Compile-time and run-time tagged strings
> * Re-encoding of strings based on compile-/run-time tags
> * Uses simple memory copying when source and dest encodings are the same
> * Forward iterators to step through code points in strings
>
> If you'd like to take a look at the code, it's available here:
> http://www.teamboxel.com/misc/unicode.tar.gz . I've tested it in gcc 4.3.2
> and MSVC8, but most modern compilers should be able to handle it. Comments
> and criticisms are, of course, welcome.
>

I think it looks like a good start. I'm getting a warning about a
string->wchar_t conversion.

Just a couple comments/questions...
- I don't think the global rt encoding objects are the way to go. I would
just each each string object declare the encoding object either as a member
variable or as needed inside a member function. Since they don't have any
member variables, the cost is negligible.
- Would it be possible to merge the ct/rt classes into a single type?
- Maybe encode/decode should be free functions - algorithm like.

You might have something like:

estring<> s= ...; // Create an encodeable string with some default encoding
(ascii?)
encode(s, utf8()); // utf8 is a functor object that returns a utf8_encoder
object.

I guess if you go this way, the estring class would just contain an encoded
string associated with the encoder type. It might be an interesting
approach. Still. A good start.

Andrew Sutton
andrew.n.sutton_at_[hidden]


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk