Boost logo

Boost :

From: Tobias Schwinger (tschwinger_at_[hidden])
Date: 2007-09-24 14:06:33


Sebastian Redl wrote:
> Phil Endecott wrote:
>> Dear All,
>>
>> Something that I have been thinking about for a while is storing
>> strings tagged with their character set. Since I now have a practical
>> need for this I plan to try to implement something. Your feedback
>> would be appreciated.
>>
> Hi,
>
> I've played around with this concept a lot already. I basically think
> that encoding-bound strings are a MUST for proper, safe,
> internationalized string handling. Everything else, in particular the
> current situation, is a mess.
>
> If you want, I can package up what I've done so far (not really much,
> but a lot of comments containing concepts) and put it somewhere.
>
> One thing: I think runtime-tagged strings are useless. Programming
> should happen with one or at most two fixed encodings, known at compile
> time. Because of the differences in behaviour in encodings (base unit 8,
> 16 or 32 bits, or 8 with various endians, fixed-length encodings vs
> variable-length encodings, ...), it is not good to write a type handling
> them all at runtime. I think that runtime-specified string conversion
> should be an I/O question. In other words, when character data enters
> your program, you convert it to the encoding you use internally, when it
> leaves the program, you convert it to an external encoding. In-between,
> you use whatever your program uses, and you specify it at compile time.

Well, having I/O facilities provide the only means for converting
strings of different encodings would make using compiled libraries that
use a different string encoding than my program pretty awkward, wouldn't it?

I agree that the "runtime tagging suggestion" seems overkill.

Maybe providing lazily evaluated, possibly cached, compile- and runtime
"string views" is a good idea, however (and might probably give a nice
framework to implement encoding conversions as well).

Examples:

     // ...given some strings a,b, and c
     string<utf8> s = a + b + ":" + c;
     // can get away with exactly one allocation since operator+ can
     // return a compile-time string view

--
     string<utf8> s = "world";
     string_view<utf8> v = "Hello " + s + "!";
     std::cout << v << std::endl;
     s = "you";
     std::cout << v << std::endl;
     // Output:
     // Hello world!
     // Hello you!
     // For a more "real-world" use case of runtime string_views
     // consider a lexer taking apart an in-memory file, with SBO
     // applied to the string_view template...
Regards,
Tobias

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk