|
Boost : |
From: Tobias Schwinger (tschwinger_at_[hidden])
Date: 2007-09-24 14:06:33
Sebastian Redl wrote:
> Phil Endecott wrote:
>> Dear All,
>>
>> Something that I have been thinking about for a while is storing
>> strings tagged with their character set. Since I now have a practical
>> need for this I plan to try to implement something. Your feedback
>> would be appreciated.
>>
> Hi,
>
> I've played around with this concept a lot already. I basically think
> that encoding-bound strings are a MUST for proper, safe,
> internationalized string handling. Everything else, in particular the
> current situation, is a mess.
>
> If you want, I can package up what I've done so far (not really much,
> but a lot of comments containing concepts) and put it somewhere.
>
> One thing: I think runtime-tagged strings are useless. Programming
> should happen with one or at most two fixed encodings, known at compile
> time. Because of the differences in behaviour in encodings (base unit 8,
> 16 or 32 bits, or 8 with various endians, fixed-length encodings vs
> variable-length encodings, ...), it is not good to write a type handling
> them all at runtime. I think that runtime-specified string conversion
> should be an I/O question. In other words, when character data enters
> your program, you convert it to the encoding you use internally, when it
> leaves the program, you convert it to an external encoding. In-between,
> you use whatever your program uses, and you specify it at compile time.
Well, having I/O facilities provide the only means for converting
strings of different encodings would make using compiled libraries that
use a different string encoding than my program pretty awkward, wouldn't it?
I agree that the "runtime tagging suggestion" seems overkill.
Maybe providing lazily evaluated, possibly cached, compile- and runtime
"string views" is a good idea, however (and might probably give a nice
framework to implement encoding conversions as well).
Examples:
// ...given some strings a,b, and c
string<utf8> s = a + b + ":" + c;
// can get away with exactly one allocation since operator+ can
// return a compile-time string view
-- string<utf8> s = "world"; string_view<utf8> v = "Hello " + s + "!"; std::cout << v << std::endl; s = "you"; std::cout << v << std::endl; // Output: // Hello world! // Hello you! // For a more "real-world" use case of runtime string_views // consider a lexer taking apart an in-memory file, with SBO // applied to the string_view template... Regards, Tobias
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk