|
Boost : |
From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-10-20 02:50:37
Erik Wien wrote:
> The basic idea I have been working around, is to make a nencoded_string
> class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is
> made possible through a encoding_traits class which contains all nececcary
> implementation details for working on strings of code units.
>
> The outline of the encoding traits class looks something like this:
>
> template<typename encoding>
> struct encoding_traits
> {
> // Type definitions for code_units etc.
> // Is the encoding fixed width? (allows a good deal of iterator
> optimizations)
> // Algoritms for iterating forwards and backwards over code units.
> // Function for converting a series of code units to a unicode code
> point.
> // Any other operations that are encoding specific.
> }
Why do you need the traits, at compile-time?
- Why would the user want to change the encoding? Especially between
UTF-16 and UTF-32?
- Why would the user want to specify encoding at compile time? Are there
performance benefits to that? Basically, if we agree that UTF-32 is not
needed, then UTF-16 is the only encoding which does not require complex
handling. Maybe, for other encodings using virtual functions in character
iterator is OK? And if iterators have abstract characters" as value_type,
maybe the overhead if that is much large that virtual function call even
for UTF-16.
(As a side note, discussion about templated vs. non-templated interface
seems a reasonable addition to a thethis. It's sure thing that if anybody
wrote such a thethis in our lab, he would be asked to justify such a
global decisions).
- What if the user wants to specify encoding at run time? For example, XML
files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding if
XML document is 8-bit, and UTF-16 when it's Unicode.
- Volodya
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk