Boost logo

Boost :

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-10-20 02:50:37


Erik Wien wrote:

> The basic idea I have been working around, is to make a nencoded_string
> class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is
> made possible through a encoding_traits class which contains all nececcary
> implementation details for working on strings of code units.
>
> The outline of the encoding traits class looks something like this:
>
> template<typename encoding>
> struct encoding_traits
> {
> // Type definitions for code_units etc.
> // Is the encoding fixed width? (allows a good deal of iterator
> optimizations)
> // Algoritms for iterating forwards and backwards over code units.
> // Function for converting a series of code units to a unicode code
> point.
> // Any other operations that are encoding specific.
> }

Why do you need the traits, at compile-time?

- Why would the user want to change the encoding? Especially between
  UTF-16 and UTF-32?

- Why would the user want to specify encoding at compile time? Are there
  performance benefits to that? Basically, if we agree that UTF-32 is not
  needed, then UTF-16 is the only encoding which does not require complex
  handling. Maybe, for other encodings using virtual functions in character
  iterator is OK? And if iterators have abstract characters" as value_type,
  maybe the overhead if that is much large that virtual function call even
  for UTF-16.
  (As a side note, discussion about templated vs. non-templated interface
   seems a reasonable addition to a thethis. It's sure thing that if anybody
   wrote such a thethis in our lab, he would be asked to justify such a
   global decisions).

- What if the user wants to specify encoding at run time? For example, XML
  files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding if
  XML document is 8-bit, and UTF-16 when it's Unicode.

- Volodya


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk