Boost logo

Boost :

From: Erik Wien (wien_at_[hidden])
Date: 2004-10-19 12:07:53


As I have said in a couple of other posts here, I have already started
testing different approaces to this library and I might as well post some
examples of what I have so far and how it would be used. I have only been
looking closely at the string representation part so far, so don't expect
too much. ;)

The basic idea I have been working around, is to make a nencoded_string
class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made
possible through a encoding_traits class which contains all nececcary
implementation details for working on strings of code units.

The outline of the encoding traits class looks something like this:

template<typename encoding>
struct encoding_traits
    {
    // Type definitions for code_units etc.
    // Is the encoding fixed width? (allows a good deal of iterator
optimizations)
    // Algoritms for iterating forwards and backwards over code units.
    // Function for converting a series of code units to a unicode code
point.
    // Any other operations that are encoding specific.
    }

This traits class is used by the encoded_string class to provide support for
strings using any unicode representation internally. This allows the
programmer to choose what encoding should be used from string to string,
depending on what would be best suited. The external interface of this class
would mainly be code point iterators. These iterators can iterate over any
encoded_string and the underlying encoding should be invisible. (This is
something that requires a non standard iterator implementation according to
the c++ spec, but would work nicely with the boost iterator library.)

You could use the encoded_string class like this:

 // Constructor converts the ASCII string to UTF-16.
encoded_string<utf16> some_string("Hello World");
// Run some standard algorithm on the string:
std::for_each(some_string.begin(), some_string.end(), do_some_operation);

I do currently have a really rough implementation that works like described
above, and I would probably base parts of a potential library on that.

I am aware that this implementation will be less that ideal for integration
with the current c++ standard, but it's issues like that I would like to get
deeper into during the develpoment.

Any comments you might have on this approach are most welcome.

Regards
Erik


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk