Boost logo

Boost :

Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-07-18 02:14:30


Rogier van Dalen wrote:

> Freestanding transcoding functions and codecvt facets are not the only
> thing I believe a UTF library would need, though.

I've personally purposely chose not to use codecvt facets in my unicode
library at all, but maybe I should provide them anyway for compatibility
with the iostreams subsystem.
I don't really find those practical to use.

  I'd add to the list:
> - compile-time encoding (meta-programming);

Didn't think of that.

> Iterator
> adaptors, I found, are a pain to attach error policies to and write
> them correctly. For example, with a policy equivalent to your
> "ReplaceCheckFailures", you need to produce the same code point
> sequence whether you traverse an invalid encoded string forward or
> backward. I've got code for UTF-8 that passes my unit tests, but the
> error checking and the one-by-one decoding makes it much harder to
> optimise.

For now my iterator adaptors (and the codecs they're based on for that
matter) perform full checks, including checking that we don't go past
the end of the input range (one way or the other).
While I wanted both versions with checks and without initially, only
having one does make it easier to use.

An error policy isn't really enough though, because to do full checks
you need each iterator to know about the begin and the end of the range
it's working on which could be avoided altogether when trusting the input.

They're fairly simple implementations and were never benchmarked
(benchmarking my library isn't even scheduled at the moment), but
they're quite correct (proper unit tests are in the works).

> I believe that Mathias Gaunard is working on a library at
> <http://blogloufoque.free.fr/unicode/doc/html/>. I don't know how
> complete it is, but from the documentation it looks well thought-out
> so far. I'm looking forward to seeing where that's going!

Thanks!
I'm in the writing of several tutorials to make it easier to understand
how it's designed. (plus I still need to actually implement some stuff
that is in that version of the docs)


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk