Boost logo

Boost :

Subject: Re: [boost] GSoC Unicode library: second preview
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-06-21 21:45:38


Phil Endecott wrote:

> I have looked quickly at your UTF8 code at
> https://svn.boost.org/trac/boost/browser/sandbox/SOC/2009/unicode/boost/unicode/utf_codecs.hpp
> in comparison with mine at
> http://svn.chezphil.org/libpbe/trunk/include/charset/utf8.hh . The
> encoding is similar, though I have avoided some code duplication (which
> is probably worthwhile in an inline function) and used an IF_LIKELY
> macro to enable gcc's branch hinting.
>
> My decoding implementation is rather different than yours, though. You
> explicitly determine the length of the code first and then loop , while
> I do this:
> <code snip />
>
> You may find that that is faster.

My code wasn't fine-tuned for performance at all, I'm still trying to
make things work first ;).
I'll surely consider your technique when I finally measure.

On another note, while I do think IF_LIKELY for UTF-16 is a good idea,
doesn't that heavily penalize certain scripts, such as asian ones, in
the case of UTF-8?

> Regarding the character database, the size is an issue. Can unwanted
> parts be omitted? For example, I would guess that the character names
> are not often used except for debugging messages and they are probably a
> large part of it.

The current design doesn't allow it to be shrunk any more than this
unfortunately.
I'm not too sure of how to enhance it to allow parts to be removed either.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk