Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-07-18 06:02:31

Next message: James Mansion: "Re: [boost] RFC: interest in Unicode codecs?"
Previous message: Edward Grace: "Re: [boost] proposal - Statistically robust function timing for performance decisions."
In reply to: Cory Nelson: "Re: [boost] RFC: interest in Unicode codecs?"
Next in thread: James Mansion: "Re: [boost] RFC: interest in Unicode codecs?"
Reply: James Mansion: "Re: [boost] RFC: interest in Unicode codecs?"
Reply: Cory Nelson: "Re: [boost] RFC: interest in Unicode codecs?"

Cory Nelson wrote:
> I finally found some time to do some optimizations of my own and have
> had some good progress using a small lookup table, a switch, and
> slightly deducing branches. See line 318:
>
> http://svn.int64.org/viewvc/int64/snips/unicode.hpp?view=markup
>
> Despite these efforts, Windows 7 still decodes UTF-8 three times
> faster (~750MiB/s vs ~240MiB/s on my Core 2. I assume they are either
> using some gigantic look up tables or SSE.

Hi Cory,

What is your test input?

When the input is largely ASCII, a worthwhile optimisation is to cast
groups of 4 (or 8) characters to ints and & with 0x80808080; if the
answer is zero, no further conversion is needed.

In general I'm unsure of the performance issues of lookup tables
compared to explicit bit-manipulation. Cache effects may be
significant, and a benchmark will tend to warm up the cache better than
a real application might.

I can't see how SSE could be applied to this problem, but it's not
something I know much about.

I don't have much time to work on this right now, but if the algorithm
plus test harness and test data were bundled up into something that I
can just "make", I will try to compare it with my version.

Regards, Phil.

Next message: James Mansion: "Re: [boost] RFC: interest in Unicode codecs?"
Previous message: Edward Grace: "Re: [boost] proposal - Statistically robust function timing for performance decisions."
In reply to: Cory Nelson: "Re: [boost] RFC: interest in Unicode codecs?"
Next in thread: James Mansion: "Re: [boost] RFC: interest in Unicode codecs?"
Reply: James Mansion: "Re: [boost] RFC: interest in Unicode codecs?"
Reply: Cory Nelson: "Re: [boost] RFC: interest in Unicode codecs?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk