Boost logo

Boost :

Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Cory Nelson (phrosty_at_[hidden])
Date: 2009-07-18 11:03:08


On Sat, Jul 18, 2009 at 6:21 AM, James
Mansion<james_at_[hidden]> wrote:
> Cory Nelson wrote:
>>>
>>> I finally found some time to do some optimizations of my own and have
>>> had some good progress using a small lookup table, a switch, and
>>> slightly deducing branches.  See line 318:
>>>
>>> http://svn.int64.org/viewvc/int64/snips/unicode.hpp?view=markup
>>>
>>> Despite these efforts, Windows 7 still decodes UTF-8 three times
>>> faster (~750MiB/s vs ~240MiB/s on my Core 2.  I assume they are either
>>> using some gigantic look up tables or SSE.
>
> How much cost are you incurring in the tests for whether the traits indicate
> that
> the error returns are valid?

I've played around with it and have not noticed any significant
difference for this.

> I'm wondering if theer is a case for requiring that these be compile time
> constants
> in the Traits class rather than flags in a Traits value.
>
> And why is 'last' passed in to decode_unsafe?

Leftover from copy-paste, good catch.

> Is there any indication that duff's device will prevent aggressive inlining?

I have looked at the output of both GCC 4.4 and VC++ 2008 with
optimization flags cranked up. Each is generating inlined code
exactly how I want them to.

> I'm
> assuming you need this method to be fully inlined into the outer loop, and
> maybe its not happening - ideally you;d want some loop unrolling too.
>
> I suspect that as noted the lack of special case for largely 7-bit ascii
> input
> will tend to make it slow on mosts Western texts, though speedups for the
> multi-character case will need care on alignment-sensitive hardware: you'll
> need to fix that in the outermost loop.

Indeed. I haven't done this because the code uses iterators, but I
think some small specializations could be made to enable this in
transcode() when the input is a raw pointer.

One thing I have been trying is in that decode_unsafe. It has less
branches overall and compiles down to the optimal assembly I'd expect.
 For some reason, it runs slower. No clue why!

-- 
Cory Nelson
http://int64.org

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk