Subject: Re: [boost] [xpressive] Performance Tuning?
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-07-13 23:38:48
On Fri, Jul 10, 2009 at 7:32 AM, Stewart, Robert<Robert.Stewart_at_[hidden]> wrote:
> Eric Niebler wrote:
>> Paul Baxter wrote:
>> > "Stewart, Robert" wrote:
>> >> Thorsten Ottosen wrote:
>> >>> It would be good if you could submit your code as a case in
>> >>> optimizing expressive code.
>> >> Where and how would you have me do so? I can see developing
>> >> a performance tuning example from it to put in the
>> >> documentation. I don't know how much of that Eric has in
>> >> mind already or if he is interested in such an addition (of
>> >> his own doing or mine).
>> > This has been an incredibly useful thread. Thanks to you all.
>> > As a potential user put off in the past by concerns over
>> > abstraction penalties with such libraries (even compile time
>> > libraries often fail to deliver in all but simple cases), I
>> > urge Eric to embrace such an example to illustrate just how
>> > powerful and maintainable a solution based on expressive can
>> > be.
>> I think this is a fine idea. All these tips and tricks are
>> already described in a doc, but they are not described in depth
>> from an end-user perspective. I think a performance tuning case
>> study would make a valuable appendix.
>> Robert, can you send me the latest version of your regex
>> grammar and your hand-coded parser? I'll see what I can do.
> Sure. I've attached both in one file, but with separate namespaces. My progression, as you can ascertain from retracing this thread, was from automatic sregexes and an automatic smatch to putting the sregexes in a namespace, which required creating placeholders, to putting the smatch into thread local storage, to using BOOST_PROTO_AUTO. That progression changed the performance of the Xpressive code from about 175X slower to less than 2X slower than the custom code. (I haven't measured against the final, tuned custom parsing code.)
> This code is used to parse whole numbers, real numbers, fractions, and mixed numbers from a string creating an integer from which the (possibly) fractional value can be recovered. example::lcm<T>::as() returns 160,000 as type T for that purpose because 160,000 is the least common multiple of all supported denominators. That aspect of this code should probably be removed in order to concentrate on the parsing, but was too well entrenched for me to remove.
> I have changed some names and namespaces from those in the original code to normalize it. I also have omitted exception throwing code and denominator validation logic. I haven't compiled since making those changes, so there may be some minor mistake in the attached code.
> The code references some things you'll not have access to, so allow me to explain them so you can make the necessary substitutions.
> - core::to_number(), uses TMP to select among several overloads of a conversion function which are wrappers around strtol(), strtod(), etc. Note that specifying a conversion radix is important to avoid octal parsing in the custom code. (Otherwise, the custom code would need to account for leading zeroes in other ways.)
> - core::numeric_cast is a function template that converts a string to a numeric type. It uses TMP to select among several overloads of a conversion function which are wrappers around core::to_number() and which log a debug-level message and throw std::bad_cast on failure. boost::lexical_cast should be a slower equivalent.
> - ThreadLocal, as you can well infer, manages memory referenced by thread local storage. (It mimics the interface of Rogue Wave's RWTThreadLocal.)
> FYI, direct_impl/direct_ exists because I couldn't distinguish its function call operator from one in to_price_impl/price_, without resorting to passing a dummy parameter. While it isn't strictly necessary, I chose to provide it because it avoids using double as an intermediate type.
> Notice that the rounding code assumes a positive value and that I manage the sign separately.
> The custom version was tuned via profiling, which explains the different treatment of the sign between parsing reals and fractions.
I find this quite interesting. I wonder if I might have the time
tonight to make a Spirit2.1 version of this, the code would certainly
be a great deal shorter.
Just to make sure, from what I gathered looking at the code, you are
trying to parse out a number from an ascii string that could
potentially be an integer (64-bit, just digits, always base 10), a
double (digits as the integer, then a period, then more digits parsed
as the integer, OR a whole integer, then a space(s), followed by an
int then a / then an int), it looks like that a real number can have a
'g' after it, but what is a g? I know what e's means, but g? I am
also confused, it seems your types support int64 as well as double,
but you only ever return an int64, why not a variant of both? Should
I do this for Spirit2.1? Spirit2.1 naturally wants to use such things
anyway so it is actually easier for me to do so, and the user would
have a more accurate value too as they would get either an int64 or a
double depending on what it parsed, I could also add in other
representations like a struct of two int64's for a
numerator/denominator as well for best accuracy. What would you
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk