Boost logo

Boost :

Subject: Re: [boost] [convert] Performance
From: Joel de Guzman (djowel_at_[hidden])
Date: 2014-06-11 20:30:12


On 6/11/14, 7:55 PM, Vladimir Batov wrote:
> Joel de Guzman wrote
>> On 6/11/14, 2:58 PM, Vladimir Batov wrote:
>>> ...
>>> Thank you, Joel, for sharing your performance testing framework.
>>
>> My pleasure. I'm glad it helped. BTW, as I said before, you can use
>> the low-level spirit ...
>
> Thanks, very much appreciated. Although Spirit is such a Terra-incognito for
> me. So, I am hoping still Jeroen will jump in and do that. :-)
>
> Now, let me get back to the performance tests... You gave me this new "toy"
> to play with... so now I cannot stop spamming the list with my new findings.
> Apologies. Still, it seems important as there were concerns voiced about
> boost::convert() performance penalties.
>
> The results I posted before were for your vanilla performance tests... well,
> with an addition of 2 of my own tests:
>
> atoi_test: 2.2135431510 [s] {checksum: 730759d}
> strtol_test: 2.1688206260 [s] {checksum: 730759d}
> spirit_int_test: 0.5207926250 [s] {checksum: 730759d}
> spirit_new_test: 0.5803735980 [s] {checksum: 730759d}
> cnv_test: 0.6161884860 [s] {checksum: 730759d}
>
> However, I felt somewhat uneasy with the limited number (9) of strings
> tested. More importantly, I felt that the short strings were too heavily
> represented in the test. What I mean is, for example, there are only 10
> 1-digit strings out of enormous sea of available numbers. That is,
> statistically, they are only
> 10 * 100% / (INT_MAX * 2) out of all available numbers. However, in the test
> they contributed to 11% of performance results. And I felt that short
> strings might be spirit's "speciality" so to speak. In other words, I felt
> that the test might use the input data that favored spirit. So, I replaced
> those 9-strings input set with 1000 randomly generated strings from the
> [INT_MIN, INT_MAX] range... and that's the results I've got:

I do not think a random distribution of number of digits is a
good representation of what's happening in the real world. In
the real world, especially with human generated numbers(*), shorter
strings are of course more common.

(* e.g. programming languages, which, you are right to say, is
spirit's specialty).

BTW, the fact that smaller numbers occur more in real life is
taken advantage of some optimized encodings such as Google
Protocol Buffers's Base 128 Varints where smaller numbers
take a smaller number of bytes. If the distribution was equal,
then encodings such as Varints would not make sense.

(https://developers.google.com/protocol-buffers/docs/encoding)

> local::strtol_test: 312.5899575630 [s] {checksum: 7aa26f0b}
> local::old_spirit_test: 132.2640077370 [s] {checksum: 7aa26f0b}
> local::new_spirit_test: 148.1716253210 [s] {checksum: 7aa26f0b}
> local::cnv_test: 143.4929925850 [s] {checksum: 7aa26f0b}
>
> 1) With the original 9-strings test spirit was 4 times faster than strtol;
> with 1000 strings the difference is down to about 2.5 times... which is what
> I've been getting: str-to-int spirit/strtol=1.45/3.76 seconds;
> 2) the overhead of "new_spirit_test" compared to "old_spirit_test" is still
> about 12%. What is important is that the only difference between the tests
> is 2 additional validity checks:
>
> struct old_spirit_test : test::base
> { ...
> boost::spirit::qi::parse(beg, end, boost::spirit::qi::int_, n);
> return n;
> }
> struct new_spirit_test : test::base
> { ...
> if (boost::spirit::qi::parse(beg, end, boost::spirit::qi::int_,
> n))
> if (beg == end)
> return n;
>
> return (BOOST_ASSERT(0), 0);
> }
>
> It seems that Spirit is really testing the speed limits given that other
> operations start playing visible role... like those (necessary IMO) checks
> add 12%!
>
> 3) The "funny" (as you mentioned before) part is that with 1000-strings set
> the cnv_test is again better than raw new_spirit_test (which has the same
> process flow as cnv_test). That's what my tests have been showing all along
> (although they are run against 10000000-strings input set):
>
> str-to-int spirit: raw/cnv=1.45/1.44 seconds (99.05%).
> str-to-int spirit: raw/cnv=1.45/1.44 seconds (99.00%).
> str-to-int spirit: raw/cnv=1.45/1.44 seconds (99.01%).
> str-to-int spirit: raw/cnv=1.45/1.44 seconds (99.07%).
> str-to-int spirit: raw/cnv=1.45/1.44 seconds (99.59%).
>
> All compiled with gcc-4.8.2
>
> I personally have no explanation to that but at least I feel that
> boost::convert() framework does not result in performance degradation as we
> were concerned it might be... seems to be the opposite.

Shrug. Obviously, there's something wrong with that picture, but
I am not sure what. It may be that what's happening here is that some
other overhead(*) outweighs the actual conversion by a large
factor at that scale and that is what you are actually seeing.

(* E.g. string operations)

Regards,

-- 
Joel de Guzman
http://www.ciere.com
http://boost-spirit.com
http://www.cycfi.com/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk