|
Boost : |
Subject: Re: [boost] [xpressive] Performance Tuning?
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-07-18 07:00:44
On Sat, Jul 18, 2009 at 2:13 AM, Eric Niebler<eric_at_[hidden]> wrote:
> Michael Caisse wrote:
>>
>> OvermindDL1 wrote:
>>>
>>> Parsing: 42.5
>
> <snip>
>>>
>>> spirit-grammar(threadsafe/reusable): 3.1393
>>
>> Thank you for pulling this together. Would you mind sharing your test
>> suite?
Er, I meant to attach it, it is attached now. :)
It requires Boost trunk, and the timer file hpp I include is part of
the Boost.Spirit2.1 examples/test/somewhere_in_there area, but I
included it with my cpp file too so you do not need to hunt for it.
The defines at the top control what parts to compile or not, 0 to
disable compiling for that part, 1 to enable it.
My build is built with Visual Studio 8 (2005) with SP1. Compiler
options are basically defaults, except getting rid of the secure crt
crap that Microsoft screwed up (enabling that crap slows down Spirit
parsers on my system, a *lot*). The exe I built is in the 7zip file
attached. As stated, I have heard that Visual Studio handles template
stuff like Spirit better then GCC, so I am very curious how GCC's
timings on this file would be. There are still more changes to make
that I intend to make, but I really want the original code in a way
that I can use it.
To be honest, I had to change the core::to_number lines (commented
out) to boost::lexical_cast (right below the commented version), so
the xpressive version could be slightly faster if I actually had the
implementation of core::to_number available, and core::to_number was
well made. The xpressive code also throws a nice 100 line long
warning in my build log, all just about a conversion warning from
double to int_64, no clue how to fix that, I do not know xpressive, so
I would gladly like it if someone could get rid of that nasty warning
in my nice clean buildlog. In my compiler, my Spirit2.1 grammar
builds perfectly clean, I would like it if xpressive was the same way.
I honestly do not know *why* the Spirit version is so much faster then
the xpressive version, the spirit-quick version (the non-threadsafe) I
whipped up in about 2 minutes. The threadsafe version took about 5
minutes, the grammar/threadsafe/reusable version took about 10
minutes, and I know a lot more work was put into the xpressive
version, especially with the auto macros added and all such as well.
I would love it if someone could find out way. If someone else with
MSVC, and someone with GCC and perhaps other things could build it and
display the results that it prints out too, I would be much
appreciative. I do have a linux computer here, but, to be honest, no
clue what to pass to gcc to build something, the command line switches
I pass to MSVC's version is rather monstrous, so trying to convert
that to GCC's seems nightmarish from my point of view.
On Sat, Jul 18, 2009 at 2:13 AM, Eric Niebler<eric_at_[hidden]> wrote:
> Yes, please. I know Spirit2 is great tech, but I have to wonder how it's
> over 10X faster than the hand-coded parser.
And I have not tested the hand-coded parser as I cannot get it to
compile. If you can get me a code-complete standalone version of it,
I would be very happy. :)
Either way, Windows users, could you please run the attached exe (that
is in the 7zip file) and paste the results it tells you in an email to
this thread, along with your windows version and basic hardware?
Before I attach this, I am going to run the release exe through a
profiler right quick.
With 1000000 iterations (one million so the xpressive version does not
take so long), with just the xpressive version enabled, the top 10
slowest functions:
CS:EIP Symbol + Offset
64-bit CPU
clocks IPC DC miss rate DTLB L1M L2M rate Misalign rate
Mispredict rate
0x421860 strcmp
2248
1.98 0 0 0 0
0x42bc84 __strgtold12_l
1196
1.1 0 0 0.02 0.01
0x4068a0 std::operator<<<std::char_traits<char> >
744
1.06 0 0 0 0.02
0x41d864 TrailUpVec
686
0.03 0.11 0 0 0
0x40e0e0 std::num_get<char,std::istreambuf_iterator<char,std::char_traits<char>
> >::_Getffld
571
0.94 0 0 0 0.01
0x42d344 __mtold12
447
2.2 0 0 0 0
0x4170a0 std::basic_istream<char,std::char_traits<char> >::operator>>
406
0.38 0 0 0.05 0.08
0x414150 boost::xpressive::detail::posix_charset_matcher<boost::xpressive::cpp_regex_traits<char>
>::match<std::_String_const_iterator<char,std::char_traits<char>,std::allocator<char>
>,boost::xpressive::detail::static_xpression<boost::xpressive::detail::true_matcher,
358 1.36 0 0 0
0
0x419231 std::_Lockit::~_Lockit
334
0.26 0 0 0 0
0x42b200 _ld12tod
333
1.05 0 0 0.01 0.01
10 functions, 700 instructions, Total: 48191 samples, 50.01% of
samples in the module, 31.99% of total session samples
So it looks like strcmp i massively hobbling it, taking almost twice
the time of the next highest user. Now for 1000000 (one million) of
just the spirit quick version (all calls, surprisingly few):
CS:EIP Symbol + Offset
64-bit CPU
clocks IPC DC miss rate DTLB L1M L2M rate Misalign rate
Mispredict rate
0x4188c9 _pow_pentium4
358
1.04 0 0 0 0
0x404d70 ??$phrase_parse_at_PBDU?$expr_at_Ubitwise_or_at_tag@proto_at_boost@@U?$list2_at_ABU?$expr_at_Ushift_right_at_tag@proto_at_boost@@U?$list2_at_ABU?$expr_at_Ushift_right_at_tag@proto_at_boost@@U?$list2_at_ABU?$expr_at_Usubscript@tag_at_proto@boost@@U?$list2_at_ABU?$terminal073d7121f2c9203b84cbac5f1ea1214c
116 1.71 0 0 0
0
0x405080 boost::spirit::qi::detail::real_impl<double,boost::spirit::qi::real_policies<double>
>::parse<char const *,double>
76 1.21 0
0 0 0
0x405f90 boost::spirit::qi::detail::extract_int<__int64,10,1,-1,boost::spirit::qi::detail::positive_accumulator<10>,0>::parse_main<char
const *,__int64>
68 2.35 0 0 0 0
0x405550 boost::spirit::qi::detail::extract_int<double,10,1,-1,boost::spirit::qi::detail::positive_accumulator<10>,0>::parse_main<char
const *,double>
66 1.82 0 0 0 0
0x4053e0 boost::spirit::qi::detail::`anonymous
namespace'::scale_number<double>
63 1.14 0 0 0
0
0x404300 parse_price_spirit_quick<char const *>
62
1.31 0 0 0 0.03
0x4054e0 boost::spirit::qi::detail::fail_function<char const
*,boost::fusion::unused_type const
,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::spirit::char_encoding::ascii>
> >::operator()<boost::spirit::qi::action<boost: 59
1.78 0 0 0 0
0x404f30 boost::spirit::qi::skip_over<char const
*,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::spirit::char_encoding::ascii>
> >
58 1.59 0 0
0 0
0x417b90 floor
48
0.67 0 0 0 0
0x417b16 _ftol2
46
2.37 0 0 0 0
0x4018f0 dotNumber
42
0.86 0 0 0 0
0x404fa0 boost::spirit::qi::action<boost::spirit::qi::real_parser_impl<double,boost::spirit::qi::real_policies<double>
>,void (__cdecl*)(double)>::parse<char const
*,boost::fusion::unused_type const
,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::s
41 1.12 0 0 0
0
0x405660 boost::spirit::qi::detail::extract_int<double,10,1,-1,boost::spirit::qi::detail::positive_accumulator<10>,1>::parse_main<char
const *,double>
31 1.29 0 0 0 0
0x417890 _CIpow
31
1.68 0 0 0 0
0x405af0 boost::spirit::qi::int_parser_impl<__int64,10,1,-1>::parse<char
const *,boost::fusion::unused_type const
,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::spirit::char_encoding::ascii>
>,__int64> 29 0.48 0
0 0 0
0x405010 boost::spirit::qi::action<boost::spirit::qi::real_parser_impl<double,boost::spirit::qi::real_policies<double>
>,void (__cdecl*)(double)>::parse<char const
*,boost::fusion::unused_type const
,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::s
27 1.04 0 0 0
0
0x4174c0 _allmul
27
1 0 0 0 0
0x405b60 boost::spirit::qi::not_predicate<boost::spirit::qi::literal_char<boost::spirit::char_encoding::standard,1,0>
>::parse<char const *,boost::fusion::unused_type const
,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::s
25 1 0 0 0
0
0x404ec0 bo$phrase_parse_at_PBDU?$expr_at_Ubitwise_or_at_tag@proto_at_boost@@U?$list2_at_ABU?$expr_at_Ushift_right_at_tag@proto_at_boost@@U?$list2_at_ABU?$expr_at_Ushift_right_at_tag@proto_at_boost@@U?$list2_at_ABU?$expr_at_Usubscript@tag_at_proto@boost@@U?$list2_at_ABU?$terminal073d7121f2c9203b84cbac5f1ea1214c
23 0.17 0 0 0
0.12
0x417bd0 _floor_pentium4
17
0.24 0 0 0 0
0x4188b0 _CIpow_pentium4
14
0 0 0 0 0
0x401970 main
9
0.11 0 0 0 0.3
0x404f10 boost::spirit::qi::skip_over<char const
*,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::spirit::char_encoding::ascii>
> >
4 0 0 0
0 0
0x40cc02 _flsbuf
1
0 0 0 0 0
0x40e8b0 __SEH_prolog4
0
0 0 0 0 0
26 functions, 447 instructions, Total: 6513 samples, 100.00% of
samples in the module, 69.20% of total session samples
Now for the same, but with the spirit grammar version, since it is so
much slower then the quick for some reason (all calls again, not that
many):
CS:EIP Symbol + Offset
64-bit CPU
clocks IPC DC miss rate DTLB L1M L2M rate Misalign rate
Mispredict rate
0x419909 _pow_pentium4
365
0.97 0 0 0 0
0x4056a0 boost::function4<bool,char const * &,char const * const
&,boost::spirit::context<boost::fusion::cons<__int64
&,boost::fusion::nil>,boost::fusion::vector0<void> >
&,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::sp
129 1.19 0 0 0
0.02
0x405780 boost::detail::function::function_obj_invoker4<boost::spirit::qi::detail::parser_binder<boost::spirit::qi::alternative<boost::fusion::cons<boost::spirit::qi::reference<boost::spirit::qi::rule<char
const *,__int64 __cdecl(void),boost::proto::exprns_::expr<boos
99 1.12 0 0 0
0.03
0x406f50 boost::spirit::qi::detail::extract_int<__int64,10,1,-1,boost::spirit::qi::detail::positive_accumulator<10>,0>::parse_main<char
const *,__int64>
81 1.28 0 0 0 0
0x406100 boost::spirit::qi::detail::real_impl<double,boost::spirit::qi::real_policies<double>
>::parse<char const *,double>
77 1.38 0
0 0 0
0x406bc0 boost::spirit::qi::rule<char const *,__int64
__cdecl(void),boost::proto::exprns_::expr<boost::proto::tag::terminal,boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::spirit::char_encoding::ascii>
>,0>,boost::fusion::unu 77 0.87 0 0
0 0.04
0x406c30 boost::spirit::qi::action<boost::spirit::qi::int_parser_impl<__int64,10,1,-1>,boost::phoenix::actor<boost::phoenix::composite<boost::phoenix::assign_eval,boost::fusion::vector<boost::spirit::attribute<0>,boost::phoenix::composite<boost::phoenix::multiplies_ev
74 1.61 0 0 0
0
0x406620 boost::spirit::qi::detail::extract_int<double,10,1,-1,boost::spirit::qi::detail::positive_accumulator<10>,0>::parse_main<char
const *,double>
64 1.22 0 0 0 0
0x4050b0 boost::spirit::qi::phrase_parse<char const
*,price_grammar<char const
*>,boost::proto::exprns_::expr<boost::proto::tag::terminal,boost::proto::argsns_::term<boost::spirit::tag::char_code<boost::spirit::tag::blank,boost::spirit::char_encoding::ascii>
>,0>,__in 56 0.29 0 0
0 0.11
0x406460 boost::spirit::qi::detail::`anonymous
namespace'::scale_number<double>
53 1.79 0 0 0
0
0x405810 boost::detail::function::function_obj_invoker4<boost::spirit::qi::detail::parser_binder<boost::spirit::qi::alternative<boost::fusion::cons<boost::spirit::qi::reference<boost::spirit::qi::rule<char
const *,__int64 __cdecl(void),boost::proto::exprns_::expr<boos
52 1.98 0 0 0
0.02
0x418b56 _ftol2
50
1.68 0 0 0 0
0x401940 main
45
0.67 0 0 0 0.04
0x405fe0 boost::spirit::traits::action_dispatch<boost::spirit::qi::real_parser_impl<double,boost::spirit::qi::real_policies<double>
> >::operator()<dot_number_to_long_long_function,double,boost::spirit::context<boost::fusion::cons<__int64
&,boost::fusion::nil>,boost:: 43 1.19 0
0 0 0
0x405f70 boost::spirit::qi::action<boost::spirit::qi::real_parser_impl<double,boost::spirit::qi::real_policies<double>
>,dot_number_to_long_long_function>::parse<char const
*,boost::spirit::context<boost::fusion::cons<__int64
&,boost::fusion::nil>,boost::fusion::vecto 41 0.83
0 0 0 0
0x405930 boost::detail::function::function_obj_invoker4<boost::spirit::qi::detail::parser_binder<boost::spirit::qi::action<boost::spirit::qi::real_parser_impl<double,boost::spirit::qi::real_policies<double>
>,dot_number_to_long_long_function>,boost::mpl::bool_<0> >,bo
36 2 0 0 0 0
0x418bd0 floor
34
1.12 0 0 0 0
0x405e60 boost::spirit::qi::action<boost::spirit::qi::real_parser_impl<double,boost::spirit::qi::real_policies<double>
>,dot_number_to_long_long_function>::parse<char const
*,boost::spirit::context<boost::fusion::cons<__int64
&,boost::fusion::nil>,boost::fusion::vecto 33 0.15
0 0 0 0.28
0x4182a0 _allmul
33
3.42 0 0 0 0
0x4188d0 _CIpow
27
0.52 0 0 0 0
0x406730 boost::spirit::qi::detail::extract_int<double,10,1,-1,boost::spirit::qi::detail::positive_accumulator<10>,1>::parse_main<char
const *,double>
26 2.62 0 0 0 0
0x406560 boost::spirit::qi::int_parser_impl<__int64,10,1,-1>::parse<char
const *,boost::spirit::context<boost::fusion::cons<__int64
&,boost::fusion::nil>,boost::fusion::vector0<void>
>,boost::spirit::qi::char_class<boost::spirit::tag::char_code<boost::spirit::tag::bla
19 0.16 0 0 0
0
0x418c10 _floor_pentium4
16
0 0 0 0 0
0x406ca0 boost::spirit::qi::not_predicate<boost::spirit::qi::literal_char<boost::spirit::char_encoding::standard,1,0>
>::parse<char const
*,boost::spirit::context<boost::fusion::cons<__int64
&,boost::fusion::nil>,boost::fusion::vector0<void>
>,boost::spirit::qi::char_ 11 0.36 0 0
0 0
0x4198f0 _CIpow_pentium4
11
0 0 0 0 0
0x40b090 _flush
1
0 0 0 0 0
26 functions, 451 instructions, Total: 7342 samples, 100.00% of
samples in the module, 71.73% of total session samples
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk