Boost logo

Boost :

Subject: Re: [boost] [config] RFC PR 82
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2015-12-01 06:10:33


On 2015-12-01 09:18, Domagoj Šarić wrote:
> On Wed, 25 Nov 2015 01:50:40 +0530, Andrey Semashev
> <andrey.semashev_at_[hidden]> wrote:
>>
>> As far as I understand, sealed can be used only with C++/CLR, is that
>> right? If so then I'd rather not add a macro for it.
>>
>> If on the other hand sealed can be used equivalently to final in all
>> contexts, then you could use it to implement BOOST_FINAL.
>
> It is available in C++ but technically it is not _the_ C++ keyword but an
> extension with the same purpose so 'purists' might mind that
> BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to
> sealed instead of final...

Again, if sealed is equivalent to final in all contexts then I don't
mind BOOST_FINAL expanding to sealed. Otherwise think of a separate
macro for sealed.

>>>> I don't see the benefit of BOOST_NOTHROW_LITE.
>>>
>>> It's a nothrow attribute that does not insert runtime checks to call
>>> std::terminate...and it is unfortunately not offered by Boost.Config...
>>
>> Do you have measurments of the possible benefits compared to noexcept?
>> I mean, noexcept was advertised as the more efficient version of
>> throw() already.
>
> What more measurements beyond the disassembly window which clearly shows
> unnecessary EH codegen (i.e. bloat) are necessary?

I'll reiterate, what are the practical benefits? I don't care about a
couple instructions there or not there - I will never see them in
performance numbers or binary size.

>>>> I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the
>>>> same effect can be achieved in pure C++.
>>>
>>> You mean creating a class template with a single dummy template argument
>>> and a static data member just so that you can define a global
>>> variable in
>>> a header w/o linker errors?
>>
>> Slightly better:
>>
>> template< typename T, typename Tag = void >
>> struct singleton
>> {
>> static T instance;
>> };
>> template< typename T, typename Tag >
>> T singleton< T, Tag >::instance;
>
> That's what I meant...and it is really verbose (and slower to compile than
> a compiler-specific attribute)...

I won't argue about compilation speeds, although I doubt that the
difference (in either favor) is measurable. As for verbosity, the above
code needs to be written only once.

>>>> Calling conventions macros are probably too specialized to functional
>>>> libraries, I don't think there's much use for these. I would rather
>>>> not have them in Boost.Config to avoid spreading their use to other
>>>> Boost libraries.
>>>
>>> That's kind of self-contradicting, if there is a 'danger' of them being
>>> used in other libraries that would imply there is a 'danger' from them
>>> being useful...
>>
>> What I mean is that having these macros in Boost.Config might
>> encourage people to use them where they would normally not.
>
> The same as above...I don't see a problem? If they are useful - great, if
> not and people still use them - 'we have bigger problems'...

[snip]

> There is no 'standard' calling convention, just the 'default' one...and
> what headache can a non-default c.convention in an API cause (e.g. the
> whole Win32 and NativeNT APIs use the non-default stdcall convention)?

By using non-default calling conventions you're forcing your users out
of the standard C++ land. E.g. the user won't be able to store an
address of your function without resorting to compiler-specific keywords
or macros to specify the calling convention. It complicates integration
of your library with other code. I'd rather strictly ban non-default
calling conventions on API level at all.

>> You might use them in library internals but there I think it's better
>> to avoid the call at all - by forcing the hot code inline.
>
> Enter bloatware...
> A statically dispatched call to a 'near' function has near zero overhead
> for any function with half-a-dozen instructions _if_ it (i.e. the
> ABI/c.convention) does not force the parameters to ping-pong through the
> stack...
> Forceinlining is just a primitive bruteforce method in such
> cases...which eventually makes things even worse (as this 'bloatware
> ignoring' way of thinking is certainly a major factor why the dual-core
> 1GB RAM netbook I'm typing on now slows down to a crawl from paging when
> I open gmail and 3 more tabs...).

There are different kinds of bloat. Force-inlining critical functions of
your program will hardly make a significant difference on the total
binary size, unless used unwisely or you're in hardcore embedded world
where every byte counts.

> For dynamically dispatched calls (virtual functions) choosing the
> appropriate c.convention and decorating the function with as many
> relevant attributes is even more important (as the dynamic dispatch is a
> firewall for the optimiser and it has to assume that the function
> 'accesses&throws the whole universe')...

My point was that one should avoid dynamic dispatch in hot code in the
first place. Otherwise you're healing a dead horse. Argument passing has
little effect compared to a failure to predict the jump target. Even
when the target is known statically (i.e. non-virtual function call) the
effect of the call can be significant if it's on the hot path -
regardless of the calling convention.

>> If that code is unimportant then why do you care?
>
> Already explained above - precisely because it is unimportant it is
> important that it be compiled for size (and possibly moved to the 'cold'
> section of the binary) to minimise its impact on the performance of the
> code that does matter; loading speed of the binary; virtual memory; disk
> space, fragmentation and IO...

I think, you're reaching here. Modern OSs don't 'load' binaries, but map
them into address space. The pages are loaded on demand, and the typical
page size is 4k - you'd have to save at least 4k of code to measure the
difference, let alone feel it. Virtual address space is not an issue,
unless you're on a 32-bit system, which is only wide spread in the
embedded area. The disk space consumption by data exceeds code by
magnitudes, which in turn shows on IO, memory and other related stuff.
And the net effect of these optimization attributes on a real program is
yet to be seen.

>> Simply organizing code into functions properly and using
>> BOOST_LIKELY/UNLIKELY where needed will do the thing.
>
> No it will not (at least not w/o PGO)

These hints don't require PGO; they work without it.

> as the compiler cannot deduce
> these things (except for simple scenarios like assuming all noreturn
> functions are cold)...and saying that we can/should then help it with
> BOOST_LIKELY while arguing that we shouldn't help it with
> BOOST_COLD/MINSIZE/OPTIMIZE_FOR_* is 'beyond self contradicting'...

The difference is the amount of effort you have to put into it and the
resulting portability and effect.

The other difference is in the amount of control the user has over the
resulting code compilation. This important point you seem to disregard.

>> What I was saying is that it's the user who has to decide whether to
>> build your code for size or for speed or for debug. That includes the
>> parts of the code that you, the library author, consider performance
>> critical or otherwise.
>
> I'm sorry I fail to take this as anything else than just pointless
> nagging for the sake of nagging (and we are not talking about debug
> builds here).

I am talking about debud builds in particular. If I build a debug
binary, I want to be able to step through every piece of code, including
the ones you marked for speed. If I build for binary size, I want to
miminize size of all code, including the one you marked. I don't care
for speed in either of these cases.

>> You may want to restrict his range of choices, e.g. when a certain
>> optimization breaks your code.
>
> More strawman 'ivory towering'...how exactly am I restrictring anyones
> choices? A real world example please?

Read that quote again, please.

For example, if your code relies on strict IEEE 754 you may want to mark
the function with -fno-fast-math. Or if your library is broken with LTO
on gcc older than 5.1 (like Boost.Log, for instance) you might want to
add -fno-lto to your library build scripts. Thing is there are so many
things that may potentially break the code, most of which you and I are
simply unaware of that this kind of defensive practice just isn't practical.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk