Boost logo

Boost :

Subject: Re: [boost] Efficient tuple implementation
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2014-06-09 05:55:15


On 8 Jun 2014 at 18:37, Stephan T. Lavavej wrote:

> > Do also bear in mind that MSVC is not an AST based compiler,
> > so all your benchmarks will look totally different on MSVC anyway.
> > What may have O(N^N) scaling on an AST compiler might well be O(N)
> > on MSVC - in fact, that is exactly why MSVC isn't AST based,
> > as internal Microsoft code is so complex it is uncompilable by an AST based compiler.
>
> 1. That's not why. The reason is that manipulating tokens seemed like a
> good idea 20-30 years ago, when this was sufficient to deal with C and
> C++ (before templates, variadic templates, decltype, constexpr,
> expression SFINAE, etc.) when computers were far smaller and slower.

For a good chunk of MSVC's (and C++'s) userbase, template use never
really exceeds cookie cutter use i.e. template<class T> class vector.
And for those guys - which includes a good chunk of Microsoft itself
- MSVC's approach makes perfect sense. It is after all blindingly
quick, and uses virtually no RAM relatively speaking.

By "internal Microsoft code is so complex it is uncompilable by an
AST based compiler" I was referring to a conversation I had with Gaby
and James McNellis at C++ Now with Chandler occasionally jumping in
to confirm the insanity :). They were telling me about the problems
of getting 18k line long functions to compile with *any* AST based
compiler, and that a unique problem facing MSVC is that a ton of
Microsoft internal code is automatically generated in a way done
under the expectation that MSVC doesn't mind really long statements
such as an if() statement spanning a few thousand lines. One then
faces the choice of retrofixing those automated generators to produce
less insane output, or keep MSVC non-AST based for non-template
non-constexpr code. I was told to date the latter has been chosen.

> 2. MS has a whole bunch of code, but only a tiny fraction approaches the
> complexity of Boost (in terms of "doing tricky things with the
> language"; stuff that's just loops and pointers is not complicated by
> this metric, no matter how twisty). Our STL implementation is usually
> the high water mark for complexity, and while we do tricky stuff, I know
> that Boost is trickier.

Oh sure. I didn't clarify I meant things like a single boolean if
containing thousands of tests. I meant sheer size complexity, not
language complexity.

> 3. I am not really qualified to talk about compiler tech, but I cannot
> possibly understand how keeping an AST around would change the
> *asymptotic* complexity of translation (it introduces constant factors
> which are an issue in practice). The AST has a strict superset of the
> information available by looking at a few tokens at a time. By
> definition, any speedups with a token-based approach that can't be
> obtained through an AST would be due to nonconformance.

The problem is RAM consumption, because you need to create an AST for
the item being compiled before you compile it. Louis' empirical tests
for MPL11 (now Hana) implementation solutions did a lot of time and
space comparisons for various metaprogramming techniques. Both clang
and GCC are AST based, so their space consumption graphs looked not
dissimilar.

MSVC has taken considerable effort to use sane amounts of RAM under
very trying conditions, so I would suspect without evidence that MSVC
would produce *very* different graphs - some worse, but probably many
better. What is O(N^2) on clang or GCC I could see being O(N) on MSVC
for example simply due to lack of two phase lookup.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/



Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk