From: Andy Little (andy_at_[hidden])
Date: 2006-09-25 18:14:14
"Joel de Guzman" <joel_at_[hidden]> wrote in message
> Andy Little wrote:
>> There do seem to be limits to when the compiler will optimise a view though,
>> may have something to do with copying references(aka joint-view) as raw
>> after which AFAICS the compiler will treat them as pointers. that said that
>> only an impression so far.
> Ok, now I understand. You sent me an email about stacked fold of
> stacked joint_view created by push_back being difficult for the
> compiler to optimize. Here's why (and there's a solution):
> joint_view is a prime example of a segmented sequence. The resulting
> iterators from it will have some overhead. That is inevitable.
> See joint_view_iterator so you'll know what I mean.
> However, Eric Niebler did some work on segmented algorithms. His
> initial testing (for proto and early spirit-2 experiments) shows
> some exciting results. Basically, a segmented sequence is a sequence
> of sequences (for each segment). Instead of creating an iterator that
> "weaves" through each segment (like joint_view_iterator), segmented
> algorithms apply the algorithm recursively to each of the segments.
> For example, a fold of a joint_view will be 2 calls to fold, one
> for the left join and one for the right.
> I know this is a bit hand-wavy, but once you see it in action, it
> really makes a lot of sense. I'm CCing Eric. I think now is the right
> time to incorporate his work into the fusion code base.
Yep. I don't understand,but it sounds great :-)
OK. FWIW I found that creating a sequence by pushing_back result_types to an
empty vector is a great way to 'create' a result sequence at compile time. It
kind of creates something out of nothing. At runtime however I found I could
employ a different approach by using recursive functors(that is still compile
time rather than runtime recursion, but much faster to compile and using less
resources, whereas joint_view can be a resource hog), because now the type of
the result sequence is known. Basically the compile time and runtime stages are
doing different jobs. The compile time fold push_back is working out what the
final sequence will be. Now the runtime's job is to figure out the most
efficient way to initialise the sequence, that it already knows about. I havent
yet figured how to create an initialiser so that the compiler will initialise
the sequence right in the constructor(not that its impossible, just havent got
round to trying), but I guess that would be ideal. The compiler is seeming to
optimise away the default ctor, followed by the fill anyway, but it would be
satisfying if I could express that intent more directly. I guess this might
involve making a view, where each element contains an individual view of the
function for initialising that element.
Anyway it might be worth thinking about not copying too slavishly the compile
time behaviour at runtime, as maybe these are very different situations.
Second my analysis of copying is based on not too much science. IN VC8 the
assembler code gets pretty big and later functions tend to join the output of
earlier functions and with 20 Mb assembler files I tended to have ideas half way
through reading it and never get to the end of the story to see what the
compiler finally did! If that makes any sense. However it may well be that the
compiler does keep track of the stuff its copying. It just gets a bit
frightening seeing rep movsd instructions on bigger and bigger joint_views
(where these might have typically 16 elemnts each with 2 referneces), with
bigger and bigger arguments, but OTOH I suspect once it has to copy then the
game is up.
The other question is whether Fusion scales. My guess is that the compile time
should flatten out once the compiler starts seeing the same types, but OTOH this
business of accumulating the result type may be 'interesting', but it requires
writing some applications to find out, so that is what I'm currently about.