Subject: Re: [boost] Final benchmark graphs for Colony vs std:: containers now available
From: Soul Studios (matt_at_[hidden])
Date: 2016-05-03 20:07:18
> Downloading a zip is a higher barrier of entry than what is usually
> expected. It would help the feedback if the source code would be browsable
> online. Boost uses github and other libraries being submitted do as well
> but that is not the only alternative. This would probably increase the
> amount of feedback.
You said you were concerned about people's ability to replicate the
tests. I pointed out they have that ability. I don't consider a zip file
a barrier but I accept it may be easier for some people.
> The text just says that N elements are inserted into a container. Is the
> container empty before the insertion? If not, how many elements does it
> have? Is it the same number of elements for all N?
You say you've read the source code, so I take it you're using this as a
way of pointing out what you think is necessary to be communicated, and
that's fine. But for the record we are starting with empty containers
and inserting singly on the fly. This replicates typical game engine
code and the scenario which colony is made for, described throughout the
benchmarks as fast insertion/erasure in realtime without invalidating
Single insertion is also the only insertion criteria for any insertion
benchmark I've seen, so I personally feel it's a logical leap to say it
needs a large amount of explanation, but I accept that you feel that
should be made more clear.
>> Regards insertion, it is all the fastest method available to that class,
> No, it is not. From the source code it looks like you use push_back for
> vector when inserting a range of N elements at the end instead of
> vector::insert(end, N, T()) or vector::insert(end, begin, end). This is why
> I would like to see a table in the text indicating which method it is used
> because this is relevant performance-wise.
That doesn't typically replicate game engine code, as it assumes you
know the number of elements in advance and have all data in advance, in
which case, all the AAA dev's I have talked to or listened to have said,
if you know the size in advance, and already have all the data, you
would use an C-style array for performance/vectorizing/compatibility
reasons. Watch the SG14's and mike acton's cppcon talks for more info.
> Since the text is omitting relevant details to me as a reader it feels like
> it is trying to hide something.
I don't feel this is a polite or appropriate thing to say. There is a
lot of information and a LOT of writing in between those benchmarks, I'm
sorry that it doesn't include the Specific information you are
interested in, but that's not a reason to go making assumptions.
> You haven't answered my question about erasure, but from the source code it
> seems that the benchmarking code is erasing single elements of a vector and
> a deque which is N^2 and something nobody would do in a real
> performance-sensitive application.
Actually this happens in game engines a lot, but a raw unmodified vector
or deque wouldn't be used in this context because of the performance hit
you've mentioned. But what I'm doing in those first tests is measuring
raw performance for comparison in later situations where the vectors etc
are modified to avoid pointer/index invalidation - as described in the
text. In the situations where colony would be used - where pointer/index
validity matters ie. the bulk of situations for game engine code -
remove_if will invalidate pointers and indexes to elements within the
container, and is not an appropriate solution by itself.
In the context of modified containers, a remove_if-*like* command might
be used, but only when the modification prevents invalidation via the
erase command, when more than a few erasures were happening per-frame,
the data set is reasonably large, and the erase command was expensive.
In that case, an additional flag would have to be added to the stored
structure or the modified container, as conditions for erasure are not
likely to be the same between elements. Then a remove_if-like command
would fire at the end-of-frame.
As mentioned in detail, the first, raw benchmarks are ONLY to measure
which containers might be useful, modified or otherwise, in comparative
tests. But I grant that the above information is probably something that
most people would not know, and should possibly be included.
At some point in the future, if I have the time, I may construct a
variant of one of the modification tests featuring extremely high
modification and compare using a remove_if-like command. That is the
only context within which it might be useful.
> In a nutshell, the thing that I think would benefit the benchmark text the
> most is including more details about what exactly are the benchmarks doing
> and how (a table would probably suffice).
I think this is a reasonable point, but I think for the most part this
is pretty clear. I take your point that a differentiation as to what
erasure method is being used and why is appropriate, but I don't agree
with you for the insertion details. There's one benchmark where data is
re-inserted back in, and that is clearly labelled, and toward the end of
all tests. Regardless if I update the page, which I might, then I will
include the specification that we are inserting into empty containers,
though I fail to see how the initial insertion benchmarks would be
useful or rational if we weren't.
Making the code browsable online
> would certainly help with getting more feedback and improving the
> benchmarks to further pinpoint the best use cases of colony. Choosing the
> range insert methods for the containers that offer them as well as
> erase_remove_if for the erasure benchmarks is something that is worth
> exploring (since in the use cases considered nobody would use vector and
> deque that way).
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost