Boost logo

Boost :

Subject: Re: [boost] Final benchmark graphs for Colony vs std:: containers now available
From: Gonzalo BG (gonzalobg88_at_[hidden])
Date: 2016-05-03 05:32:52


> Hi Gonzalo, I take your points, but the source code for the tests is
listed on the site: http://www.plflib.org/colony.htm#download
> You can look there if you have further questions.

Downloading a zip is a higher barrier of entry than what is usually
expected. It would help the feedback if the source code would be browsable
online. Boost uses github and other libraries being submitted do as well
but that is not the only alternative. This would probably increase the
amount of feedback.

> Regards vector insertion I think you need to check your assumptions
there.

If the vector is initially empty my assumption is "nothing can beat vector
sized constructor".

The text just says that N elements are inserted into a container. Is the
container empty before the insertion? If not, how many elements does it
have? Is it the same number of elements for all N?

Important information to understand the benchmarks is missing _in the
text_. IMO "download this zip file and take a look at the source code" is
not a good solution to this problem. A better solution would be to detail
in the text exactly what is the status of the containers size-wise (and
capacity-wise where appropiate) before and after insertion and erasure, as
well as where is the range of N elements being inserted for each container
and how, how are the elements being erased, etc. This could be done in a
table and would _significantly_ help understanding the results.

> It is not a fast insertor due to the fact that it reallocates to a new
memory block upon expansion past capacity. Deque does not do this. Nor does
colony.

I explicitly asked in my previous email what is the status of the size and
capacity before/after insertion. The text doesn't mention this, and it is
useful information to know. I am not arguing about whether one should
consider or ignore the capacity for, e.g., std::vector. I am just arguing
that this information is _required_ to help the reader properly understand
the benchmark results.

> Regards insertion, it is all the fastest method available to that class,

No, it is not. From the source code it looks like you use push_back for
vector when inserting a range of N elements at the end instead of
vector::insert(end, N, T()) or vector::insert(end, begin, end). This is why
I would like to see a table in the text indicating which method it is used
because this is relevant performance-wise.

> As noted at the top of the benchmarks, "I have not included results
involving 'reserve()' functions as the differences to overall insertion
performance were not adequate.". This is true, and what's also true is that
I find reserve irrelevant. If you know the size you need, use an array,
that's my philosophy.

Since the text is omitting relevant details to me as a reader it feels like
it is trying to hide something. It is not about whether reserve is relevant
or not. There are many use cases where one doesn't have a meaningful value
for reserve. It is more about giving the reader all the information it
needs to make sense of the results and decide for itself whether colony is
appropriate for its use case.

You haven't answered my question about erasure, but from the source code it
seems that the benchmarking code is erasing single elements of a vector and
a deque which is N^2 and something nobody would do in a real
performance-sensitive application. In my opinion a fair (and realistic)
erasure benchmark should use erase_remove_if which is O(N) (and can be also
used for all containers).

In a nutshell, the thing that I think would benefit the benchmark text the
most is including more details about what exactly are the benchmarks doing
and how (a table would probably suffice). Making the code browsable online
would certainly help with getting more feedback and improving the
benchmarks to further pinpoint the best use cases of colony. Choosing the
range insert methods for the containers that offer them as well as
erase_remove_if for the erasure benchmarks is something that is worth
exploring (since in the use cases considered nobody would use vector and
deque that way).


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk