On 15 July 2015 at 01:07, Soul Studios <matt@soulstudios.co.nz> wrote:
Firstly, you don't invest the multi-year effort to get a library into
Boost because it's superior, as it probably isn't.

You do it because of what you will become after you succeed: a Boost
library author who has passed the judgement of his or her peers.

Personally, I actually want to do it to help others, as that is the focal point of my project. I'm not greatly looking for personal gain, but I know that this algorithm gives much greater efficiency under the circumstances described - and I believe efficiency is important. Also, it solves some problem scenarios that cannot be solved with current algorithms - at least easily.


I am struggling to see the gain over other combinations of STL
containers. If you need very fast deletion and never invalidated
iterators, unordered_map is very hard to beat - the cost is insertion
performance, but even that goes away with Hinnant's node_ptr proposed
extensions to the standard. Dense hash maps probably will be very
competitive, and we sorely need one of those in Boost and the STL.

The problem with map and unordered_map, which you can see from the first benchmark on the colony page (map currently outperforms unordered_map under GCC, so I didn't include it), is that they have terrible iteration and search performance. A colony has better erase and insert performance, and the best iteration speeds against all std:: containers in the context of requiring valid pointers/iterators.
http://www.plflib.org/colony.htm#benchmarks


Don't do it because of that. Do it for yourself personally, or not at
all. Trust me: my library is up for review this Friday. I started the
process of getting it in in 2012. That's the commitment you need to
make.

Gotcha, that is good to know- thanks.


I'd have thought there is surely a space wastage going on here too if
I understand the algorithm. That can have cache spillage effects
under multithreaded use.

If you take a look at the diagram at the top of the colony page, those gaps are filled in when any new 'add' is called - basically, they get added to a 'recycle' stack and popped off at next insertion. All empty slots are reused before adding to the end of the colony. Hence the unordered schema.

However you are correct in that if many elements are deleted and no insert's performed, iteration will suffer - slightly. It doesn't affect cache performance, but obviously if you are skipping over many element slots that takes more time than if those slots don't actually exist anymore!

Also when a memory block is 100% empty ie. all elements erased, the memory block is removed from the chain, so you don't have that wastage anymore. Those blocks can be tuned for size in the constructor.


I'd benchmark your algorithm when running four benchmarks in
parallel. A lot of implementations which look faster single threaded
look very slow when multithreaded due to LL cache pressure. The STL
tries to balance those two off, and it generally succeeds,
sacrificing some single threaded performance for much improved cache
load.

To be fair, I have not looked into multithreaded performance. I pretty much followed the STL guidelines as per under what conditions containers can be used concurrently. I am potentially looking at implementing colonies on CUDA however.



I lack time right now but I'm interested in trying your container in the context of implementations (mostly experimental) of components systems.
I don't think annything related to multithreading should be done by such kind of container. Most high performance usage will be going through all the elements
to transform them (if "alive") and as long as you provide iterators it can be done in parallel the user wants.
 

FYI dense hash maps and bitwise trie maps have *excellent*
multithreaded performance, almost linear speedup with cores, thanks
to their low LL cache loading.

You may correct me here, but my understanding is that vector pretty much outperforms the other std:: containers in terms of iteration as cache performance is better, due to contiguous data location.
According to Chandler Carruth, this also holds true for key-value searchs vs map, as the same rules apply (400ns for L1-cache-search beats multiple 200ns main memory fetches). www.youtube.com/watch?v=fHNmRkzxHWs
If I'm missing something do let me know here.
Genuine thanks,


There are cases where vector is not the right solution but yeah in my experience too it fits most use cases.
Also: http://bannalia.blogspot.fr/2015/06/cache-friendly-binary-search.html
I remember Stroustrup making the same statement at Going Native conference too, and it's also common observation in game devs circles (several other CPPcon talks talked about it too).

By the way, it seems that you might also be interested in recent discussions taking place in SG14: https://groups.google.com/a/isocpp.org/forum/?fromgroups#!topic/sg14/csLjKbLOArw

 

M
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users