Boost logo

Boost Users :

Subject: Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost
From: Klaim - Joël Lamotte (mjklaim_at_[hidden])
Date: 2015-07-14 19:44:24


On 15 July 2015 at 01:07, Soul Studios <matt_at_[hidden]> wrote:

> Firstly, you don't invest the multi-year effort to get a library into
>> Boost because it's superior, as it probably isn't.
>>
>> You do it because of what you will become after you succeed: a Boost
>> library author who has passed the judgement of his or her peers.
>>
>
> Personally, I actually want to do it to help others, as that is the focal
> point of my project. I'm not greatly looking for personal gain, but I know
> that this algorithm gives much greater efficiency under the circumstances
> described - and I believe efficiency is important. Also, it solves some
> problem scenarios that cannot be solved with current algorithms - at least
> easily.
>
>
> I am struggling to see the gain over other combinations of STL
>> containers. If you need very fast deletion and never invalidated
>> iterators, unordered_map is very hard to beat - the cost is insertion
>> performance, but even that goes away with Hinnant's node_ptr proposed
>> extensions to the standard. Dense hash maps probably will be very
>> competitive, and we sorely need one of those in Boost and the STL.
>>
>
> The problem with map and unordered_map, which you can see from the first
> benchmark on the colony page (map currently outperforms unordered_map under
> GCC, so I didn't include it), is that they have terrible iteration and
> search performance. A colony has better erase and insert performance, and
> the best iteration speeds against all std:: containers in the context of
> requiring valid pointers/iterators.
> http://www.plflib.org/colony.htm#benchmarks
>
>
> Don't do it because of that. Do it for yourself personally, or not at
>> all. Trust me: my library is up for review this Friday. I started the
>> process of getting it in in 2012. That's the commitment you need to
>> make.
>>
>
> Gotcha, that is good to know- thanks.
>
>
> I'd have thought there is surely a space wastage going on here too if
>> I understand the algorithm. That can have cache spillage effects
>> under multithreaded use.
>>
>
> If you take a look at the diagram at the top of the colony page, those
> gaps are filled in when any new 'add' is called - basically, they get added
> to a 'recycle' stack and popped off at next insertion. All empty slots are
> reused before adding to the end of the colony. Hence the unordered schema.
>
> However you are correct in that if many elements are deleted and no
> insert's performed, iteration will suffer - slightly. It doesn't affect
> cache performance, but obviously if you are skipping over many element
> slots that takes more time than if those slots don't actually exist anymore!
>
> Also when a memory block is 100% empty ie. all elements erased, the memory
> block is removed from the chain, so you don't have that wastage anymore.
> Those blocks can be tuned for size in the constructor.
>
>
> I'd benchmark your algorithm when running four benchmarks in
>> parallel. A lot of implementations which look faster single threaded
>> look very slow when multithreaded due to LL cache pressure. The STL
>> tries to balance those two off, and it generally succeeds,
>> sacrificing some single threaded performance for much improved cache
>> load.
>>
>
> To be fair, I have not looked into multithreaded performance. I pretty
> much followed the STL guidelines as per under what conditions containers
> can be used concurrently. I am potentially looking at implementing colonies
> on CUDA however.
>
>

I lack time right now but I'm interested in trying your container in the
context of implementations (mostly experimental) of components systems.
I don't think annything related to multithreading should be done by such
kind of container. Most high performance usage will be going through all
the elements
to transform them (if "alive") and as long as you provide iterators it can
be done in parallel the user wants.

>
> FYI dense hash maps and bitwise trie maps have *excellent*
>> multithreaded performance, almost linear speedup with cores, thanks
>> to their low LL cache loading.
>>
>
> You may correct me here, but my understanding is that vector pretty much
> outperforms the other std:: containers in terms of iteration as cache
> performance is better, due to contiguous data location.
> According to Chandler Carruth, this also holds true for key-value searchs
> vs map, as the same rules apply (400ns for L1-cache-search beats multiple
> 200ns main memory fetches). www.youtube.com/watch?v=fHNmRkzxHWs
> If I'm missing something do let me know here.
> Genuine thanks,

There are cases where vector is not the right solution but yeah in my
experience too it fits most use cases.
Also: http://bannalia.blogspot.fr/2015/06/cache-friendly-binary-search.html
I remember Stroustrup making the same statement at Going Native conference
too, and it's also common observation in game devs circles (several other
CPPcon talks talked about it too).

By the way, it seems that you might also be interested in recent
discussions taking place in SG14:
https://groups.google.com/a/isocpp.org/forum/?fromgroups#!topic/sg14/csLjKbLOArw

>
> M
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net