Boost Users :

Date view	Thread view	Subject view	Author view

Subject: Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost
From: Soul Studios (matt_at_[hidden])
Date: 2015-07-14 19:07:17

Next message: Jeff Garland: "Re: [Boost-users] Boost.DateTime and leap year/leap seconds"
Previous message: Joel de Guzman: "Re: [Boost-users] Boost 1_59_0_b1_rc1 regression in spirit::qi"
In reply to: Niall Douglas: "Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost"
Next in thread: Klaim - Joël Lamotte: "Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost"
Reply: Klaim - Joël Lamotte: "Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost"

> Firstly, you don't invest the multi-year effort to get a library into
> Boost because it's superior, as it probably isn't.
>
> You do it because of what you will become after you succeed: a Boost
> library author who has passed the judgement of his or her peers.

Personally, I actually want to do it to help others, as that is the
focal point of my project. I'm not greatly looking for personal gain,
but I know that this algorithm gives much greater efficiency under the
circumstances described - and I believe efficiency is important. Also,
it solves some problem scenarios that cannot be solved with current
algorithms - at least easily.

> I am struggling to see the gain over other combinations of STL
> containers. If you need very fast deletion and never invalidated
> iterators, unordered_map is very hard to beat - the cost is insertion
> performance, but even that goes away with Hinnant's node_ptr proposed
> extensions to the standard. Dense hash maps probably will be very
> competitive, and we sorely need one of those in Boost and the STL.

The problem with map and unordered_map, which you can see from the first
benchmark on the colony page (map currently outperforms unordered_map
under GCC, so I didn't include it), is that they have terrible iteration
and search performance. A colony has better erase and insert
performance, and the best iteration speeds against all std:: containers
in the context of requiring valid pointers/iterators.
http://www.plflib.org/colony.htm#benchmarks

> Don't do it because of that. Do it for yourself personally, or not at
> all. Trust me: my library is up for review this Friday. I started the
> process of getting it in in 2012. That's the commitment you need to
> make.

Gotcha, that is good to know- thanks.

> I'd have thought there is surely a space wastage going on here too if
> I understand the algorithm. That can have cache spillage effects
> under multithreaded use.

If you take a look at the diagram at the top of the colony page, those
gaps are filled in when any new 'add' is called - basically, they get
added to a 'recycle' stack and popped off at next insertion. All empty
slots are reused before adding to the end of the colony. Hence the
unordered schema.

However you are correct in that if many elements are deleted and no
insert's performed, iteration will suffer - slightly. It doesn't affect
cache performance, but obviously if you are skipping over many element
slots that takes more time than if those slots don't actually exist anymore!

Also when a memory block is 100% empty ie. all elements erased, the
memory block is removed from the chain, so you don't have that wastage
anymore. Those blocks can be tuned for size in the constructor.

> I'd benchmark your algorithm when running four benchmarks in
> parallel. A lot of implementations which look faster single threaded
> look very slow when multithreaded due to LL cache pressure. The STL
> tries to balance those two off, and it generally succeeds,
> sacrificing some single threaded performance for much improved cache
> load.

To be fair, I have not looked into multithreaded performance. I pretty
much followed the STL guidelines as per under what conditions containers
can be used concurrently. I am potentially looking at implementing
colonies on CUDA however.

> FYI dense hash maps and bitwise trie maps have *excellent*
> multithreaded performance, almost linear speedup with cores, thanks
> to their low LL cache loading.

You may correct me here, but my understanding is that vector pretty much
outperforms the other std:: containers in terms of iteration as cache
performance is better, due to contiguous data location.
According to Chandler Carruth, this also holds true for key-value
searchs vs map, as the same rules apply (400ns for L1-cache-search beats
multiple 200ns main memory fetches). www.youtube.com/watch?v=fHNmRkzxHWs
If I'm missing something do let me know here.
Genuine thanks,
M

Next message: Jeff Garland: "Re: [Boost-users] Boost.DateTime and leap year/leap seconds"
Previous message: Joel de Guzman: "Re: [Boost-users] Boost 1_59_0_b1_rc1 regression in spirit::qi"
In reply to: Niall Douglas: "Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost"
Next in thread: Klaim - Joël Lamotte: "Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost"
Reply: Klaim - Joël Lamotte: "Re: [Boost-users] Query regarding interest in my submitting plf::colony to Boost"

Date view	Thread view	Subject view	Author view

Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net