El 20/04/2026 a las 23:15, Ion Gaztañaga escribió:
Hi,
Since the review was announced, I've doing some experiments with te proposed data structure. I've collected some notes for technical discussion.
[...]
---------------------------------------- Prefetch/stored_data_in_block performance difference ----------------------------------------
Using the above options, it's difficult to see if prefetching or storing data on the block noticeably improves performance in my computer. I see contradictory results depending on the OS (Linux/Windows) or compiler (MSVC/Clang-CL, GCC)
I theory I would expect storing data and metadata in the same allocation would help a bit, but I could not have a clear answer. I imagine that results could differ on a deeply embedded machine versus a desktop/server CPU.
My conclusion is that the defaults chosen by Joaquín are correct, at least for usual Server/Desktop machines.
-------------- Prefetch changes -----------------
I experimented with some prefetch changes in iterator's operator++
https://github.com/boostorg/container/blob/develop/include/boost/container/e...
and operator--:
https://github.com/boostorg/container/blob/develop/include/boost/container/e...
Basically prefetching the first slot with a value in the array for operator++ or the last one in operator-- instead of always prefetching the start of the array. I only see a slight improvement (or maybe it's a place effect).
Regarding prefetching and storing data in block or separately, I did my own experiments and settled down for what's been proposed, though of course some more experimenting can't hurt. Separate data in particular was a noticeable win for some element sizes and environments, which I attribute (without proof) to block headers and data blocks ending up in different per-size pools in the allocator, with a resulting improvement in cache locality for both.
-------------- Local/segmented iterators --------------
I tried to implement efficient segment
https://github.com/boostorg/container/blob/develop/include/boost/container/e...
and local_iterators
https://github.com/boostorg/container/blob/develop/include/boost/container/e...
in order to test if hub can take advantage of generic segmented algorithms. The problem I found with my implementation is that the local iterator is nearly as expensive as the general iterator, because it must navigate through the data array using nearly the same bit-tricks.
At this microlevel, it's hard to beat a local mask-based loop as boost::container::hub uses. I noticed that you check for empty segments here: https://github.com/boostorg/container/blob/develop/include/boost/container/e... ("if (BOOST_LIKELY(m != 0))".) The check is not needed: boost::container::{hub|nest} only traverses non-empty segments.
[...]
-------------- Summary -------------- - hub's defaults seem correct and performant
- I found hub is faster than the plf_hive, as Joaquín claimed.
- It's possible to fully integrate hub at the same level as other containers, with moderate effort. A prototype (nest) is already there with similar performance.
- I enjoyed reviewing the implementation, more details about my design review in the official review I plan to make.
Looking forward to your review, Joaquín M López Muñoz