[hub] Porting and pre-review experiments

20 Apr 2026

      Hi,

Since the review was announced, I've doing some experiments with te 
proposed data structure. I've collected some notes for technical discussion.

Porting and benchmarking
--------------------------

I took the hub proposal and did a porting (the initial version was 
AI-assisted, a very good starting point) that I tuned until the result 
was good enough for a prototype. It's called "nest" (I could not find a 
fancy name starting with "h", sorry) here:

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

with a bench (based on the one provided by hub) here:

https://github.com/boostorg/container/blob/develop/experimental/bench_hub.cp...

The porting supports all the range of compilers Boost.Container is 
tested and uses internal Boost.Container utilities.

So the first conclusion is that "hub" is perfctly integrable into 
Boost.Container at the same level as other existing containers with a 
low effort.

Porting performance
-------------------

In my local benchmarks the ported container achieves similar performance 
results, although there are some outliers with some tools (e.g. clang) 
that I haven't investigated enough. Also, I only tested this on x86/x64, 
Windows/Linux in my desktop machine, so probably there are rough edges 
here and there.

Benchmarking is certainly magic, I tried to use recommended 
"clobber()"-like instructions before benchmarked operations, but 
probably many boost authors would measure this much better than me.

Data structure configuration options
-----------------------------------

I also added to "nest" configurability options in order to test other 
performance decisions Joaquín has made:

- *prefetch< boolean >* --> Activates prefetching for the nest container 
to see if prefetching improves benchmark results

- *stored_data_in_block< boolean >* --> hub stores 64 values of data 
independently from the "block" (metadata with the mask and the linked 
list pointers). When "true", it stores values in the same struct as the 
metadata (so it allocates metadata + data in the same block)

//An example of a modified configuration
nest
   < element //element type
   , void //default allocator
   , nest_options_t< prefetch<false>, store_data_in_block<true> >
...
;
Supporting these options is not complicated, so they could probably be 
added to "hub" if it's accepted and we can measure some difference on 
some platforms.

----------------------------------------
Prefetch/stored_data_in_block performance difference
----------------------------------------

Using the above options, it's difficult to see if prefetching or storing 
data on the block noticeably improves performance in my computer. I see 
contradictory results depending on the OS (Linux/Windows) or compiler 
(MSVC/Clang-CL, GCC)

I theory I would expect storing data and metadata in the same allocation 
would help a bit, but I could not have a clear answer. I imagine that 
results could differ on a deeply embedded machine versus a 
desktop/server CPU.

My conclusion is that the defaults chosen by Joaquín are correct, at 
least for usual Server/Desktop machines.

--------------
Prefetch changes
-----------------

I experimented with some prefetch changes in iterator's operator++

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

and operator--:

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

Basically prefetching the first slot with a value in the array for 
operator++ or the last one in operator-- instead of always prefetching 
the start of the array. I only see a slight improvement (or maybe it's a 
place effect).

--------------
Local/segmented iterators
--------------

I tried to implement efficient segment

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

and local_iterators

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

in order to test if hub can take advantage of generic segmented 
algorithms. The problem I found with my implementation is that the local 
iterator is nearly as expensive as the general iterator, because it must 
navigate through the data array using nearly the same bit-tricks.

I managed to make the difference between local_iterators (operator-) 
constant-time (basically counting the number of 1s in the mask between 
both positions) and that could have a slight advantage if the algorithm 
does some size-bounded computation (say, algorithms like count_n, 
fill_n, ...):

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

In my implementation it's also more efficient to only support positive 
differences (in the implementation the general implementation supporting 
negative differences is present but disabled), which is the usual case 
for iterator ranges in algorithms.

Some algorithms I tested could improve a bit with difference operation, 
but the segmented approach was still more inneficient than using general 
iterators. Probably Joaquín can improve it and make it sufficiently 
performant.

Hub's iterators are bidirectional but it's possible to make the 
local_iterator random-access (the needed operations can be made O(1)). 
However, the performance of my implementation was not good enough IMHO, 
so probably not very practical. You have the prototype here:

https://github.com/boostorg/container/blob/develop/include/boost/container/e...

--------------
Summary
--------------
- hub's defaults seem correct and performant

- I found hub is faster than the plf_hive, as Joaquín claimed.

- It's possible to fully integrate hub at the same level as other 
containers, with moderate effort. A prototype (nest) is already there 
with similar performance.

- I enjoyed reviewing the implementation, more details about my design 
review in the official review I plan to make.

Best,

Ion

Ion Gaztañaga

Joaquin M López Muñoz

Ion Gaztañaga

tags

participants (2)