[hub] on the specialized iteration over elements

newer
[release] Boost 1.91 RC1 available

Andrzej Krzemienski

16 Apr 2026 16 Apr '26

7:42 p.m.

Hi Everyone, Once again, thank you Joaquín for this contribution to the community. This is not really a review, but I wanted to talk about this idea to implement a specialized algorithm for iterating over the container elements in the form of hub::visit. First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach? Second, I think that the most natural naming choice for this operation is "for each", not "visit". It does practically the same thing as `std::for_each`. Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements: hub<T> h, g; h.visit(g.begin(), g.end(), fun); https://godbolt.org/z/bbb765bb1 It looks like this function should be a free function rather than a member function. And if there is a reason to make it a member function, at minimum add a precondition that both the iterators should point inside the container. Regards, &rzej;

Show replies by date

Ion Gaztañaga

16 Apr 16 Apr

9:49 p.m.

El 16/04/2026 a las 21:42, Andrzej Krzemienski via Boost escribió:

...

Hi Everyone, Once again, thank you Joaquín for this contribution to the community.

This is not really a review, but I wanted to talk about this idea to implement a specialized algorithm for iterating over the container elements in the form of hub::visit.

First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

Yes, this is a real abstraction penalty. For deque, I'm expementing (you can see it in the develop branch of containers) with segmented iterators (based on Austern's classic paper, https://lafstern.org/matt/segmented.pdf) and the initial tests shows improvement vs current STL implementations. Member functions know the internal structure of the container, so that iteration can be implemented more efficiently. Another advantage of those "visit"-like member functions is that in some platforms, knowing the internal data structure allows using prefetch instructions so that the CPU can do work while the memory we are very probably going to touch next is already requested to the cache. With hive, I think the same optimization applies: the member function iterates more efficiently and explicitly prefetching next block and/or element array (the hardware prefetcher probably only helps with contiguous memory accesses) while calling the function object (which is doing work) can mask the memory latency and make a difference. The STL is great but any abstraction has a penalty. I suspect std::ranges has even a greater performance penalty. Best, Ion

Andrzej Krzemienski

17 Apr 17 Apr

6:46 a.m.

czw., 16 kwi 2026 o 23:49 Ion Gaztañaga <igaztanaga@gmail.com> napisał(a):

...

El 16/04/2026 a las 21:42, Andrzej Krzemienski via Boost escribió:

...
Hi Everyone, Once again, thank you Joaquín for this contribution to the community.

This is not really a review, but I wanted to talk about this idea to implement a specialized algorithm for iterating over the container elements in the form of hub::visit.

First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

Yes, this is a real abstraction penalty. For deque, I'm expementing (you can see it in the develop branch of containers) with segmented iterators (based on Austern's classic paper, https://lafstern.org/matt/segmented.pdf) and the initial tests shows improvement vs current STL implementations.

Member functions know the internal structure of the container, so that iteration can be implemented more efficiently.

Another advantage of those "visit"-like member functions is that in some platforms, knowing the internal data structure allows using prefetch instructions so that the CPU can do work while the memory we are very probably going to touch next is already requested to the cache.

With hive, I think the same optimization applies: the member function iterates more efficiently and explicitly prefetching next block and/or element array (the hardware prefetcher probably only helps with contiguous memory accesses) while calling the function object (which is doing work) can mask the memory latency and make a difference.

The STL is great but any abstraction has a penalty. I suspect std::ranges has even a greater performance penalty.

Ok, so in the context of this review, focusing only on the API part, maybe this concept of "iteration with the opportunity to use segmented iteration" calls for a "generic" interface with the ability for concrete containers to customize (specialize) for their implementation details? If I am implementing my own iterator-agnostic algorithm, I would like to be able to say, "iterate over this range in the fastest possible way". Regards, &rzej;

Ion Gaztañaga

7:34 a.m.

El 17/04/2026 a las 8:46, Andrzej Krzemienski via Boost escribió:

...

Ok, so in the context of this review, focusing only on the API part, maybe this concept of "iteration with the opportunity to use segmented iteration" calls for a "generic" interface with the ability for concrete containers to customize (specialize) for their implementation details?

If I am implementing my own iterator-agnostic algorithm, I would like to be able to say, "iterate over this range in the fastest possible way".

Regards, &rzej;

The "generic interface" would be a good tool, but I think that a generic interface would not reach the optimization level that a member function that specifically targets the details of the data structure can achieve. Benchmarks will tell... I, for instance, could not beat the performance of the current member functions, using the Austerns "segmented iterator" generic interface. In some algorithms, it was even slower than the non-segmented, classic STL algorithm. Best, Ion

Jean-Louis Leroy

16 Apr 16 Apr

10:23 p.m.

...

Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements:

hub<T> h, g; h.visit(g.begin(), g.end(), fun);

I guess it can be useful to iterate over a narrower range. But in that spirit, I would expect 'visit' to return the "end" iterator. J-L

Joaquin M López Muñoz

17 Apr 17 Apr

7:51 a.m.

El 17/04/2026 a las 0:23, Jean-Louis Leroy via Boost escribió:

...

...
Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements:

hub<T> h, g; h.visit(g.begin(), g.end(), fun); I guess it can be useful to iterate over a narrower range. But in that spirit, I would expect 'visit' to return the "end" iterator.

To be quite honest, I don't see tremendous value on iterating over a range narrower than the whole container, because insertion order in hub is arbitrary and there's no meaningul property that elements of a subrange would share --same rationale applies to other unordered containers like, for instance, std::unordered_map. The function visit(first, last, f) is there for operational closure reasons, so to say. Joaquín M López Muñoz

Alexander Grund

8:06 a.m.

Am 17.04.26 um 09:51 schrieb Joaquin M López Muñoz via Boost:

...

El 17/04/2026 a las 0:23, Jean-Louis Leroy via Boost escribió:

...
...
Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements:

hub<T> h, g; h.visit(g.begin(), g.end(), fun); I guess it can be useful to iterate over a narrower range. But in that spirit, I would expect 'visit' to return the "end" iterator.

To be quite honest, I don't see tremendous value on iterating over a range narrower than the whole container, because insertion order in hub is arbitrary and there's no meaningul property that elements of a subrange would share --same rationale applies to other unordered containers like, for instance, std::unordered_map. The function visit(first, last, f) is there for operational closure reasons, so to say.

If your functor could break out of iteration, like Ions suggested visit_while it could make sense to be able to "resume" iteration where it left off

Joaquin M López Muñoz

8:35 a.m.

El 17/04/2026 a las 10:06, Alexander Grund via Boost escribió:

...

Am 17.04.26 um 09:51 schrieb Joaquin M López Muñoz via Boost:

...
El 17/04/2026 a las 0:23, Jean-Louis Leroy via Boost escribió:

...
...
Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements:

hub<T> h, g; h.visit(g.begin(), g.end(), fun); I guess it can be useful to iterate over a narrower range. But in that spirit, I would expect 'visit' to return the "end" iterator.

To be quite honest, I don't see tremendous value on iterating over a range narrower than the whole container, because insertion order in hub is arbitrary and there's no meaningul property that elements of a subrange would share --same rationale applies to other unordered containers like, for instance, std::unordered_map. The function visit(first, last, f) is there for operational closure reasons, so to say.

If your functor could break out of iteration, like Ions suggested visit_while it could make sense to be able to "resume" iteration where it left off

Umm, yes, that's a potential use case, thank you! FWIW, visit_while is already provided in the version of hub under review. Joaquín M López Muñoz

Peter Dimov

16 Apr 16 Apr

11:16 p.m.

Andrzej Krzemienski wrote:

...

First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

This is called "segmented iterators" (originally motivated by std::deque as the best known segmented data structure); Matt Austern coined the name in https://lafstern.org/matt/segmented.pdf and libc++ has been doing work on it (under the hood), e.g. https://github.com/llvm/llvm-project/blob/main/libcxx/include/__iterator/seg... https://github.com/llvm/llvm-project/issues/102817 When I last looked at this I had some ideas how it can be improved and possibly standardized, but I've forgotten everything already. :-) This might be a good opportunity for Boost to add a segmented iterator API.

Jean-Louis Leroy

17 Apr 17 Apr

1:27 a.m.

It seems to me that 'visit' is useful and legal only in combination with 'operator++', 'operator--' and the likes of 'std::next', 'std::prev', 'std::advance', etc. For example: int main() { boost::container::hub<int> h {1, 3, 5, 7, 9}; auto first = h.begin(); auto middle = std::next(first, h.size() / 2); auto last = h.end(); h.visit(first, middle, [](int i){ std::print("{} ", i); }); std::print(" | "); h.visit(middle, last, [](int i){ std::print("{} ", i); }); } (https://godbolt.org/z/1e5E413r8) Important use cases are divide-and-conquer algorithms and parallelization. From here, two things. If 'visit' is faster than just "ranging over", it probably (?) means that we need 'hub::advance' etc as well. But then an integer displacement is always involved, be it 1 or -1. Doesn't it suggest that 'visit' should take offsets, not iterators? That would take care of Andrzej's concern. And there is no need for public 'advance' etc members.

Alexander Grund

7:34 a.m.

Am 17.04.26 um 03:27 schrieb Jean-Louis Leroy via Boost:

...

It seems to me that 'visit' is useful and legal only in combination with 'operator++', 'operator--' and the likes of 'std::next', 'std::prev', 'std::advance', etc. For example:

int main() { boost::container::hub<int> h {1, 3, 5, 7, 9}; auto first = h.begin(); auto middle = std::next(first, h.size() / 2); auto last = h.end(); h.visit(first, middle, [](int i){ std::print("{} ", i); }); std::print(" | "); h.visit(middle, last, [](int i){ std::print("{} ", i); }); }

(https://godbolt.org/z/1e5E413r8)

Important use cases are divide-and-conquer algorithms and parallelization.

...
From here, two things.

If 'visit' is faster than just "ranging over", it probably (?) means that we need 'hub::advance' etc as well.

But then an integer displacement is always involved, be it 1 or -1. Doesn't it suggest that 'visit' should take offsets, not iterators? That would take care of Andrzej's concern. And there is no need for public 'advance' etc members.

You'd probably still want iterator support not (only) offsets as you might want to process all elements until or after the element returned by e.g. `std::find` As for `advance`: That sounds like an ADL-enabled `next` similar to `swap`

Joaquin M López Muñoz

8:05 a.m.

El 17/04/2026 a las 3:27, Jean-Louis Leroy via Boost escribió:

...

It seems to me that 'visit' is useful and legal only in combination with 'operator++', 'operator--' and the likes of 'std::next', 'std::prev', 'std::advance', etc. For example:

int main() { boost::container::hub<int> h {1, 3, 5, 7, 9}; auto first = h.begin(); auto middle = std::next(first, h.size() / 2); auto last = h.end(); h.visit(first, middle, [](int i){ std::print("{} ", i); }); std::print(" | "); h.visit(middle, last, [](int i){ std::print("{} ", i); }); }

(https://godbolt.org/z/1e5E413r8)

Important use cases are divide-and-conquer algorithms and parallelization.

...
From here, two things.

If 'visit' is faster than just "ranging over", it probably (?) means that we need 'hub::advance' etc as well.

This paralellization/divide_and_conquer scenario opens up an interesting exploration opportunity. There's a performance problem with std::next, though: a hub is basically a linked list of 64-element blocks, each equipped with an occupancy bitmask; advancing an iterator n positions reduces necessarily to incrementing the iterator n times, checking for occupied slots etc at each step, that is, there's no shortcut that could make the operation faster. An alternative would be having something like mid(first, last) that returns an iterator _roughly_ midway between first and last. The "roughly" part allows us to do the operation much more quickly (like 64x faster or more) by happily jumping over entire blocks, assuming the load factor of each block is more or less the same (which is not true generally, but it may be ok if your goal is to split the entire range into reasonably sized subranges for parallelization).

...

But then an integer displacement is always involved, be it 1 or -1. Doesn't it suggest that 'visit' should take offsets, not iterators? That would take care of Andrzej's concern. And there is no need for public 'advance' etc members.

Sorry, I don't get your argument here, care elaborate? Thank you! Joaquín M López Muñoz

Ion Gaztañaga

6:26 a.m.

El 17/04/2026 a las 1:16, Peter Dimov via Boost escribió:

...

Andrzej Krzemienski wrote:

...
First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

This is called "segmented iterators" (originally motivated by std::deque as the best known segmented data structure); Matt Austern coined the name in

https://lafstern.org/matt/segmented.pdf

and libc++ has been doing work on it (under the hood), e.g.

https://github.com/llvm/llvm-project/blob/main/libcxx/include/__iterator/seg... https://github.com/llvm/llvm-project/issues/102817

When I last looked at this I had some ideas how it can be improved and possibly standardized, but I've forgotten everything already. :-)

This might be a good opportunity for Boost to add a segmented iterator API.

You can find the experiments that I'm doing with segmented iterators and several algorithms (some are more complicated to implement, if you want to support recursively segmented iterators or input and output iterator) that the Austern's std::fill example). Here the experimental implementation of several segmented algorithms using Austern's traits: Prototype implementations: https://github.com/boostorg/container/tree/develop/include/boost/container/e... Some tests and a general bench agains std: https://github.com/boostorg/container/tree/develop/experimental If the vectorizer is activated (deque is the best example, as the local_iterator can be a raw pointer) you can get 5x-10x performance improvement in some scenarios. You can have worse performance in some cases. I haven't investigate those. I tried to design efficient segmented iterators for a hub-like container but I didn't have success (the hub segmente is relatively small, 64 elements, and the local iterator is not a sufficiently simpler iterator than the general iterator that Joaquín has designed). I'm currently trying to make all those algorithms support recursively segmented iterators, including also dual segmented input and output iterators (say, for std::copy). If results are consistent across compilers and in a broad ranges of types, as those obtained in these preliminary tests, maybe I should try to present this sutff as a new Boost library... If you could use your local brain prompt to recover those lost ideas... Best, Ion

Joaquin M López Muñoz

8:33 a.m.

El 17/04/2026 a las 8:26, Ion Gaztañaga via Boost escribió:

...

El 17/04/2026 a las 1:16, Peter Dimov via Boost escribió:

...
Andrzej Krzemienski wrote:

...
First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

This is called "segmented iterators" (originally motivated by std::deque as the best known segmented data structure); Matt Austern coined the name in

https://lafstern.org/matt/segmented.pdf

and libc++ has been doing work on it (under the hood), e.g.

https://github.com/llvm/llvm-project/blob/main/libcxx/include/__iterator/seg...

https://github.com/llvm/llvm-project/issues/102817

When I last looked at this I had some ideas how it can be improved and possibly standardized, but I've forgotten everything already. :-)

This might be a good opportunity for Boost to add a segmented iterator API.

You can find the experiments that I'm doing with segmented iterators and several algorithms (some are more complicated to implement, if you want to support recursively segmented iterators or input and output iterator) that the Austern's std::fill example). Here the experimental implementation of several segmented algorithms using Austern's traits:

Prototype implementations:

https://github.com/boostorg/container/tree/develop/include/boost/container/e...

Some tests and a general bench agains std:

https://github.com/boostorg/container/tree/develop/experimental

If the vectorizer is activated (deque is the best example, as the local_iterator can be a raw pointer) you can get 5x-10x performance improvement in some scenarios. You can have worse performance in some cases. I haven't investigate those.

I tried to design efficient segmented iterators for a hub-like container but I didn't have success (the hub segmente is relatively small, 64 elements, and the local iterator is not a sufficiently simpler iterator than the general iterator that Joaquín has designed).

Exactly, hub doesn't have a local iterator per se, and intrablock visit resolves to (simplified pseoducode): auto mask = pb->mask; // pb is a pointer to the block while(mask) { int n = std::countr_zero(mask); f(pb->data[n]); mask &= mask - 1; // clear least significant one } which, I presume, any local iterator abstraction wouldn't optimize to unless the compiler is extremely smart, though I can work with you Ion to see if we can get close. Joaquín M López Muñoz

Ion Gaztañaga

9:38 a.m.

El 17/04/2026 a las 10:33, Joaquin M López Muñoz via Boost escribió:

...

Exactly, hub doesn't have a local iterator per se, and intrablock visit resolves to (simplified pseoducode):

auto mask = pb->mask; // pb is a pointer to the block while(mask) { int n = std::countr_zero(mask); f(pb->data[n]); mask &= mask - 1; // clear least significant one }

which, I presume, any local iterator abstraction wouldn't optimize to unless the compiler is extremely smart, though I can work with you Ion to see if we can get close.

Right. The local iterator must do the same bitmask trick, which is highly optimized (congrats Joaquín). The only advantage is that we could have constant-time local iterator distance (using popcount or similar), the classic STL-like algorithm does not take advantage of, but I have ideas on how this could be used to implement some hand-made unrolling in some algorithms. For instance, std::ranges have this "sized_range" concept that has weaker requirements than a full random-access_iterator, and I've found some algorithms can be a bit faster just knowing the distance between the iterator range without having random-access functions (e.g. you can optimize a bit things like count_n and other xxx_n algorithms for forward/bidirectional iterators if additionally you can calculate the distance between to iterators (which is possible for hub local iterators because after some masking popcount-like intrinsic operators can count the number of "1s" in the bit mask quite fast. Also, I managed to convert the hub local_iterator into a random-access iterator (implementing operator += inside the block in constant-time), but the random-access operations were not fast enough (I'm not expert on bit-handling optimizations, I must admit) to make the local iterator noticealy faster vs the general iterator. The segmentation iteration has also some overhead that must be compensated with a lighter local iterator and I could not achieve this for the hub-like data structure. I'll try to describe my experiments with a hub-like port in another thread, although I'm spoiling my own review effort. The goal of this experiment was to deeply analyze the design of hub and alternative design decisions (spoiler, Joaquín's design seem to be the right one) for my own review, but maybe I can describe a bit more the experiment to analyze if there is interesting work to do in that line... Best, Ion

Joaquin M López Muñoz

7:43 a.m.

El 16/04/2026 a las 21:42, Andrzej Krzemienski escribió:

...

Hi Everyone, Once again, thank you Joaquín for this contribution to the community.

This is not really a review, but I wanted to talk about this idea to implement a specialized algorithm for iterating over the container elements in the form of hub::visit.

First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

Ion has aready followed up on this subtopic, I'll add my observations on that subthread.

...

Second, I think that the most natural naming choice for this operation is "for each", not "visit". It does practically the same thing as `std::for_each`.

Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements:

hub<T> h, g; h.visit(g.begin(), g.end(), fun);

https://godbolt.org/z/bbb765bb1

It looks like this function should be a free function rather than a member function. And if there is a reason to make it a member function, at minimum add a precondition that both the iterators should point inside the container.

Yes, I think your proposal makes sense, since, as you CE snippet shows, the operation can be performed based on the iterators alone. So, instead of visit and visit_all we can have global functions: template<hub_iterator Iterator, typename F> Iterator for_each(Iterator first, Iterator last, F f); template<typename T, typename Allocator, typename F> typename hub<T, Allocator>::iterator for_each(hub<T, Allocator>& x, F f) { return for_each(x.begin(), x.end(), std:.ref(f)); } // same for const hub I have a bikeshedding problem with visit_while/visit_all_while, though: the morally closest function in <algorithm> would be std::find_if_not, which is not a very intuitive equivalent and has, in fact, different semantics (predicates can't change the passed element). Ideas on naming? for_each_while would be my first option, I guess. Joaquín M López Muñoz

Andrzej Krzemienski

8:51 a.m.

On Fri, Apr 17, 2026, 09:43 Joaquin M López Muñoz < joaquinlopezmunoz@gmail.com> wrote:

...

El 16/04/2026 a las 21:42, Andrzej Krzemienski escribió:

...
Hi Everyone, Once again, thank you Joaquín for this contribution to the community.

This is not really a review, but I wanted to talk about this idea to implement a specialized algorithm for iterating over the container elements in the form of hub::visit.

First, should we draw from this a more general observation that iterators are an abstraction with overhead, and other containers (deque?) would also benefit from this inversion-of-control approach?

Ion has aready followed up on this subtopic, I'll add my observations on that subthread.

...
Second, I think that the most natural naming choice for this operation is "for each", not "visit". It does practically the same thing as `std::for_each`.

Third, the interface is rather unusual, and it allows one hub to iterate over another hub's elements:

hub<T> h, g; h.visit(g.begin(), g.end(), fun);

https://godbolt.org/z/bbb765bb1

It looks like this function should be a free function rather than a member function. And if there is a reason to make it a member function, at minimum add a precondition that both the iterators should point inside the container.

Yes, I think your proposal makes sense, since, as you CE snippet shows, the operation can be performed based on the iterators alone.

So, instead of visit and visit_all we can have global functions:

template<hub_iterator Iterator, typename F> Iterator for_each(Iterator first, Iterator last, F f);

template<typename T, typename Allocator, typename F> typename hub<T, Allocator>::iterator for_each(hub<T, Allocator>& x, F f) { return for_each(x.begin(), x.end(), std:.ref(f)); }

// same for const hub

I have a bikeshedding problem with visit_while/visit_all_while, though: the morally closest function in <algorithm> would be std::find_if_not, which is not a very intuitive equivalent and has, in fact, different semantics (predicates can't change the passed element). Ideas on naming? for_each_while would be my first option, I guess.

A very good and descrptive name, IMO. Regards, &rzej;

...

Joaquín M López Muñoz

Age (days ago)

Last active (days ago)

List overview

16 comments

6 participants

participants (6)

Alexander Grund
Andrzej Krzemienski
Ion Gaztañaga
Jean-Louis Leroy
Joaquin M López Muñoz
Peter Dimov

[hub] on the specialized iteration over elements

tags

participants (6)