Boost logo

Boost :

From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-21 18:06:44


El 21/05/2025 a las 11:05, Ruben Perez via Boost escribió:
> On Tue, 13 May 2025 at 11:48, Arnaud Becheler via Boost
> <boost_at_[hidden]> wrote:
>> Dear Boost community,
>>
>> The review of the Boost.Bloom library begins today May 13th, 2025, and will
>> run through May 22nd, 2025.
>
> Hi all,
>
> Since I'm not very knowledgeable of Bloom filters or containers, and
> I'm friends with the author, I won't emit a formal review of the
> proposed Boost.Bloom. However, I still would like to provide some
> (hopefully useful) feedback for the author.

Your feedback is much welcome!

> In general, I think the library is great - it solves one problem and
> solves it well, and the documentation does a great job explaining the
> domain problem, the design decisions that were made and how to use the
> library.
>
> Some points regarding the interface and implementation:
>
> * Peeking the implementation, I've got the impression that the block
> class, when using integers distinct to unsigned char (e.g.
> block<std::uint32_t>), end up using reinterpret_cast<std::uint32_t*>
> on the underlying unsigned char array. AFAIK this is undefined
> behavior, since no std::uint32_t object exists at this location. I
> don't know how relevant this is though, as I guess it works fine in
> most compilers.

There are two situations here:

* When possible, the array memory is aligned to Block and accessed
directly as
if it were an array of Blocks. I think we're good here as Block, being
an unsigned
integral type, is also an implicit-lifetime type.
* When alignment is not possible (typically, when there's subarray
overlap), Blocks
are memcpy'ed from and to the array. Again I think this is legal because
unsigned integral types are trivially copyable.

> * Is there a reason to use boost::uint64_t and boost::uint32_t in a
> C++11 library? (vs. std::uint64_t and std::uint32_t).

I don't know, I'm just in the habit of using <boost/cstdint.hpp> since time
immemorial. Yes, I shall use <cstdint> instead.

> * Does filter::reset() without arguments make sense?

filter::reset() is the same as filter::reset(0), i.e. it deallocates the bit
array. I find reset() convenient and reasonably self-explanatory, but if
the community thinks otherwise I can just omit the default arg value.

> * Does filter::emplace() add any value over just constructing and
> inserting an object? In a traditional container, this is justified
> because the elements are actually stored, but that does not seem to be
> the case here.

It's basically there for consistency with standard container APIs. There's
an astronomically unlikely use case for it, namely uses-allocator value_type
construction:

https://en.cppreference.com/w/cpp/memory/uses_allocator

> * Do the constructor and operator= overloads taking a range of
> elements add any value? They seem to be equivalent to
> constructing/assigning and then inserting the values separately.

Again, consistency with standard APIs. Yes, they're equivalent to
construction
plus insertion.

> * The block classes seem to perform no type checking on the passed
> Block type argument.

You're right, I had it in my mental backlog and forgot about it. Will fix.

> * The convenience header <boost/bloom.hpp> seems to be missing.

I'm a little wary of omnibus headers cause they keep growing as the
lib does. In this particular case the whole library is very small, so I
guess
it's not a problem.

> [...]
>
> Some points regarding the documentation:
>
> * I struggle to understand the BucketSize template parameter. I think
> this is because the term "bucket" is never mentioned in the primer.
> It'd be cool if some information regarding this concept was mentioned
> in the primer.

The concept does not belong in the primer as this business with
overlapping subarrays is an innovation in candidate Boost.Bloom. Anyway,
this was already brought up by Ivan Matek, I will add a diagram to support
the explanation on buckets, subarrays, and how they interact.

> * The serialization example saves and loads a filter using
> multiblock<std::uint64_t>. Is the bit pattern generated by the filter
> endianness-dependent? One of the papers you reference [1] mentions
> sending such representations over the network, which would make this
> point relevant for the reader.

Yes, it is endianness-dependent. Should I mention this somewhere, I guess?

> * I've found the examples more difficult to read than what I'd have
> liked. Some more comments would be useful.

Noted, I can add more comments.

> * Examples in the docs would benefit from syntax highlighting. You can
> use asciidoc [source,cpp] blocks to achieve it.

This is how it was initially but then I took it back for some reason
I can't remember now. Will look at it again.

> * I'd try to avoid C-style casts in the examples.

Noted.

> * I'd try to make code snippets as close to real code as possible.
> That is, I'd try to avoid things like "using filter =
> boost::bloom::filter<std::string, ...>;", in favor of something that
> the user may copy to their code.

You mean user-defined filter being homonym with boost::bloom::filter?
I don't see a problem with that, even in real code.

> Some points regarding testing:
>
> * It looks like the examples are not being built by any CI build. This
> is dangerous because it yields to code rotting.

Ok. Do you of any Boost lib already doing that from which I can take
inspiration?

> * It looks like the tests are not being built from CMake. Having
> "if(BUILD_TESTING AND EXISTS
> "${CMAKE_CURRENT_SOURCE_DIR}/test/CMakeLists.txt")" in the main
> CMakeLists.txt may trick the reader into thinking they are, but there
> is no test/CMakeLists.txt. I understand that this is what's generated
> by default by boostdep --cmake, but I'd advise either to add a
> test/CMakeLists.txt (preferable) or removing the if statement
> altogether.

I depend on others' kindness for that as my knowledge of CMake
is embarrasingly limited. Will ask for help to do it (if you know how to do
it and would like to submit a PR, be my guest!)

> * The Jamfile in example/ shouldn't specify <cxxstd> directly.

How should I do then?

> * Adding a code coverage metric would be useful.

Noted, will do.

> Hope this helps.

It certainly does! My post-review backlog is growing by the hour :)
Thanks again for taking the time to go through my proposal.

Joaquin M Lopez Munoz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk