Boost logo

Boost :

From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-18 16:52:33


El 18/05/2025 a las 13:29, Andrzej Krzemienski via Boost escribió:
> Hi Everyone,
> The following can hardly be called a library review, but I still wanted to
> give this feedback.

Hi Andrzej, thanks for this feedback! It'd be great if you can find the
time to write a full review.

> [...].
>
> Some interface suggestions:
>
> 1. Use the term "bit_capacity" rather than "capacity" for representing
> capacity in bits. This is not to accidently mislead the users.

I find the name ugly but it certainly dispels any potential
ambiguity, so, yes, I can adopt that.

> [...]
> 3. When describing the |= operator, use the name "set union", which is the
> industry standard. The current choice of words "containing both the
> elements of f and f2" can be interpreted both as a union and as an
> intersection.

Noted, will do.

> 4. I am uneasy about the functionality under &= operator. It implies
> symmetry with operator |=. But the symmetry is broken for at least two
> reasons:
> * If I put some elements to filter f1 and some other elements to filter f2,
> and then I apply |=, then the effect is the same as if I put all the
> elements in f1 to begin with. The same property, if I understand correctly,
> does not apply to operator &=.

The following, dual property holds: if I put some elements to f1 and
some other
elements to f2 and then I apply &=, those elements that were both in f1
and f2
will also be in the result. The symmetry breaks in that this operation
is _not_
the same as constructing a filter from scratch with the elements in both f1
and f2, as the latter won't have stray, unnecessary bits set to one that may
potentially be left over in the former.

> * The compromising of the FPR value that you are describing.
> I am not sure if anyone ever needs this intersection behavior. But if it is
> useful, maybe reflect this lack of symmetry by using a named function
> rather than an operator?

I don't like this much. Admittedly there's a lack of simmetry due to FPR
degradation, but &= is so descriptive of what's going on that I'd rather
keep it.

> 5. Does the suffix "_for" in the name "fpr_for" bring any value?

This suffix has been added for consistency reasons. We have
capacity() (non-static, capacity of the filter under consideration)
and capacity_for(...) (static, capacity estimation, not dependent
on the state of any filter). And fpr_for(...) is sort of the inverse
function of capacity_for(...), so it makes sense (in my mind at
least) that it have the same suffix.

> 6. Not sure if it is a good idea to have two overloads for `reset()` doing
> slightly different things. For constructors this is unavoidable, but for
> member functions, you can have `reset` and `reset_for`.

This usage of "_for()" to unload the reset overload is not consistent
with how it's used for capacity() and capacity_for(). I mean,
reset() is NOT to reset_for() as capacity() is to capacity_for(); rather
both overloads of reset do the same thing with different input parameters.

> 7. I think that you are missing a precondition for operators &= and |=.
> They work under the assumptions that for both filters f1 and f2 inserting
> the same element renders the same bit pattern in the hash. But this is not
> the case when the hashers that I provide to f1 and f2 have the same type
> but different initial state:
>
> filter<T, MyHash> f1(1000, MyHash{1});
> filter<T, MyHash> f2(1000, MyHash{2});

Well, technically the behavior is well defined even if the hashers are not
equivalent:

"[...] changes the value of each bit in the internal array with the
result of
doing a logical AND operation of that bit and the corresponding one in x."

Admittedly, doing this with non-equivalent hashers will produce nothing
but garbage, so... well, I don't know what I should do here :-)

> [...]
> Joaquín, thank you for writing and sharing this library!

Thank _you_ for taking the time to look into my proposal.

Joaquin M Lopez Munoz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk