![]() |
Boost : |
From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2025-05-22 17:36:31
On Thu, May 22, 2025 at 10:09â¯AM Joaquin M López Muñoz via Boost <
boost_at_[hidden]> wrote:
> In fact I'm somewhat surprised most reviewers haven't asked for variations
> of/alternatives to Bloom filters. This can be added to the roadmap if
> the library gains traction, I guess.
>
Ideally, variations would have motivating use-cases so their performance
could be studied in a real-world setting.
> > The example in the README could be improved by declaring the filter
> thusly:
> >
> > using filter = boost::bloom::filter<boost::core::string_view, 5>;
> >
> > I don't think it is a good idea to give users an example which performs
> > unnecessary copying or allocation.
>
> This is not needed cause the filter (with the exception of emplace) does
> not create elements on its own --you comment on this later in the review.
>
It does create the element temporarily. Here, we are constructing a
`std::string` from `char const*`:
This is only because the filter type uses `std::string`. I realize in the
example that these probably qualify for std::string's small buffer
optimization, and users might be surprised when "inserting" larger strings
that dynamic allocations take place.
> > I'm puzzled about the terminology. "insert" and "emplace" imply that new
> > nodes or array elements of type value_type are being stored. This is not
> > the case though, as these objects are hashed and the resulting digest is
> > used to algorithmically adjust the filter which is in essence an array of
> > bool. `emplace` documentation says that it constructs elements.
>
> The interface mimics that of a container as a way to lower the entry
> barrier
> for new users learning it. But filters are _not_ containers, as you've
> found
> out. Claudio also mentioned this. I'll stress the point in some
> prominent place in the documentation.
>
I have the opposite opinion. That the interface should distance itself from
a typical container interface, otherwise new users may mistakenly believe
that they behave similarly in terms of resource usage. At least when it
comes to strings, proper selection of the template parameter type seems to
be a foot-gun. Maybe there is room for a type alias "string_filter" which
is tuned for character strings.
BTW, emplace() does construct the element, calculates its hash, and then
> throws it away.
>
Yeah, this is weird. Why even have the element type? `insert` can in theory
accept any object which is hashable. And emplace doesn't seem as valuable
for the bloom filter container as it is for regular containers, since there
are no nodes. That is, rather than `emplace(args...)` a user could just as
easily write `insert(T(args...))`
> > I think that the array() function(s) need to guarantee that bits beyond
> the
> > capacity but present in the storage will be zeroed out. This way they can
> > be serialized deterministically.
>
> Not sure I'm getting you, but the extent of the span returned by array()
> is exactly as wide as capacity() indicates.
>
What if I construct the filter with say, a prime number of bits like 137?
What happens to the last 7 bits (to bring it up to 144)?
Thanks
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk