Boost logo

Boost :

From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-22 18:23:47


El 22/05/2025 a las 19:36, Vinnie Falco escribió:
> On Thu, May 22, 2025 at 10:09 AM Joaquin M López Muñoz via Boost
> <boost_at_[hidden]> wrote:
>
> In fact I'm somewhat surprised most reviewers haven't asked for
> variations
> of/alternatives to Bloom filters. This can be added to the roadmap if
> the librarygains traction, I guess.
>
>
> Ideally, variations would have motivating use-cases so their
> performance could be studied in a real-world setting.
>
> > The example in the README could be improved by declaring the
> filter thusly:
> >
> >      using filter =
> boost::bloom::filter<boost::core::string_view, 5>;
> >
> > I don't think it is a good idea to give users an example which
> performs
> > unnecessary copying or allocation.
>
> This is not needed cause the filter (with the exception of
> emplace) does
> not createelements on its own --you comment on this later in the
> review.
>
>
> It does create the element temporarily. Here, we are constructing a
> `std::string` from `char const*`:
>
> https://github.com/joaquintides/bloom/blob/fabc8698c1053efdfe5336de2d01a4fb3e44d79f/example/basic.cpp#L28
>
> This is only because the filter type uses `std::string`. [...]
>

Ah ok. Yes, a string must be constructed to extract its hash value.
boost::bloom::filter supports heterogeneous lookup, so you can avoid it
with that,
or use std::string_view to beign with as you pointed out.

> > I'm puzzled about the terminology. "insert" and "emplace" imply
> that new
> > nodes or array elements of type value_type are being stored.
> This is not
> > the case though, as these objects are hashed and the resulting
> digest is
> > used to algorithmically adjust the filter which is in essence an
> array of
> > bool. `emplace` documentation says that it constructs elements.
>
> The interface mimics that of a container as a way to lower the
> entry barrier
> for new users learning it. But filters are _not_ containers, as
> you've found
> out. Claudio also mentioned this. I'll stress the point in some
> prominent place inthe documentation.
>
>
> I have the opposite opinion. That the interface should distance itself
> from a typical container interface, otherwise new users may mistakenly
> believe that they behave similarly in terms of resource usage. At
> least when it comes to strings, proper selection of the template
> parameter type seems to be a foot-gun. Maybe there is room for a type
> alias "string_filter" which is tuned for character strings.

I personally don't see the need: if you want to process strings, you have to
be aware of the implications of using std::string vs. std::string_view,
this is
not something germane to candidate Boost.Bloom per se.

>
> BTW, emplace() does construct the element, calculates its hash,
> and then
> throwsit away.
>
>
> Yeah, this is weird. Why even have the element type? `insert` can in
> theory accept any object which is hashable.

I've been tempted to introduced an insert_hash_value method, but this looks
mighty dangerous for the casual user. As an alternative, one can always
declare
a filter<std::size_t, ...> and do all the hashing stuff externally.

> And emplace doesn't seem as valuable for the bloom filter container as
> it is for regular containers, since there are no nodes. That is,
> rather than `emplace(args...)` a user could just as easily write
> `insert(T(args...))`

It's there just for consistency with container APIs. It also supports
the incredibly
unlikely case of uses-allocator constuction, but this is admittedly not
a real
motivation:

https://en.cppreference.com/w/cpp/memory/uses_allocator

> > I think that the array() function(s) need to guarantee that bits
> beyond the
> > capacity but present in the storage will be zeroed out. This way
> they can
> > be serialized deterministically.
>
> Not sure I'm getting you, but the extent of the span returned by
> array()
> is exactlyas wide as capacity() indicates.
>
>
> What if I construct the filter with say, a prime number of bits like
> 137? What happens to the last 7 bits (to bring it up to 144)?

You can _request_ 137 bits, but the resulting capacity will be slightly
larger
than that to accomodate for the subfilter block size and round up to the
byte.

Joaquin M Lopez Munoz


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk