![]() |
Boost : |
From: Ivan Matek (libbooze_at_[hidden])
Date: 2025-05-18 21:38:54
Had a bit more time to think :) so here are my replies and few more
questions.
On Sun, May 18, 2025 at 11:56â¯AM Joaquin M López Muñoz via Boost <
boost_at_[hidden]> wrote:
> El 17/05/2025 a las 20:36, Ivan Matek via Boost escribió:
> > Hi everybody,
> >
> > I was reading the documentation/code, and I have some questions/comments.
> >
> >
> > 1. Why is unit of measurement bit? I understand that conceptually
> bloom
> > filters(maybe others, did not check) are bit arrays, but for me this
> seems
> > like very error prone. Would verbose alternative of using
> > boost::bloom::bits(1000) or boost::bloom::bytes(125) be preferred? I
> did
> > not suggest UDL, as I feel for such rarely used class it may be
> overkill.
>
> I'm happy to do whatever the communitty decides is best here.
>
I am a strong fan of strong types(e.g. std::chrono) so bits/bytes strong
type has my vote, not sure what community in general prefers.
>
> > 5. Why is BOOST_ASSERT(fpr>=0.0&&fpr<=1.0); not
> > BOOST_ASSERT(fpr>0.0&&fpr<=1.0);
> > , i.e. is there benefit of allowing calls with impossible fpr
> argument?
> fpr==0.0 is a legitimate (if uninteresting) argument value for
> capacity_for:
>
> capacity_for(0, 0.0) --> 24
> capacity_for(1, 0.0) --> 18446744073709549592
>
> The formal reason why fpr==0.0 is supported is because of symmetry:
> some calls to fpr_for actually return 0.0 (for instance, fpr_for(0, 100)).
>
This is a bit philosophical, but I actually do not feel this is correct.
First of all is (0, 0.0) only usecase where fpr of 0.0 makes sense? i.e
any time when n>0 fpr 0.0 is impossible(or I misunderstood something).
So assert could be implies(it is funny because we had discussion about
implies on ML few months ago), so something like:
BOOST_IMPLICATION(fpr == 0.0, n == 0);
Similarly for (1, 0.0) I do not believe result should be size_t max value,
as this is not correct value. Now we both know you will OOM before noticing
this in reality,
but even if we imagine magical computer that can allocate that much memory
fpr is not 0.0.
If this library was not C++11 I would suggest std::optional as return type,
but as it is boost::optional seems like best option.
Now I know people might think I am making API ugly for users, but I really
do not think a bit more typing is such a big problem when
the alternative is people messing up(probably not when using constants in
code, more likely when they dynamically compute something). Bloom filters
are important, but they are not like json or protobuf where they are
everywhere in codebase, users needing to use a bit uglier
API in 10LOC in 1M LOC codebase does not seem like a big deal to me.
So in examples above:
capacity_for(0, 0.0) - > min_possible_capacity
capacity_for(1, 0.0) - > nullopt
Few more questions I thought of recently, apologies in advance if I
misunderstood something.
1. There is some SIMD code that seems to have cost in K that is
proportional to "greater or equal multiple of 8". I wonder does that affect
recommendation for optimal K or not? e.g if table we disused before tells
me to use K 14 should I bump it up to 16 if using fast_multiblock64?
static BOOST_FORCEINLINE bool check(const value_type& x,boost::uint64_t
hash)
{
for(int i=0;i<k/8;++i){
if(!check_m256ix2(x[i],hash,8))return false;
hash=detail::mulx64(hash);
}
if(k%8){
if(!check_m256ix2(x[k/8],hash,k%8))return false;
}
return true;
}
2. Fact that bloom filter has a break in lookup(you can break as soon as
one bit is not set(let's ignore simd now) makes me wonder if performance
depends on hit rate? E.g. If I know my hit rate will be ~10% or ~90% should
that affect my picking of K?
3. I have just noticed I wrongly remembered that g++ defaults to
march=native, it does not. Why is this related to this library? Well I am
not sure fallback in fast_ headers should exist. In other words when I want
to use fast_ and AVX2 is not activated I would prefer code to fail to
compile. So I would remove this fallback:
template<std::size_t K>
using fast_multiblock64=multiblock<boost::uint64_t,K>;
Now there is reasonable argument that people do not want their code to
break when they change compiler flags, or target new architecture(e.g. one
where you did not add SIMD for), but I believe SIMD speedup is so huge that
this should not be a silent change. If users really want fallback behavior
they can do their own preprocesor + using logic.
4. Another thing related to SIMD is question of ones arrays used in
make_m256ix2/make_m256i. Initially I have assumed they are not static
const(guaranteed construction just once, but thread safe init overhead)
because compiler is always smart enough to optimize this to some readonly
memory that will be initialized at compile time and everything is great. I
looked into asm of my simple program test I wrote on my machine,
initialization is done at compile time, great.
But :) then I went to godbolt, and made small reproduction of code, and
there I can not <https://godbolt.org/z/sYfc7rffa> get the compilers to
avoid populating that array at runtime. Now I do not expect for you to
go over godbolt link and try to nudge compilers to optimize this by
changing flags or example code, my question is more about if this is
fragile is that maybe a reason to use some ugly trick to initialize those
values at compile time(e.g. as constexpr array of properly aligned uint32
that when interpreted as ones has correct values). If anybody reading
this has any idea why this happens please do tell. My first guess is that I
messed up when porting code to small godbolt example. :) My second guess
would be that in bloom code I extracted for godbolt is used in a way that
hints to compiler to do the initialization at compile time(e.g. kp
values are known at compile time), but this is just a guess. So to be clear
there is no problem I can detect when using bloom library, I am just
worried that with a huge combinations of compilers, standards, optimization
levels(e.g. O2 vs O3) and architectures there may be a subset of users that
will hit performance issue if bloom does not force compile time init of
those values.
To be more specific, for this source:
int main()
{
using simd_filter = boost::bloom::filter<std::string_view, 1,
boost::bloom::fast_multiblock64<2>, 1>;
simd_filter sf;
char str[2] = {char('a'+(rand()%26)), '\0'};
sf.insert(str);
str[0] = char('a'+(rand()%26));
return sf.may_contain(str);
}
on my machine clang generates beautiful asm in a sense there is no
initialization of ones at runtime. When I copy godbolt example to my
machine initialization of ones at runtime exists, so it is probably not
just godbolt/my machine difference.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk