
Hi everybody, I was reading the documentation/code, and I have some questions/comments. 1. Why is unit of measurement bit? I understand that conceptually bloom filters(maybe others, did not check) are bit arrays, but for me this seems like very error prone. Would verbose alternative of using boost::bloom::bits(1000) or boost::bloom::bytes(125) be preferred? I did not suggest UDL, as I feel for such rarely used class it may be overkill. 2. boost::hash<short> (and other integral ones) afaik are identity, I presume this is not an issue because you do extra steps if hash_is_avalanching is false, but maybe users should prefer avalanching hash? If so maybe suggest fast avalanching hash available in boost... btw I tried to read the docs but I got(not a big issue, since I read them from original. version): 404 Not Found - Code: NoSuchKey - Message: The specified key does not exist. - Key: develop.bloom.cpp.al/unordered/doc/html/unordered/reference/hash_traits.html - RequestId: JHRNESDX8BYY6T2G - HostId: nz9kkYamNQ4ocu023JneA7PyMHTEyQK9jgMSSBuwReE0OQAslcN0xOZxgopXpvn/63AFQzreXgM= 3. libs/unordered/include/boost/unordered/detail/mulx.hpp contains mulx implementation that seems very similar to one used in bloom, and authors are same. Although this is tiny duplication, would it be best to extract it somewhere to avoid duplication? 4. Is there no benefit in having a power of 2 sized bloom filters? AFAIK it is trick used by hash maps (requires hash is good) to get faster indexing. I see you use clever 128bit multiplication that generates fast mul instruction to avoid division, but I still presume shift and masking would be faster. 5. Why is BOOST_ASSERT(fpr>=0.0&&fpr<=1.0); not BOOST_ASSERT(fpr>0.0&&fpr<=1.0); , i.e. is there benefit of allowing calls with impossible fpr argument? This may sound like nitpicking, but I think it is wrong to allow calls with arguments that give people wrong idea of what bloom filters are capable. I understand that this functionality is under capacity *esimation**,*but I still think it may be confusing. 6. Related to 6. : It is relatively easy to OOM with this if you are "reckless" with fpr value, I would add a suggestion to docs to mention this(if it is not already there, maybe I missed it). 7. It may be just me, but I believe Filter Definition chapter would benefit from visual example, in particular how is BucketSize related to subarray size. 8. Regarding choosing optimal value: In the example we use 10M elements. From what I see they are unused in our computations till the end so I would rewrite: "*Suppose we plan to insert 10M elements and want to keep the FPR at 10−4. The chart gives us five possibilities:*" to something like " Suppose we plan to insert 10M elements and want to keep the FPR at 10−4. Looking up FPR in the chart(this step is not dependent on number of elements) gives us five possibilities:" 9. My IDE complains about redundant inline on member functions. Is this intentionally done since (afaik msvc) some compilers use inline as optimization(inlining) hint or? 10. Should #if 1 in may_contain be there?