|
Boost : |
Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
From: Dean Michael Berris (mikhailberis_at_[hidden])
Date: 2009-06-10 05:17:45
On Tue, Jun 9, 2009 at 11:47 PM, Dean Michael
Berris<mikhailberis_at_[hidden]> wrote:
> Hi Manuel,
>
> On Tue, Jun 9, 2009 at 8:45 PM, Manuel
> Fiorelli<manuel.fiorelli_at_[hidden]> wrote:
>> About the implementation:
>> If I understand your implementation, you do something like that:
>> given a value V,
>> you compute a number X (say the digest of V),
>> then you compute K different values applying the transformation (X +
>> offset) mod(M)
>>
>
> Right. This naive filter will almost always create K consecutive
> indexes (or results) from the hash functions.
>
Which I've addressed in revision 53785 -- now the default_hash<> uses
Boost.Hash internally with statically defined seeds.
>> About the documentation:
>> I think that the following paragraph in the documentation must be reformulated
>>
>> <snip>
>> To find out whether we've encountered a certain URL before, we simply
>> pass the URL to the same three hash functions. If we see that all the
>> positions returned by the three hash functions are all 1, then we
>> might have seen the URL before -- this is called a false positive.
>> </snip>
>>
>> indeed, you cannot say that the URL is a false positive, since it
>> could be equal to an URL stored previously.
>>
>
> Right, I need to make that a little more clear. Thanks for pointing it out.
>
This is something I'll be working on later.
HTH!
-- Dean Michael Berris | Software Engineer, Friendster, Inc. blog.cplusplus-soup.com | twitter.com/mikhailberis | linkedin.com/in/mikhailberis | profiles.friendster.com/mikhailberis | deanberris.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk