Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)
From: Arash Partow (arash_at_[hidden])
Date: 2009-06-17 19:00:56

Next message: Ilya Bobir: "Re: [boost] [date_time] [#1861] Change for the default duration format"
Previous message: Christopher Kohlhoff: "Re: [boost] [ann] Urdl - a library for downloading web content"
In reply to: Dean Michael Berris: "Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)"
Next in thread: Dean Michael Berris: "Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)"
Reply: Dean Michael Berris: "Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)"

1. Can you please explain why you need multiple "hashes"?
2. Your code doesn't trivially exit on the contains
3. Your solution doesn't provide a means to determine the various optimal parameter combinations
4. Your solution doesn't provide a means for counting modes
5. Your solution doesn't provide for erasures (do not confuse with removal of inserted items)
6. Your solution doesn't provide for self superimposed compression mode (similar to union but doubling of fpp)

Arash Partow
__________________________________________________
Be one who knows what they don't know,
Instead of being one who knows not what they don't know,
Thinking they know everything about all things.
http://www.partow.net

Dean Michael Berris wrote:
> On Tue, Jun 9, 2009 at 11:47 PM, Dean Michael
> Berris<mikhailberis_at_[hidden]> wrote:
>> Hi Manuel,
>>
>> On Tue, Jun 9, 2009 at 8:45 PM, Manuel
>> Fiorelli<manuel.fiorelli_at_[hidden]> wrote:
>>> About the implementation:
>>> If I understand your implementation, you do something like that:
>>> given a value V,
>>> you compute a number X (say the digest of V),
>>> then you compute K different values applying the transformation (X +
>>> offset) mod(M)
>>>
>> Right. This naive filter will almost always create K consecutive
>> indexes (or results) from the hash functions.
>>
>
> Which I've addressed in revision 53785 -- now the default_hash<> uses
> Boost.Hash internally with statically defined seeds.
>
>>> About the documentation:
>>> I think that the following paragraph in the documentation must be reformulated
>>>
>>> <snip>
>>> To find out whether we've encountered a certain URL before, we simply
>>> pass the URL to the same three hash functions. If we see that all the
>>> positions returned by the three hash functions are all 1, then we
>>> might have seen the URL before -- this is called a false positive.
>>> </snip>
>>>
>>> indeed, you cannot say that the URL is a false positive, since it
>>> could be equal to an URL stored previously.
>>>
>> Right, I need to make that a little more clear. Thanks for pointing it out.
>>
>
> This is something I'll be working on later.
>
> HTH!
>

Next message: Ilya Bobir: "Re: [boost] [date_time] [#1861] Change for the default duration format"
Previous message: Christopher Kohlhoff: "Re: [boost] [ann] Urdl - a library for downloading web content"
In reply to: Dean Michael Berris: "Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)"
Next in thread: Dean Michael Berris: "Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)"
Reply: Dean Michael Berris: "Re: [boost] Boost.Bloom_filter now in Sandbox (WAS Re: Interest in a Bloom Filter?)"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk