Subject: [boost] [gsoc16] Re: Potential contribution scenarios?
From: Miguel Coimbra (miguel.e.coimbra_at_[hidden])
Date: 2016-01-05 12:52:35
> It is not hard to find either myself or Antony. Both of us are here,
> and both of us are at the top of Google search results for our names.
Probably my bad, I only bumped into LinkedIn references.
I'll wait for a reply from Antony as well, thanks.
I will explore other ideas as well.
> 1. A thing Boost really could do with is a decent (optionally
> mmappable) dense hash map implementation, so the hash map could
> contain billions of items and mostly reside on disk yet lookups would
> still be very fast.
I have some questions regarding this first idea. I believe a concept and
implementation already exist and are actually in use at Google:
Quoting from that page:
dense_hash_map is distinguished from other hash-map implementations
by its speed and by the ability to save and restore contents to disk. On
other hand, this hash-map implementation can use significantly more space
than other hash-map implementations, and it also has requirements -- for
instance, for a distinguished "empty key" -- that may not be easy for all
applications to satisfy.
This class is appropriate for applications that need speedy access to
small "dictionaries" stored in memory, and for applications that need these
dictionaries to be persistent."
So what you want is something like this, but able to "contain billions of
and mostly reside on disk yet lookups would still be very fast"?
I assume the "billions of items" is what is lacking from that
already in use by Google? Is this the difference between that and your idea?
Assuming this is indeed what you were proposing, would the work encompass
defining an API and also exploring mmap for lazy iteration for performance
in the context of handling billions of items?
Miguel E. Coimbra
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk