Re: [Boost-bugs] [Boost C++ Libraries] #12919: multi_index indexed with hashed_non_unique produces buffer overrun.

Subject: Re: [Boost-bugs] [Boost C++ Libraries] #12919: multi_index indexed with hashed_non_unique produces buffer overrun.
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2017-03-21 18:29:00


#12919: multi_index indexed with hashed_non_unique produces buffer overrun.
----------------------------------------+-------------------------
  Reporter: Alex Huszagh <ahuszagh@…> | Owner: joaquin
      Type: Bugs | Status: new
 Milestone: To Be Determined | Component: multi_index
   Version: Boost 1.63.0 | Severity: Problem
Resolution: | Keywords: multi_index
----------------------------------------+-------------------------

Comment (by Alex Huszagh <ahuszagh@…>):

 Hi Joaquin,

 I can email you private details about the data, most of the strings are
 concatenations of UniProt identifiers, a residue, and a protein position
 (of the form "P46406:C15", or "P46406:C15-P46406:C230", and as the data
 has not yet been published, I would rather not provide it on a public
 mailing list.

 The hash function used is boost::hash (the default hash is), and I have
 noticed that changing the hash function (std::hash does not cause an
 issue), or slightly tweaking the input data (such as removing the ":" in
 the identifiers), will make a given test-case that normally overruns the
 container size on a rehash to run normally.

 Just for the full, public bug tracker, I am using the multi_index
 characteristic of the container, the primary indexer is a hashed struct
 using boost::hash_combine, which includes an atypical data structure (a
 short, immutable, ordered set of pointers), and I have confirmed in all
 working cases the hash function is unique (there is not a single
 collision, and bucket loads are typically low).

 One other thing is that my proposed patch also does not fix infinite
 iteration when I was attempting to calculate the bucket size of the
 hashed_non_unique index, so my patch seems to be woefully insufficient,
 but I am unsure how to continue.

 I can provide a working test case privately via email.


 '''Rudimentary Code Snippet'''[[BR]]

 Note: This snippet is missing includes, some template specializations, and
 a few other things. It's meant as a general idea of the model, not as
 working code.
 {{{
 #!div style="font-size: 80%"
 Code highlighting:
   {{{#!cpp

 // this is identical to boost hash_combine_impl, just exported here
 inline void hash_combine_impl(size_t& seed, size_t value)
 {
     seed ^= value + 0x9e3779b9 + (seed<<6) + (seed>>2);
 }


 // this uses the same magic numbers as Python's frozenset hash,
 // for immutable set hashing
 struct set_hash
 {
     template <typename T>
     size_t operator(const T &t)
     {
         size_t h = 1927868237 * (t.size() + 1);
         size_t i;

         for (auto &v: t) {
             i = std::hash<decltype(v)>()(v);
             h ^= (i ^ (i << 16) ^ 89869747UL) * 3644798167UL;
         }
         h = h * 69069U + 907133923UL;
         return h;
     }
 };


 struct my_item
 {
     std::set<uintptr_t> pointers;
     std::set<uint32_t> positions;
     std::string value;
 };


 bool operator==(const my_item &lhs,
     const my_item &rhs)
 {
     auto l = std::tie(lhs.pointers, lhs.positions, lhs.value);
     auto r = std::tie(rhs.pointers, rhs.positions, rhs.value);
     return l == r;
 }


 struct my_item_hash
 {
     size_t operator(const my_item &i)
     {
         size_t seed = 0;
         hash_combine_impl(seed, set_hash()(i.pointers));
         hash_combine_impl(seed, set_hash()(i.positions));
         boost::hash_combine(seed, i.value);

         return seed;
     }
 };


 struct my_struct
 {
     std::set<my_item, my_item_hash> set;
     std::string string;
     uint32_t count;
 };


 bool operator==(const my_struct &lhs,
     const my_struct &rhs)
 {
     auto l = std::tie(lhs.set, lhs.string, lhs.count);
     auto r = std::tie(rhs.set, rhs.string, rhs.count);
     return l == r;
 }


 struct my_struct_hash
 {
     size_t operator(const my_struct &s)
     {
         size_t seed = 0;
         // these following lines are identical to boost::hash_combine_impl
         hash_combine_impl(seed, set_hash()(s.set));
         boost::hash_combine(seed, s.string);
         boost::hash_combine(seed, s.count);

         return seed;
     }
 };


 typedef boost::multi_index_container<
     my_struct,
     boost::multi_index::indexed_by<
         boost::multi_index::hashed_unique<
             boost::multi_index::identity<my_struct>,
             my_struct_hash
>,
         boost::multi_index::hashed_non_unique<
             boost::multi_index::tag<tags::my_tag>,
             boost::multi_index::const_mem_fun<my_struct, std::string,
 &my_struct::linkage>
>
>
> my_struct_memo;
     }}}
 }}}


 Best,
 Alex

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/12919#comment:2>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-03-21 18:32:01 UTC