Subject: Re: [Boost-bugs] [Boost C++ Libraries] #12919: multi_index indexed with hashed_non_unique produces buffer overrun.
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2017-03-21 18:29:00
#12919: multi_index indexed with hashed_non_unique produces buffer overrun.
----------------------------------------+-------------------------
Reporter: Alex Huszagh <ahuszagh@â¦> | Owner: joaquin
Type: Bugs | Status: new
Milestone: To Be Determined | Component: multi_index
Version: Boost 1.63.0 | Severity: Problem
Resolution: | Keywords: multi_index
----------------------------------------+-------------------------
Comment (by Alex Huszagh <ahuszagh@â¦>):
Hi Joaquin,
I can email you private details about the data, most of the strings are
concatenations of UniProt identifiers, a residue, and a protein position
(of the form "P46406:C15", or "P46406:C15-P46406:C230", and as the data
has not yet been published, I would rather not provide it on a public
mailing list.
The hash function used is boost::hash (the default hash is), and I have
noticed that changing the hash function (std::hash does not cause an
issue), or slightly tweaking the input data (such as removing the ":" in
the identifiers), will make a given test-case that normally overruns the
container size on a rehash to run normally.
Just for the full, public bug tracker, I am using the multi_index
characteristic of the container, the primary indexer is a hashed struct
using boost::hash_combine, which includes an atypical data structure (a
short, immutable, ordered set of pointers), and I have confirmed in all
working cases the hash function is unique (there is not a single
collision, and bucket loads are typically low).
One other thing is that my proposed patch also does not fix infinite
iteration when I was attempting to calculate the bucket size of the
hashed_non_unique index, so my patch seems to be woefully insufficient,
but I am unsure how to continue.
I can provide a working test case privately via email.
'''Rudimentary Code Snippet'''[[BR]]
Note: This snippet is missing includes, some template specializations, and
a few other things. It's meant as a general idea of the model, not as
working code.
{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!cpp
// this is identical to boost hash_combine_impl, just exported here
inline void hash_combine_impl(size_t& seed, size_t value)
{
seed ^= value + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
// this uses the same magic numbers as Python's frozenset hash,
// for immutable set hashing
struct set_hash
{
template <typename T>
size_t operator(const T &t)
{
size_t h = 1927868237 * (t.size() + 1);
size_t i;
for (auto &v: t) {
i = std::hash<decltype(v)>()(v);
h ^= (i ^ (i << 16) ^ 89869747UL) * 3644798167UL;
}
h = h * 69069U + 907133923UL;
return h;
}
};
struct my_item
{
std::set<uintptr_t> pointers;
std::set<uint32_t> positions;
std::string value;
};
bool operator==(const my_item &lhs,
const my_item &rhs)
{
auto l = std::tie(lhs.pointers, lhs.positions, lhs.value);
auto r = std::tie(rhs.pointers, rhs.positions, rhs.value);
return l == r;
}
struct my_item_hash
{
size_t operator(const my_item &i)
{
size_t seed = 0;
hash_combine_impl(seed, set_hash()(i.pointers));
hash_combine_impl(seed, set_hash()(i.positions));
boost::hash_combine(seed, i.value);
return seed;
}
};
struct my_struct
{
std::set<my_item, my_item_hash> set;
std::string string;
uint32_t count;
};
bool operator==(const my_struct &lhs,
const my_struct &rhs)
{
auto l = std::tie(lhs.set, lhs.string, lhs.count);
auto r = std::tie(rhs.set, rhs.string, rhs.count);
return l == r;
}
struct my_struct_hash
{
size_t operator(const my_struct &s)
{
size_t seed = 0;
// these following lines are identical to boost::hash_combine_impl
hash_combine_impl(seed, set_hash()(s.set));
boost::hash_combine(seed, s.string);
boost::hash_combine(seed, s.count);
return seed;
}
};
typedef boost::multi_index_container<
my_struct,
boost::multi_index::indexed_by<
boost::multi_index::hashed_unique<
boost::multi_index::identity<my_struct>,
my_struct_hash
>,
boost::multi_index::hashed_non_unique<
boost::multi_index::tag<tags::my_tag>,
boost::multi_index::const_mem_fun<my_struct, std::string,
&my_struct::linkage>
>
>
> my_struct_memo;
}}}
}}}
Best,
Alex
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/12919#comment:2> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-03-21 18:32:01 UTC