Boost logo

Boost :

From: Ivan Matek (libbooze_at_[hidden])
Date: 2024-12-06 22:42:31


On Fri, Dec 6, 2024 at 10:36 PM Andrey Semashev via Boost <
boost_at_[hidden]> wrote:

> On 12/2/24 22:48, Peter Dimov via Boost wrote:
> >
> > Comments and questions are welcome; you don't need to wait for the
> > review.
>
> A few comments:
>
> 1. In the HashAlgorithm interface, there is the result() method.
>
> https://pdimov.github.io/hash2/doc/html/hash2.html#hashing_bytes_result
>
> I think the name is rather confusing as usually nouns are used for pure
> accessors, which result() isn't. I think, something like finalize()
> would be better. At the very least, it should probably be a verb and
> indicate that the method is actually supposed to modify the internal
> state before returning the final hash value.
>
> I admit that in the docs you describe a use case of repeatedly calling
> result() to obtain a pseudo-random sequence of numbers, and for that
> usage the finalize() name doesn't look appropriate. But (a) result() is
> even less appropriate (because to an uninitiated user it looks like the
> code always returns the same value) and (b) I would say that use case is
> not the primary one anyway.
>
>
>
>
I was looking into if this could be wrapped into a nicer interface, that is
harder to misuse.
Main idea is to use std::move to signal we are done.
This has the benefit that most users know that after std::move object state
is not guaranteed to be usable.
Second benefit is "Rust at home". We can use clang-tidy as very primitive
use after move checker.

What I came up quickly is this:

template <typename H, typename F = boost::hash2::default_flavor>
class simple_hash_values
{
public:
    simple_hash_values()
    {
        // safety for case user never calls append
        boost::hash2::hash_append(h, F{}, filler);
    }

    // to prevent clang tidy complaining about std::move of trivially
copyable type
    simple_hash_values(simple_hash_values&) = delete;
    simple_hash_values(simple_hash_values&&) = default;

    template <typename V>
    void append(const V& value)
    {
        boost::hash2::hash_append(h, F{}, value);
    }
    // limit to simple results
    static_assert(std::is_same_v<size_t,
std::remove_cvref_t<decltype(H{}.result())>> ||
        std::is_same_v<uint32_t, std::remove_cvref_t<decltype(H{}.result())>>);
private:
    auto result() {
        return h.result();
    }
    // value does not matter
    static constexpr char filler = 'x';
    H h;

    [[nodiscard]] friend auto get_digest(simple_hash_values&& simple_hash) {
        return simple_hash.result();
    }
};

 use is like this:

{
    simple_hash_values<fnv1a_32> h;
    std::string x{"ab"};
    std::string y{"c"};
    h.append(x);
    h.append(y);
    // does not compile
    // get_digest(h);
    const auto digest = get_digest(std::move(h));
    // compiles, gives clang-tidy bugprone-use-after-move
    get_digest(std::move(h));
    std::print("{:x}\n", digest);
}

clang-tidy warning is nice, minor issue is that for me it is not running in
IDE, I presume this kind of checks is expensive so I had to run it in
terminal manually.
 warning: 'h' used after it was moved [bugprone-use-after-move]
   20 | get_digest(std::move(h));
      | ^
...example/simple_hash_values.cpp:18:29: note: move occurred here
   18 | const auto digest = get_digest(std::move(h));

But this has limitations: (from documentation)

In some cases, the check may not be able to detect that two branches are
mutually exclusive. For example (assuming that i is an int):

if (i == 1) { messages.emplace_back(std::move(str));}if (i == 2) {
std::cout << str;}

In this case, the check will erroneously produce a warning, even though it
is not possible for both the move and the use to be executed. More
formally, the analysis is flow-sensitive but not path-sensitive
<https://en.wikipedia.org/wiki/Data-flow_analysis#Sensitivities>.

Another option for safer wrapper is to handle case when user just hashes
everything in one place, so we would be returning digest immediately from
function invoked with (x, y). This function can not be called append since
we are actually producing final result immediately, but beside that idea
is similar. So maybe something like

simple_hash_all<fnv1a_32> h;

h.get_digest(x,y);

h.get_digest(x,y); // recreates hash, gives same result as above


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk