Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2024-10-21 08:43:34


Hi all,

This is my review of async_mqtt5.

First of all, thanks Ivica and the Mireo team for submitting this
library for Boost review, and Klemens for managing the review.

- Does this library bring real benefit to C++ developers for real world
use-case?

Yes. MQTT clients are widely used in the IoT world. Boost.Asio is one
of the strongest networking libraries in C++. Bringing these together
in a high-quality library is valuable. The library already has already
more than 140 stars on GitHub, which is a good sign.

- Do you have an application for this library?

Not right now. But I used to work in the IoT world, and I would have
used it if I would have had it available back then.

- Does the API match with current best practices?

It does almost everywhere. Analysis follows.

1. It's only a high-level API, akin to Boost.Redis. This may not cover
some very specialized use cases, but makes it much easier handling
common tasks robustly. I don't think this is concerning, since the API
is flexible enough. Moreover, the Mireo team has more field experience
regarding the protocol than I do. A lower level client can be added in
the future without breaking compatibility, if requested by users.

2. It correctly follows Boost.Asio universal async model (except for a
gotcha with executors, see my next point), accounting for allocator
customization, custom completion tokens, and per-operation
cancellation. Looking at the code, it's clear that the developers bear
a deep understanding of Asio, which is very valuable.

3. At the moment, the client requires the following: "The same
executor must execute mqtt_client::async_run and all the subsequent
async_xxx operations." (quoting
https://spacetime.mireo.com/async-mqtt5/async_mqtt5/asio_compliance/executors.html).
This is a deviation from the standard Asio model, not present in any
of the libraries currently in Boost. The former statement makes the
following malformed, without any diagnostic being emitted:

cli.async_run(asio::bind_executor(ex1, ...)); // 1
cli.async_publish(asio::bind_executor(ex2, ...)); // 2
cli.async_publish(asio::bind_executor(ex3, ...)); // 3

Where operations 2 and 3 may complete on any of ex1, ex2 or ex3.

I think this limitation should be lifted, and that the above code
should be legal. I'd suggest making stream writes child agents of
async_run, instead of async_publish and similar functions (see
https://lists.boost.org/Archives/boost/2024/10/258132.php for the
discussion regarding this issue).

4. The client is templated on the stream type. This is a well-tested
design pattern, but can lead to higher compile times. Writing dynamic
clients (where the transport type or TLS usage is determined by a
runtime variable) is also more involved. I think the current
architecture is acceptable, but I'd consider adding a type-erased
client in the future. While this shouldn't be too much effort for
well-versed Asio developers, regular users may find it unsurmountable.
Alternatively, I'd also be good with completely replacing the
templated client with a type-erased one (up to the authors).

5. The library internally buffers outgoing and ingoing messages
automatically, which is great because it's easier to maintain
resilience. However, the only limit on the publishing side are the
limited 2^16 packet IDs (restricted by the protocol), with no limit on
the reading side. I'm slightly concerned about applications buffering
too much on either side. I don't know if this is a real concern or
just produced by my lack of MQTT field experience. Would it make sense
to somehow introduce a per-topic limit on the number of unacknowledged
messages on both the subscribing side (possibly specified when the
client subscribes) and on the publishing side? If that's not possible,
I'd like to see an example on how to implement this flow control limit
on the publishing side, at least.

6. Some APIs use owning types (std::vector and std::string) where
non-owning types (boost::core::span and std::string_view) could
suffice. I'd suggest reviewing these.

7. TLS streams require a number of template specializations to set up.
At the moment, this must be done by the user even for the usual
asio::ssl::stream. This risks ODR violations if two libraries end up
defining such specializations independently. I'd advise placing the
required boilerplate code, together with helper typedefs, in an
optional, separate header.

8. The API is really small, and I think that's great.

- Is the documentation helpful and clear?

It is in general well written and clear. It is however geared towards
a user that knows Asio and MQTT pretty well. I'd suggest a couple of
changes:

1. There are a lot of pages justifying design decisions made by the
library. This has been very useful to me, but is geared towards
advanced users. I'd attempt to gather all pages explaining how to use
the library in the first section, and move pages with design
rationales to a second section.
2. There is no page in the discussion addressing how to use
async_publish and async_receive (only an example). This new page
should likely go into the first section.
3. There is no page in the discussion addressing TLS.
4. I miss an example about custom authentication.
5. I don't think the multi-threaded example captures the problems that
a multi-threaded program integrating the library may face. Running
everything under a single strand negates the benefit of
multi-threading. I'd target the publisher example, instead. It's
important that you ensure that reading the sensor happens outside the
strand.

I have more minor comments, included at the bottom of this email.

- Did you try to use it? What problems or surprises did you encounter?

I built a sample publisher and subscriber, based on the examples.
They're available here: https://github.com/anarthal/async-mqtt5-test

The publisher reads two sensors at different frequencies, which is
something that scales poorly when using threads. The async
architecture makes this task trivial, which is great. Note that the
publisher doesn't co_await async_publish, because that'd prevent
further sensor reads, reducing the benefits of resilience. The
publisher example in the docs may consider mentioning this.

Debugging config issues is extremely difficult (see
https://spacetime.mireo.com/async-mqtt5/async_mqtt5/auto_reconnect.html
for a rationale). I initially tried to use a public test broker that
rejected my connection because of high utilization. It took me a
decent amount of time to figure it out - I tried debugging, enabling
Asio handler tracking, until I finally launched Wireshark and figured
it out. This is a problem during development but also in production,
as logging when things go wrong is a usual practice.

I think this should be somehow addressed, at the very least by
suggesting to use Wireshark/tshark (or other tools that may be
practical) in the docs. When using TLS, using
SSL_CTX_set_keylog_callback
(https://docs.openssl.org/1.1.1/man3/SSL_CTX_set_keylog_callback/#synopsis)
to create a keylog file for Wireshark may also be useful, and
mentioning how to do it may be useful.

I'd at least consider the possibility of including some sort of
logging or diagnostics generation, either by using a custom macro or
abusing Asio handler tracking.

I found that the implementation does not correctly shutdown TLS and
websockets (no calls to asio::ssl::stream::async_shutdown and
beast::websocket::stream::async_close). These should be added
(https://github.com/mireo/async-mqtt5/issues/18).

- What is your evaluation of the implementation?

The implementation is well written and very easy to follow, which is great.

Build times are pretty high, which is not great. The sample publisher
takes 22s to build under clang-18, Linux, Debug, C++23 on my machine
(30s in Release mode). Building with -ftime-trace, two points show up
in the flame graph:

1. The client header includes boost/beast/websocket/stream.hpp, even
when not using websockets. The header is very heavyweight and slows
down compilation. I think the class can be forward declared if the
definition of rebind_executor for websocket streams is changed to the
following:

template <typename Stream, bool deflate_supported, typename Executor>
struct rebind_executor<boost::beast::websocket::stream<asio::ssl::stream<Stream>,
deflate_supported>, Executor> {
    using other = typename boost::beast::websocket::stream<
        asio::ssl::stream<typename rebind_executor<Stream, Executor>::other>,
        deflate_supported
>;
};

Some other Beast headers are included in detail/connect_op.hpp.
Connection attempts to distinguish whether the stream is a websocket
using metaprogramming, and then runs Beast-specific code when a
websocket is detected. I think this creates too much coupling between
the libraries. My suggestion is to re-architect this a bit. I'd
rewrite websocket detection as a separate trait, and then provide a
definition for it in a separate header, thus making Beast a peer
dependency.

Beast is also use for get_lowest_layer. I don't know if this is
strictly required, since asio::ip::tcp::socket does implement the
layered protocol (it defines a lowest_layer_type and a
get_lowest_layer function, see asio::basic_socket).

2. asio::append and asio::prepend are heavily used, which are heavy to
instantiate. I'd suggest experimenting with replacing these by a
custom handler class that specializes asio::associator to propagate
associated characteristics, as suggested in this ML post:
https://lists.boost.org/Archives/boost/2024/10/258132.php. This may or
may not reduce compile times, but it's worth a try.

The library has many Boost dependencies. While this might not itself
impact compile times, it affects package managers. Using Boostdep, the
dependency with most transitive dependencies (by far) is Spirit,
pulling in variant, thread, preprocessor, phoenix, and a number of
other libs.

The obvious solution would be encouraging the author to remove the
dependency. But I think we have something off in how we manage
dependencies in Boost, since the parts of Spirit they're pulling don't
require most of these libraries.

Regarding testing, I like seeing both unit and integration tests. Code
coverage is great. I haven't looked into them in great detail, but
from some snippets I saw and Ivica's comments, they seem correctly
architectured. I miss sanitizers, fuzz testing and some compilers. See
my comments below.

- Do you have experience with asio?

Yes. I developed and maintain Boost.MySQL.

- Did you work with MQTT in the past?

I know the basic concepts and have read part of the spec for this
review. I haven't used MQTT in production. I have worked in embedded
systems before, though.

- How much time did you spend on this review?

I spent around 25h reading the documentation, reference, researching
MQTT, building code snippets and discussing design decisions on the
mailing list.

- Final decision and comments

My recommendation is to CONDITIONALLY ACCEPT async_mqtt5 into Boost.
It's a great library and I'd love to see it in Boost.

Here are a lot of feedback points, in decreasing order of importance.

1. Acceptance conditions: I think these should be addressed before
Boost integration.

- Executor correctness. The restriction of forbidding different
executors for different operations should be lifted.
- Making debugging configuration easier. This can have the form of a
documentation page explaining the available tools and how to use them,
or by enhancing the current API with logging facilities.
- TLS/websocket shutdown code should be added, as per
https://github.com/mireo/async-mqtt5/issues/18.

2. Recommendations. These are not acceptance criteria, but I strongly
recommend considering them, at least:

- Remove the hard dependency on Boost.Beast websocket in
detail/rebind_executor.hpp by forward-declaring websocket::stream.
- Change the websocket connect handling from if constexpr to traits,
to remove the Beast hard dependency.
- Consider creating an optional header containing the required
boilerplate to use TLS-based connections without the need of template
specializations. Helper typedefs could be beneficial.
- Experiment with build times and -ftime-trace by reducing the usage
of asio::prepend and Boost.Spirit in the implementation. If this
happens to improve build times, consider applying the relevant
changes.
- Introduce fuzz testing for the encoders and decoders (I can aid with
this if required).
- Introduce address and undefined sanitizer builds (I can also aid with this).
- Consider supporting total cancellation in async_publish and other
functions. Probably requires fixing the executor issue.
- Consider introducing a type-erased client that does not depend on
the stream type. I'd consider acceptable entirely removing the
templated one in favor of the type-erased one, if the authors feel
comfortable with that.
- Consider changing the mqtt_client::brokers signature to take a
span<host_and_port>, where host_and_port contains a string and a port
number. I think this better represents what the function does.
- Consider changing usages of std::vector and std::string in public
APIs to boost::core::span and std::string_view where it makes sense.
- Consider introducing APIs to limit the message
production/consumption rates per topic. If not feasible, introduce an
example on how to rate-limit a publisher user-side.
- Consider trimming some Boost dependencies to make the library
lighter for package managers.

3. Minor comments. These are not acceptance criteria, either.

- Consider adding CIs for clang 16 to 19.
- Apply concept checking to completion tokens, using the
BOOST_ASIO_COMPLETION_TOKEN_FOR macro for C++17 compatibility.
- Consider using an inline variable to implement
get_error_code_category, instead of a static one.
- Place non-documented, non-public functions in the detail namespace.
Candidate functions:
    - client_error_to_string, in error.hpp
    - client_ec_category, in error.hpp
    - Auxiliary functions and types implementing properties, in
property_types.hpp
    - to_reason_code, in reason_codes.hpp
- Consider making the library compatible with the
BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT macro. Although legacy, other
libraries (namely a postgresql one) require it.
- Consider making the will class a plain struct, instead of a class.
This would make it compatible with C++20 designated initializers.
- Consider renaming the library and namespace to mqtt5 or mqtt,
following what we have for MySQL and Redis. Note that typing
boost::async_mqtt5 may be long.
- Consider making constexpr global variables (like the ones in
reason_codes.hpp) inline constexpr.
- Consider enhancing support for Asio handler tracking (as per
https://live.boost.org/doc/libs/1_86_0/doc/html/boost_asio/overview/core/handler_tracking.html)
by using the BOOST_ASIO_HANDLER_LOCATION macro. This may mitigate the
frustration of users experiencing connectivity problems.
- Consider running the thread-safety example under the thread
sanitizer. I've found a lot of surprises (and learnt a lot) by doing
this.
- Consider renaming client_service to something that doesn't include
the term "service". When working with Asio, a service usually refers
to something inheriting from asio::execution_context::service,
allocated once per execution context. This is not the case for
client_service, which initially created confusing expectations to me.
- Consider replacing the SFINAE clause in authenticator() by a
static_assert. Calling authenticator() with something not fulfilling
the concept is always an error - the clearer the diagnostic, the
better.
- Consider whether asio::cancel_after/asio::cancel_at can be used to
simplify some of the internal waiting logic.
- Consider moving the code snippets shown in the discussion to a C++
file and use Quickbook import statements to include them in the docs.
This reduces the possibility of having out-of-sync snippets.
- Consider specializing asio::associated_executor,
asio::associated_immediate_executor and asio::associated_allocator to
implement property propagation in detail::cancellable_handler.

4. Comments on docs. These are not acceptance criteria.

- The reference lists is_authenticator
(https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/is_authenticator.html),
but it looks like it's in the detail namespace. Either promote it to
the public namespace, or hide it from the docs (you can still create a
page for the concept even if you don't expose the actual type).
- The reference lists the phrase "On immediate completion, invocation
of the handler will be performed in a manner equivalent to using
boost::asio::post.". This is not true, since you support immediate
executors.
- Some property types are listed as scalars, when they are really
vectors of scalars. This might be confusing. For instance,
https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/publish_props.html
lists subscription_identifier as std::string, when it's actually
std::vector<std::string>. Same applies for user_property.
- As of Boost 1.86, timeouts should be implemented using
asio::cancel_after, rather than by awaitable operators or parallel
groups. I suggest replacing the examples on timeouts by a single one
using asio::cancel_after.
- Type requirements for ExecutionContext
(https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/ExecutionContext.html)
are not correct. You can probably just link to Asio's description:
https://www.boost.org/doc/libs/1_86_0/doc/html/boost_asio/reference/ExecutionContext.html.
- The multithreaded example recommends the following code to run
callbacks though a strand:

boost::asio::post(
    strand,
    [&client, &strand] {
        // Considering that the default associated executor of all
completion handlers is the strand,
        // it is not necessary to explicitly bind it to async_run or
other async_xxx's handlers.
        client.async_run(boost::asio::detached);
    }
);

This form of post is not equivalent to
asio::post(asio::bind_executor(ex, ...)), and can yield surprising
behavior if the token has a bound executor. Furthermore, post might be
less efficient than dispatch, and should only be used if a reschedule
is required (which is not the case). I'd advise to change the above
by:

boost::asio::dispatch(boost::asio::bind_executor(
    strand,
    [&client, &strand] {
        // Considering that the default associated executor of all
completion handlers is the strand,
        // it is not necessary to explicitly bind it to async_run or
other async_xxx's handlers.
        client.async_run(boost::asio::detached);
    }
));

- The multi-threaded example uses asio::io_context and a vector of
threads. This is error-prone. It can be simplified by using
asio::thread_pool, asio::thread_pool::attach and
asio::thread_pool::join.
- The coroutine examples can be simplified with the latest Asio
additions. At the moment, asio::deferred is recommended over
asio::use_awaitable for efficiency reasons. With partial tokens,
async_xxx(asio::as_tuple(asio::deferred)) can be replaced by
async_xxx(asio::as_tuple).
- The coroutines examples shouldn't use asio::detached with
asio::co_spawn, as this swallows any thrown exceptions. It's better to
use [](std::exception_ptr e) { if(e) std::rethrow_exception(e); }, as
this propagates exceptions to main.
- It's probably better to avoid the unified header in the examples. We
should encourage including individual headers, instead, as this is
better for compile times.
- In https://spacetime.mireo.com/async-mqtt5/async_mqtt5/connection_maintenance.html:
should be "the MQTT <protocol>" (missing word).
- https://spacetime.mireo.com/async-mqtt5/async_mqtt5/optimising_communication.html:
typo: "simultanious".
- https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/mqtt_client/operator_eq_.html:
moved-from state probably supports being assigned to, too.
- https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/mqtt_client/tls_context.html:
render error in the 1st template parameter.
- https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/mqtt_client/authenticator.html:
render error in the 1st template parameter.
- https://spacetime.mireo.com/async-mqtt5/async_mqtt5/ref/authority_path.html:
member descriptions are empty.

Thanks Ivica for proposing the library. I really hope to see it in
Boost in the near future.

Regards,
Ruben.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk