Boost logo

Boost :

From: Marcelo Zimbres Silva (mzimbres_at_[hidden])
Date: 2023-01-24 18:53:38


On Tue, 24 Jan 2023 at 12:11, Klemens Morgenstern via Boost
<boost_at_[hidden]> wrote:
>
> The connection being reconnectable seems a design flaw:

There are many problems with connections that can't be reconnected

* They can't be easily shared among e.g. http/websocket sessions
without an additional layer. To address this, users would have to
either use one redis-connection per e.g. http-session, which might
exhaust ports on the Redis server, or implement a connection pool
which brings a whole new world of complexity to the problem. Puting
this burden on every user does not seem a wise decision.

* The performance of resp3 is built around command pipelines. A
connection that can be shared makes better use of this feature as it
can put e.g. 10k commands in a single write instead of 1000 commands
in 10 writes (among other things).

* Non-reconnectable connections make it harder to write failover-safe
apps. For example, say you receive a chat message over a websocket
session and want to store it on redis. What should you do if at this
moment there is a failover happening? Implementing retry logic at
client code is ineficient and complex as it would involve canceling
and retrying, another implementation burden.

> The idea behind having a reconnectable connection seems to be there
> to make the user's life easier (by keeping the user's request and
> retrying them on reconnect) but there are some issues: A network
> disconnect leaves the user's request in an undefined state (they
> might have applied on the server or not) we can't simply retry them
> after a reconnection it requires the user's intervention to handle a
> reconnect. For example, we can't simply retry an INCRBY command, it
> might increment the value twice.

Double execution is guaranteed to not happen by default. The
implementation will cancel all requests that have been written but
remained unresponded when the connection was lost. To allow resending
those unresponsed requests you need to mark them with

   request::config::cancel_if_unresponded = false // default is true.

which is useful for read-only (e.g. GET) and other commands like SET
etc. This means the API is safe by default, you won't get double
execution if you don't want to risk it.

> Because of the possibility of reconnecting the handshake process has
> to be done by users (by sending HELLO command as the first request)
> and on a reconnect this should be repeated again, this affected the
> implementation in a way it has to check for each command to see if
> it is a HELLO command or not.

You are talking about the performance overhead of this line

      if (cmd == "HELLO")

which is unmeasurable. I don't see how this can be a concern of yours.

> If a user is using async_receive there is no way to detect a disconnect,
> users have to use some form of async condition variable (which Asio doesn't
> have at the moment) to signal the disconnect event.

This claim does not make sense to me. What do you mean by detect? You
"detect" that a disconnection occurred when async_run completes

   co_await conn->async_run();
   conn->cancel(operation::receive); // If you want to cancel.

> On a reconnect all Redis subscriptions are dropped and again a user has a
> hard time signaling the async_receive part of the code to subscribe again.

This can be easily addressed by running async_run and async_receive in
parallel so it gets cancelled, as I show in the examples

    co_await ((conn->async_run() || receiver(conn)));

> Because of the possibility of reconnecting, each request gets a lot of
> boolean config parameters to handle the reconnect scenario
> (cancel_on_connection_lost, cancel_if_not_connected, cancel_if_unresponded,
> hello_with_priority) that made the API difficult to use and understand.

This complexity is inherent to the problem and we cannot pretend it
does not exist. if I ignore it, users will have to address it
themselves (wasted effort).

> There are a lot of rescheduling for reading a single push event
> (performance issue): At the moment the demultiplexing of Redis
> events seems to happen with a strategy
> https://github.com/mzimbres/aedis/blob/master/include/aedis/detail/guarded_operation.hpp
> The problem is that for reading each event message around 4
> rescheduling on executor happens (it might be less or more than 4 it
> is hard to be sure from the code) and that has degraded the
> performance of reading push events (rescheduling on Asio executors
> is not a cheap operation).

Is this backed with benchmarks? Parsing resp3 itself requires lots of
rescheduling so I am not sure this is going to impact performance
heavily.

In any case there are alternatives. Implementing more than one read in
a single call to async_receive is also simple.

> There is no possibility of reading push events in batches
> (performance issue): Redis pub/sub receives its messages as push
> events and because it is a way of broadcasting messages it is
> sometimes used at rates like 50,000 messages per second, with the
> current API there is no possibility for reading these messages in
> batches which made the reading operation very costly.

Do you have any number to share so I have an idea about how costly this is?

> Creating an Asio timer for each request (performance issue):
> Currently, Aedis creates an Asio timer for each request in the queue
> to act as an async condition variable, but it seems it can be
> eliminated if the exec_op is converted into a polymorphic type which
> can be resumed externally, after this change would a need for an
> Asio service too, to destroy paused operation on a service shutdown.

The dynamic allocation is performed on the completion handler
associated allocator, so you have full control over it. If suspending
an operation without a timer is possible and produces better results,
please open an issue. It is however hard for me to imagine the timer
can hurt performance.

> The echo server benchmark doesn't carry any information and is misleading
> in my opinion, if there is some benchmark it should be a practical usage
> example.

The echo server is very good for Redis benchmarks because it measures
the ability of a client to perform automatic pipelining and how good
it handles concurrent use of a connection.

Thanks,
Marcelo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk