Boost logo

Boost :

From: Marcelo Zimbres Silva (mzimbres_at_[hidden])
Date: 2022-04-08 19:52:19


Hi Vinìcius,

> On Fri, 8 Apr 2022 at 20:20, Vinícius dos Santos Oliveira <vini.ipsmaker_at_[hidden]> wrote:
>>
>> I'm not a redis user myself so I won't be able to comment
>> much on this topic, but here's some early and sloppy
>> feedback...

Thanks.

>> The dynamic buffer example from the documentation's
>> front-page seems weird.
>>
>> Dynamic buffer fills the gap to turn raw IO into buffered
>> IO. Raw IO is all about passing the read calls directly
>> to the layer immediately below. When you're parsing
>> streams there's no implicit framing between the messages,
>> so you buffer. And you do need to buffer because only by
>> chance you'd be able to receive exactly the contents of a
>> single message. If you receive less bytes than required,
>> keep the buffer and read more. If you receive more bytes
>> than required, consume the current message and keep the
>> rest for the next read.
>>
>> It follows that the dynamic buffer will abstract a few
>> properties:
>>
>> Capacity (non-used part of the buffer)
>> Ready data
>>
>> Then Boost.Asio will also introduce the concept of
>> max_size to allow growing buffers with a level of
>> protection against DoS attacks.

Notice that Aedis works on the client side, which means users will be
connecting to a trusted server. That means it would have to be server
DoS-attacking the client, which is unknown to me i.e. not a security
hole.

That said, I don't have any special treatment for max_size at the
moment, so reading more than permitted will throw. I will check the
code to see where this could be an issue.

>> Similar frameworks will do just the same (e.g.
>> bufio.Scanner from Golang). But do notice a concept is
>> still missing in Boost.Asio's dynamic buffer: a
>> pointer/marker to the current message size. Boost.Asio's
>> buffered IO abstraction (dynamic buffer) is different
>> from other frameworks in this regard (e.g. Golang's
>> bufio.Scanner) and defer this responsibility to the user
>> (cf. read_until()'s return value). I personally don't
>> like Boost.Asio choice here, but that's just the way it
>> is.

Thanks, informative.

>> Now, from your example:
>>
>> std::string request, read_buffer, response;
>> // ...
>> co_await resp3::async_read(socket, dynamic_buffer(read_buffer)); // Hello (ignored).
>> co_await resp3::async_read(socket, dynamic_buffer(read_buffer), adapt(response)); // Set
>> co_await resp3::async_read(socket, dynamic_buffer(read_buffer)); // Quit (ignored)
>>
>> By recreating the dynamic buffer every time, you discard
>> the "native" capacity property from the dynamic buffer.
>>
>> Also I don't see a return value to indicate the current
>> message size

The async_read completion handler has the following signature
void(boost::system::error_code, std::size_t), where the second
argument is the number of bytes that have been read i.e. the size of
the message. Users can keep it if they judge necessary, namely

auto n = co_await resp3::async_read(...);

>> so I know what to discard from the current buffer. You
>> always discard the current message yourself (and a second
>> argument must be supplied if the user wants to save the
>> response).

Yes, Aedis will consume the buffer as it parses the message, in a
manner similar to the async_read_until - erase pattern [1]. Each new
chunk of data is handed to the user and erased afterwards, this is
efficient as it doesn't require much memory to read messages.

I believe I am not alone here, I've had a look at beast and it looks
like it also consumes message on its own, see
https://github.com/boostorg/beast/blob/17141a331ad4943e76933e4db51ef48a170aaf84/example/http/client/async/http_client_async.cpp#L136

The downside of the *async_read_until - erase* pattern however is that
it rotates the buffer constantly with x.consume(n)), adding some
latency.

However, this could be mitigated with a dynamic_buffer implementation
that consumes the content less eagerly (in exchange to increased
memory use).

>> If the current message was kept in the buffer then the
>> response could just hold string_views to it. What are
>> your thoughts?

As I said above, response adapters will have to act on each new chunk
of resp3 data, afterwards that memory will be overwritten with new
data when the buffer is rotated and won't be available anymore.

I consider this a good thing, for example, if you are receiving json
files from Redis, it is better to deserialize it as content becomes
available than keeping it in intermediate storage to be processed
later. I can't come up with a use case where this could be desirable.

In any case I made a lot of effort to avoid temporaries. I also don't
see a way around it, sooner or later the buffer will have to be
consumed.

Regards,
Marcelo

[1] https://www.boost.org/doc/libs/1_78_0/doc/html/boost_asio/reference/async_read_until/overload1.html


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk