Boost logo

Boost :

From: Marcelo Zimbres Silva (mzimbres_at_[hidden])
Date: 2022-04-09 08:36:45


On Sat, 9 Apr 2022 at 03:22, Vinícius dos Santos Oliveira
<vini.ipsmaker_at_[hidden]> wrote:
>
> Can you clarify what you mean by "erased afterwards"?
> Afterwards when? Before or after the delivery to the user?
> When? I need to know when before I can comment much
> further.

Let me give you an example, a map data type with two elements looks
like this on the wire

   "%2\r\n$4\r\nkey1\r\n$6\r\nvalue1\r\n$4\r\nkey2\r\n$6\r\nvalue2\r\n"

The parser will start reading the message header with async_read_until
(\r\n) and see it is a map with size 2. This information will be
passed to the user by means of a callback (adapter in my examples)

   callback({type::map, 2, ...}, ...);

after that the read operation will consume the "%2\r\n" and the buffer
content will be reduced to

   "$4\r\nkey1\r\n$6\r\nvalue1\r\n$4\r\nkey2\r\n$6\r\nvalue2\r\n"

Reading the next element works likewise but now the element is a blob
type and not a map. The parser reads the header to know the size of
the blob (again with read_until) and "$4\r\n" is consumed, reducing
the buffer to

   "key1\r\n$6\r\nvalue1\r\n$4\r\nkey2\r\n$6\r\nvalue2\r\n"

then it reads the blob "key1" with a read of size 6 (two more to
consume \r\n) and passes that info to the user

   callback({type::blob_string, 1, 1, "key1"}, ...);

"key1\r\n" is then consumed resulting in a buffer

   "$6\r\nvalue1\r\n$4\r\nkey2\r\n$6\r\nvalue2\r\n"

The same procedure will be applied to the remaining elements until the
map is completely processed.

In simple words, as soon as data becomes available it is passed to the
user and consumed.

   callback(...)
   x.consume(n)

> It's not clear to me at all how aedis manages the buffer.
> If the buffer were an internal implementation detail (as
> in wrapping the underlying socket) I wouldn't care, but as
> it is... it's part of the public interface and I must
> understand how to use it.

Sure, does the explanation above make things clearer?

> Golang's bufio.Scanner implementation will avoid excessive
> memcopies to the head of the buffer by using a "moving
> window" over the buffer. It only uses the tail of the
> buffer to new read operations. Only when the buffer fully
> fills it'll memcpy the current message to the head of the
> buffer as to have more space.

That is something I would like to see in Asio. It would definitely
improve performance.

> The pattern to parse the textual protocol is simple: get
> message, process message, discard message.
>
> Upon accumulating a whole message, you decode its fields.

What is the point of accumulating the whole message if I am done with
what has already been read? If I am reading a Redis Hash with millions
of elements in a std::unordered_map<std::string, std::string>, I
prefer to get rid of processed data as soon as possible, releasing
memory for next reads.

We have the following scenarios

  1. read doesn't consume the buffer.

  2. read consumes the buffer after data has been passed to the user.

I find number 2. far more attractive

  a. Most users won't be interested in string views as data lifetime
will be too short. They will have to convert to owing strings anyway.

  b. Most users will prefer a custom serialization i.e. std::map<U,
V> instead of std::map<string_view, string_view> . Having the string
content available in the buffer will be pointless.

  c. Scenario 2. can still emulate 1. by passing a custom
implementation of dynamic_buffer where consume won't overwrite
elements but add an offset. It may however make more sense to add a
new buffer concept (as you mentioned below from go) rather than hack
an existing one.

Number 1. has the following problems

  e. It may be an issue for users that read large maps, lists, etc.
from Redis. There will be no opt-out, reading will never consume and
may result in higher memory consumption.

  f. It is not clear to me whether Asio buffer concepts support this
properly for RESP3 or whether I need my own buffer concept.

> Does redis's usage pattern feel similar to this? If it
> doesn't, then how does it differ? If it differs, I should
> reevaluate my thoughts for this discussion.

I hope these points were also addressed in the comments above. If not,
please ask.

> As for "[rather] than keeping it in intermediate storage",
> that's more complex. The deserialized object *is*
> intermediate storage. The question is: can I use pointers
> to the original stream to put less pressure on the
> allocator (even if we customize the allocator, the gains
> only accumulate)? For instance, suppose the deserialized
> object is map<string, string>:
>
> for (;;) {
> dynamic_buffer buf;
> map<string_view, string_view> result;
> auto message_size = read(socket, buf, result);
> process(result);
> buf.consume(message_size);
> }
>
> Now the container is cheaper.

Ditto. This is indeed something nice which I would like to support.
But as I said above I don't know how to achieve this with the current
asio buffers and whether this is really useful.

Regards,
Marcelo


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk