Boost logo

Boost :

From: Vinnie Falco (vinnie.falco_at_[hidden])
Date: 2023-02-07 12:19:22


On Mon, Feb 6, 2023 at 11:40 PM Andrzej Krzemienski <akrzemi1_at_[hidden]> wrote:
> Let me offer some further thoughts.

Yes these are very good :)

> 1. It is more of a question. These buffers::sink and buffers::source being mentioned in this thread: are they already present in one of the Boost libraries (ASIO, Beast)?

These are new. The motivation for these types is originally for
type-erasing certain HTTP bodies during parsing or serialization in
the new Http.Proto library
(https://github.com/CPPAlliance/http_proto). This library implements
HTTP/1 and makes some different design choices which aim to fix some
inherent defects in Beast's design.

> The only thing that multiple libraries (Boost and non-Boost) would benefit from is the buffer interface (boost::sink, boost::source). No buffer implementations, no buffer algorithms.

The situation in Beast is that it is actually several libraries in one:

1. "sans-IO" HTTP/1
2. HTTP/1 on Asio
3. "sans-IO" Websocket
4. Websocket on Asio
5. Buffer algorithms and containers
6. Asio utilities
7. C++ port of ZLib (!)

The "core" directory contain the files for items 5 and 6 above:

<https://github.com/boostorg/beast/tree/341ac7591b2b023c81de13312a80d1e824742a1c/include/boost/beast/core>

In theory there is nothing wrong with aggregating these things into
one library. But for practical reasons, quite frankly it sucks. It
takes forever for CI to turn around, the docs end up taking longer to
build because there is so much more stuff, and it just gives off a
bulky vibe that isn't fun to work with.

One of my goals for my new generation of libraries (which intend to
replace Beast) is to design things in a way that they do more with
less API and implementation. Less "try-hard" so to speak. In
Http.Proto I tried very hard to stay away from having to need these
various buffer implementations and concepts but in the end it proved
unworkable. It turns out that these buffer algorithms and containers
are just so damn useful that even in a "sans-IO"
(https://sans-io.readthedocs.io/) library they end up being the
correct choice.

Okay fine so I brought back in some selected buffer related things to
Http.Proto but only as implementation details and private interfaces.
No problem the library stays lean. But... oh well that didn't work out
so well either because as it turns out, using buffer sequences to
define the HTTP body is very useful and that brought me right back to
where I started which is that the HTTP protocol library API benefits
from buffer concepts. And the user benefits from having
implementations of buffers on hand.

Specifically, HTTP/1 message body serialization and parsing should support:

    Three body styles for `serializer`
        1 Specify a ConstBufferSequence
        2 Specify a Source
        3 Write into a serializer::stream

    Three body styles for `parser`
        4 Specify a DynamicBuffer
        5 Specify a Sink
        6 Read from a parser::stream

Boost.Buffers fulfills the API requirements for achieving 1,2,4, and 5 above.

> 3. There are a number of interfaces, where everyone can plug their type, in the STD and Boost that have an overlap. We have the IOStream interface, we have Boost.Serialization interface, std::format is coming, and we have now Boost.Buffer being proposed. Can you make a clear distinction why Boost.Buffers is different? Why do we need another one? Are previous ones defective (and can be superseded), or do they play a different, incompatible role?

Yes.

std::istream, std::ostream: These are actually pretty good substitutes
for source and sink. They are in the standard already. They perform
type-erasure. And they come with implementations (e.g. stringstream,
ofstream). They could in theory work, and many types already support
operator<< to std::ostream& so these could be generically used. Good
thinking Andrzej :) But it is not without problems. It's got weird
error handling and signaling for end of stream. Implementing your own
istream or ostream can be difficult. It isn't design from the ground
up for the buffer-oriented interface They are biased for character
-based output and part of their interface has to do with formatting
(which is out of scope for Boost.Buffers).

Boost.Serialization: This is an entirely different thing from
buffer-oriented exchange of data. Rather it is about defining
algorithms for transporting types to and from an "archive" at full
fidelity. This is out of scope. Boost.Buffers could have a say in how
an archive represented as zero or more contiguous spans of bytes might
be transported or streamed from one API to another. But it has nothing
to say (nor cares) about how the user defined types map to or from
those bytes.

std::format: Kind of the same situation as Boost.Serialization. It
defines a way to convert types into ASCII text and substitute that
text into a larger corpus, which is out of scope. Boost.Buffers could
have a say in how the buffers produced by std::format might be
transported to another API.

> I suppose that buffers have much to do with buffering, that is working with chunks of messages. But how does this work with JSON? Can you even start thinking about parsing JSON if you only have a part, broken at an arbitrary position?

As a matter of fact... you can :) Boost.JSON is unique in that it is
the only JSON library which comes with a streaming parser and a
streaming serializer, to allow buffer-at-a-time processing. This is
essential for network programs to provide fairness. Specifically the
streaming interface allows the implementor to restrict the amount of
work performed when serializing or parsing JSON, and spread the
computational requirements of handling large JSON texts across
multiple I/O cycles so that one connection does not monopolize a
thread. You can see those interfaces here:

<https://www.boost.org/doc/libs/1_81_0/libs/json/doc/html/json/input_output.html#json.input_output.parsing.streaming_parser>

<https://www.boost.org/doc/libs/1_81_0/libs/json/doc/html/json/ref/boost__json__serializer/read/overload1.html>

> Also, IOStreams have a layer of buffers. What is the relation there (Between IOStream buffers and the proposed Boost.Buffers)?

I have to be honest I find the IOStreams interfaces incredibly
confusing with the buffers and the "controlled sequence" and the g
pointer and the p pointer and the.. well, you get the point. Maybe you
could tell me what the relationship is, if any, as I am not quite
sure...

> 4. Would Boost.Buffers satisfy my every use case for buffers?

Well I don't know. You'd have to list the use-cases :) My recipe for
this library was to start with Asio concepts, add in some
implementations which end up being needed often, and add my own
buffer-oriented filter, source, and sink abstract interfaces. If these
are insufficient for a particular use case I would need to study the
use-case and figure out if it is the right fit for Buffers and how it
could be satisfied.

> Are buffers only about how you allocate a new chunk of memory?

No. There are roughly three areas of interest:

1. range-like sequences of contiguous storage:
  - ConstBufferSequence, MutableBufferSequence
  - buffers::const_buffer, buffers::mutable_buffer

2. stream-like controlled buffers:
  - DynamicBuffer
  - buffers::circular_buffer, buffers::flat_buffer, buffers::string_buffer

3. abstract buffering interfaces:
  buffers::filter, buffers::source, buffers:sink

In 1 the ranges are static, buffers can't change size nor can the
range change size. The concepts are akin to `span<span<char> const>
const`. In 2 the controlled buffers can grow (via prepare/commit) and
shrink (via consume). Depending on the implementation this could
allocate memory (or not). For 3 these interfaces define how buffers
are passed from one program interface to another, when operating a
buffer-at-a-time processing algorithm.

> Or are they about identifying a place where you can cut your message into meaningful portions?

No, but if you have already cut your message into zero or more
contiguous bytes of storage then Boost.Buffers can help you do things
with it.

Thanks


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk