Boost logo

Boost :

Subject: Re: [boost] [Beast] Questions Before Review
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2017-06-26 14:47:16


>> Most users I should imagine would therefore build
>> scatter-gather lists on the stack as they'll be thrown away immediately,
>> indeed I usually feed it curly braced initializer_lists personally,
>
> Thus imposing limitation on the size of the buffer sequence.

The kernel imposes very significant limits on the size of the buffer
list anyway: some OSs as low as 16 scatter-gather buffers per i/o, and
as low as 1024 scatter-gather buffers in flight across the entire OS. So
when you initiate an async i/o, you may get a resource temporarily
unavailable error for even a single buffer, let alone two.

On top of that, even if the OS accepts more, the DMA hardware has a
fixed size buffer list capacity. 64 is not uncommon, and that's after
the kernel has split your virtual memory scatter-gather list into
physical memory plus added its own scatter-gather headers. So 32 buffers
is a very realistic limit, and 16 is the portable maximum.

(AFIO v2 doesn't involve itself whatsoever with any of this, it sends
what you ask for to the OS, and reports back whatever errors the OS does)

>> I think you are going to have to back up your claim that memory
>> copying all incoming data is faster rather than being bad implementation
>> techniques with discontinuous storage
>
> I'll let Kazuho back it up for me since I use his ideas:
> https://github.com/h2o/picohttpparser
>
> Here's the slide show explaining the techniques:
> https://www.slideshare.net/kazuho/h2o-20141103pptx
>
> And here is an example of the optimizations possible with linear
> buffers, which I plan to directly incorporate into Beast in the near
> future:
> https://github.com/h2o/picohttpparser/blob/2a16b2365ba30b13c218d15ed9991576358a6337/picohttpparser.c#L110

Ah, I see you're referring to SIMD. I thought you were claiming that
linear buffer based parsers were significantly faster than forward only
iterator based parsers.

You solve that problem by doing all i/o in multiples of the SIMD length,
and memcpy any tail partial SIMD length at the end of a partial i/o into
the next buffer. This avoids memory copying, yet keeps SIMD.

> Of course if you think you can do better I would love to see your
> working parser that operates on multiple discontiguous high quality
> ring buffered page aligned DMA friendly storage iterators so that I
> might compare the performance. The good news is that you can do so
> while leveraging Beast's message model to produce objects that people
> using Beast already understand. Except that you'll be producing them
> much, much faster (which is a win-win for everyone).

You're the person bringing the library before review, not me. If you
have a severe algorithmic flaw in your implementation, reviewers would
be right to reject your library. They did so with me for AFIO v1, so it
was on me to start AFIO again from scratch.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk