Boost logo

Boost :

Subject: Re: [boost] C++ Networking Library Release 0.5
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2010-02-01 16:49:25


Hi Peter,

Peter Petrov wrote:
> On Mon, Feb 1, 2010 at 1:26 PM, Glyn Matthews <glyn.matthews_at_[hidden]>wrote:
>> On 31 January 2010 12:08, Phil Endecott <spam_from_boost_dev_at_[hidden]
>> >wrote:
>> > James Mansion wrote:
>> >> Phil Endecott wrote:
>> >>> I have an HTTP request parser using Spirit, if you're interested. It is
>> >>> a bit grotty as I wrote it as my first exercise using Spirit - but it does
>> >>> work. http://svn.chezphil.org/libpbe/trunk/src/parse_http_request.cc.
>> >>>
>> >> Out of interest, is the parser suitable to use as a tutorial on how to
>> >> translate from RFC specs?
>> >>
>> >
>> > You're welcome to use it in that way if you wish. Most of it was translated
>> > directly from the BNF in the RFCs.
>> >
>>
>> I would say that this is on-topic as it is an issue that we face in
>> implementing cpp-netlib. Currently, the request parser in the HTTP server
>> is taken from Boost.Asio HTTP example but I'm certain that this can be
>> improved.
>>
>>
>
> Let me chime in, as I've recently developed an Asio-based HTTP server as
> well.
>
> First, Spirit is unsuitable for the task - it consumes all the input in one
> pass, and doesn't support the case when the HTTP request arrives in more
> than one read. The real solution is a state-machine-based parser, just like
> the one in the Asio HTTP example.

I disagree in general. My parser is primarily an HTTP request _header_
parser, and the headers are normally relatively small. For most
requests (i.e. GETs) the request body doesn't add much, and in those
cases it is likely that the whole request can be got in a single read.
In fact browser implementations go to some lengths to make their
requests fit in single network packets (about 1500 bytes) for
performance reasons, and single network packets will generally be
accessible as single reads.

I normally use this code in a thread-per-connection environment, but if
you wanted to use it in a single-threaded system you would need to
modify it to detect incomplete input in the (rare) case when the input
was split over multiple packets.

In the case of HTTP POST and PUT requests, on the other hand, the body
(but not the header) can be large, and parsing it incrementally as it
arrives probably is necessary. I noticed a BoostCon paper about a MIME
parser (Marshall?) - this would definitely benefit from working
incrementally in many applications.

> In my case, I used an automatically generated parser from EBNF, via Ragel (
> http://www.complang.org/ragel/). The grammar itself I "borrowed" from the
> sandbox version of Lighttpd, which uses the same approach. Link:
>
> http://redmine.lighttpd.net/projects/lighttpd-sandbox/repository/revisions/master/entry/src/main/http_request_parser.rl
>
> Ragel is the best solution I'm aware of, and it's easy to integrate its
> output into Boost-style C++ code. I've not yet benchmarked my solution
> against the Asio HTTP example parser for performance, but I assume they are
> close.

This is interesting, and I'll have a look at it next time I need to do
some BNF-like parsing.

Regards, Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk