Boost logo

Boost :

From: Darren Garvey (lists.drrngrvy_at_[hidden])
Date: 2007-04-06 17:28:31


On 06/04/07, Phil Endecott <spam_from_boost_dev_at_[hidden]> wrote:
>
> Darren wrote:
> > I think the library should really be separated into
> > (for example) a cgi::service - which handles the protocol specifics -
> and
> > cgi::request's.
>
> I think I agree, except that 'cgi' is the wrong name; it's an http
> request, which could be a CGI request or something else.

Well, I intended to have a template class like cgi::basic_cgi_service<>, for
instance. Then cgi::service would be a typedef for the basic cgi service. I
suppose a sensible typedef for an http-service would be cgi::http_service.
Would you find that misleading?

> I have high hopes that a
> > good cgi::service template would allow the library to be extended to
> handle
> > arbitrary cgi-based protocols, including 'standalone HTTP'
>
> Yes, except again you need to swap that around; "standard HTTP" is not
> a "CGI-based protocol", but the converse.

I probably should have said something like '... extended to handle arbitrary
protocols as long as they can be mapped to a cgi request'. The aim of the
library as I see it (feel free to disagree), is only to enable writing CGI
programs. The idea I mentioned of adding an 'http service' to basically give
you a standalone server would be dependent on the 'service' handling
everything by itself except for what it determines to be CGI requests. Only
these would be passed on to the program itself. I'd imagine the service in
this case would be housed in an external library, that sort of thing.

>>> Of particular interest:
> >>> *should GET/POST variables be parsed by default?
> >>
> >> So the issue is can you be more efficient in the case when the variable
> >> is not used by not parsing it? Well, if you're concerned about
> >> efficiency in that case then you would be better off not sending the
> >> thing in the first place. So I suggest parsing everything immediately,
> >> or at least on the first use of any variable.
> >
> > I'd agree in theory, but automatic parsing would make it easy for a
> > malicious user to cripple the server by just POSTing huge files wouldn't
> it?
>
> A DOS attack of X million uploads of a file of size S is in most ways
> equivalent to 10*X million uploads of a file of size S/10, or 100*X
> million uploads of a file of size S/100. Where do you draw the line?
> The place to avoid this sort of concern is with bandwidth throttling in
> the front-end of the web server.

This is something I honestly don't know about. I would have thought the
fewer connections an attacker needs to cause a DOS, the easier it is to do,
but I've frequently found my natural thoughts to be the polar opposite to
reality with this. I see what you're getting at Phil, maybe the concern is
misplaced, but I'm wary of washing over it just yet.

> There's also situations where a cgi program accepts large files and
> possibly
> > parses them on the fly, or encrypts them or sends them in chunks to a
> > database. As a real-world example, if you attach an exe to a gmail
> message,
> > you have to wait for the whole file to be sent first before the server
> > returns the reply that it's an invalid file type.
>
> I think it's hard to avoid parsing the whole stream in order to know
> which variables are present and that it's syntactically correct before
> continuing. And I don't think you can control the order in which the
> browser sends the variables. But if you can devise a scheme that
> allows lazy parsing of the data, great! As long as it doesn't add any
> syntactic complexity in the common case of a few small variables.

You can't control variable order, no. It's by no means trivial, but I think
it's quite doable. Keeping the common case simple and intuitive is the main
concern, but I'd like to try at incorporating this sort of delayed parser
very much.

>>> *should cookie variables be accessible just like GET/POST vars, or
> >>> separately?
> >>
> >> Separately
> >
> > Ok. Although I think direct access is important, I'm tempted to include
> an
> > helper function like:
> > cgi::param( /*name*/ ) // returns 'value'
> > That would iterate over the GET/POST vars _as well as_ the cookie vars.
> I'll
> > keep my eye open for objections to the idea.
>
> I think that the recent fuss about "Javascript Hijacking" has
> emphasised the fact that programmers need to be aware of whether they
> are dealing with cookies, GET (URL) variables, or POST data. Cookies
> set by example.com are returned to example.com even when the request
> comes from a script element on a page served by bad.com. In contrast,
> the bad.com page's script cannot see the GET or POST data that
> example.com's page is sending.

That's a good point. Security is obviously a big concern with CGI programs.
I've no intention of doing anything like tainting variables (a la Perl - I
tried incorporating this in the past, but I found it painful to work with),
but perhaps this sort of 'forced awareness' is worth the extra typing?

Regards,
Darren


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk