Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-04-06 16:27:52


Darren wrote:
> I think the library should really be separated into
> (for example) a cgi::service - which handles the protocol specifics - and
> cgi::request's.

I think I agree, except that 'cgi' is the wrong name; it's an http
request, which could be a CGI request or something else.

> I have high hopes that a
> good cgi::service template would allow the library to be extended to handle
> arbitrary cgi-based protocols, including 'standalone HTTP'

Yes, except again you need to swap that around; "standard HTTP" is not
a "CGI-based protocol", but the converse.

>>> Of particular interest:
>>> *should GET/POST variables be parsed by default?
>>
>> So the issue is can you be more efficient in the case when the variable
>> is not used by not parsing it? Well, if you're concerned about
>> efficiency in that case then you would be better off not sending the
>> thing in the first place. So I suggest parsing everything immediately,
>> or at least on the first use of any variable.
>
> I'd agree in theory, but automatic parsing would make it easy for a
> malicious user to cripple the server by just POSTing huge files wouldn't it?

A DOS attack of X million uploads of a file of size S is in most ways
equivalent to 10*X million uploads of a file of size S/10, or 100*X
million uploads of a file of size S/100. Where do you draw the line?
The place to avoid this sort of concern is with bandwidth throttling in
the front-end of the web server.

> There's also situations where a cgi program accepts large files and possibly
> parses them on the fly, or encrypts them or sends them in chunks to a
> database. As a real-world example, if you attach an exe to a gmail message,
> you have to wait for the whole file to be sent first before the server
> returns the reply that it's an invalid file type.

I think it's hard to avoid parsing the whole stream in order to know
which variables are present and that it's syntactically correct before
continuing. And I don't think you can control the order in which the
browser sends the variables. But if you can devise a scheme that
allows lazy parsing of the data, great! As long as it doesn't add any
syntactic complexity in the common case of a few small variables.

>>> *should cookie variables be accessible just like GET/POST vars, or
>>> separately?
>>
>> Separately
>
> Ok. Although I think direct access is important, I'm tempted to include an
> helper function like:
> cgi::param( /*name*/ ) // returns 'value'
> That would iterate over the GET/POST vars _as well as_ the cookie vars. I'll
> keep my eye open for objections to the idea.

I think that the recent fuss about "Javascript Hijacking" has
emphasised the fact that programmers need to be aware of whether they
are dealing with cookies, GET (URL) variables, or POST data. Cookies
set by example.com are returned to example.com even when the request
comes from a script element on a page served by bad.com. In contrast,
the bad.com page's script cannot see the GET or POST data that
example.com's page is sending.

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk