Boost logo

Boost :

From: Darren Garvey (lists.drrngrvy_at_[hidden])
Date: 2007-04-06 11:55:12


Hi Phil,

On 06/04/07, Phil Endecott <spam_from_boost_dev_at_[hidden]> wrote:
>
> I have some GPL code that does this sort of thing; you're welcome to
> look at it, and I'm not fussy about the license if you want to re-use
> any of it for a Boost submission. This code has evolved to meet my
> needs, and isn't the sort of thing that you would write if you were
> starting from scratch.
>

Thank you, sir! I'm aiming for a middle-ground, so seeing your code is very
helpful.

[snip]
>
I think this is more "boostified"! For example, it has a
> get_as<T>(name) method that will lexical_cast the parameter to the
> required type.

I suppose it could be argued that a wrapper for lexical_cast is
out-of-scope, but it's likely I'll include one - unless there's strong
criticism - due to the amount of use it'd probably get.

I have also used the Apache module API, and have written standalone
> HTTP servers. If I was doing all this from scratch I'd try to do
> something that would be equally applicable in any of these situations
> (and also things like FCGI as someone else has suggested). I.e. you
> want to define an 'HTTP Request' object, which has ways of accessing
> the form data, not an explicitly 'CGI' object. (There is actually an
> HttpRequest class in
> http://svn.chezphil.org/libpbe/trunk/include/HttpRequest.hh, but I
> haven't used it in combination with form data. There is also a Spirit
> parser for HTTP requests in
> http://svn.chezphil.org/libpbe/trunk/src/Request.cc.)

I completely agree here. I think the library should really be separated into
(for example) a cgi::service - which handles the protocol specifics - and
cgi::request's. I haven't a proof-of-concept, but I have high hopes that a
good cgi::service template would allow the library to be extended to handle
arbitrary cgi-based protocols, including 'standalone HTTP', almost
transparently to user code.

> Of particular interest:
> > *should GET/POST variables be parsed by default?
>
> So the issue is can you be more efficient in the case when the variable
> is not used by not parsing it? Well, if you're concerned about
> efficiency in that case then you would be better off not sending the
> thing in the first place. So I suggest parsing everything immediately,
> or at least on the first use of any variable.

I'd agree in theory, but automatic parsing would make it easy for a
malicious user to cripple the server by just POSTing huge files wouldn't it?
There's also situations where a cgi program accepts large files and possibly
parses them on the fly, or encrypts them or sends them in chunks to a
database. As a real-world example, if you attach an exe to a gmail message,
you have to wait for the whole file to be sent first before the server
returns the reply that it's an invalid file type. I may be missing something
but this seems like a significant problem, despite it largely being ignored
(I think).

> *how should GET/POST variables be accessed?
>
> As a map<string,string>, or similar.

Noted. I'm curious if unordered_map would be more efficient, but that's an
implementation detail. I'll have to see.

> *should cookie variables be accessible just like GET/POST vars, or
> > separately?
>
> Separately, but again in a map-like name/value thing, e.g.
>
> struct HttpRequest {
> map<string,string> cgi_vars;
> map<string,string> cookies;
> map<string,string> http_headers;
> ...
> }

Ok. Although I think direct access is important, I'm tempted to include an
helper function like:
cgi::param( /*name*/ ) // returns 'value'
That would iterate over the GET/POST vars _as well as_ the cookie vars. I'll
keep my eye open for objections to the idea.

> *should the CGI environment variables each have explicit functions for
> their
> > access, or should (eg.) a generic cgi::get_env() function be used?
>
> I listed them all explicitly in my CgiVars implementation, rather than
> adding another map<string,string> to the HttpRequest. I think that my
> motivation was to get a compile-time error if I mis-remembered the
> variable name (i.e. vars.remote_host vs. vars["REMOTE_HOST"]).

That's the way I'm leaning too.

> *url decoding functions are needed for GET/POST variables.
>
> Internal to your code, of course. I don't want to see them.

Of course. ;)

> Should url encoding functions also be provided?
>
> If you want, but in a separate header file. (And be sure you know
> exactly what encoding you're doing...)

Noted. This is tricky but I suppose it's a non-vital component. Adding it
sounds like fun, unfortunately...

> *how transparent should user code be to internationalization/different
> > character sets?
>
> I think that in the case of url-encoded data it's hard to be certain of
> the character set in use. In the MIME case that data is explicitly
> available, and you should make it accessible to the user. I think you
> can also get content-type information. I'm not sure how best to fit
> that into the map<string,string> scheme. We really need:
>
> class string_with_charset {
> string s;
> charset_t charset;
> };

Sounds about right. I think an awareness of content-types would be very
useful too. I'll have to be careful to not stray out of this library's scope
(which should really be quite tight, imo), but awareness of the issue should
be included at least.

and then your CGI parameters can be map<string,string_with_charset>.
> Or something more complex to handle content-types as well. Is there a
> Boost wrapper for iconv yet? What about MIME handling? I don't think
> either has been doone; maybe you'd like to do those too.

I don't think either of those have been 'boosted' yet. Having access to them
might make my life easier later on, but we'll see. :)

Thanks for the input,
Darren


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk