From: Carl Daniel (cpdaniel_at_[hidden])
Date: 2002-02-26 00:16:19
Cool script :) Too bad it won't work on windows :(
It is reading the raw response from the HTTP socket into a string an extracting the <xhtml><tt> tag (same as I did) -
the spaces are being removed at the source, AFAICT. (Although I don't know the Perl HTML package at all - perhaps your
idea would work. I'd try it, but it won't work on any system I have).
----- Original Message -----
From: "David Abrahams" <david.abrahams_at_[hidden]>
Sent: Monday, February 25, 2002 8:19 PM
Subject: Re: [boost] http savvy?
> OK, got any Perl savvy?
> We already have a way to download all the messages thanks to Carl Daniel,
> but unfortunately the whitespace following <BR> is always stripped by the
> time it gets to his site, which kills the readability of source code. At my
> site I get the whitespace, but I can't get his download software. I tried
> the script, but it kills whitespace, too. A little poking at it (I have ZERO
> perl expertise) tells me that it's nontrivial. My guess is that the best way
> is to grab the whole page as a string and use a regexp to preprocess it by
> replacing space characters with   in any line that starts with a space,
> then pass it through HTML::TokeParser.
> ----- Original Message -----
> From: "Douglas Gregor" <gregod_at_[hidden]>
> To: <boost_at_[hidden]>
> Sent: Monday, February 25, 2002 10:06 PM
> Subject: Re: [boost] http savvy?
> > On Tuesday 19 February 2002 06:57 pm, you wrote:
> > > It's being handled. I'll have them all in mbox format by tomorrow
> > >
> > > Here's a reference on the mbox format:
> > >
> > > -cd
> > It's probably all finished by now, but I just ran across this:
> > http://www.lpthe.jussieu.fr/~zeitlin/yahoo2mbox.html
> > Doug
> > Info: http://www.boost.org Send unsubscribe requests to:
> > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
> Info: http://www.boost.org Send unsubscribe requests to: <mailto:boost-unsubscribe_at_[hidden]>
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
Boost list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk