|
Boost : |
From: David Abrahams (david.abrahams_at_[hidden])
Date: 2002-02-19 09:35:52
Hi,
Beman and I are trying to prepare Boost for a move from YahooGroups to
Mailman, and I'm afraid I need some help from someone who knows a bit more
about www protocols than I do.
I want to collect an archive of past messages. Beman wrote a simple Python
script
a few months ago which would download the yahoogroups web pages for the
messages using urllib. Unfortunately, yahoogroups added periodic redirection
to pages containing advertisements, so the old script doesn't work. The
nature of the beast is that if you visit
http://groups.yahoo.com/group/boost/message/1000 in your web browser, you'll
often end up at
http://groups.yahoo.com/group/boost/auth?done=%2Fgroup%2Fboost%2Fmessage%2F1
000 instead, a page containing an advertisement. The latter page contains a
link to "/group/boost/message/1000" which always takes you to the right
place. It looks to me as though that link needs to be in the context of the
ad page in order to work properly, because I can't figure out how to make
urllib retrieve the right one. I'm sure I'm just missing something simple.
Any help appreciated,
Dave
-- +---------------------------------------------------------------+ David Abrahams C++ Booster (http://www.boost.org) O__ == Pythonista (http://www.python.org) c/ /'_ == resume: http://users.rcn.com/abrahams/resume.html (*) \(*) == email: david.abrahams_at_[hidden] +---------------------------------------------------------------+
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk