Python to Save Web Pages


This is probably a very simple task, but I cannot find any help. I have a website that takes the form www.xyz.com/somestuff/ID. I have a list of the IDs I need information from. I was hoping to have a simple script to go one the site and download the (complete) web page for each ID in a simple form ID_whatever_the_default_save_name_is in a specific folder.

Can I run a simple python script to do this for me? I can do it by hand, it is only 75 different pages, but I was hoping to use this to learn how to do things like this in the future.


<a href="https://pypi.python.org/pypi/mechanize/" rel="nofollow">Mechanize</a> is a great package for crawling the web with python. A simple example for your issue would be:

import mechanize br = mechanize.Browser() response = br.open("www.xyz.com/somestuff/ID") print response

This simply grabs your url and prints the response from the server.


This can be done simply in python using the urllib module. Here is a simple example in Python 3:

import urllib.request url = 'www.xyz.com/somestuff/ID' req = urllib.request.Request(url) page = urllib.request.urlopen(req) src = page.readall() print(src)

For more info on the urllib module -> <a href="http://docs.python.org/3.3/library/urllib.html" rel="nofollow">http://docs.python.org/3.3/library/urllib.html</a>


Do you want just the html code for the website? If so, just create a url variable with the host site and add the page number as you go. I'll do this for an example with <a href="http://www.notalwaysright.com" rel="nofollow">http://www.notalwaysright.com</a>

import urllib.request url = "http://www.notalwaysright.com/page/" for x in range(1, 71): newurl = url + x response = urllib.request.urlopen(newurl) with open("Page/" + x, "a") as p: p.writelines(reponse.read())


