25802

Python 3.5 urllib.request 403 Forbidden Error

Question:

import urllib.request import urllib from bs4 import BeautifulSoup url = "https://www.brightscope.com/ratings" page = urllib.request.urlopen(url) soup = BeautifulSoup(page, "html.parser") print(soup.title)

<b>I was trying to go to the above site and the code keeps spitting out a 403 Forbidden Error.</b>

Any Ideas?

<blockquote>

C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\python.exe "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py" Traceback (most recent call last): File "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py", line 7, in page = urllib.request.urlopen(url) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen return opener.open(url, data, timeout) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 472, in open response = meth(req, response) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 582, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 510, in error return self._call_chain(*args) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain result = func(*args) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

</blockquote>

Answer1:

import requests from bs4 import BeautifulSoup url = "https://www.brightscope.com/ratings" headers = {'User-Agent':'Mozilla/5.0'} page = requests.get(url) soup = BeautifulSoup(page.text, "html.parser") print(soup.title)

out:

<title>BrightScope Ratings</title>

First, use reuqests rather than urllib.

Than, add headers to requests, if not, the site will ban your, because the default User-Agent is crawler, which the site do not like.

Recommend

  • Python3 threading, trying to ping multiple IPs/test port simultaineously
  • Lodash - Search Nested Array and Return Object
  • Getting a 403 from BufferedReader
  • Cannot retrieve repository metadata (repomd.xml) for repository: sandbox. Please verify its path and
  • PHP set_time_limit no effect
  • Get/Set “File History” (Windows 8) settings using C#
  • Creating account using accounts-password in meteor
  • Vagrant rsync error: Error: Could not create directory '/home/Eric/.ssh'
  • Set attribute to all types in XML Schema
  • How to redirect or show a page rather than “Forbidden” when i have directory listings off (htaccess/
  • Java Application vs. Java Desktop Application in Netbeans [duplicate]
  • import scipy.sparse failed
  • Unexpected token ILLEGAL while running node.js mocha test
  • Differences in dis-assembled C code of GCC and Borland?
  • Android application: how to use the camera and grab the image bytes?
  • ADO and msqli connections very slow
  • What and where is mdimport
  • Does it make sense to call System.gc() and Thread.sleep() when working on Bitmaps?
  • PHP buffered output depending on server setting?
  • How to make a tree having multiple type of nodes and each node can have multiple child nodes in java
  • Get object from AWS S3 as a stream
  • Cassandra Data Model
  • Build own AppleScript numerical error handling
  • Websockets service method fails during R startup
  • How can I estimate amount of memory left with calling System.gc()?
  • Google cloud sdk not working when python points python3
  • Is there a mandatory requirement to switch app.yaml?
  • Codeigniter doesn't let me update entry, because some fields must be unique
  • Acquiring multiple attributes from .xml file in c#
  • Hits per day in Google Big Query
  • FormattedException instead of throw new Exception(string.Format(…)) in .NET
  • How to CLICK on IE download dialog box i.e.(Open, Save, Save As…)
  • How to get Windows thread pool to call class member function?
  • Linking SubReports Without LinkChild/LinkMaster
  • How can I remove ASP.NET Designer.cs files?
  • Are Kotlin's Float, Int etc optimised to built-in types in the JVM? [duplicate]
  • XCode 8, some methods disappeared ? ex: layoutAttributesClass() -> AnyClass
  • How to get NHibernate ISession to cache entity not retrieved by primary key
  • costura.fody for a dll that references another dll
  • java string with new operator and a literal