68836

Web-crawler for facebook in python

I am tring to work with web-Crawler in python to print the number of facebook recommenders. for example in this article from sky-news(http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine) there are about 60 facebook reccomends. I want to print this number in the python program with web-crawler. i tried to do this, but it doesn't print anything:

import requests from bs4 import BeautifulSoup def get_single_item_data(item_url): source_code = requests.get(item_url) plain_text = source_code.text soup = BeautifulSoup(plain_text) # if you want to gather information from that page for item_name in soup.findAll('span', {'class': 'pluginCountTextDisconnected'}): try: print(item_name.string) except: print("error") get_single_item_data(https://www.e-learn.cn/content/wangluowenzhang/"http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine")

Answer1:

The Facebook recommends loads in an iframe. You can follow the iframe src attribute to that page, and then load the span.pluginCountTextDisconnected's text:

import requests from bs4 import BeautifulSoup url = 'http://news.sky.com/story/1330046/are-putins-little-green-men-back-in-ukraine' r = requests.get(url) # get the page through requests soup = BeautifulSoup(r.text) # create a BeautifulSoup object from the page's HTML url = soup('iframe')[0]['src'] # search for the iframe element and get its src attribute r = requests.get('http://' + url[2:]) # get the next page from requests with the iframe URL soup = BeautifulSoup(r.text) # create another BeautifulSoup object print(soup.find('span', class_='pluginCountTextDisconnected').string) # get the directed information

The second requests.get is written as such due to the src attribute returning //www.facebook.com/plugins/like.php?href=http%3A%2F%2Fnews.sky.com%2Fstory%2F1330046&send=false&layout=button_count&width=120&show_faces=false&action=recommend&colorscheme=light&font=arial&height=21. I added the http:// and ignored the leading //.

<hr>

BeautifulSoup documentation Requests documentation

Answer2:

Facebook recommends are loaded dynamically from javascript, so they won't be available to your HTML parser. You will need to use the Graph API and FQL to get your answer directly from Facebook.

Here is a web console where you can explore queries once you have generated an access token.

Recommend

  • iPhone - Escape charecter issue in JSON parsing
  • How do I change the language for a credit card/paypal payment page in PayPal for a customer?
  • How to open a specific PayPal link in the PayPal iOS app from my iOS app
  • Merging two backbone collection and models into one object using underscore
  • MVC: How do you give a viewmodel a list and correctly output it on .cshtml
  • Separate float into digits
  • Hibernate Joda DateTime Sorting
  • changes in jquery 1.4.2 breaking the code?
  • text-align justify, cannot override
  • Angular - routerLinkActive and queryParams handling
  • Is it possible to get the word under the mouse cursor in a ``?
  • BeautifulSoup difference between findAll and findChildren
  • JPA flush vs commit
  • Grails calculated field in SQL
  • Set the selected item in dropdownlist in MVC3
  • Python CGI os.system causing malformed header
  • DomPDF {PAGE_NUM} not on first page
  • Lost migrations and Azure database is now out of sync
  • Insert into database using onclick function
  • AES padding and writing the ciphertext to a disk file
  • How would I use PHP exceptions to define a redirect?
  • How to convert from System.Drawing.Color to Excel.ColorFormat in C#? Change comment color
  • Why doesn't :active or :focus work on text links in webkit? (safari & chrome)
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • javascript inside java/jsp code
  • Validaiting emails with Net.Mail MailAddress
  • MySQL WHERE-condition in procedure ignored
  • htaccess rewriting URLs with multiple forward slashes
  • Display Images one by one with next and previous functionality
  • Why is the timeout on a windows udp receive socket always 500ms longer than set by SO_RCVTIMEO?
  • Web-crawler for facebook in python
  • Cannot Parse HTML Data Using Android / JSOUP
  • A cron job substitute?
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • trying to dynamically update Highchart column chart but series undefined
  • Java static initializers and reflection
  • unknown Exception android
  • Observable and ngFor in Angular 2
  • Unable to use reactive element in my shiny app
  • java string with new operator and a literal