44233

How to find all elements on the webpage through scrolling using SeleniumWebdriver and Python

Question:

I can't seem to get all elements on a webpage. No matter what I have tried using selenium. I am sure I am missing something. Here's my code. The url has at least 30 elements yet whenever I scrape only 6 elements return. What am I missing?

import requests import webbrowser import time from bs4 import BeautifulSoup as bs from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'} url = 'https://www.adidas.com/us/men-shoes-new_arrivals' res = requests.get(url, headers = headers) page_soup = bs(res.text, "html.parser") containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"}) print(len(containers)) #for each container find shoe model shoe_colors = [] for container in containers: if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None: shoe_model = container.div.div.img["title"] review = container.find('div', {'class':'gl-product-card__reviews-number'}) review = int(review.text) driver = webdriver.Chrome() driver.get(url) time.sleep(5) shoe_prices = driver.find_elements_by_css_selector('.gl-price') for price in shoe_prices: print(price.text) print(len(shoe_prices))

Answer1:

So there seems to be some difference in the results as using your <em>code trial</em>:

<ul><li>You find <strong>30</strong> items with <strong>requests</strong> and <strong>6</strong> items with <strong>Selenium</strong></li> <li>Where as I found <strong>40</strong> items with <strong>requests</strong> and <strong>4</strong> items with <strong>Selenium</strong></li> </ul>

This items on this website are dynamically generated through <a href="https://en.wikipedia.org/wiki/Lazy_loading" rel="nofollow">Lazy Loading</a> so you have to scrollDown and wait for the new elements to render within the <a href="https://www.w3schools.com/js/js_htmldom.asp" rel="nofollow">HTML DOM</a> and you can use the following solution:

<ul><li>

Code Block:

import requests import webbrowser from bs4 import BeautifulSoup as bs from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException, TimeoutException headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'} url = 'https://www.adidas.com/us/men-shoes-new_arrivals' res = requests.get(url, headers = headers) page_soup = bs(res.text, "html.parser") containers = page_soup.findAll("div", {"class": "gl-product-card-container show-variation-carousel"}) print(len(containers)) shoe_colors = [] for container in containers: if container.find("div", {'class': 'gl-product-card__reviews-number'}) is not None: shoe_model = container.div.div.img["title"] review = container.find('div', {'class':'gl-product-card__reviews-number'}) review = int(review.text) options = Options() options.add_argument('start-maximized') options.add_argument('disable-infobars') options.add_argument('--disable-extensions') driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get(url) myLength = len(WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "span.gl-price")))) while True: driver.execute_script("window.scrollBy(0,400)", "") try: WebDriverWait(driver, 20).until(lambda driver: len(driver.find_elements_by_css_selector("span.gl-price")) > myLength) titles = driver.find_elements_by_css_selector("span.gl-price") myLength = len(titles) except TimeoutException: break print(myLength) for title in titles: print(title.text) driver.quit() </li> <li>

Console Output:

47 $100 $100 $100 $100 $100 $100 $180 $180 $180 $180 $130 $180 $180 $130 $180 $130 $200 $180 $180 $130 $60 $100 $30 $65 $120 $100 $85 $180 $150 $130 $100 $100 $80 $100 $120 $180 $200 $130 $130 $100 $120 $120 $100 $180 $90 $140 $100 </li> </ul>

Answer2:

you have to slowly scroll down the page. It only request price data with ajax when product viewed.

options = Options() options.add_argument('--start-maximized') driver = webdriver.Chrome(options=options) url = 'https://www.adidas.com/us/men-shoes-new_arrivals' driver.get(url) scroll_times = len(driver.find_elements_by_class_name('col-s-6')) / 4 # (divide by 4 column product per row) scrolled = 0 scroll_size = 400 while scrolled < scroll_times: driver.execute_script('window.scrollTo(0, arguments[0]);', scroll_size) scrolled +=1 scroll_size += 400 time.sleep(1) shoe_prices = driver.find_elements_by_class_name('gl-price') for price in shoe_prices: print(price.text) print(len(shoe_prices))

Recommend

  • CSS animation not working in Internet Explorer 10 and 11
  • Oracle special characters
  • How to set parameter by name instead of its position in JDBC/JPA when calling stored procedure?
  • Calculating the Number of Patients in an ED by Hour
  • Spotify create playlist and add tracks
  • .Net Core 2.0 Webjob with Scoped Dependencies
  • Jupyter: Seaborn pairplot difficult to set graph dimensions for?
  • TeamCity: Scripting elements jsp:declaration, jsp:expression, jsp:scriptlet are disallowed here
  • Delete file on sd card from a listview
  • pyodbc fails without error
  • Unreadable characters displaying in ASP.NET MVC
  • How to pass a parameter to an included page with JSF 1.2
  • Stop the background service after particular time in android
  • In Python ElementTree how can I get list of all ancestors of an element in tree?
  • How to render a react component on any other page other than index.html
  • Order the result by best match on other field
  • ASP.NET: replacing UpdatePanel with Jquery?
  • autotest on ubuntu does nothing
  • WSO2 Identity Server + Rest STS Client (without ESB)
  • Validating a Firebase Key [duplicate]
  • Arraylist of strings into one comma separated string
  • What does “T extends Junk” mean in a generic class in Java?
  • How to select multiple items from a List view - JavaFX 8
  • How to resolve this in PHPUnit where it is asking me to set KERNEL_DIR in my phpunit.xml?
  • How do I use libcurl to printf a remote FTP directory listing?
  • Annotate objects in a queryset with next and previous object ids
  • Content-Type alternative in MQTT
  • How to clear a browser cache in Protractor
  • How to turn off notice reporting in xampp?
  • how to run ejabberd with Erlang on Heroku?
  • XEP-0166: Jingle protocol implementation for voice/video chat in iOS
  • Call Microservice from another Microservice within Docker
  • ReferenceError: TextEncoder is not defined