Recently I’ve got a question:
How do I get pass the dynamic “load more” button using a Python web scraper?
Often we encounter dynamic web content with dynamic web elements like the following url.
Obviously (looking at the page-source) one can see the site is driven by JS React library. Python scraper does not make website to run its on-site JavaScript.
Moreover, when one presses “load more”, the HTTP POST request is generated by site’s JS with
access-control-allow-origin: https://execthread.com
One picture is better than thousand words:
Is it possible to make a similar request to spoof target server? No it’s not. See the Same Origin Policy here.
The solution is to drive a browser to perform all the actions.
Selenium comes to rescue
See the code with the FF browser driven by Selenium in Python:
from selenium import webdriver from time import sleep from selenium.webdriver import ActionChains driver = webdriver.Firefox() url="https://execthread.com/listings?q=all&sort=most%20relevant" driver.get(url) while(1): driver.execute_script("window.scrollTo(0,document.body.scrollHeight);") try: sleep( 1.5 )# time in seconds btn = driver.find_element_by_xpath("//*[text()='load more jobs']") print 'btn[load more]:', btn,'\n' ActionChains(driver).move_to_element(btn).click(btn).perform() print 'btn is clicked' except Exception as e: print 'Click Error:', e click = raw_input('want more to click? (y/n)') if click != 'y': break #b = raw_input('Press any button to close') driver.close()
Alternative ways to scroll down to the btn, provided btn is defined in the driver scope:
btn = driver.find_element_by_xpath("//*[text()='load more jobs']") #1. from selenium.webdriver import ActionChains ActionChains(driver).move_to_element(btn).perform() #2. driver.execute_script("arguments[0].scrollIntoView();", btn)
If the element is not present in the driver’s scope, then there comes Expected Condition:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By btn = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[text()='load more jobs']")))
If an element is a ‘stale’, not operating, seems the site’s JS restricts the number of usages of that element. My suggestion would be to reload the page and make another search (with other input data) and try over that element. If JS limits number of clicks on the button (usages) – we can’t do much…
One may explore a post of scraping dynamic website using regular [Python] requests to the sites API.
3 replies on “How do I get pass dynamic “load more” btn?”
Wow amazing post about selenium very impressive
Im pretty sure chrome-headless python library is now the standard and selenium is not the best way to do this anymore.
James,
Just googled to find out this project, https://github.com/miyakogi/pyppeteer. Where can I find the official Chrome-headless python library? Thanks!