Recently I’ve got a question:
How do I get pass the dynamic “load more” button using a Python web scraper?
Often we encounter dynamic web content with dynamic web elements like the following url.
Moreover, when one presses “load more”, the HTTP POST request is generated by site’s JS with
One picture is better than thousand words:
Is it possible to make a similar request to spoof target server? No it’s not. See the Same Origin Policy here.
The solution is to drive a browser to perform all the actions.
Selenium comes to rescue
See the code with the FF browser driven by Selenium in Python:
from selenium import webdriver from time import sleep from selenium.webdriver import ActionChains driver = webdriver.Firefox() url="https://execthread.com/listings?q=all&sort=most%20relevant" driver.get(url) while(1): driver.execute_script("window.scrollTo(0,document.body.scrollHeight);") try: sleep( 1.5 )# time in seconds btn = driver.find_element_by_xpath("//*[text()='load more jobs']") print 'btn[load more]:', btn,'\n' ActionChains(driver).move_to_element(btn).click(btn).perform() print 'btn is clicked' except Exception as e: print 'Click Error:', e click = raw_input('want more to click? (y/n)') if click != 'y': break #b = raw_input('Press any button to close') driver.close()
Alternative ways to scroll down to the btn, provided btn is defined in the driver scope:
btn = driver.find_element_by_xpath("//*[text()='load more jobs']") #1. from selenium.webdriver import ActionChains ActionChains(driver).move_to_element(btn).perform() #2. driver.execute_script("arguments.scrollIntoView();", btn)
If the element is not present in the driver’s scope, then there comes Expected Condition:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By btn = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[text()='load more jobs']")))
If an element is a ‘stale’, not operating, seems the site’s JS restricts the number of usages of that element. My suggestion would be to reload the page and make another search (with other input data) and try over that element. If JS limits number of clicks on the button (usages) – we can’t do much…