How do I get pass dynamic “load more” btn?

Recently I’ve got a question:

How do I get pass the dynamic “load more” button using a Python web scraper?

Often we encounter dynamic web content with dynamic web elements like the following url.

Obviously (looking at the page-source) one can see the site is driven by JS React library. Python scraper does not make website to run its on-site JavaScript.
Moreover, when one presses “load more”, the HTTP POST request is generated by site’s JS with

access-control-allow-origin: https://execthread.com

One picture is better than thousand words: access-control-allow-origin
Is it possible to make a similar request to spoof target server? No it’s not. See the Same Origin Policy here.

If the Same Origin Policy is not imposed, you might try to emulate “load more btn.” with regular scraping library. See an example.

The solution is to drive a browser to perform all the actions.

Selenium comes to rescue

See the code with the FF browser driven by Selenium in Python:

from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains
driver = webdriver.Firefox()
url="https://execthread.com/listings?q=all&sort=most%20relevant"
driver.get(url)

while(1):
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    try:
        sleep( 1.5 )# time in seconds
        btn = driver.find_element_by_xpath("//*[text()='load more jobs']")
        print 'btn[load more]:', btn,'\n'
        ActionChains(driver).move_to_element(btn).click(btn).perform()
        print 'btn is clicked'
    except Exception as e:
        print 'Click Error:', e
    click = raw_input('want more to click? (y/n)')
    if click != 'y':
        break
    
#b = raw_input('Press any button to close')
driver.close()

Alternative ways to scroll down to the btn, provided btn is defined in the driver scope:

btn = driver.find_element_by_xpath("//*[text()='load more jobs']")
#1.
from selenium.webdriver import ActionChains
ActionChains(driver).move_to_element(btn).perform()
#2.
driver.execute_script("arguments[0].scrollIntoView();", btn)

If the element is not present in the driver’s scope, then there comes Expected Condition:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
btn = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[text()='load more jobs']")))

If an element is a ‘stale’, not operating, seems the site’s JS restricts the number of usages of that element. My suggestion would be to reload the page and make another search (with other input data) and try over that element. If JS limits number of clicks on the button (usages) – we can’t do much…

One may explore a post of scraping dynamic website using regular [Python] requests to the sites API.

3 replies on “How do I get pass dynamic “load more” btn?”

Wow amazing post about selenium very impressive

Im pretty sure chrome-headless python library is now the standard and selenium is not the best way to do this anymore.

James,

Just googled to find out this project, https://github.com/miyakogi/pyppeteer. Where can I find the official Chrome-headless python library? Thanks!

Selenium comes to rescue

3 replies on “How do I get pass dynamic “load more” btn?”

Leave a Reply Cancel reply