Categories
Challenge Development

How do I get pass dynamic “load more” btn?

Recently I’ve got a question:

How do I get pass the dynamic “load more” button using a Python web scraper?

Often we encounter dynamic web content with dynamic web elements like the following url.

Obviously (looking at the page-source) one can see the site is driven by JS React library. Python scraper does not make website to run its on-site JavaScript.
Moreover, when one presses “load more”, the HTTP POST request is generated by site’s JS with

access-control-allow-origin: https://execthread.com

One picture is better than thousand words:access-control-allow-origin
Is it possible to make a similar request to spoof target server? No it’s not. See the Same Origin Policy here.

If the Same Origin Policy is not imposed, you might try to emulate “load more btn.” with regular scraping library. See an example.

The solution is to drive a browser to perform all the actions.

Selenium comes to rescue

See the code with the FF browser driven by Selenium in Python:

from selenium import webdriver
from time import sleep
from selenium.webdriver import ActionChains
driver = webdriver.Firefox()
url="https://execthread.com/listings?q=all&sort=most%20relevant"
driver.get(url)

while(1):
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    try:
        sleep( 1.5 )# time in seconds
        btn = driver.find_element_by_xpath("//*[text()='load more jobs']")
        print 'btn[load more]:', btn,'\n'
        ActionChains(driver).move_to_element(btn).click(btn).perform()
        print 'btn is clicked'
    except Exception as e:
        print 'Click Error:', e
    click = raw_input('want more to click? (y/n)')
    if click != 'y':
        break
    
#b = raw_input('Press any button to close')
driver.close()

Alternative ways to scroll down to the btn, provided btn is defined in the driver scope:

btn = driver.find_element_by_xpath("//*[text()='load more jobs']")
#1.
from selenium.webdriver import ActionChains
ActionChains(driver).move_to_element(btn).perform()
#2.
driver.execute_script("arguments[0].scrollIntoView();", btn)

If the element is not present in the driver’s scope, then there comes Expected Condition:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
btn = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[text()='load more jobs']")))

If an element is a ‘stale’, not operating, seems the site’s JS restricts the number of usages of that element. My suggestion would be to reload the page and make another search (with other input data) and try over that element. If JS limits number of clicks on the button (usages) – we can’t do much…

One may explore a post of scraping dynamic website using regular [Python] requests to the sites API.

3 replies on “How do I get pass dynamic “load more” btn?”

Leave a Reply to Mig Young Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.