Categories
Development

Scrape a JS Lazy load page by Python requests

The JS loading page is usually scraped by Selenium or another browser emulator. Yet, for a certain shopping website we’ve
found a way to perform a pure Python requests scrape.

The website performs simple GET XHR (Ajax) requests to upload more of its data onto a page as a user scrolls down. The data are supplied as JSON, which greatly simplifies the parsing process:
Scrape lazy load

Here is the code:

import requests
def get_source(page_num = 1):
        url = 'https://www.ajio.com/api/category/830216001?fields=SITE&currentPage={}&pageSize=45&format=json&query=%3Arelevance%3Abrickpattern%3AWashed&sortBy=relevance&gridColumns=3&facets=brickpattern%3AWashed&advfilter=true'

        res = requests.get(url.format(1),headers={'User-Agent': 'Mozilla/5.0'})
        if res.status_code == 200 :
                return res.json()
# data = get_source(page_num = 1)
# total_pages = data['pagination']['totalPages'] # total pages are 111
prodpage = []
for i in range(1,112):
        print(f'Getting page {i}')
        data = get_source(page_num = i)['products']
        for item in data:
                prodpage.append('https://www.ajio.com{}'.format(item['url']))
        if i == 3: break
print(len(prodpage)) # output 135 for 3 pages 

The code was provided by Ahmed Soliman.


You might want to read a post of how to scrape dynamic website thru regular [Python Scrapy] requests to its API.

One reply on “Scrape a JS Lazy load page by Python requests”

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.