Categories
Development

Node.js to automate a browser XHR (Ajax)

Lately I needed to scrape some data that are dynamically loaded by “Load more” button. A website JavaScript invokes XHR (or Ajax request) to fetch a next data portion. So, the need was to re-run those XHR with some POST parameters as variables.

So, how to make it in Node.js?

1. Replay XHR and fetch it from Dev. tools

We copy the XHR from Developer tools (F12) as a Node.js fetch() function call:

The result will be something like the following:

fetch("https://www.xxx.com/graphql", {
  "headers": {
    "accept": "application/json, text/plain, */*",
    "accept-language": "en-US,en;q=0.9,ru;q=0.8",
    "content-type": "application/json",
    "sec-ch-ua": "\" Not;A Brand\";v=\"99\", \"Google Chrome\";v=\"91\", \"Chromium\";v=\"91\"",
    "sec-ch-ua-mobile": "?0",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-origin", 
    "x-original-url": "https://www.xxx.com/search/page/2?q=yyy", 
    "cookie": "wlw_client_id=rBEAAmC2BOKcHADzAxVeAg==;..."
  },
  "referrer": "https://www.xxx.com/search/page/2?q=B%C3%BCrom%C3%B6bel&supplierTypes=Hersteller",
  "referrerPolicy": "origin-when-cross-origin",
  "body": "{\"query\":\"query( $ufsSessionId: String, $searchMethod: String, $q: String, $conceptSlug: String, $citySlug: String, $filters: [FilterInput!], $page: Int, $sort: String, $tracking: Boolean, $disableCityExtraction: Boolean, $cityExtractionRadius: String     }    }\",\"variables\":{\"ufsSessionId\":\"9dcdece22e1f0a694730ea638e4996bb\",\"searchMethod\":\"direct\",\"locale\":\"de\",\"q\":\"Büromöbel\",\"filters\":[{\"name\":\"Lieferantentyp\",\"selected\":[\"Hersteller\"]}],\"page\":2,\"sort\":\"best\",\"tracking\":true,\"disableCityExtraction\":false,\"cityExtractionRadius\":\"50km\"}}",
  "method": "POST",
  "mode": "cors"
});

2. The full code

Wrap up that fetch() call with an asynchronous function. We also change the parameter value using the wrapper function input variable, named i. The data is written into a file using a file stream.

async function  getdata(i){ 
	fetch("https://www.xxx.com/graphql", {
	  "headers": {
		"accept": "application/json, text/plain, */*"
	  }, 
	  "variables ":{ "page":"+ i + "}",
	  "method": "POST",
	  "mode": "cors"
	}).then(response => response.json())
	  .then((value) => {
			let companies = value.data.search.companies;
			//console.log( companies );			
			let names=[]; // we fetch names only
			console.log( '\nPage: ', i);
			for(c of companies){
				console.log( c['name']);
				names.push( c['name']);
			} 
			names.forEach( function (item, index) {
				stream.write(item + "\n");
			});  
	  }); 
}

The whole code you might load below.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.