Tag: Node.js

Node.js to automate a browser XHR (Ajax)

Post author By admin
Post date 23.09.2023
No Comments on Node.js to automate a browser XHR (Ajax)

Lately I needed to scrape some data that are dynamically loaded by “Load more” button. A website JavaScript invokes XHR (or Ajax request) to fetch a next data portion. So, the need was to re-run those XHR with some POST parameters as variables.

So, how to make it in Node.js?

Tags automation, Node.js

Challenge Development

Node.js, Python & Ruby Bots Zoo repo

Post author By admin
Post date 08.03.2023
No Comments on Node.js, Python & Ruby Bots Zoo repo

Today, I got in touch with the Node.js [and Python] bots garden/zoo providing modern bots with different kinds of browsers (Firefox, Chrome, Headless/not headless) using different automation frameworks (Puppeteer, Selenium, Playwright) in several programming languages.

Tags Node.js, Python, scrape detection

Development

Puppeteer async scraper with browsers number to be tuned based on CPU capacity

Post author By admin
Post date 09.02.2023
No Comments on Puppeteer async scraper with browsers number to be tuned based on CPU capacity

Recently we’ve got a tricky website of dynamic content to scrape. The data are loaded thru XHRs into each part of the DOM (HTML markup). So, the task was to develop an effective scraper that does async while using reasonable CPU recourses.

Tags automation, Javascript, Node.js

Development

MERN Stack – Build a Film Hall Application

Post author By admin
Post date 14.12.2022
No Comments on MERN Stack – Build a Film Hall Application

What is MERN?

The MERN stack is a set of frameworks and tools used for developing a software product. They are very specifically chosen to work together in creating a well-functioning software (see a MERN app code at the post bottom).

Tags Node.js, React.js

Development

Redirect Node.js console output into file

Post author By admin
Post date 22.03.2021
No Comments on Redirect Node.js console output into file

node.exe index.js > scrape.log 2>&1

When executing file index.js we redirect all the console.log() output from console into a file scrape.log .

Tags Node.js

Development

Node.js Cheerio scraper, replace element

Post author By admin
Post date 23.02.2021
No Comments on Node.js Cheerio scraper, replace element

let table = $('table');
if ($(table).has('br')) {  				     
    $("br").replaceWith(" ");
}

Tags Cheerio, Node.js

Development

Puppeteer Stealth to prevent detection

Post author By admin
Post date 05.02.2021
No Comments on Puppeteer Stealth to prevent detection

In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.

Tags Node.js, Puppeteer, scrape detection

Development

How to check if a target page loads data thru XHR (Ajax)

Post author By admin
Post date 22.01.2021
No Comments on How to check if a target page loads data thru XHR (Ajax)

When performing web scaping I first need to evaluate a site’s difficulty level. That is how difficult is it for the scrape procedures? Do its pages make extra XHR (Ajax) calls? Based on that I choose whether to use (1) Request scraper (eg. Cheerio) or (2) Browser automation scraper (eg. Puppeteer).

So, I’ve discovered an Apify Web Page Analyzer, a free scraper agent that analyses a target site and returns inclusive JSON data of the target web page. The presence of XHR (AJAX) helps me to decide what type of crawler to use for scraping that website.

Tags Node.js, scraper

Development

Cheerio scraper escapes special symbols with html entities when performing .html()

Post author By admin
Post date 01.12.2020
1 Comment on Cheerio scraper escapes special symbols with html entities when performing .html()

As developers scrape data off the web, we use Node.js along with handy Cheerio scraper. When fetching .html() Cheerio parser returns the special symbols as HTML encoded entities, eg.:
ä as ä
ß as ß

Cheerio developer vindication of the parser action

(1) It’s not the job of a parser to preserve the original document.
(2) .html() returns an HTML representation of the parsed document, which doesn’t have to be equal to the original document.
source.

Tags Cheerio, Javascript, Node.js

Development

Node.js, mariaDB, save data & bulk save

Post author By admin
Post date 19.11.2020
No Comments on Node.js, mariaDB, save data & bulk save

Imstall mariadb package:

npm i mariadb

The code

const config = require("./config");
const db = config.database;
const mariadb = require('mariadb');
const pool = mariadb.createPool({
     host: db.host,
	 user: db.user,
	 password: db.password,
	 database: db.database,
     connectionLimit: 5
});
 
async function asyncSaveDataDB(data) {
  let conn;
  try {
	conn = await pool.getConnection();
	const rows = await conn.query("SELECT 1 as val");
	console.log(rows); //[ {val: 1}, meta: ... ]
	const res = await conn.query("INSERT INTO test (string1) value (?)", [data]);
	console.log(res); // { affectedRows: 1, insertId: 1, warningStatus: 0 }

  } catch (err) {
	throw err;
  } finally {
	if (conn) return conn.end();
  }
}

async function asyncSaveDataBulkDB(arr) {
  let conn;
  try {
	conn = await pool.getConnection();
	conn.batch("INSERT INTO `test` (string1) values (?)", arr)
    .then(res => {
         console.log(res); // 2
    });	 

  } catch (err) {
	throw err;
  } finally {
	if (conn) return conn.end();
  }
}

if (module.parent) {
    module.exports = { asyncSaveDataDB, asyncSaveDataBulkDB }
} else {
    asyncSaveDataBulkDB(['tt6', 'test 8']);
}

Config.js might look like the following:

module.exports = {
  database:{
    host: "185.221.154.249",
	user: "xxxxxxxxx",
	password: "xxxxxxxxx",
	database: 'xxxxxxxxx'
  }
}

Docs on mariaDb with Node.js

Tags MariaDb, Node.js

Свежие записи

Свежие комментарии

Архивы

Рубрики