Tag: Node.js

Crawlee library for fast crawler composure

Post author By admin
Post date December 5, 2024
No Comments on Crawlee library for fast crawler composure

Crawlee is a free web scraping & browser
automation library fitting for composing Node.js (and Python) crawlers.

Tags crawling, library, Node.js, Playwright, Puppeteer, Python

Development

Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

Post author By admin
Post date February 8, 2024
2 Comments on Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

In the post we share the practical implementation (code) of the Xing companies scrape project using Node.js, Puppeteer and the Apify library. The first post, describing the project objectives, algorithm and results, is available here.

The scrape algorithm you can look at here.

Tags business directory, crawling, headless, Node.js

Development

Using Modern Tools such as Node.js, Puppeteer, Apify for Web Scraping (Xing scrape)

Post author By admin
Post date January 23, 2024
No Comments on Using Modern Tools such as Node.js, Puppeteer, Apify for Web Scraping (Xing scrape)

I want to share with you the practical implementation of modern scraping tools for scraping JS-rendered websites (pages loaded dynamically by JavaScript). You can read more about scraping JS rendered content here.

Tags business directory, headless, Node.js

Development

Node.js to automate a browser XHR (Ajax)

Post author By admin
Post date September 23, 2023
No Comments on Node.js to automate a browser XHR (Ajax)

Lately I needed to scrape some data that are dynamically loaded by “Load more” button. A website JavaScript invokes XHR (or Ajax request) to fetch a next data portion. So, the need was to re-run those XHR with some POST parameters as variables.

So, how to make it in Node.js?

Tags automation, Node.js

Challenge Development

Node.js, Python & Ruby Bots Zoo repo

Post author By admin
Post date March 8, 2023
No Comments on Node.js, Python & Ruby Bots Zoo repo

Today, I got in touch with the Node.js [and Python] bots garden/zoo providing modern bots with different kinds of browsers (Firefox, Chrome, Headless/not headless) using different automation frameworks (Puppeteer, Selenium, Playwright) in several programming languages.

Tags Node.js, Python, scrape detection

Development

Puppeteer async scraper with browsers number to be tuned based on CPU capacity

Post author By admin
Post date February 9, 2023
1 Comment on Puppeteer async scraper with browsers number to be tuned based on CPU capacity

Recently we’ve got a tricky website of dynamic content to scrape. The data are loaded thru XHRs into each part of the DOM (HTML markup). So, the task was to develop an effective scraper that does async while using reasonable CPU recourses.

Tags automation, browser-automation, Javascript, Node.js

Development

MERN Stack – Build a Film Hall Application

Post author By admin
Post date December 14, 2022
No Comments on MERN Stack – Build a Film Hall Application

What is MERN?

The MERN stack is a set of frameworks and tools used for developing a software product. They are very specifically chosen to work together in creating a well-functioning software (see a MERN app code at the post bottom).

Tags Node.js, React.js

Development

Redirect Node.js console output into file

Post author By admin
Post date March 22, 2021
No Comments on Redirect Node.js console output into file

node.exe index.js > scrape.log 2>&1

When executing file index.js we redirect all the console.log() output from console into a file scrape.log .

Tags Node.js

Development

Node.js Cheerio scraper, replace element

Post author By admin
Post date February 23, 2021
No Comments on Node.js Cheerio scraper, replace element

let table = $('table');
if ($(table).has('br')) {  				     
    $("br").replaceWith(" ");
}

Tags Cheerio, Node.js

Development

Puppeteer Stealth to prevent detection

Post author By admin
Post date February 5, 2021
No Comments on Puppeteer Stealth to prevent detection

In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.

Tags Node.js, Puppeteer, scrape detection