In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.
In the post we share the practical implementation (code) of the Xing companies scrape project using Node.js, Puppeteer and the Apify library. The first post, describing the project objectives, algorithm and results, is available here. The scrape algorithm you can look at here.
How to handle cookie, user-agent, headers when scraping with JAVA? We’ll use for this a static class ScrapeHelper that easily handles all of this. The class uses Jsoup library methods to fetch from data from server and parse html into DOM document.
In a previous post we’ve considered the ways to disguise an automated Chrome browser by spoofing some of its parameters – Headless Chrome detection and anti-detection. Here we’ll share the practical results of Fingerprints testing against a benchmark for both human-operated and automated Chrome browsers.
In the post we summarize how to detect the headless Chrome browser and how to bypass the detection. The headless browser testing should be a very important part of todays web 2.0. If we look at some of the site’s JS, we find them to checking on many fields of a browser. They are similar […]
When performing web scaping I first need to evaluate a site’s difficulty level. That is how difficult is it for the scrape procedures? Do its pages make extra XHR (Ajax) calls? Based on that I choose whether to use (1) Request scraper (eg. Cheerio) or (2) Browser automation scraper (eg. Puppeteer). So, I’ve discovered an […]
When we use Selenium or Node.js + Puppeteer to run [headless] Chrome/Chromium we might need to add some extra functionality/conditions to launch browsers with. Below you’ll find all kinds of Conditions and their explanations. How to use command line switches? The Chromium Team has made a page on which they briefly explain how to use these switches.