Tag: headless

Headless Chrome detection and anti-detection

Post author By admin
Post date January 29, 2021
No Comments on Headless Chrome detection and anti-detection

In the post we summarize how to detect the headless Chrome browser and how to bypass the detection. The headless browser testing should be a very important part of todays web 2.0. If we look at some of the site’s JS, we find them to checking on many fields of a browser. They are similar to those collected by fingerprintjs2.

So in this post we consider most of them and show both how to detect the headless browser by those attributes and how to bypass that detection by spoofing them.

See the test results of disguising the browser automation for both Selenium and Puppeteer extra.

Tags anti-scrape, headless, Javascript, scrape detection, scrape protection

Review

DataFlowKit review

Recently we encountered a new service that helps users to scrape the modern web 2.0. It’s a simple, comfortable, easy to learn service – https://dataflowkit.com
Let’s first highlight some of its outstanding features:

Visual online scraper tool: point, click and extract.
Javascript rendering; any interactive site scrape by headless Chrome run in the cloud
Open-source back-end
Scrape a website behind a login form
Web page interactions: Input, Click, Wait, Scroll, etc.
Proxy support, incl. Geo-target proxying
Scraper API
Follow the direction of robots.txt
Export results to Google drive, DropBox, MS OneDrive.

Tags headless, service, web scraping

Development

Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

Post author By admin
Post date October 8, 2019
2 Comments on Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

In the post we share the practical implementation (code) of the Xing companies scrape project using Node.js, Puppeteer and the Apify library. The first post, describing the project objectives, algorithm and results, is available here.

The scrape algorithm you can look at here.

Tags business directory, crawling, headless, Node.js

Development

Using Modern Tools such as Node.js, Puppeteer, Apify for Web Scraping (Xing scrape)

Post author By admin
Post date August 23, 2019
No Comments on Using Modern Tools such as Node.js, Puppeteer, Apify for Web Scraping (Xing scrape)

I want to share with you the practical implementation of modern scraping tools for scraping JS-rendered websites (pages loaded dynamically by JavaScript). You can read more about scraping JS rendered content here.

Tags business directory, headless, Node.js

Development

Headless browser python scraper at pythonanywhere

Post author By admin
Post date February 13, 2017
No Comments on Headless browser python scraper at pythonanywhere

Recently I decided to work with pythonanywhere.com for running python scripts on JS stuffed websites.

Originally I tried to leverage the dryscrape library, but I failed to do it, and a nice support explained to me: “…unfortunately dryscrape depends on WebKit, and WebKit doesn’t work with our virtualisation system.”

Tags headless, Python