Categories
Development

Puppeteer Stealth to prevent detection

In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.

Categories
Development

How to check if a target page loads data thru XHR (Ajax)

When performing web scaping I first need to evaluate a site’s difficulty level. That is how difficult is it for the scrape procedures? Do its pages make extra XHR (Ajax) calls? Based on that I choose whether to use (1) Request scraper (eg. Cheerio) or (2) Browser automation scraper (eg. Puppeteer). So, I’ve discovered an […]

Categories
Uncategorized

Chromium Command Line switches

When we use Selenium or Node.js + Puppeteer to run [headless] Chrome/Chromium we might need to add some extra functionality/conditions to launch browsers with. Below you’ll find all kinds of Conditions and their explanations. How to use command line switches? The Chromium Team has made a page on which they briefly explain how to use these switches.

Categories
Development Guest posting Web Scraping Software

Octoparse Alternatives

Let me tell you what you already know! Octoparse is a great web scraping tool! But like every great tool, it’s got its limitations. At times, you may wonder if there are any alternatives to Octoparse. We wondered the same and put together this blog to provide you a short list of Octoparse alternatives along […]

Categories
Development

Selenium Web Scraping in simple words

Question: What is Selenium web scraping? Answer: A picture is better than 1000 words: So, you make a program with Python, PHP, JAVA, Ruby and whatever language you use in order to browse(), select(), click(), submit(), save(), etc.,  target web pages.

Categories
Review

DataFlowKit review

Recently we encountered a new service that helps users to scrape the modern web 2.0. It’s a simple, comfortable, easy to learn service – https://dataflowkit.com Let’s first highlight some of its outstanding features: Visual online scraper tool: point, click and extract. Javascript rendering; any interactive site scrape by headless Chrome run in the cloud Open-source […]

Categories
Development

Scraping JavaScript protected content

Here we come to one new milestone: the JavaScript-driven or JS-rendered websites scrape. Recently a friend of mine got stumped as he was trying to get content of a website using PHP simplehtmldom library. He was failing to do it and finally found out the site was being saturated with JavaScript code. The anti-scrape JavaScript […]

Categories
Challenge

Is there any way to skip CAPTCHA?

JavaScript powered CAPTCHA Most of the answers to the question in internet forums are given by services that automatically solve captchas. They provide services to solve CAPTCHA rather than to fully skip it.

Categories
Development Guest posting

Captcha solving with Java and why you should avoid it

In this blog post we are going to show how you can solve [Re]captcha with Java and some third party APIs, and why you should probably avoid them in the first place. For the Python code (+ captcha API) see that post. The post author is Kevin Sahin from ScrapingNinja.co. Captcha solving “Completely Automated Public Turing test to tell Computers and […]

Categories
Development

Is there any way to skip CAPTCHA?

  Is there a way to skip CAPTCHA?