Categories
Development

Puppeteer Stealth to prevent detection

In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.

Categories
Development

Headless Chrome detection and anti-detection

In the post we summarize how to detect the headless Chrome browser and how to bypass the detection. The headless browser testing should be a very important part of todays web 2.0. If we look at some of the site’s JS, we find them to checking on many fields of a browser. They are similar to those collected by fingerprintjs2.

So in this post we consider most of them and show both how to detect the headless browser by those attributes and how to bypass that detection by spoofing them.

See the test results of disguising the browser automation for both Selenium and Puppeteer extra.

Categories
Development Guest posting

Captcha solving with Java and why you should avoid it

In this blog post we are going to show how you can solve [Re]captcha with Java and some third party APIs, and why you should probably avoid them in the first place.
For the Python code (+ captcha API) see that post.

The post author is Kevin Sahin from ScrapingNinja.co.

Captcha solving

“Completely Automated Public Turing test to tell Computers and Humans Apart” is what captcha stands for. Captchas are used to prevent bots from accessing and performing actions on websites or applications.

The last one is the most used captcha mechanism, Google ReCaptcha v2. That’s why we are going to see how to “break” these captchas.

Categories
Uncategorized

How to detect your site is being scraped?

scrape_detectIn the age of the modern web there are a lot of data hunters people who want to take the data that is on your website and re-use it. The reasons someone might want to scrape your site are incredibly varied, but regardless it is important for website owners to know if it is happening. You need to be able to identify any illegal bots and take necessary action to make sure they aren’t bringing down your site.

Categories
Miscellaneous Web Scraping Software

7 Ways to Protect Website from Scraping and How to Bypass this Protection

stop-scrape In this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.

Categories
Uncategorized

7 Ways to Protect Website from Scraping and How to Bypass this Protection (2)

stop-scrapeIn this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.

If you are interesting of how to find out if your site is being scraped, then turn to this post: How to detect your site is being scraped?
Categories
Web Scraping Software

Scrape detection and how Visual Web Ripper can help deal with this problem

Recently we have encountered the web scrape detection issues in some of our projects. So as we’ve consulted with the Sequentum developers we present to you some points on this topic. Here are a few lines about web scraping detection and how Visual Web Ripper can help deal with this problem.

Categories
Uncategorized

How to alarm of your site being illegally scraped

Have you encountered the issue of your site being scraped and your online content being infringed? Yes, you’ve warned your content abuser with no response or you have received just some excuses. But, after Google indexing, your content does not stick out of the similar content heap of stolen material in search results? What can one do to set an alarm and enforce some consequences or even punishment? 

Categories
Uncategorized

Distil: Scrape Bot Protection Test

The anti scrape bot service test has been my focus for some time now. How well can the Distil service protect the real website from scrape? The only answer comes from an actual active scrape. Here I will share the log results and conclusion of the test. In the previous post we briefly reviewed the service’s features, and now I will do the live test-drive analysis.

Categories
Review

Distil Review: Anti-Scrape-Bot Service

Are you thinking of protecting your website content from theft and nonlegal scraping? Are you suspecting that some ‘innocent bots’ are continually visiting your web pages for data retrieval? Now we come to the anti scraping bot software and services. In this post we want to briefly review the new anti scrape bot service called Distil