Captcha solving with Java and why you should avoid it

In this blog post we are going to show how you can solve [Re]captcha with Java and some third party APIs, and why you should probably avoid them in the first place.
For the Python code (+ captcha API) see that post.

The post author is Kevin Sahin from ScrapingNinja.co.

Captcha solving

“Completely Automated Public Turing test to tell Computers and Humans Apart” is what captcha stands for. Captchas are used to prevent bots from accessing and performing actions on websites or applications.

The last one is the most used captcha mechanism, Google ReCaptcha v2. That’s why we are going to see how to “break” these captchas.

How to detect your site is being scraped?

scrape_detectIn the age of the modern web there are a lot of data hunters people who want to take the data that is on your website and re-use it. The reasons someone might want to scrape your site are incredibly varied, but regardless it is important for website owners to know if it is happening. You need to be able to identify any illegal bots and take necessary action to make sure they aren’t bringing down your site.

7 Ways to Protect Website from Scraping and How to Bypass this Protection

stop-scrape In this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.

7 Ways to Protect Website from Scraping and How to Bypass this Protection (2)

stop-scrapeIn this article I’d love to revise few well-known methods of protecting website content from automatic scraping. Each one has its advantages and disadvantages, so you need to make your choice basing on the particular situation. None of these methods is ultimate and each one has its own ways around I will mention further.

If you are interesting of how to find out if your site is being scraped, then turn to this post: How to detect your site is being scraped?

Scrape detection and how Visual Web Ripper can help deal with this problem

Recently we have encountered the web scrape detection issues in some of our projects. So as we’ve consulted with the Sequentum developers we present to you some points on this topic. Here are a few lines about web scraping detection and how Visual Web Ripper can help deal with this problem.

How to alarm of your site being illegally scraped

Have you encountered the issue of your site being scraped and your online content being infringed? Yes, you’ve warned your content abuser with no response or you have received just some excuses. But, after Google indexing, your content does not stick out of the similar content heap of stolen material in search results? What can one do to set an alarm and enforce some consequences or even punishment? 

Distil: Scrape Bot Protection Test

The anti scrape bot service test has been my focus for some time now. How well can the Distil service protect the real website from scrape? The only answer comes from an actual active scrape. Here I will share the log results and conclusion of the test. In the previous post we briefly reviewed the service’s features, and now I will do the live test-drive analysis.

Distil Review: Anti-Scrape-Bot Service

Are you thinking of protecting your website content from theft and nonlegal scraping? Are you suspecting that some ‘innocent bots’ are continually visiting your web pages for data retrieval? Now we come to the anti scraping bot software and services. In this post we want to briefly review the new anti scrape bot service called Distil