Categories
Challenge

Is there any way to skip CAPTCHA?

JavaScript powered CAPTCHA

Most of the answers to the question in internet forums are given by services that automatically solve captchas. They provide services to solve CAPTCHA rather than to fully skip it.

Categories
Monetize

Octoparse: how to extract GPS coordinates from Google Maps

octoparseHave you ever thought you could make money by knowing how many restaurants there are in a square mile? There is no free lunch, however, if you know how to use Google Maps, you can extract and collect restaurants’ GPS’s and store them in your own database. With that information in hand and some math calculations, you are off to creating a big data online service.

Categories
Guest posting Miscellaneous

Data extraction: web crawling vs. web scraping in E-commerce

Nowadays, when one has some questions, it comes almost naturally for us to just type it in a search bar and get helpful answers. But we rarely wonder how all that information is available and how it appears as soon as we start typing. Search engines provide easy access to information, but web crawling and scraping tools, which are not such well-known players, have a crucial role in wrapping up online content.

Categories
Uncategorized

Crawler vs Scraper vs Parser

In the post we share the differences between Crawler, Scraper and Parser.

Categories
Development

Simple JAVA email crawler

In this post we share the code of a simple Java email crawler. It crawls emails of a given website, with an infinite crawling depth. A previous post showed us Python simple email crawler.

Categories
Uncategorized

Death By Captcha new feature Recaptcha v3 support

dbc-logo1After a great deal of work, the Death By Captcha developers have finally released their new feature to the world – new Recaptcha v3 Support.

As you may already know, the Recaptcha v3 API is quite similar in many ways to the previous one used to manage tokens (Recaptcha v2). In Recaptcha v3, the system evaluates or scores each user to determine if it’s bot or human, then it uses the score value to decide if it will accept or not the requests from said user. Lower scores are identified as bots. Check this link to verify the API documentation and download client based sample codes.

With very competitive pricing, Death By Captcha is at the cutting edge of solving tools in the market. Check it out –  you can receive free credit for testing from this LINK; ping the service with the promo code below to receive your captchas.

Use the promo code “Scrapepro” and you’ll get 3k Captchas credit for free.

P. S. See the ReCaptcha v2 test results.

Categories
Web Scraping Software

The present trends in web scraping tools

 

Recently I got a question from one of the blog readers. After I replied to it, I decided to share it with a wider audience.
Question:

Hi,

I found your [web]scraping.pro site and found it very helpful, then realized the web scraper solutions rating was from 2014.  What is the best solution for today?   I have lots of sites I need to scrape, mainly search then drill-down sites.   I would like to be able to schedule the scraping to run on a daily basis.  Is there a direction you could point me?  I’m a seasoned developer by trade but am seeing all these point and click solutions (e.g. import.io) and am wondering if I should stick with Node.JS or .NET or if I should investigate some of these GUI scrapers of today.
Categories
Development

Node.js, Puppeteer, Apify for Web Scraping (Xing scrape) – part 2

In the post we share the practical implementation (code) of the Xing companies scrape project using Node.js, Puppeteer and the Apify library. The first post, describing the project objectives, algorithm and results, is available here.

The scrape algorithm you can look at here.

Categories
Miscellaneous

Huge JSON files view and search tool with excellent performance

Dadroit JSON Viewer LogoThe results of scraping activities are most often stored as json data, the latter having many advantages over .xml or .csv formats. Recently in one of my projects, I had to deal with JSON files of over 6Mb. Even though I managed them in Notepad++, still the proper search and count could have been better.

Categories
Miscellaneous

Endcaptcha now solving Recaptcha V2!

endcaptchaSo far the latest developments of the services that develop captchas  (google, nucaptcha, etc.) are no match for the captcha bypassers, and Endcaptcha is living proof of it.
Endcaptcha developers have been working hard to make this new feature possible – they’re finally releasing Recaptcha V2 support!