Categories
Web Scraping Software

Mozenda web scraping and publishing of data to cloud storage

Mozenda is a cloud web scraping service (SaaS), and we’ve already reviewed it. Since our last review, Mozenda has provided more useful utility features for data extraction. Besides multi-threaded extraction & smart data aggregation, Mozenda allows users to publish extracted data to cloud storage such as Dropbox, Amazon, and Microsoft Azure. In this post we will try to explain the new Mozenda extraction and integration capabilities.

Categories
Development

2captcha service to solve reCaptcha v2.0 (python)

In this post we want to show you the code for an automatic connection to 2captcha service for solving google reCaptcha v2.0. Not long ago, google drastically complicated the user-behavior reCaptcha (v2.0). This online service provides a method for solving it.

Categories
Miscellaneous

Octoparse review

octoparse-logoOctoparse is a new modern visual web data extraction software. It provides users a point-&-click UI to develop extraction patterns, so that scrapers can apply these patterns to structured websites. Both experienced and inexperienced users find it easy to use Octoparse to bulk extract information from websites – for most of scraping tasks no coding needed! 

Categories
Uncategorized

Reliable rotating proxies for business directories scrape

We’ve already written about suitable proxy servers for web scraping. Now we want to focus our readers on those for the huge/mass quantities data records scrape, particulary from the business directories. When scraping business directories, their web servers can identify repetitive requesting and put you on hold by looking at the IP address that is used for frequent http requests. Proxy rotation web service is the means for repeatedly changing IP address. Thus, target web server can only see the random IP addresses from rotating proxies pool at each request.

Categories
Development Web Scraping Software

The worthy alternative to dissolving scraping Kimono API

Recently I got notified of Kimono service finishing its work due to kimono team being joining another project. So many data hunters who were using this prominent free API service are now in search for a good alternative. 

Categories
Data Mining

Testing the Filter by TheWebMiner for advanced web content filtering

thewebminer_logoRecently I came across an interesting new tool from TheWebMiner called Filter. The Filter is an attempt by TheWebMiner to sort (categorize) indexed websites and deliver them to users as a content filtering service.

Categories
Featured Web Scraping Software

Dexi.io Review

dexi-medium-height-130pxDexi.io is a powerful scraping suite. This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and runnig them in clouds for user convenience. Yet it includes the API, each scraper being a json definition similar to other services like import.io, kimono lab and parseHub.

Categories
Guest posting

EndCaptcha for fast CAPTCHA solving

endcaptchaFrom time to time, web users struggle with “CAPTCHA services” such as DeCaptcher and DBC. And although those services are reliable, often times they’re “overloaded”, meaning the images to be solved get rejected or it takes a lot of time to be decoded (some services might even take 50 seconds to solve a single image!).

But, I recently came across a new service that hopes to fill this (fast CAPTCHA solving) gap. EndCaptcha.com, is a new image digitization service that was built to satisfy the needs of the most demanding consumers. It uses a dedicated team of operators assisted by a smart OCR system. That’s why it’s being considered a Premium CAPTCHA service. 

Categories
Uncategorized

Writing next generation scraping scripts with Web Robots IDE

http cookie
Most scraping solutions fall into two categories: ­Visual scraping platforms targeted at non-programmers ( Content Grabber, Dexi.io, Import.io, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code.

Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, run and debug.

Categories
Web Scraping Software

Import.io Enter the Enterprise DaaS Market

Import.io Enterprise
Recently, import.io (a free scraping online tool) announced that they are adding another way to get data from the web: they’ll build it for you. This new “Data as a Service” program is targeted at businesses and organizations who need data, but don’t have the time or resources to devote to using the import.io tool to build it themselves. For these clients, import will curate custom datasets based on their specific requirements as well as develop custom data implementation solutions based on the organization’s in-house software.