Categories
Uncategorized

Reliable rotating proxies for business directories scrape

We’ve already written about suitable proxy servers for web scraping. Now we want to focus our readers on those for the huge/mass quantities data records scrape, particulary from the business directories. When scraping business directories, their web servers can identify repetitive requesting and put you on hold by looking at the IP address that is used for frequent http requests. Proxy rotation web service is the means for repeatedly changing IP address. Thus, target web server can only see the random IP addresses from rotating proxies pool at each request.

Categories
Development Web Scraping Software

The worthy alternative to dissolving scraping Kimono API

Recently I got notified of Kimono service finishing its work due to kimono team being joining another project. So many data hunters who were using this prominent free API service are now in search for a good alternative. 

Categories
Data Mining

Testing the Filter by TheWebMiner for advanced web content filtering

thewebminer_logoRecently I came across an interesting new tool from TheWebMiner called Filter. The Filter is an attempt by TheWebMiner to sort (categorize) indexed websites and deliver them to users as a content filtering service.

Categories
Featured Web Scraping Software

Dexi.io Review

dexi-medium-height-130pxDexi.io is a powerful scraping suite. This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and runnig them in clouds for user convenience. Yet it includes the API, each scraper being a json definition similar to other services like import.io, kimono lab and parseHub.

Categories
Guest posting

EndCaptcha for fast CAPTCHA solving

endcaptchaFrom time to time, web users struggle with “CAPTCHA services” such as DeCaptcher and DBC. And although those services are reliable, often times they’re “overloaded”, meaning the images to be solved get rejected or it takes a lot of time to be decoded (some services might even take 50 seconds to solve a single image!).

But, I recently came across a new service that hopes to fill this (fast CAPTCHA solving) gap. EndCaptcha.com, is a new image digitization service that was built to satisfy the needs of the most demanding consumers. It uses a dedicated team of operators assisted by a smart OCR system. That’s why it’s being considered a Premium CAPTCHA service. 

Categories
Uncategorized

Writing next generation scraping scripts with Web Robots IDE

http cookie
Most scraping solutions fall into two categories: ­Visual scraping platforms targeted at non-programmers ( Content Grabber, Dexi.io, Import.io, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code.

Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, run and debug.

Categories
Web Scraping Software

Import.io Enter the Enterprise DaaS Market

Import.io Enterprise
Recently, import.io (a free scraping online tool) announced that they are adding another way to get data from the web: they’ll build it for you. This new “Data as a Service” program is targeted at businesses and organizations who need data, but don’t have the time or resources to devote to using the import.io tool to build it themselves. For these clients, import will curate custom datasets based on their specific requirements as well as develop custom data implementation solutions based on the organization’s in-house software.

Categories
Monetize

My Experience in Choosing a Web Scraping Service

 Recently I decided to outsource a web scraping project to another company. I typed “web scraping service” in Google, chose six services from the first two search result pages and sent the project specifications to all of them to get quotes. Eventually I decided to go another way and did not order the services, but my experience may be useful for others who want to entrust web scraping jobs to third party services.

Categories
Miscellaneous

What is import•io from the user’s point of view?

Import•io is a big data cloud platform that has the ambitious goal of turning the web into a database.  It was founded in March, 2012, and a year later it received $1.3M in seed funding from Wellington PartnersLouis Monier and Emmanuel Javal.

Categories
Web Scraping Software

Free Online Web Scrapers

Free online web scrapers are a useful tool for gathering information and putting it into useable form. The contents of a given URL can be placed in a spreadsheet and expanded over time into a data-set. With an online web service, collected data can be merged into a new or existing database.