Categories
Miscellaneous

Octoparse review

Octoparse is a new modern visual web data extraction software. It provides users a point-&-click UI to develop extraction patterns, so that scrapers can apply these patterns to structured websites. Both experienced and inexperienced users find it easy to use Octoparse to bulk extract information from websites – for most of scraping tasks no coding needed! 

Categories
Miscellaneous

Data Scraping Studio review

Data Scraping Studio (DSS) is a new free, multi-threading studio for effective data extraction. It consists of two parts: (1) the Google Chrome extension with point-&-click interface to setup a web scraping agent and (2) the Desktop app for executing scraping agents.

Categories
Development

Solve ReCaptcha with Selenium (python)

I’ve already written about how the new No CAPTCHA ReCaptcha works, and even had some success breaking it with an iMacros’ browser automation. But, the latest scraping tools are – for most part – driven by Python, so now I want to try the same experiment with Selenium + Python.

Categories
Guest posting Web Scraping Software

Turn any interactive website into an API with ParseHub

Anyone should be able to pull data from the web and access it in the format they want. If a website does not have an API available, scraping is one of the only options to get the data you need. But figuring out how to scrape data in the complicated HTML is a pain. ParseHub is […]

Categories
Uncategorized

Writing next generation scraping scripts with Web Robots IDE

Most scraping solutions fall into two categories: ­Visual scraping platforms targeted at non-programmers ( Content Grabber, Dexi.io, Import.io, etc.), and scraping code libraries like Scrapy or PhantomJS which require at least some knowledge of how to code. Web Robots builds scraping IDE that fills the gap in between. Code is not hidden but instead made simple to create, […]

Categories
Review

OutWit Hub Review

OutWit Hub is a software providing simple data extraction without requiring any programming skills or advanced technical knowledge. What impressed me about Outwit Hub is its general approach to data gathering: harvest everything (links, text, images, etc.) and, then, let the user choose what is needed (sift by scrapers). The program is apt to browse over links […]

Categories
Development

How to Write a Captcha Solver that uses DeathByCaptcha service

Let’s look at a practical example on how to solve CAPTCHAs using the DeathByCaptcha service. This example is written in C#, but you can get it in Java as well.

Categories
Web Scraping Software

TEST DRIVE: AJAX

The new Web Scraper Testing Drive Stage is on, the AJAX upload. Here we’ll check if the scrapers are able to extract the AJAX supplied data. This is simply not an easy task for the scraper software.

Categories
Data Mining

Clustering in Data Mining

Clustering is a data mining process where data are viewed as points in a multidimensional space. Points that are “close” in this space are assigned to the same cluster.

Categories
Featured Web Scraping Software

OutWit Hub Review

OutWit Hub is a software providing simple data extraction without requiring any programming skills or advanced technical knowledge. What impressed me about Outwit Hub is its general approach to data gathering: harvest everything (links, text, images, etc.) and, then, let the user choose what is needed (sift by scrapers). The program is apt to browse over links […]