Categories
Data Mining Web Scraping Software

Easy Data Visualisation with Silk.co

This post is outdated. The silk.co service is no more awailable.

This is a guest post by Daniel Cave.

With the rise of social media sharing, collaboration and a increasingly interested market for data, there are more and more people wanting to ‘play with data’ and learn using some basics free tools. So recently I’ve been trying to find a technically advanced and interesting combination of free tools to collect and visualise web data that will allow enthusiasts and students to get those all important initial quick and easy wins.

Categories
Development

Tutorial: How to use Headless Firefox for Scraping in Linux

I have already written several articles on how to use Selenium WebDriver for web scraping and all those examples were for Windows. But what about if you want to run your WebDriver-based scraper somewhere on a headless Linux server? For example on a Virtual Private Server with SSH-only access. Here I will show you how to do it in several simple steps.

Categories
Miscellaneous

What is import•io from the user’s point of view?

Import•io is a big data cloud platform that has the ambitious goal of turning the web into a database.  It was founded in March, 2012, and a year later it received $1.3M in seed funding from Wellington PartnersLouis Monier and Emmanuel Javal.

Categories
Development

An Example of Captcha Solver in Java

java_captcha Recently I published an article on how to solve captcha in C# using DeathByCaptcha service, and I promised to offer you an example in other languages as well. In this post I’ll offer a Java project that does the same thing.

Categories
Development

How to improve your scraper with “Bypass CAPTCHA”

If you develop an application for web scraping then it would be really nice to upgrade it with automatic captcha recognition.  “Bypass CAPTCHA” service allows you to do this very easily since its focus is on use in third-party software. In this post I’ll show you how easy it is to extend your scraper using this service.

Categories
Development

How to Write a Captcha Solver that uses DeathByCaptcha service

Let’s look at a practical example on how to solve CAPTCHAs using the DeathByCaptcha service. This example is written in C#, but you can get it in Java as well.

Categories
Web Scraping Software

Captcha Breaker Review

GSA Captcha Breaker is a CAPTCHA solving software. It uses Optical Character Recognition algorithms for CAPTCHA decoding. Being a standalone program it works independently of any online captcha recognition services (like DeathByCaptcha, BypassCaptcha and etc). This means that once you have paid for the program you don’t need to pay for each recognition anymore, and this allows you to save money when you need to recognize a huge amount of CAPTCHAs.

Categories
Web Scraping Software

How to extract emails and phones with GSA Email Spider

email_spider_logoThe task of email extraction is quite popular in the sphere of web scraping. Here I want to present you with a review of the GSA Email Spider, a useful program designed for collecting emails, phones and fax numbers from the web.

Categories
Web Scraping Software

Auto Website Submitter Review

GSA website submitterIn this post I want to offer you a brief review of GSA Auto Website Submitter. This application is designed to submit information about a web page (that includes backlinks, categories, description, etc.) to thousands of directories and dozens of search engines.

Categories
Web Scraping Software

Free Online Web Scrapers

Free online web scrapers are a useful tool for gathering information and putting it into useable form. The contents of a given URL can be placed in a spreadsheet and expanded over time into a data-set. With an online web service, collected data can be merged into a new or existing database.