Free online web scrapers are a useful tool for gathering information and putting it into useable form. The contents of a given URL can be placed in a spreadsheet and expanded over time into a data-set. With an online web service, collected data can be merged into a new or existing database.
Category: Web Scraping Software
Meet new OutWit Hub 4.0!
Like every year, OutWit just released a major upgrade of its extraction program OutWit Hub. This version brings a number of interesting new features and some of them I’m going to cover in this post.
In the good old days when dial-up internet was almost the only way to the riches of the Web, and most of the people had to pay for every minute they spent on the Internet, offline browsers were very popular. But even today you may find such software useful, especially if you need to access a website from a place with limited internet connection (e.g. plane). In this article I will tell you about a wonderful piece of software called Web2Disk which can pack a website onto a CD for you to take wherever you go.
There are two extreme approaches for building a web scraper: to make it highly flexible and customizable but understandable for IT gurus only or to make it nice, simple and handy but limited in usage. All scraping software developers usually try to find a golden mean between these two approaches. In this article I want to introduce you to a relatively new startup, import•io, which says that anyone can scrape any data regardless of his or her IT skills.
As we mentioned before, it’s often necessary to use proxy server when you gather infromation from the web. In this tutorial I’ll show you how to tune Visual Web Ripper to run the web requests through proxy servers.
HideMyAss proxy service has a wonderful feature called “Scheduled IP Change” that automatically changes your IP address at set time intervals. This may help you greatly if you are trying to scrape a website that may block the IP address you use for scraping. But does this feature work as good as it is stated? Recently we have got the following testimony of one of our visitors:
As you scrape information from websites, it’s often necessary to keep your real IP hidden, quickly change your IP or simply access a website from a country that differs from your own. All these tasks are achieved by means of proxies, mediators between you and the target website. Though there are plenty of companies offering such services on the market today, in this post I’ll introduce you to CyberGhost, an affordable and nice looking proxy.
Personally, I prefer using online tools for performing quick manipulation on different data formats like JSON, XML, CSV and so on. They’re platform independent and always within reach of my hand (since I mainly work in a browser). After we published an article about 7 best JSON viewers, I was told about Knowledge Walls, a similar service containing many tools for text data manipulation.
Recently, while surfing the web I stumbled upon an simple web scraping service named Web Scrape Master. It is a kind of RESTful web service that extracts data from a specified web site and returns it to you in JSON format.
As we are talking about web scraping, it would be a pity not to mention Yahoo Pipes, an exciting service provided by Yahoo!. This tool provides users with an intuitive graphical interface to assist them in organizing their favorite feeds and webpages into a single stream of content.