As we are talking about web scraping, it would be a pity not to mention Yahoo Pipes, an exciting service provided by Yahoo!. This tool provides users with an intuitive graphical interface to assist them in organizing their favorite feeds and webpages into a single stream of content. By pulling information from across the internet, Yahoo Pipes enables users to receive all of the information they care about without the unnecessary hassle of navigating between sites.
Built in 2007, this tool has received years of feature-driven development that make it the premier content organization service available on the web today. It takes only a few clicks to set up a new content feed that can aggregate content from your favorite sources and display it in a single stream on your website or homepage. A library of modules which extend the functionality of Yahoo Pipes exists for dedicated users, allowing for incredible customization of the application.
This library has been indexed by a variety of categories that make them easy to navigate even for new users, such as:
Sources are the Yahoo Pipes modules that get data and information from one or several sources on the web. On the screenshot at the left you can see the XPath Fetch Page module that applies XPath expression to any web page.
These modules allow you to add user input into a pipe. You can add either text, location, URL, number, or date.
There are plenty of tools to manipulate the data flow in your pipe. You can find such operators as filter, location extractor, regex, reverse, split, tail, union, web service, count, loop, rename, sort, sub-element, truncate, and unique.
URL builder module is one of the most important Yahoo Pipes modules. It allows you to construct a URL from parts. Some parts you may type in, others you may wire in using Text User Input modules.
The modules in this category are used to either combine or modify the strings. Here they are: string builder, string replace, term extractor, string regex, sub string, and translate.
There are only two modules in this category: date formatter and date builder. The latter converts text to dates while the first one takes the dates and changes them to the desired formats.
This module is able to convert a description of a place into geographical data. It can recognize addresses, zip codes, airport codes, city/country names, and U.S. city/state names.
This module performs simple mathematical operations. It applies math operations to the numbers inputted into it and outputs the result. The operations include addition, subtraction, multiplication, division, modulo, and powers.
Putting all things together
All you need to do is to choose the proper modules and connect them into a pipe:
Also you may browse other pipes made by others. Probably you will find that many tasks are already done for you.