We’d like to introduce the new SCRAPER TEST DRIVE stage, called ‘Text list‘. This seemingly simple test case hides within itself a non-ordinary structure. This time the HTML DOM structure is so plain, making you scratch your head, wondering how to approach to it. Yet, those off-the-shelf products have shown their best features extracting even a smallest thing from seemingly plain content.
Data Mining with Google Refine
Google Refine is a free tool for data processing, it standing in line with some other free Google data analysis tools. Because of its close association with web scraping, we want to shed some light on it.
If you use Microsoft SQL Server to store and process your data, you are probably in the market for a convenient backup and restore tool. In this post, I’d like to share a very nice tool for backup and restoring of your MS SQL database in “three clicks”.
TEST DRIVE: Blocks
The next stage in the Scraper Test Drive is to test the scrapers on their ability to parse Block layout. This test evaluates the ability of different scrapers to cope with difficult blocks layouts, especially those in which there is no direct HTML association among the data presented on a screen.
In this video i share how you might automate data extraction using the iMacros plugin for IE browser.
The iMacros plugin for IE has the most visual interface compare to equal iMacros plugins for FF or Chrome browsers. Yet, the same macro might be run at the iMacros plugins at any of the browsers. A data extraction is only one of the niches the plugin is of use, see the short description of all its usage here. The code of the macro from the video above you might see down here:
VERSION BUILD=8021970
TAB T=1
TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO
URL GOTO=http://www.londonstockexchange.com/exchange
/prices-and-markets/stocks/indices/summary
/summary-indices-constituents.html?index=AIM1
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:1
Recently, a friend of my asked me for a simple free tool to detect new releases of the phpMyAdmin software. Since I recently did some research on website change tracking, I immediately recommended ChangeDetection.com.
TEST DRIVE: Table Report
In this post, I’ll start to share our experiences with different web scrapers on the Testing Ground project. The first test, which I thought would be the simplest one, proved to be irksome and discouraging. With a struggle, I completed the test drive on Table Report. This test evaluates the ability of different scrapers to cope with difficult tables, like merged tables, missing values and so on.
When I needed to extract dictionary words’ definitions I chose Python and lxml library. In this tutorial, I’ll review the steps of scraping Webster online dictionary using lxml in Python.
When I needed to extract dictionary words’ definitions I chose Python and lxml library. In this tutorial, I’ll review the steps of scraping Webster online dictionary using lxml in Python.