Categories
Web Scraping Software

TEST DRIVE: Text list

We’d like to introduce the new SCRAPER TEST DRIVE stage, called ‘Text list‘. This seemingly simple test case hides within itself a non-ordinary structure. This time the HTML DOM structure is so plain, making you scratch your head, wondering how to approach to it. Yet, those off-the-shelf products have shown their best features extracting even a smallest thing from seemingly plain content.

Categories
Data Mining

Data Mining with Google Refine

Google Refine is a free tool for data processing, it standing in line with some other free Google data analysis tools. Because of its close association with web scraping, we want to shed some light on it. 

Categories
Miscellaneous

An easy way to backup your MS SQL Server database

If you use Microsoft SQL Server to store and process your data, you are probably in the market for a convenient backup and restore tool. In this post, I’d like to share a very nice tool for backup and restoring of your MS SQL database in “three clicks”.

Categories
Web Scraping Software

TEST DRIVE: Blocks

The next stage in the Scraper Test Drive is to test the scrapers on their ability to parse Block layout. This test evaluates the ability of different scrapers to cope with difficult blocks layouts, especially those in which there is no direct HTML association among the data presented on a screen. 

Categories
Miscellaneous

Data extraction using iMacros plugin for IE

In this video i share how you might automate data extraction using the iMacros plugin for IE browser.

The iMacros plugin for IE has the most visual interface compare to equal iMacros plugins for FF or Chrome browsers. Yet, the same macro might be run at the iMacros plugins at any of the browsers. A data extraction is only one of the niches the plugin is of use, see the short description of all its usage here. The code of the macro from the video above you might see down here:

VERSION BUILD=8021970
TAB T=1
TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO
URL GOTO=http://www.londonstockexchange.com/exchange
/prices-and-markets/stocks/indices/summary
/summary-indices-constituents.html?index=AIM1
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:1
Categories
Web Scraping Software

How to monitor phpMyAdmin new release

Recently, a friend of my asked me for a simple free tool to detect new releases of the phpMyAdmin software. Since I recently did some research on website change tracking, I immediately recommended ChangeDetection.com.

Categories
Web Scraping Software

TEST DRIVE: Table Report

In this post, I’ll start to share our experiences with different web scrapers on the Testing Ground project. The first test, which I thought would be the simplest one, proved to be irksome and discouraging. With a struggle, I completed the test drive on Table Report. This test evaluates the ability of different scrapers to cope with difficult tables, like merged tables, missing values and so on.

Categories
Development

How to scrape an online dictionary using Python and lxml library

When I needed to extract dictionary words’ definitions I chose Python and lxml library. In this tutorial, I’ll review the steps of scraping Webster online dictionary using lxml in Python.

Categories
Development

How to scrape an online dictionary using Python and lxml library

When I needed to extract dictionary words’ definitions I chose Python and lxml library. In this tutorial, I’ll review the steps of scraping Webster online dictionary using lxml in Python.

Categories
Web Scraping Software

XPather Review

A developer often needs to build XPath of a web element. XPather, a FireFox add-on, comes to the rescue. XPather quickly generates XPath of an element, allowing more functionality too.