Author: admin

SEO and Growth Hacking

OutWit: Scrape search results for SEO Audit

Post author By admin
Post date December 3, 2012
No Comments on OutWit: Scrape search results for SEO Audit

In this video, Dale Stokdyk, explains how to scrape Search Engine Results using OutWit Hub with custom scraper.

Tags Google

SaaS

80legs Review – Crawler for rent in the sky

Post author By admin
Post date December 1, 2012
No Comments on 80legs Review – Crawler for rent in the sky

80legs offers a crawling service that allows users to (1) easily compose crawl jobs and (2) cloud run their crawl jobs over the distributed computer network.

The modern web requires you to spend huge amount of processing power to mine it for information. How could a start-up or a small business do comprehensive data crawling without having to build the giant server farms used by major search engines?

Tags crawling, service

Development

Scraping in PHP with cURL

Post author By admin
Post date November 24, 2012
19 Comments on Scraping in PHP with cURL

In this post, I’ll explain how to do a simple web page extraction in PHP using cURL, the ‘Client URL library’.

The curl is a part of libcurl, a library that allows you to connect to servers with many different types of protocols. It supports the http, https and other protocols. This way of getting data from web is more stable with header/cookie/errors process rather than using simple file_get_contents(). If curl() is not installed, you can read here for Win or here for Linux.

Tags Curl, PHP, Regex

SEO and Growth Hacking

How to leverage Web Scraping for SEO

Post author By admin
Post date November 22, 2012
No Comments on How to leverage Web Scraping for SEO

Eppie Vojt at the SEOmoz Meetup on the scrape leverage for the site SEO. Techniques: XPath and Regex in Google Docs to fetch links and more.

Tags Regex, SEO, Xpath

SEO and Growth Hacking

How to leverage Web Scraping for SEO

Post author By admin
Post date November 22, 2012
No Comments on How to leverage Web Scraping for SEO

Eppie Vojt at the SEOmoz Meetup on the scrape leverage for the site SEO. Techniques: XPath and Regex in Google Docs to fetch links and more. The link to the sample Twitter Scraper developed by Eppie Vojt.

Tags Regex, SEO, Xpath

Web Scraping Software

TEST DRIVE: Text list

We’d like to introduce the new SCRAPER TEST DRIVE stage, called ‘Text list‘. This seemingly simple test case hides within itself a non-ordinary structure. This time the HTML DOM structure is so plain, making you scratch your head, wondering how to approach to it. Yet, those off-the-shelf products have shown their best features extracting even a smallest thing from seemingly plain content.

Tags Regex, Xpath

Data Science

Data Mining with Google Refine

Post author By admin
Post date November 13, 2012
No Comments on Data Mining with Google Refine

Google Refine is a free tool for data processing, it standing in line with some other free Google data analysis tools. Because of its close association with web scraping, we want to shed some light on it.

Tags Google, Regex

Miscellaneous

An easy way to backup your MS SQL Server database

Post author By admin
Post date November 8, 2012
No Comments on An easy way to backup your MS SQL Server database

If you use Microsoft SQL Server to store and process your data, you are probably in the market for a convenient backup and restore tool. In this post, I’d like to share a very nice tool for backup and restoring of your MS SQL database in “three clicks”.

Web Scraping Software

TEST DRIVE: Blocks

The next stage in the Scraper Test Drive is to test the scrapers on their ability to parse Block layout. This test evaluates the ability of different scrapers to cope with difficult blocks layouts, especially those in which there is no direct HTML association among the data presented on a screen.

Miscellaneous

Data extraction using iMacros plugin for IE

Post author By admin
Post date October 26, 2012
No Comments on Data extraction using iMacros plugin for IE

In this video i share how you might automate data extraction using the iMacros plugin for IE browser.

The iMacros plugin for IE has the most visual interface compare to equal iMacros plugins for FF or Chrome browsers. Yet, the same macro might be run at the iMacros plugins at any of the browsers. A data extraction is only one of the niches the plugin is of use, see the short description of all its usage here. The code of the macro from the video above you might see down here:

VERSION BUILD=8021970
TAB T=1
TAB CLOSEALLOTHERS
SET !EXTRACT_TEST_POPUP NO
URL GOTO=http://www.londonstockexchange.com/exchange
/prices-and-markets/stocks/indices/summary
/summary-indices-constituents.html?index=AIM1
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:Next
WAIT SECONDS=2
TAG POS=1 TYPE=TABLE ATTR=CLASS:table_dati EXTRACT=TXT
SAVEAS TYPE=EXTRACT FOLDER=c:\iMacros FILE=table.csv
TAG POS=1 TYPE=A ATTR=TXT:1

Tags scraping tool