In this post, I’ll explain how to do a simple web page extraction in PHP using cURL, the ‘Client URL library’.
The curl is a part of libcurl, a library that allows you to connect to servers with many different types of protocols. It supports the http, https and other protocols. This way of getting data from web is more stable with header/cookie/errors process rather than using simple file_get_contents(). If curl() is not installed, you can read here for Win or here for Linux.
Month: November 2012
Eppie Vojt at the SEOmoz Meetup on the scrape leverage for the site SEO. Techniques: XPath and Regex in Google Docs to fetch links and more.
Eppie Vojt at the SEOmoz Meetup on the scrape leverage for the site SEO. Techniques: XPath and Regex in Google Docs to fetch links and more. The link to the sample Twitter Scraper developed by Eppie Vojt.
TEST DRIVE: Text list
We’d like to introduce the new SCRAPER TEST DRIVE stage, called ‘Text list‘. This seemingly simple test case hides within itself a non-ordinary structure. This time the HTML DOM structure is so plain, making you scratch your head, wondering how to approach to it. Yet, those off-the-shelf products have shown their best features extracting even a smallest thing from seemingly plain content.
Data Mining with Google Refine
Google Refine is a free tool for data processing, it standing in line with some other free Google data analysis tools. Because of its close association with web scraping, we want to shed some light on it.
If you use Microsoft SQL Server to store and process your data, you are probably in the market for a convenient backup and restore tool. In this post, I’d like to share a very nice tool for backup and restoring of your MS SQL database in “three clicks”.
TEST DRIVE: Blocks
The next stage in the Scraper Test Drive is to test the scrapers on their ability to parse Block layout. This test evaluates the ability of different scrapers to cope with difficult blocks layouts, especially those in which there is no direct HTML association among the data presented on a screen.