Handy Web Extractor

Handy Web ExtractorHandy Web Extractor is a simple tool for everyday web content monitoring. It will periodically download the web page, extract the necessary content and display it in the window on your desktop. One may consider it as the data extraction software, taking its own nitch in the scraping software and plugins.

What is Crawlera?

I came across this tool a few weeks ago, and wanted to share it with you. So far I have not tested it myself, but it is a simple concept- Safely download web pages without the fear of overloading websites or getting banned. You write a crawler script using scruping hub, and they will run through there IP proxies and take care of the technical problems of crawling.

Scrape with Google App Script

In this post I want to let you how I ve managed to complete the challenge of scraping a site with Google Apps Script (GAS).

Extracting sequential HTML elements with XPath and Regex

Often, we need to extract some HTML elements ordered sequentially rather than in hierarhical order.

About XPath

XPath is a formal language that is used to navigate through and query elements and attributes in XML documents. While this notation is being used in XSL and XQuery, it is very useful for DOM data access and extraction. XML documents and also HTML/XHTML documents are objects of DOM parsing while using XPath.

Pros and Cons of using Selenium WebDriver for Website Scraping

Since Selenium WebDriver is created for browser automation, it can be easily used for scraping data from the web. In this post we will consider some advantages and drawbacks of using WebDriver for web scraping.

What is Web Scraping?

Web scraping (a.k.a. web data mining, web data processing, web data extraction, web content extraction, web harvesting, web screen scraping, web crawling, web ripping, web content extraction, etc.) is a process of extracting useful information from the web.

How to scrape Yellow Pages with ScreenScraper Chrome Extension

Recently I was asked to help with the job of scraping company information from the Yellow Pages website using the ScreenScraper Chrome Extension. After working with this simple scraper, I decided to create a tutorial on how to use this Google Chrome Extension for scraping pages similar to this one. Hopefully, it will be useful to many of you.

Cookie browser-server workflow

Dexi Pipes

dexi-pipes-logoToday I want to share my experience with Dexi Pipes. Pipes is a new kind of robot introduced by Dexi.io to integrate web data extraction and web data processing into a single seamless workflow. The main focus of the testing is to show how Dexi might leverage multi-threaded jobs for extraction of data from a retail website.
NB Pipes robots are available starting from PROFESSIONAL plans.