Categories
Web Scraping Software

HttpWatch Review

HttpWatch is a developer’s tool that captures a wide range of HTTP related data. It helps users to watch precisely which HTTP traffic is triggered when one accesses a web page. The program integrates as an Internet Explorer and a Firefox plugin. This plugin works perfectly to show custom website loads and performance. The full list of the network analysis tools is available here.  

Categories
Development

HTTP Hyper Text Transfer Protocol (2)

This post is the continuation of the first post on HTTP, the hypertext web transfer protocol.

Categories
Development

HTTP Hyper Text Transfer Protocol (1)

HTTP is a client-server text data transfer protocol being used in various applications.

Categories
Web Scraping Software

Microsys A1 Website Scraper Review

The A1 scraper by Microsys is a program that is mainly used to scrape websites to extract data in large quantities for later use in webservicesThe scraper works to extract text, URLs etc., using multiple Regexes and saving the output into a CSV file. This tool is can be compared with other web harvesting and web scraping services.

Categories
Development

Email validation Regexes

Now we want to review some email validation Regexes. We’ve chosen Regexes based on readability, complexity and RFC standarts relevance. For online Regex testing tools refer here.

Categories
Development Web Scraping Software

7+ Best JSON Viewers

In this post we share on json viewers both as online tools and as plugins for browsers and Notepad++ editor.

Categories
Uncategorized

How to alarm of your site being illegally scraped

Have you encountered the issue of your site being scraped and your online content being infringed? Yes, you’ve warned your content abuser with no response or you have received just some excuses. But, after Google indexing, your content does not stick out of the similar content heap of stolen material in search results? What can one do to set an alarm and enforce some consequences or even punishment? 

Categories
Development

Exception handling in php scrapers

Suppose we want to set only one exception handler function for all exceptions in the scraper program. This exception handler might be working for a multi-level program. Here is how it works in PHP.

Categories
Data Mining

Distributed File System Implementations and MapReduce strategy

We have already mentioned the MapReduce distributed computation style in data analysis for computing clusters in the previous post. Here we want to touch more on the matter of implementation of this strategy for distributed hardware.

Categories
Review

Inspyder Power Search Review

Inspyder Power Search is a crawling and scraping application which is more for straightforward scraping, using both XPath and Regex. The program has a simple, nice interface making it easy to learn and employ it.

Inspyder is designed for multiple purposes: