Categories
Development

Example of Scraping with Selenium WebDriver in C#

In this article I will show you how it is easy to scrape a web site using Selenium WebDriver. I will guide you through a sample project which is written in C# and uses WebDriver in conjunction with the Chrome browser to login on the testing page and scrape the text from the private area of the website.

Categories
Web Scraping Software

Offline Browsing with Web2Disk

In the good old days when dial-up internet was almost the only way to the riches of the Web, and most of the people had to pay for every minute they spent on the Internet, offline browsers were very popular. But even today you may find such software useful, especially if you need to access a website from a place with limited internet connection (e.g. plane). In this article I will tell you about a wonderful piece of software called Web2Disk which can pack a website onto a CD for you to take  wherever you go.

Categories
Uncategorized

What is Selenium WebDriver?

If you are interested in browser automation or web application testing you may have already heard of Selenium. Since there is a lot of terminology related to this framework, it is easy for you to get lost, especially if you come to Selenium for the first time. In this article I want to save your day by providing a short and clear explanation of what is what in the Selenium project.

What is Selenium?

Selenium is a web application testing framework that allows you to write tests in many programming languages like  Java, C#, Groovy, Perl, PHP, Python and Ruby. Selenium deploys on Windows, Linux, and MAC OS.

It is an open-source project, released under the Apache 2.0 license, so you can download and use it without charge.

Categories
Web Scraping Software

Import•io: the First Impression

There are two extreme approaches for building a web scraper: to make it highly flexible and customizable but understandable for IT gurus only or to make it nice, simple and handy but limited in usage. All scraping software developers usually try to find a golden mean between these two approaches. In this article I want to introduce you to a relatively new startup, import•io, which says that anyone can scrape any data regardless of his or her IT skills.

Categories
Development

SQL Dump Splitter

This nice free program saved my day when I had to transfer a big database from one MySql Server to another. If you need to restore a big database using phpMyAdmin you will find it usefull as well.

Categories
Web Scraping Software

How to use proxy in Visual Web Ripper

As we mentioned before, it’s often necessary to use proxy server when you gather infromation from the web. In this tutorial I’ll show you how to tune Visual Web Ripper to run the web requests through proxy servers.

Categories
Web Scraping Software

Does HideMyAss’s “Scheduled IP Change” feature really work?

HideMyAss Scheduled IP ChangeHideMyAss proxy service has a wonderful feature called “Scheduled IP Change” that automatically changes your IP address at set time intervals. This may help you greatly if you are trying to scrape a website that may block the IP address you use for scraping. But does this feature work as good as it is stated? Recently we have got the following testimony of one of our visitors:

Categories
Development

Using Regex Lookaround for HTML element extraction

Yes, I’m aware that using regex for HTML parsing is not the best idea. But still when I need to quickly extract some small portion of a web page I find myself applying regex more often than executing an XPath query, and its lookahead and lookbehind constructions may be quite helpful.

Categories
Uncategorized

About Proxy Servers

It’s frequently required to have your actual IP address hidden when doing web scraping or, alternately, to access the website from different counties. That’s why we have anonymizers, also called anonymous proxies. These days, it is possible to find an abundance of proxy software and services. Following is a general summary of the fundamentals of proxy:

Categories
Web Scraping Software

CyberGhost: a nice freemium proxy

As you scrape information from websites, it’s often necessary to keep your real IP hidden, quickly change your IP or simply access a website from a country that differs from your own. All these tasks are achieved by means of proxies, mediators between you and the target website. Though there are plenty of companies offering such services on the market today, in this post I’ll introduce you to CyberGhost, an affordable and nice looking proxy.