Categories
Development

Example of Scraping with Selenium WebDriver in C#

In this article I will show you how it is easy to scrape a web site using Selenium WebDriver. I will guide you through a sample project which is written in C# and uses WebDriver in conjunction with the Chrome browser to login on the testing page and scrape the text from the private area of the website.

Downloading the WebDriver

First of all we need to get the latest version of Selenium Client & WebDriver Language Bindings and the Chrome Driver. Of course, you can download WebDriver bindings for any language (Java, C#, Python, Ruby), but within the scope of this sample project I will use the C# binding only. In the same manner, you can use any browser driver, but here I will use Chrome.

After downloading the libraries and the browser driver we need to include them in our Visual Studio solution:csharp-webscraping-solution

Creating the scraping program

In order to use the WebDriver in our program we need to add its namespaces:

using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;

Then, in the main function, we need to initialize the Chrome Driver:

using (var driver = new ChromeDriver())

This piece of code searches for the chromedriver.exe file. If this file is located in a directory different from the directory where our program is executed, then we need to specify explicitly its path in the ChromeDriver constructor.

When an instance of ChromeDriver is created, a new Chrome browser will be started. Now we can control this browser via the driver variable. Let’s navigate to the target URL first:

driver.Navigate().GoToUrl("http://testing-ground.webscraping.pro/login");

Then we can find the web page elements needed for us to login in the private area of the website:

var userNameField = driver.FindElementById("usr");
var userPasswordField = driver.FindElementById("pwd");
var loginButton = driver.FindElementByXPath("//input[@value='Login']");

Here we search for user name and password fields and the login button and put them into the corresponding variables. After we have found them, we can type in the user name and the password  and press the login button:

userNameField.SendKeys("admin");
userPasswordField.SendKeys("12345");
loginButton.Click();

At this point the new page will be loaded into the browser, and after it’s done we can scrape the text we need and save it into the file:

var result = driver.FindElementByXPath("//div[@id='case_login']/h3").Text;
File.WriteAllText("result.txt", result);

That’s it! At the end, I’d like to give you a bonus – saving a screenshot of the current page into a file:

driver.GetScreenshot().SaveAsFile(@"screen.png", ImageFormat.Png);

The complete program listing

using System.IO;
using System.Text;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;

namespace WebDriverTest
{
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize the Chrome Driver
            using (var driver = new ChromeDriver())
            {
                // Go to the home page
                driver.Navigate().GoToUrl("http://testing-ground.webscraping.pro/login");

                // Get the page elements
                var userNameField = driver.FindElementById("usr");
                var userPasswordField = driver.FindElementById("pwd");
                var loginButton = driver.FindElementByXPath("//input[@value='Login']");

                // Type user name and password
                userNameField.SendKeys("admin");
                userPasswordField.SendKeys("12345");

                // and click the login button
                loginButton.Click();

                // Extract the text and save it into result.txt
                var result = driver.FindElementByXPath("//div[@id='case_login']/h3").Text;
                File.WriteAllText("result.txt", result);

                // Take a screenshot and save it into screen.png
                driver.GetScreenshot().SaveAsFile(@"screen.png", ImageFormat.Png);
            }
        }
    }
}

Get the whole project.

Conclusion

I hope you are impressed with how easy it is to scrape web pages using the WebDriver. You can naturally press keys and click buttons as you would in working with the browser. You don’t even need to understand what kind of HTTP requests are sent and what cookies are stored; the browser does all this for you. This makes the WebDriver a wonderful tool in the hands of a web scraping specialist.

19 replies on “Example of Scraping with Selenium WebDriver in C#”

Can u please help me..

While executing the above code I am getting below exception

System.InvalidOperationException occurred
Message=unknown error: unable to discover open pages
(Driver info: chromedriver=2.4.226107,platform=Windows NT 6.1 SP1 x86_64)
Source=WebDriver
StackTrace:
at OpenQA.Selenium.Remote.RemoteWebDriver.UnpackAndThrowOnError(Response errorResponse) in c:\Projects\WebDriver\trunk\dotnet\src\webdriver\Remote\RemoteWebDriver.cs:line 1015
InnerException:

Please help me

Great Example, but the http://testing-ground.webscraping.pro/login page seems to be unavailable.

Additionally, if you want to build this example from scratch using only Nuget packages, you’ll need the following:

Selenium Remote Control (RC)
Selenium WebDriver Support Classes
Selenium WebDriver
Selenium.WebDriver.ChromeDriver (contains the chromedriver.exe which needs to be in the release directory. It contains chromedriver.exe 2.20.0.0 and I’ve found other Nuget packages with different versions)
WebDriver-backed Selenium

I only expected to find out whether or not it was possible to screen scrape with Selenium. I came across your article and it exceeded my expectations! Thank you.

Looks Interesting!!

Here I have one scenario.
We have 10,000+ of similar Web Pages.
We need to test all web pages at a single page. So we have to create a Web App.
We have created a web app , having Selenium DLLs and installed in iis. But when i try to perform tests , no browser is opening (its breaking).

And We can perform testing when we run our web app locally using Visual Studio.
The Web App is developed in ASP.NET MVC5.

Please suggest any solution to deploy in iis and to make it work.

I see your site is outranked by many competitors in google.
You should spy their backlinks and use them for your website and you will hit top2 very fast.
There is useful tool for this, just search google for : rilkim’s tips

Leave a Reply to Marcello Lins Cancel reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.