As we already showed you the example of using WebDriver with C#, in this post we will see how to extract web data using Selenium WebDriver with Java, the native language of Selenium WebDriver.
Selenium is an open source tool for Web Automation, it provides APIs though which we can perform user events programmatically. For more information about Selenium read here.
To extract data using Selenium you will need to install the following tools and libraries to your computer; if you have them already you can simply skip the following two sections
1. Download Selenium WebDriver
From http://www.seleniumhq.org/download/ you need to download the following libraries and extract them somewhere on your computer:
- Selenium server
(http://selenium.googlecode.com/files/selenium-server-standalone-2.38.0.jar ) - Selenium client & Webdriver language binding for JAVA
(http://selenium.googlecode.com/files/selenium-java-2.38.0.zip)
2. Install Eclipse Java IDE and configure Selenium WebDriver
- Open http://www.eclipse.org/downloads/packages/eclipse-ide-java-developers/keplersr1 and download the ZIP specific to your operating system
- Extract the zip file to some location and create the shortcut to “eclipse.exe”
Create the JAVA project and configure Selenium:
- Open the eclipse IDE from the short cut and create the new java project by navigating to File > New > Java Project
- Provide the project name and click on the “Finish” button as it is shown in the image below:
Then you need to provide paths to the libraries you recently downloaded:
- Right click on the project we created and go to Build Path > Configure Build Path
- Open the libraries tab and click on the “add external jars..” and provide the mentioned JARs:
- Now, if you have done everything correctly, in the reference libraries you will be able to see those two JAR files as in the image:
Now we are ready to start writing the program.
3. Write the Scraper
I’ll show you a program that does the following:
- Opens the Firefox
- Goes to http://testing-ground.scraping.pro/login
- Submit the form using username and password
- Extracts the message text and saves it to status.txt file
- Takes a screenshot of the website and saves it to screenshot.png file
- Closes the Firefox
Here I provide for you the complete code of the program and below it you can find the explanation of how it works.
import java.io.*; import org.apache.commons.io.FileUtils; import org.openqa.selenium.By; import org.openqa.selenium.OutputType; import org.openqa.selenium.TakesScreenshot; import org.openqa.selenium.WebDriver; import org.openqa.selenium.WebElement; import org.openqa.selenium.firefox.FirefoxDriver; public class WebScrapper { public WebDriver driver = new FirefoxDriver(); /** * Open the test website. */ public void openTestSite() { driver.navigate().to("http://testing-ground.scraping.pro/login"); } /** * * @param username * @param Password * * Logins into the website, by entering provided username and * password */ public void login(String username, String Password) { WebElement userName_editbox = driver.findElement(By.id("usr")); WebElement password_editbox = driver.findElement(By.id("pwd")); WebElement submit_button = driver.findElement(By.xpath("//input[@value='Login']")); userName_editbox.sendKeys(username); password_editbox.sendKeys(Password); submit_button.click(); } /** * grabs the status text and saves that into status.txt file * * @throws IOException */ public void getText() throws IOException { String text = driver.findElement(By.xpath("//div[@id='case_login']/h3")).getText(); Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("status.txt"), "utf-8")); writer.write(text); writer.close(); } /** * Saves the screenshot * * @throws IOException */ public void saveScreenshot() throws IOException { File scrFile = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE); FileUtils.copyFile(scrFile, new File("screenshot.png")); } public void closeBrowser() { driver.close(); } public static void main(String[] args) throws IOException { WebScrapper webSrcapper = new WebScrapper(); webSrcapper.openTestSite(); webSrcapper.login("admin", "12345"); webSrcapper.getText(); webSrcapper.saveScreenshot(); webSrcapper.closeBrowser(); } }
4. How it works
In this section I will explain the purpose of all the functions in the code.
openTestSite() launches the Firefox browser and opens the test website:
public void openTestSite() { driver.navigate().to("http://testing-ground.scraping.pro/login"); }
login() enters the provided username and password into the corresponding fields and submits the form. It does this by searching the form field elements on the page by their HTML Ids and by sending characters to those elements:
public void login(String username, String Password) { WebElement userName_editbox = driver.findElement(By.id("usr")); WebElement password_editbox = driver.findElement(By.id("pwd")); WebElement submit_button = driver.findElement(By.xpath("//input[@value='Login']")); userName_editbox.sendKeys(username); password_editbox.sendKeys(Password); submit_button.click(); }
getText () grabs the message appeared after login and saves it to status.txt file:
public void getText() throws IOException { String text = driver.findElement(By.xpath("//div[@id='case_login']/h3")).getText(); Writer writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("status.txt"), "utf-8")); writer.write(text); writer.close(); }
saveScreenshot() takes a screenshot of the web page and saves it to screenshot.png file:
public void saveScreenshot() throws IOException { File scrFile = ((TakesScreenshot) driver).getScreenshotAs(OutputType.FILE); FileUtils.copyFile(scrFile, new File("screenshot.png")); }
closeBrowser() closes the Firefox browser:
public void closeBrowser() { driver.close(); }
5. Run the program
To run the program, click on your project, and then select Run > Run as Java Application. The program will open the Firefox browser, and once the browser is closed the program execution is finished. To check the screenshot and text file, right click on the browser and click “refresh” button. Inside the project folder you will be able to see one text file and one png file.
That’s it. If you have any comments or questions feel free to ask! For a real-world exaple of scraping with WebDriver in Java look at this article.
12 replies on “How to use Selenium WebDriver with Java”
This tutorial is really helpful and fruitful
How can WebDriver manage cookies while scraping sites?
WebDriver simply controls the web browser (FireFox in our case) and the web browser, in its turn, takes care about the cookies.
thanks a lot !!!! could you provide a logic- i have to test the all links(all pages’s link) of a website after login using web driver.
I guess, you need to find all links in the page and click them, for example:
for (WebElement link : driver.findElements(By.tagName(“a”))) link.click();
Compilation error at “userName_editbox.sendKeys(username);”
CharSequence cannot be resolved
This is known bug if you are using Eclipse indigo and Java 8. Upgrade your eclipse to newer version.
Your code will do the scraping, I accept.
But, for web scraping, there are other frameworks based on Java like HTTPUnit and HTMLUnit.
Selenium, if used purely for Automation Testing, will have more advantages over When used for web scraping.
Just my 2 cents.
You can find good training about selenium at the following selenium training centers in chennai
This was very helpful. Thank you!
This was a very helpful tutorial. Thank you!
Precise and very useful tutorial. It really worked for me to start quickly on a webscrapping tool using selenium. Thanks!
What is WebScrapper class ots showing the error.