Recently I published an article on how to solve captcha in C# using DeathByCaptcha service, and I promised to offer you an example in other languages as well. In this post I’ll offer a Java project that does the same thing.
A Shortcut
You can download the project right away.
How it works
In short, this program uses Selenium Webdriver to get a CAPTCHA picture, sends it to DeathByCaptcha service, receives a response, types it in and gets to the secured page. As an example of a captcha-protected webpage, I use my Web Scraper Testing Ground.
Let’s have a tour of the code now.
1. Opening the Webpage
First we need to initialize the WebDriver and open the target webpage. Let’s use the Firefox driver for this:
FirefoxDriver driver = new FirefoxDriver(); driver.manage().timeouts().implicitlyWait(1, TimeUnit.SECONDS); driver.navigate().to("http://testing-ground.scraping.pro/captcha");
2. Getting the Captcha Image
To get the image we will take a screenshot of the whole screen and then cut the image out according to its dimensions and location. After that the image is saved into a file in PNG format for further sending to DeathByCaptcha service:
byte[] arrScreen = driver.getScreenshotAs(OutputType.BYTES); BufferedImage imageScreen = ImageIO.read(new ByteArrayInputStream(arrScreen)); WebElement cap = driver.findElementById("captcha"); Dimension capDimension = cap.getSize(); Point capLocation = cap.getLocation(); BufferedImage imgCap = imageScreen.getSubimage(capLocation.x, capLocation.y, capDimension.width, capDimension.height); ByteArrayOutputStream os = new ByteArrayOutputStream(); ImageIO.write(imgCap, "png", os);
You may ask why I use such a complicated solution in taking a screenshot and extracting the image from it. Why not download the ready image by its URL? The problem is that every time you request the image the server returns a new, randomly generated CAPTCHA, so to enter a valid code you need to use the very image that was generated specifically for the page on which you enter the code.
3. Requesting the DeathByCaptcha Service
Now as we have the captcha image extracted, we can send it to DeathByCaptcha for recognition. It’s done in a couple of code lines:
SocketClient client = new SocketClient("user", "password"); Captcha res = client.decode(new ByteArrayInputStream(os.toByteArray()));
Note that you need to replace “user” and “password” with your real DeathByCaptcha account details.
4. Typing the Recognized Captcha In
As soon as we get the response from DeathByCaptcha, we can type it into the page accessing the secure part:
driver.findElementByXPath("//input[@name='captcha_code']").sendKeys(res.text); driver.findElementByXPath("//input[@name='submit']").click();
That’s it! Note, though, that in these snippets I have omitted several minor details that are present in the whole project, but are not so important here.
Libraries
Here I’d like to briefly mention some crucial libraries, packages and classes used in the project:
- Selenium Webdriver for working with a webpage
- java.awt.image.BufferedImage for working with an image (extracting a part of it as a separate image)
- javax.imageio for writing the captcha image to a disk
- DeathByCaptchaAPI for accessing the service to turn the captcha image into a text
- java.util.logging.Logger for logging the whole process
9 replies on “An Example of Captcha Solver in Java”
Web scraper can get blocked ip by web master.
Could you please make a tut how to change IP with WebDriver, Selenium.
Done, look at examples at http://scraping.pro/change-webdrivers-ip-address/. They are in Python, but I think you can guess how to do the same in Java.
i just wanna display captcha from website to my program to make a bot. Could you please give me an advice? (captcha from google and solve media). Thanks you!
I was just curious if your java project is up to date. I’m having some problems adding it to a Maven project. Thanks!
we develop a software in vb.net/c#, for extract data from website. where website added capicha characters to receive detailed data, and through our software we have to enter capicha character one by one. can you help us to read capitcha image.
thanks
Keep up the good work ! Bravo
getScreenshotAs() method is missing in driver class. i got no method error. please suggest . where do i get this method.
Hi ,
When I run the code, I am gettting an exception:
Exception in thread “main” java.awt.image.RasterFormatException: (y + height) is outside of Raster
Me too. When I run the code, I am gettting an exception:
java.awt.image.RasterFormatException: (y + height) is outside of Raster