Recently I published an article on how to solve captcha in C# using DeathByCaptcha service, and I promised to offer you an example in other languages as well. In this post I’ll offer a Java project that does the same thing.
You can download the project right away.
How it works
In short, this program uses Selenium Webdriver to get a CAPTCHA picture, sends it to DeathByCaptcha service, receives a response, types it in and gets to the secured page. As an example of a captcha-protected webpage, I use my Web Scraper Testing Ground.
Let’s have a tour of the code now.
1. Opening the Webpage
First we need to initialize the WebDriver and open the target webpage. Let’s use the Firefox driver for this:
FirefoxDriver driver = new FirefoxDriver(); driver.manage().timeouts().implicitlyWait(1, TimeUnit.SECONDS); driver.navigate().to("http://testing-ground.scraping.pro/captcha");
2. Getting the Captcha Image
To get the image we will take a screenshot of the whole screen and then cut the image out according to its dimensions and location. After that the image is saved into a file in PNG format for further sending to DeathByCaptcha service:
byte arrScreen = driver.getScreenshotAs(OutputType.BYTES); BufferedImage imageScreen = ImageIO.read(new ByteArrayInputStream(arrScreen)); WebElement cap = driver.findElementById("captcha"); Dimension capDimension = cap.getSize(); Point capLocation = cap.getLocation(); BufferedImage imgCap = imageScreen.getSubimage(capLocation.x, capLocation.y, capDimension.width, capDimension.height); ByteArrayOutputStream os = new ByteArrayOutputStream(); ImageIO.write(imgCap, "png", os);
You may ask why I use such a complicated solution in taking a screenshot and extracting the image from it. Why not download the ready image by its URL? The problem is that every time you request the image the server returns a new, randomly generated CAPTCHA, so to enter a valid code you need to use the very image that was generated specifically for the page on which you enter the code.
3. Requesting the DeathByCaptcha Service
Now as we have the captcha image extracted, we can send it to DeathByCaptcha for recognition. It’s done in a couple of code lines:
SocketClient client = new SocketClient("user", "password"); Captcha res = client.decode(new ByteArrayInputStream(os.toByteArray()));
Note that you need to replace “user” and “password” with your real DeathByCaptcha account details.
4. Typing the Recognized Captcha In
As soon as we get the response from DeathByCaptcha, we can type it into the page accessing the secure part:
That’s it! Note, though, that in these snippets I have omitted several minor details that are present in the whole project, but are not so important here.
Here I’d like to briefly mention some crucial libraries, packages and classes used in the project:
- Selenium Webdriver for working with a webpage
- java.awt.image.BufferedImage for working with an image (extracting a part of it as a separate image)
- javax.imageio for writing the captcha image to a disk
- DeathByCaptchaAPI for accessing the service to turn the captcha image into a text
- java.util.logging.Logger for logging the whole process