Categories
Challenge Development

Bypass GoDaddy Firewall thru VPN & browser automation

Recently we encountered a website that worked as usual, yet when composing and running scraping script/agent it has put up blocking measures.

In this post we’ll take a look at how the scraping process went and the measures we performed to overcome that.

A scraping agent

At first we composed the scraping agent with JAVA using JSoup library. At the test phase the agent worked fine to reach both category and product pages.

The firewall blockage

Then when we added all other necessary selectors for parsing information to the agent and ran it, a firewall stopped the scraping process, namely the agent got no more than 17 product pages. We couldn’t access the site at all. The return page was as follows:

I took a look at the GoDaddy Firewall service. The advanced firewall plan included the following (I’ve bolded the aspects pertaining to the web scraping):

  • Firewall prevents hackers.
  • DDoS protection, and Content Deliver Network (CDN) speed boost.
  • SSL certificate included in firewall.
  • Protects one website.
  • Malware scanning.
  • Unlimited site cleanups

No details are given of what particularly the firewall performs. As we’ve searched for CDN hints in the website code, we’ve found none of those.


VPN & Browser Automation

We turned to the VPN usage when the site access was restored. The free VPN account has provided only a limited access to the target site. 🙁
Additionally we have switched from mere HTTP GET requests (by JSoup) to the browser automation.

We reached a full site access with the scrape agent by using a paid VPN plan (shuffling servers several times) and having a delay of 1.5 seconds.

Recap

The modern hustings now leverage hosting firewalls, yet they basically do general protection, based on the multiple access from the same IP. Managing VPN and browser automation makes it possible to overcome that blockage and bring in successful data harvest.

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

This site uses Akismet to reduce spam. Learn how your comment data is processed.