Categories
Challenge Development

Experience with CloudFlare bypass

Presently (March 2024) anti-bots are actively applied for web data protection. Some of them with their characteristics & bypass methods might be seen here. If you are interested, take a look at some bot protected websites table. In this post we’ll share our real case experience with fighting CloudFlare proection.

The CloudFlare is a multy functional platform incl. CDN, Firewalls, Application Security, Bot Management and more. So, as we’ve found it at swappie.com & airwise.com we tried some ways to bypass with initial weak success.

Local execution bypass success in Headful mode

We’ve managed to bypass the Cloudflare protection page when we used residential proxy* and a browser in the headful mode (both Chrome headless at Selenium and Chromium at Playwright). But we considered browser working only in headless mode as Linux server.

*We applied Ultra Residential Proxies of MarsProxies.

Xvbf (X virtual framebuffer) to help

As we turned to community we’ve got help.

There’s a Linux tool called Xvfb**, which creates a virtual display so that you don’t have to use Chrome’s headless mode when running browser tests in it. Various frameworks have a option to use it.

**Xvbf stands for “X virtual framebuffer”.

We have installed Xvbf to the Linux server.
Now we’re able to apply browser scrape in headful mode with command:

xvfb-run [command]

where [command] is the orignal command to launch a scraper code.

Screen max-size emulation with Playwright

Later we managed to scrape data using Playwright with browser in Headless mode but with emulation of the screen size of 1920×1080 + residential proxy.

Read more on Choosing reliable residential proxies for web scraping

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.