Categories
Challenge Development

How to bypass PerimeterX

You’ve found the website you need to scrape, set up your scraper and fired it, just to sadly realize PerimeterX has blocked you.

PerimeterX’s dynamically complex bot detection system relies on server-side and client-side checks to distinguish humans from bots. It deploys several layers of protection and, for the most part, manages to do its job without interrupting the user experience.

But don’t fall into despair! There are a couple of things you can try to bypass PerimeterX (called HUMAN now) before giving up on your goal of scraping that delicious data.

Here are the specific steps you must follow to bypass PerimeterX:

  1. Analyze network logs.
  2. Identify the PerimeterX cookies and understand their sequence.
  3. Deobfuscate the PerimeterX JavaScript challenge script.
  4. Analyze the deobfuscated script and the subsequent checks.
  5. Calculate the correct values to bypass all the tests built into the script.
  6. Use premium proxies.

Naturally, deobfuscating the challenge script and analyzing its data points is the most complex part of the task. We cover it in great detail and with specific examples, together with how PerimeterX works, so that you can understand the logic.

Let’s dive right in!

How Does PerimeterX Work?

PerimeterX was one of the first companies out there providing security services for websites when it was founded back in 2004 (6 years before Cloudflare!). So they know what they’re doing when it comes to blocking bots.

Active and Passive Bot Detection

PerimeterX uses both passive and active bot detection. Passive Bot detection refers to checks they do on their servers once they receive a request from a visitor. Active Bot detection refers to scripts they run on the agent of the visitor to gather information and detect bots.

What’s more, its detection system claims to protect websites from bots with minimal impact on the user experience. In other words, it tries not to bother human users with captcha solving or a waiting screen for authentication unless they are suspicious the request comes from a bot.

PerimeterX Bot Detection Techniques

The below list is not exhaustive, but it covers the most aggressive defenses PerimeterX deploys. We’ll touch on how they work and then will focus on how to overcome them.

1. IP Filtering

Security companies like PerimeterX usually have huge lists of IPs known to be used by bots. They also can identify groups of IPs that belong to data centers, proxies, or VPN providers. Web Application Firewalls (WAF) usually assign some score or reputation to each IP that tries to access the protected website. If the IP your bot is using has a bad reputation, you will probably get blocked.

2. Checking HTTP [missing] Request Headers

Lots of bots use libraries or other non-browser agents like python-requests or Axios. These agents usually don’t send some of the headers that typical browsers add to their requests. This is one of the simplest ways anti-bot systems like PerimeterX Bot Defender use to identify and block bots. Luckily, it’s easy to add HTTP headers to your requests to bypass this protection.

3. Behavioral Analysis thru AI

PerimeterX is very proud of its use of Machine Learning algorithms for behavioral analysis, which allows it to identify bots based on their behavior. For example, their system has learned that IPs that make hundreds of requests in a short amount of time are usually bots. When they detect this type of behavior, they usually block access to a protected web page.

4. Fingerprinting and Blacklisting

Some of the methods we mentioned like Behavioral Analysis or Checking HTTP request headers can be combined with others like TLS fingerprinting to identify visitors even if they use different IPs. Once the Web Application Firewalls (WAF) identify the visitor as a bot, they add it to a Black list to prevent their access on future visits.

To learn some anti-block techniques against Passive Bot Detection, check out our article on how to bypass anti-scrape techniques.

If, after applying the techniques for bypassing passive bot defenses, PerimeterX is still detecting you, probably its active bot script is what is detecting your bot. If you’re ready to create a PerimeterX captcha bypass, prepare yourself to get your hands dirty with some obfuscated Javascript code and reverse engineering strategies.

How to Create a PerimeterX Bypass

One way to bypass their Bot Defender system is to add a PerimeterX bypass to your existing scraper. To code this PerimeterX bypass, it is important to understand its internals, and the first step to do that is to reverse engineer their client-side bot detection script.

In this example, we’ll analyze the antibot implementation on SSENSE. That website seems like a good example because a lot of e-commerce sites use PerimeterX. We will use mainly JavaScript, but the techniques you will learn in this tutorial will allow you to code your PerimeterX bypass in Python or any other language. Ready? Let’s begin!

SSENSE front page

Step 1: Analyze the Network Log

First, open up the developer tools for the web browser of your choice, and switch to the Network tab.

Next, while leaving the developer tools open, navigate to SSENSE.

As the page loads, you’ll notice many requests appear in the network log. These are the important ones to take note of, in chronological order:

An initial GET request to https://www.ssense.com/en-ca. Looking at the response, you’ll see a Set-Cookie header for _pxhd. This is an important cookie: it acts as a session indicator, and will also be used in future requests. Your PerimeterX bypass will need some data from this cookie to calculate correct values that will be sent for validation to the server.

GET Request

Check also that the response body’s HTML contains a <script> tag, which fetches the PerimeterX challenge script:

<script type="text/javascript"> 
	(function () { 
		window._pxAppId = "PX58Asv359"; 
		if (window._pxAppId) { 
			var p = document.getElementsByTagName("script")[0], 
				s = document.createElement("script"); 
			s.async = 1; 
			s.src = "/" + window._pxAppId.substring(2) + "/init.js"; 
			p.parentNode.insertBefore(s, p); 
		} 
	})(); 
</script>

A GET request to /<_pxAppId>/init.js (where <_pxAppId> is the value of window._pxAppId). This returns the script PerimeterX uses for client-side bot detection. It’s obfuscated and minified, so you won’t be able to understand much for now. Click here to see the entire script.

Initial Request

Then, a POST request to /<_pxAppId>/xhr/api/v2/collector happens. The request payload is a string with content-type application/x-www-form-urlencoded, and contains the following data:

  • <payload> is an encrypted and Base64 encoded string.
  • <appId> is the previously defined value of window._pxAppId.
  • <tag> is a version tag (static per site), ex. v8.0.2.
  • <uuid> is a randomly generated UUID, ex. 4420aff0-351d-11ed-95d0-c137f4896ca9.
  • <ft> is an integer (static per site), ex. 278.
  • <seq> has the value 0.
  • <en> has the value NTA.
  • <pc> is an integer, ex. 3195683956001701.
  • <pxhd> is the value of the _pxhd cookie.
  • <rsc> has the value 1.
Collector Request

The response body is a JSON object, with a single top-level field: do. The do field contains an array of strings. The format is as follows:

{ 
	"do": [ 
		"sid|<sid>", // a string, ex. 4415dfc2-351d-11ed-a66d-7275714f5843 
		"pnf|cu", 
		"cls|<cls>", // an integer, ex. 85062563435994268828 
		"sts|<sts>", // is a UNIX timestamp, ex. 1663263533114 
		"wcs|<wcs>", // a string, ex. cchm6ba3onsi8miotj00 
		"drc|<drc>", // an integer, ex. 4460 
		"cts|<cts>|true", // a string, ex. 4415e33e-351d-11ed-a66d-7275714f584 
		"cs|<cs>", // a SHA2-256 hash, ex. dd2d5dc601445d684b2c4249a4c68f300048446afd4fece93c44ae41f62bdda3 
		"vid|<vid1>|<vid2>|true", // a string and an integer, ex. 43c15b2f-351d-11ed-97ec-797549415148 and 31536000 
		"sff|cc|60|<sff>" // a base64-encoded string, ex. U2FtZVNpdGU9TGF4Ow== 
	] 
}

And a second POST request to /<_pxAppId>/xhr/api/v2/collector. The payload has the same content-type as before and a similar format with a few added fields:

  • <payload> is a much longer, encrypted + Base64 encoded string.
  • <appId><tag><uuid><ft> and <pxhd> are the same as the previous request.
  • <seq> has the value 1.
  • <en> has the value NTA.
  • <cs> is a SHA2-256 hash, ex. dd2d5dc601445d684b2c4249a4c68f300048446afd4fece93c44ae41f62bdda3
  • <pc> is an integer, ex. 1670315818019117
  • <sid> is an string, ex. 4415dfc2-351d-11ed-a66d-7275714f5843
  • <vid> is an string, ex. 43c15b2f-351d-11ed-97ec-797549415148
  • <cts> is an string, ex. 4415e33e-351d-11ed-a66d-7275714f5843
  • <rsc> has the value 2.

If you take a closer look, you’ll see that the cssidvid and cts fields are derived directly from the JSON object returned from the first POST request.

Additionally, the value of the seq and rsc has incremented by 1, relative to the first POST request. This behavior is maintained for all following POST requests too, so we can determine that these fields act as some sort of request counter.

PerimeterX sends another JSON object in the response body, once again containing an array of strings:

{ 
	"do": ["bake|_px2|330|<jwt>|true|300", "pnf|cu"] 
	 // where jwt is a JWT Token, ex. eyJ1IjoiNDQyMGFmZjAtMzUxZC0xMWVkLTk1ZDAtYzEzN2Y0ODk2Y2E5IiwidiI6IjQzYzE1YjJmLTM1MWQtMTFlZC05N2VjLTc5NzU0OTQxNTE0OCIsInQiOjE2NjMyNjM4MzQxMjIsImgiOiIwNzUzZDJhYTU1OWEzZDFhYjM5YjcyOGFmZDA0MDUyYWFlNDQ2MmU1NjMxNjZkNjM4MjM0NjZkNmNjMzIwY2ZlIn0= 
}

You may have noticed that none of the POST requests contains a Set-Cookie response header. Typically, once a browser has passed bot-detection checks, an antibot system will set special cookies or headers for use in future requests. Then, once a client makes a request to a protected endpoint, those headers/cookies from the request get validated on the server side.

So, how does this work in the case of PerimeterX? If you make a request to an endpoint protected by PerimeterX, you won’t see any unusual headers. You will, however, notice what seems to be some PerimeterX-related cookies. For a cleaner overview, you can view all the cookies on the site and filter by the keyword px:

Cookies

These are the PerimeterX’s clearance cookies. They are checked on the server side to determine if a request should be blocked or forwarded to the origin. But, remember, there’s no record of these cookies being set with Set-Cookie header in the network log. So, where are they coming from?

You might recognize the cookie names and values from the response bodies of the POST requests. This must mean that the cookies are being set directly through JavaScript, which makes sense considering all the PerimeterX cookies lack a Http-Only flag.

Note: Depending on the security level of a PerimeterX-protected site, your browser and your device, the behavior of the challenge script and its requests may differ slightly. At the time of writing, SSENSE only requires the two above POST requests to /<_pxAppId>/xhr/api/v2/collector. The second POST yields a _px2 cookie, which is the main clearance cookie that grants unblocked access to a site. Higher security sites may require additional POST requests to /<_pxAppId>/xhr/api/v2/collector, to obtain a _px3 cookie. For those sites, _px3 acts as a required clearance cookie. Don’t worry though, the techniques we discuss here will also be useful to bypass PerimeterX on sites with a high security level.

Okay, great job! By analyzing the requests, we learned a lot about how PerimeterX behaves. Unfortunately, we’re still missing a lot of information. We still don’t know what data is contained in the encrypted payload field, how some other fields are generated, and what client-side bot detection checks the script performs. If you want to bypass PerimeterX, that knowledge is crucial.

If we want to answer those remaining questions, we have no choice but to directly consult the PerimeterX challenge script to figure out exactly how it works.

Step 2: Deobfuscate the PerimeterX JavaScript Challenge

To make the script unreadable to reverse engineers, PerimeterX applies obfuscation to their Javascript challenge. Here’s a non-exhaustive list of some examples:

String Concealing. This technique replaces all references to string literals with a call to a decoder function. In the case of PerimeterX, strings are either Base64 encoded or additionally encrypted with a simple XOR cipher.

// String concealing example from the PerimeterX script 
 
// Creates an empty lookup cache for use in the decoding function 
var o = Object.create(null); 
 
/* ... */ 
 
// XOR Decryptor function 
// Returns the decoded string. 
// This function references some external variables and functions. 
// The n() and r() functions are related to recording timestamps, and are irrelevant to the decoding function. 
// The i() function is a polyfill function for atob (base64 decoding) 
// The o variable is defined earlier in the script as a cache. 
 
function c(t) { 
	// n() is irrelevant to the decoding 
	var a = n(), 
		e = o[t]; 
	if (e) u = e; // Try to look up the decoding string in the cache 
	else { 
		// i() is a polyfill function for atob 
		// Base64 decodes the input string 
		var c = i(t); 
 
		var u = ""; 
		// XOR decryption 
		for (var f = 0; f < c.length; ++f) { 
			var A = "dDqXfru".charCodeAt(f % 7); 
			u += String.fromCharCode(A ^ c.charCodeAt(f)); 
		} 
		// Store the result in the cache 
		o[t] = u; 
	} 
	return r(a), u; // r(a) is irrelevant to the decoding. 
} 
 
/* ... */ 
 
// Later on in the script, it's used like this: 
 
c("NBxAaVZGQg"); // => "PX11047"

Proxy Variables/Functions. This technique replaces direct references to a variable/function’s identifier with an intermediate variable.

/* Proxy function example */ 
 
// Decoding function from above 
function c(t) { 
	/* ... */ 
} 
 
// Intermediate variable declaration 
var r = c; 
 
// Calling r() instead of c() directly 
r("NBxAaVZERw"); // => "PX11062" 
 
/* Proxy variable example */ 
 
// Intermediate variable for the identifier "window" 
var F = window; 
 
// Referencing "F" instead of "window" directly 
F.performance.now();

Unary Expressions. Rather than directly using boolean literals or the undefined keyword, this technique takes advantage of the automatic type-conversion behavior of JavaScript’s unary expression implementation.

var o = !0; // equivalent to o = true 
var c = !1; // equivalent to c = false 
void 0 === this.channels; // equivalent to undefined === this.channels

Though the PerimeterX challenge script’s obfuscation may not be as sophisticated as that of other bot detection vendors, it still requires specialized reverse-engineering skills to convert it to a readable state. Simply pasting it in a general JavaScript deobfuscator won’t produce easily understandable code.

To deobfuscate the PerimeterX script, you’ll need to create a custom deobfuscator. This step can be difficult, but it’s essential for creating a PerimeterX bypass!

Hint: Try using abstract syntax tree (AST) manipulation.

Once you’ve deobfuscated the PerimeterX challenge script, you can read it to determine what bot detection checks are performed, and how to replicate the challenge-solving behavior. In the next step, we’re going to go over the deobfuscated script and try to extract critical information about its internals.

Step 3: Analyze the Deobfuscated PerimeterX Script

Let’s start by figuring out how the payload is encrypted!

PerimeterX’s Payload Encryption

To figure out how the payload is encrypted, so we can code our custom PerimeterX bypass we’re going to work backward. First, we find where it’s set by searching for the string "payload=" in the deobfuscated script:

var B = { 
	vid: cn, 
	tag: ff.Bn, 
	appID: ff.J, 
	cu: Uo, 
	cs: f, 
	pc: A, 
}; 
var N = Wc(n, B); 
var l = [ 
	"payload=" + N, 
	"appId=" + ff.J, 
	"tag=" + ff.Bn, 
	"uuid=" + Uo, 
	"ft=" + ff.Nn, 
	"seq=" + Uu++, 
	"en=NTA", 
];

The final value for payload is stored in the variable N. Looking at the definition of N, we can determine that the Wc function is responsible for payload encryption. Wc takes in two parameters:

  • n: a JavaScript object that stores the raw payload data.
  • B: a JavaScript object that stores some values used as keys in the encryption process.

Let’s look up the definition of Wc:

var B = { 
var Wc = function (n, r) { 
	var t; 
	var a = n.slice(); 
	t = nc || "1604064986000"; 
	var e = zr(Un(t), 10); 
	var i = z(a); 
	a = Un(zr(i, 50)); 
 
	var c = (function (n, r, t) { 
		var a, e, i, o, c; 
		var u = zr(Un(t), 10); 
		var f = []; 
		var A = -1; 
 
		for (var B = 0; B < n.length; B++) { 
			/* ... */ 
		} 
 
		for (var v = 0; n.length > v; v++) { 
			/* ... */ 
		} 
 
		return f.sort(function (n, r) { 
			return n - r; 
		}); 
	})(e, a.length, r[Hc]); 
 
	a = (function (n, r, t) { 
		/* ... */ 
		return (a += r.substring(e)); 
	})(e, a, c); 
 
	return a; 
};

This is PerimeterX’s encryption cipher. The original function is quite long and references many external variables/functions. For the sake of practicality, we’ve truncated it.

However, there are some important things you can learn about this cipher by looking at the fully deobfuscated PerimeterX script:

  • The payload uses two encryption keys: the values of uuid and sts.
  • uuid appears in every POST request, while sts appears in the 2nd POST request onwards. In the case of the 1st POST request, where sts is absent, "1604064986000" is used in place of it.
  • This is a symmetric-key algorithm. Therefore, as long as you have the original sts and uuid values, you can decrypt any encrypt PerimeterX payload. This is useful for analyzing the payload that your browser sends since the keys are always sent in the POST request along with the encrypted content.

How PerimeterX Sets Cookies

We previously concluded that all PerimeterX-related cookies were set by the actual script itself. Recall that the raw value of the _px2 cookie first appeared inside of a JSON-formatted response body (as <jwt>):

{ 
	"do": ["bake|_px2|330||true|300", "pnf|cu"] 
}

The field name, do, actually turns out to be quite literal. The corresponding value of do actually is an array of instructions. Each string is split on every | into an array. For the first string in the do array, that looks like this:

// The first instruction 
var processedInstruction1 = "bake|_px2|330||true|300".split("|"); // => ["bake","_px2","330","","true","300"]

The first element of the resulting array determines the function to be executed, while the remaining elements are taken as the arguments for the function. In this case, bake is the name of the function to be executed.

Searching for bake in the deobfuscated PerimeterX script, we discover the cu object. This cu object holds the handler for the bake instruction:

var cu = { 
	/** 
	 * @param n = "_px2" 
	 * @param r = "330" 
	 * @param t = "" 
	 * @param a = "true" 
	 * @param e = "300" 
	 */ 
	bake: function (n, r, t, a, e) { 
		if (ff.J === window._pxAppId) { 
			wt(n, r, t, a); 
		} 
		/* ... */ 
	}, 
	/* ... */ 
};

The arguments nrta, and e all take on the values of "_px2""330""<jwt>""true", and "300" respectively.

The bake method calls a function, wt. Let’s look up the definition of that too:

/** 
 * @param n = "_px2" 
 * @param r = "330" 
 * @param t = "" 
 * @param a = "true" 
 */ 
 
function wt(n, r, t, a) { 
	/* ...*/ 
 
	try { 
		var i; 
		// Creates the expiry date of the cookie, based on the "r" parameter. 
		if (r !== null) { 
			i = new Date(+new Date() + 1000 * r).toUTCString().replace(/GMT$/, "UTC"); 
		} 
		// Initialize the _px2 cookie string 
		var o = n + "=" + t + "; expires=" + i + "; path=/"; 
		var c = (a === true || a === "true") && bt(); 
		// Append the site domain to the cookie, and add the cookie to document.cookie 
		c && (o = o + "; domain=" + c)((document.cookie = o + "; " + e)); 
		return true; 
	} catch (n) { 
		return false; 
	} 
}

So, it looks like the bake instruction directly sets the _px2 cookie! It’s also a play on words, as in baking cookies.

Congrats! You found where in the code their main antibot cookie is being set! The next step will be to calculate values for it that make sense to PerimeterX so your bot does not get flagged as suspicious.

You should note that the cu object contains handlers for all other possible do instructions, too! To create a PerimeterX bypass, you need to reverse engineer the functionality of each do instruction.

Let’s learn how to break some of the security checks you might find inside this do array.

WebGL Fingerprinting

In the snippet below, PerimeterX uses WebGL APIs to create and render an image. The hash of the image is then stored in canvasfp:

// This function creates, renders, and hashes the image to construct "canvasfp". 
 
function A() { 
	return new T(function (c) { 
		setTimeout(function () { 
			try { 
				a = n.createBuffer(); 
				n.bindBuffer(n.ARRAY_BUFFER, a); 
				var u = new Float32Array([ 
					-0.2, -0.9, 0, 0.4, -0.26, 0, 0, 0.732134444, 0, 
				]); 
				n.bufferData(n.ARRAY_BUFFER, u, n.STATIC_DRAW)((a.itemSize = 3))( 
					(a.numItems = 3) 
				)((e = n.createProgram()))((i = n.createShader(n.VERTEX_SHADER))); 
				n.shaderSource( 
					i, 
					"attribute vec2 attrVertex;varying vec2 varyinTexCoordinate;uniform vec2 uniformOffset;void main(){varyinTexCoordinate=attrVertex+uniformOffset;gl_Position=vec4(attrVertex,0,1);}" 
				); 
				/* Some more transformations on the canvas image... */ 
				/* ... */ 
				n.drawArrays( 
					n.TRIANGLE_STRIP, 
					0, 
					a.numItems 
				)( 
					(r.canvasfp = n.canvas === null ? "no_fp" : In(n.canvas.toDataURL())) // In() computes a hash of the generated image 
				)((r.extensions = n.getSupportedExtensions() || ["no_fp"])); 
			} catch (n) { 
				r.errors.push("PX10703"); 
			} 
			/* ... */ 
		}, 1); 
	}); 
}

This is useful for fingerprinting because even if instructed to draw the exact same image, slight variations in hardware or low-level software (i.e., operating systems) will produce a different output (and thus, a different hash). This makes WebGL fingerprinting a good way to classify devices.

PerimeterX also collects various other WebGL properties to better classify your device. Using machine learning, they can use this data to detect if you’re spoofing WebGL properties/rendering.

The computed canvasfp, along with the additional WebGL properties are added to the payload object in the snippet below:

// Adding the collected WebGL data to the POST request payload 
 
(function (t) { 
	(a.PX10061 = t.canvasfp)((a.PX11016 = t.webglVendor))((a.PX10529 = t.errors))( 
		(a.PX10279 = t.webglRenderer) 
	)((a.PX10753 = t.webGLVersion))((a.PX10246 = t.extensions))( 
		(a.PX11232 = In(t.extensions)) 
	)((a.PX10871 = t.webglParameters))((a.PX11231 = In(t.webglParameters)))( 
		(a.PX11077 = t.unmaskedVendor) 
	)((a.PX10165 = t.unmaskedRenderer))((a.PX10244 = t.shadingLangulageVersion)); 
	tt("PX11223"); 
	r(a); 
});

Automated Browser Checks

Below, PerimeterX is checking for the existence of automated-browser-specific properties:

try { 
	(n.PX10010 = !!window.emit)((n.PX10225 = !!window.spawn))( 
		(n.PX10855 = !!window.fmget_targets) 
	)((n.PX11065 = !!window.awesomium))((n.PX10456 = !!window.__nightmare))( 
		(n.PX10441 = Xr(window.RunPerfTest)) 
	)((n.PX10098 = !!window.geb))((n.PX10557 = !!window._Selenium_IDE_Recorder))( 
		(n.PX10170 = !!window._phantom || !!window.callPhantom) 
	)((n.PX10824 = !!document.__webdriver_script_fn))( 
		(n.PX10087 = !!window.domAutomation || !!window.domAutomationController) 
	)( 
		(n.PX11042 = 
			window.hasOwnProperty("webdriver") || 
			!!window["webdriver"] || 
			document.getElementsByTagName("html")[0].getAttribute("webdriver") === 
				"true") 
	); 
} catch (n) {}

Sandboxing Checks

PerimeterX checks for the existence of NodeJS-only APIs to determine if the script is being sandboxed:

var n; 
// The process object only exists in NodeJS. 
try { 
	n = 
		n || 
		((typeof process == "undefined" ? "undefined" : A(process)) === "object" && 
			String(process) === "[object process]"); 
} catch (n) {} 
 
try { 
	n = n || /node|io\.js/.test(process.release.name) === true; 
} catch (n) {}

To make sure built-in functions haven’t been modified (i.e. monkey-patched), PerimeterX calls typeof and an implicit toString on them:

// A() acts as a wrapper for "typeof" 
 
function A(n) { 
	A = 
		typeof Symbol == "function" && typeof Symbol.iterator == "symbol" 
			? function (n) { 
					return typeof n; 
				} 
			: function (n) { 
					return n && 
						typeof Symbol == "function" && 
						n.constructor === Symbol && 
						n !== Symbol.prototype 
						? "symbol" 
						: typeof n; 
				}; 
	return A(n); 
} 
// 
function Xr(n) { 
	// When typeof is called on an unmodified built-in function, it will return "function". 
	// "" + n is an implicit toString() 
	// An unmodified built-in function will always include "[native code]" in the result. 
	return A(n) === "function" && /\{\s*\[native code\]\s*\}/.test("" + n); 
} 
 
/* ... */ 
 
// Later used like this: 
 
n.PX10213 = Xr(window.EventSource); 
n.PX10283 = Xr(Function.prototype.bind); 
n.PX10116 = Xr(window.setInterval); 
 
// If they haven't been modified, all the above calls should return true.

User Input Event Tracking

PerimeterX collects behavioral biometrics, such as mouse movements, keyboard presses, and touch movements. The collected data can then be analyzed with machine learning to determine if the inputs are human-like, or generated by a bot.

In this snippet, PerimeterX tracks the timing and position of touch events:

{ 
	(function (n, r) { 
		_i.length < 10 && 
			_i.push( 
				+n.movementX.toFixed(2) + "," + +n.movementY.toFixed(2) + "," + wr(r) 
			); 
 
		if (n && n.movementX && n.movementY) { 
			if (Pi.length < 50) { 
				Pi.push( 
					(function (n) { 
						var r = n.touches || n.changedTouches; 
						var t = r && r[0]; 
						var a = +(t ? t.clientX : n.clientX).toFixed(0); 
						var e = +(t ? t.clientY : n.clientY).toFixed(0); 
 
						var i = (function (n) { 
							return +(n.timestamp || n.timeStamp || 0).toFixed(0); 
						})(n); 
 
						return "".concat(a, ",").concat(e, ",").concat(i); 
					})(n) 
				); 
			} 
		} 
	})(n, t); 
}

The Smart Way to Bypass PerimeterX

At this point, maybe you are thinking “Isn’t there simply any existing PerimeterX bypass that works?”.

The harsh reality is that in 2023 it’s very difficult to bypass PerimeterX antibot service using public software, like the libraries you can find on GitHub (however, you can check some of them, like Puppeteer Stealth). Also, standard headless browsers based on Chrome, Chromium, Firefox or Selenium need very specific configurations to work.

Because the source code of such software is public, PerimeterX developers can update their antibot system to detect it. One option is to code your own PerimeterX bypass, although the easiest way to deal with their antibot protection is to use private software designed to bypass PermiterX.

Conclusion

As you can see, to bypass the PerimeterX JavaScript challenge, you need to (1) deobfuscate the JavaScript script and then (2) carefully calculate the correct values for all of the different types of checks the script has:

  • Input event tracking, like mouse events.
  • Sandboxing (NodeJS, Python, etc.).
  • Automated browsers, like Chrome Headless or Selenium check.
  • WebGL fingerprinting combined with their machine learning algorithms.
  • Cookies with correct values.

Have in mind that after you cheat PerimeterX (HUMAN) Bot Defender once, your job isn’t completely done. It’s necessary to constantly check your scraper because if PerimeterX updates its script, your custom bypass can be detected and blocked.

Source

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.