Categories
Challenge Development

TLS (SSL) Fingerprinting

In the post we share about web fingerprinting and particularly TLS fingerprinting. First let’s categorize the fingerprinting.

Passive Fingerprinting

This kind of Fingerprinting takes place at 3 levels:

  • TCP
  • TLS/SSL*
  • HTTP

*TLS – Transport Layer Security, wiki. TLS is actually just a more recent version of SSL. It fixes some security vulnerabilities in the earlier SSL protocols.

Active Fingerprinting

  1. Browser fp – a server sends something to you (browser) and watches how you/browser react to that. This might be further categorized as the following:
    – Canvas fingerprinting
    – Behavior fingerprinting

TLS fingerprinting

Visit the website browserleaks.com/ssl to see your TLS data. When hashed, these will be TLS fingerprinting. Different browsers and robots run from the same machine will cause the TLS to differ.

  • One User-Agent (UA) has only one TLS fingerprint.
  • 1 TLS fingerprint for many UAs.

Anti-scrape tools match your TLS fingerprint to the UA that you are sending thru the request.

Bypass TLS fingerprinting [for web scraping]

  • Randomize TLS Cipher [suite] parameters and randomize TLS version. Thus you send a different TLS code with each request. The challenge is to make random Cipher parameters match the TLS connection version.
  • Align all the TLS parameters with UA that we send. Yet for this we need a huge UA DB, that being the challenge.
    The TLS manipulation is a thing in itself. The dealing with a combination of symmetric and asymmetric cryptography requires reading lots of RFQs. See more here.
  • Use real browsers with their real UA. Eg. you load different Chrome revisions with their corresponding UAs.

Conclusion

Let’s see how to react for web scraping as far as the TLS alignment is concerned.

Case 1. No need to deal with TLS alignment

• Many domains scrape

• One domain is not crucial if scrape fails

• Not much time for development

Case 2. TLS alignment realized

• Specific domain(s) scrape

• Data is crucial, it should not be compromised

• Bringing scraping bots to the next level

The TLS alignment for web scraping comparison table

No need to deal with TLS alignmentTLS alignment realized
Scrape scope Many domains scrapeSpecific domain(s) scrape
Critical data One domain is not crucial if its scrape fails Data is crucial, no data to miss
Timing Not much time for developmentDecent timing for TLS alignment
DevelopmentNoYes, incl. studying HTTP Secure level crypto
Bot levelRegularBringing scraping bots to the next level

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.