Categories
Development

How to find out that website is Distil protected?

Given: a webpage to scrape.
If you inspect the DOM tree of that page you will find that quite a few tags are having the keyword dist. As an example:

  • <link rel="shortcut icon" type="image/x-icon" href="/wcsstore/ColesResponsiveStorefrontAssetStore/dist/30e70cfc76bf73d384beffa80ba6cbee/img/favicon.ico">
  • <link rel="stylesheet" href="/wcsstore/ColesResponsiveStorefrontAssetStore/dist/30e70cfc76bf73d384beffa80ba6cbee/css/google/fonts-Source-Sans-Pro.css" type="text/css" media="screen">
Categories
Challenge

How Imperva protects against scraping bots

Imperva (that includes the former Distil anti-bot management) is a service providing many kinds of website protections. The present Imperva services include the following ones:

  1. Cloud Web Application Firewall (WAF)
  2. Bot Protection service (formerly Distil Networks)
  3. IP Reputation Intelligence
  4. Content Delivery Network (CDN)
  5. Attack Analytics solution (eg. DDoS)

As to the protection of the bot scraping activities we mention the following.

Categories
Challenge

Most popular web scraping targets and how to scrape them

  1. Online marketplaces
    In the marketplaces people offer their products for sale. Similar to garage sales, but online. (eg. eCrater, www.1188.no).
    Easy to scrape since they are usually free and do not tend to protect their data.
  2. Business directories
    The usually huge online directories targeted at the general audience. (eg. Yellow Pages). They do protect their data to avoid duplication and loss of audience. See some posts on this.
Categories
Challenge Development

Scraping a Javascript-dependent website with puppeteer

In today’s web 2.0 many business websites utilize JavaScript to protect their content from web scraping or any other undesired bot visits. In this article we share with you the theory and practical fulfillment of how to scrape js-dependent/js-protected websites.

Categories
Development

Bypass Distil

The Distil scrape protection is a prominent one in the modern anti-scrape techniques. So, now we want to share with you some tips of how to bypass it. If you are interested, please make an inquiry to the following email: igor[dot]savinkin[at]gmail[dot]com


Categories
Uncategorized

Distil: Scrape Bot Protection Test

The anti scrape bot service test has been my focus for some time now. How well can the Distil service protect the real website from scrape? The only answer comes from an actual active scrape. Here I will share the log results and conclusion of the test. In the previous post we briefly reviewed the service’s features, and now I will do the live test-drive analysis.

Categories
Review

Distil Review: Anti-Scrape-Bot Service

Are you thinking of protecting your website content from theft and nonlegal scraping? Are you suspecting that some ‘innocent bots’ are continually visiting your web pages for data retrieval? Now we come to the anti scraping bot software and services. In this post we want to briefly review the new anti scrape bot service called Distil