In the post we share about web fingerprinting and particularly TLS fingerprinting. First let’s categorize the fingerprinting.
Category: Challenge
Today, I got in touch with the Node.js [and Python] bots garden/zoo providing modern bots with different kinds of browsers (Firefox, Chrome, Headless/not headless) using different automation frameworks (Puppeteer, Selenium, Playwright) in several programming languages.
The original AWSCodeStarFullAccess policy is full [to provide access to AWS CodeStar via the AWS Management Console] yet it still does not grant enough access to create a Code connection for IAM at CodePipeline. So we had to manage to create a custom policy based on AWS tutorial suggestions.
Bot protected websites
AirTable scrape challenge
The problem is that data are loaded highly dynamically. HTML contains only the information that you currently see on the browser screen.
If there are a lot of records then it is difficult to collect such a table. One of the possible ways is to calculate the size of the screen and rows in the table. Then using the browser automation and use to make a script that will scroll through it bit by bit and collect data.
Is there any other feasible way to get data of a table? For example there is a HTTP requests coding way to get dynamic data.
JS infinite scroll does not work for AirTable either.
Please comment down here if having some tips, hints.
Recently we’ve performed the Yelp business directory scrape for acquiring high quality B2B leads (company + CEO info). This forced us to apply many techniques like proxying, external company site scrape, email verification and more.
Recently we encountered a website that worked as usual, yet when composing and running scraping script/agent it has put up blocking measures.
In this post we’ll take a look at how the scraping process went and the measures we performed to overcome that.
Recently we encountered a new powerful scraping service called Web Scraper IDE [of Bright Data]. The life-test and thorough drill-in are coming soon. Yet now we want to highlight its main features that has badly (in positive sense, strongly) impressed us.
“Out of the 15 bank customers to whom the manager offered to connect autopayments, four agreed. Service activation is a binary feature that can be described by the Bernoulli distribution.”.
Let’s find the maximum likelihood estimate for the parameter p out of such a sample.
1) Likelihood function:
L(Xn, p) = ∏ p[Xi=1]*(1−p)[Xi=0] = p^4 * (1-p)^11
2) We find the maximum likelihood estimate for the parameter p.
We logarithm L(Xn, p) and get the following:
ln(p^4 * (1-p)^11) = 4*ln(p) + 11*ln(1-p)
3) Now we take its derivative and equate it to zero to find p.
[4ln(p) + 11ln(1-p)]` = 4 (ln(p))` + 11 (ln(1-p))` = 4/p + 11/(1-p) * (-1) = 0
Following: 4/p = 11/(1-p) => 4(1-p) = 11p => 15p = 4 => p = 4/15 =~ 0.26667.
We’ll also interpret the found linear dependencies. That means we check whether the discovered pattern corresponds to common sense. The main purpose of the task is to show and explain by example what causes overfitting and how to overcome it.