How to handle cookie, user-agent, headers when scraping with JAVA? We’ll use for this a static class ScrapeHelper
that easily handles all of this. The class uses Jsoup library methods to fetch from data from server and parse html into DOM document.
Category: Development
Suppose we’ve a following array:
arr = [[ 5.60241616e+02, 1.01946349e+03, 8.61527813e+01],
[ 4.10969632e+02 , 9.77019409e+02 , -5.34489688e+01],
[ 6.10031512e+02, 9.10689615e+01, 1.45066095e+02 ]]
How to print it with rounded elements using map() and lamba() functions?
l = list(map(lambda i: list(map(lambda j: round(j, 2), i)), arr))
print(l)
The result will be the following:
[[560.24, 1019.46, 86.15],
[410.97, 977.02, -53.45],
[610.03, 91.07, 145.07]]

Sequentum Enterprise is a powerful, multi-featured enterprise data pipeline platform and web data extraction solution. Sequentum’s CEO Sarah Mckenna doesn’t like to call it web scraping because, in its description, the web scraping refers to many different types of unmanaged and non-compliant techniques for obtaining web-based datasets.
The docs on requestQueue.getInfo().
After some unsuccessful tries I could have managed to get the requestQueue info output. Note, we run the function inside the Apify runtime environment:
Apify.main(async () => { ... }
Solution 1
We make the function async and add await to the getInfo()
Promise call:
async function printRequestQueue (requestQueue){
var { totalRequestCount, handledRequestCount, pendingRequestCount } = await requestQueue.getInfo();
console.log(`Request Queue info:` );
console.log(' - handled :', handledRequestCount);
console.log(' - pending :', pendingRequestCount);
console.log(' - total:' , totalRequestCount);
}
with the following result:
Request Queue info:
- handled : 479
- pending : 312
- total: 791
Solution 2, using then/catch
In this case we do not need to make our function async since we catch the the getInfo()
promise result thru .then(response)
.
function printRequestQueue (requestQueue){
requestQueue.getInfo().then((response)=> {
console.log('total:', response.totalRequestCount);
console.log('handled:', response.handledRequestCount);
console.log('pending:', response.pendingRequestCount);
console.log('\nFull response:\n', response); })
.catch( (error) => console.log(error));
}
with the following result:
total: 791
handled: 479
pending: 312
Full response:
{ id: 'queue-name',
name: 'queue-name',
userId: null,
createdAt: 2021-02-26T11:57:00.453Z,
modifiedAt: 2021-02-26T11:58:47.988Z,
accessedAt: 2021-02-26T11:58:47.989Z,
totalRequestCount: 791,
handledRequestCount: 479,
pendingRequestCount: 312
}
let table = $('table');
if ($(table).has('br')) {
$("br").replaceWith(" ");
}
Often we need to select certain html DOM elements excluding ones with certain names/ attributes/ attribute values. Let’s show how to do that.
In this post we’ll show how to build classification linear models using the sklearn.linear.model module.
The code as an IPython notebook
In the post we will show how to generate model data and load standard datasets using the sklearn datasets module. We use sklearn.datasets in the Python 3.
The code of an iPython notebook

In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.
In a previous post we’ve considered the ways to disguise an automated Chrome browser by spoofing some of its parameters – Headless Chrome detection and anti-detection. Here we’ll share the practical results of Fingerprints testing against a benchmark for both human-operated and automated Chrome browsers.