Categories
Data Mining

Linear models, Sklearn.linear_model, Regression

In this post we’ll show how to build regression linear models using the sklearn.linear.model module.

See also the post on classification linear models using the sklearn.linear.model module.

Categories
Development

How to print out requestQueue info (Apify) at run time

The docs on requestQueue.getInfo().

After some unsuccessful tries I could have managed to get the requestQueue info output. Note, we run the function inside the Apify runtime environment:

Apify.main(async () => { ... }

Solution 1

We make the function async and add await to the getInfo() Promise call:

async function printRequestQueue (requestQueue){
   var { totalRequestCount, handledRequestCount, pendingRequestCount } = await requestQueue.getInfo();
   console.log(`Request Queue info:` );
   console.log(' - handled :', handledRequestCount);
   console.log(' - pending :', pendingRequestCount);
   console.log(' - total:'  , totalRequestCount); 
}

with the following result:

Request Queue info:
 - handled : 479
 - pending : 312
 - total: 791

Solution 2, using then/catch

In this case we do not need to make our function async since we catch the the getInfo() promise result thru .then(response).

function printRequestQueue (requestQueue){ 
  requestQueue.getInfo().then((response)=> { 
    console.log('total:', response.totalRequestCount); 
    console.log('handled:', response.handledRequestCount);
    console.log('pending:', response.pendingRequestCount);  
    console.log('\nFull response:\n', response); })
 .catch( (error) => console.log(error)); 
}

with the following result:

total: 791
handled: 479
pending: 312

Full response:
 { id: 'queue-name',
  name: 'queue-name',
  userId: null,
  createdAt: 2021-02-26T11:57:00.453Z,
  modifiedAt: 2021-02-26T11:58:47.988Z,
  accessedAt: 2021-02-26T11:58:47.989Z,
  totalRequestCount: 791,
  handledRequestCount: 479,
  pendingRequestCount: 312 
}
Categories
Development

Node.js Cheerio scraper, replace element

let table = $('table');
if ($(table).has('br')) {  				     
    $("br").replaceWith(" ");
}
Categories
Development

DOM selector excluding certain elements

Often we need to select certain html DOM elements excluding ones with certain names/ attributes/ attribute values. Let’s show how to do that.

Categories
Data Mining Development

Linear models, Sklearn.linear_model, Classification

In this post we’ll show how to build classification linear models using the sklearn.linear.model module.

Categories
Data Mining

Adding regularization into Linear Regression model

Regularization is applying a penalty to increasing the magnitude of parameter values in order to reduce overfitting. When you train a model such as a logistic regression model, you are choosing parameters that give you the best fit to the data. This means minimizing the error between what the model predicts for your dependent variable given your data compared to what your dependent variable actually is.

Categories
Data Mining

Cross-validation strategies and their application

In the post we’ll get to know the Cross-validation strategies as from the Sklearn module. We’ll show the methods of how to perform  k-fold cross-validation. All the iPython notebook code is correct for Python 3.6.

Categories
Data Mining Development

Work with inbuilt datasets of Sklearn and Seaborn libraries

In the post we will show how to generate model data and load standard datasets using the sklearn datasets module. We use sklearn.datasets in the Python 3.

Categories
Data Mining

Linear regression and Stochastic Gradient Descent

In this post we’ll show how to make a linear regression model for a data set and perform a stochastic gradient descent in order to optimize the model parameters. As in a previous post we’ll calculate MSE (Mean squared error) and minimize it.

Categories
Development

Puppeteer Stealth to prevent detection

In the previous post we shared how to disguise Selenium Chrome automation against Fingerprint checks. In this post we share the Puppeteer-extra with Stealth plugin to do the same. The test results are available as html files and screenshots.