Categories
Challenge Featured Review SaaS

Data collectors to scrape tough websites

Recently we encountered a new powerful scraping service called Data Collector [of Bright Data]. The life-test and thorough drill-in are coming soon. Yet now we want to highlight it main features that has badly (in positive sense, strongly) impressed us.

Categories
Development

Vesta CP install SSL certificate for a domain

I’ve set up a subdomain working thru http at VPS on Centos 7. Yet, as to https it returns the main domain content.

Given

  1. SSL certificate that we’ve bought for domain does not fit for subdomain.
  2. There is the redirect SSL subdomain SSL subdomain solution.

How to set up a certificate for subdomain ?

A step by step guide of the ACME SSL certification using dehydrated.

Added LetsEncrypt certificate using VestaCP

First you need to update Vesta CP:

# sudo v-update-sys-vesta-all

In case of

v-update-sys-vesta-all command not found

then you do the following:

# source ~/.bash_profile
# export VESTA=/usr/local/vesta/

Now you visit VestaCP and in the WEB tab select a domain to edit.

Scroll down to the SSL support, tick it up. Tick up the Let’s Encrypt checkbox as well and wait to up to 5 minutes while SSL certificate to be generated. The result should be evident soon:

After the successful SSL certificate load from the Lets Encrypt the following files should be present in the /home/admin/conf/web directory:

ssl.sm.webscraping.pro.ca
ssl.sm.webscraping.pro.crt
ssl.sm.webscraping.pro.key
ssl.sm.webscraping.pro.pem

Categories
Development

Subdomain at Centos 7 with Laravel project

This post is devoted to the steps of how to create subdomain (Centos 7 and Vesta CP) and map a [Laravel] project folder to it.

Categories
Development

How to add Git Personal Access Token (PAT) into git console

  1. Remove previous git origin
git remote remove origin
  1. Add new origin with PAT (<Token>) :
git remote add origin https://<TOKEN>@github.com/<USERNAME>/<REPO>.git
  1. Push once with –set-upstream
git push --set-upstream origin main

Now you might commit changes to the remote repo without adding PAT into a push command every time.

If you need to create PAT, use the following tut.

Categories
Data Mining

Random Forest vs Gradient boosting

The objective of the task is to build a model so that we can, as optimally as this data allows, relate molecular information, to an actual biological response.

We have shared the data in the comma separated values (CSV) format. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological response; the molecule was seen to elicit this response (1), or not (0). The remaining columns represent molecular descriptors (D1 through D1776), these are calculated properties that can capture some of the characteristics of the molecule – for example size, shape, or elemental constitution. The descriptor matrix has been normalized.

Categories
Data Mining

Bagging and Random Forest

In this post we do several tasks performing the Bagging and the Random Forest Classificators.

We gradually develop classifier for the Bagging on randomized trees that in its final stage matches the Random Forest algorithm.

We’ll also build the RandomForestClassifier of sklearn.ensemble and learn of its quality depending on (1) number of trees, (2) max features used for each tree node, and (3) max tree depth.

Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models.

Random forest is an extension of bagging that also randomly selects subsets of features used in each data sample.

Categories
Data Mining

Sklearn, Random Forest

The objective of the task is to build a model so that we can, as optimally as this data allows, relate molecular information, to an actual biological response.

We have shared the data in the comma separated values (CSV) format. Each row in this data set represents a molecule. The first column contains experimental data describing an actual biological response; the molecule was seen to elicit this response (1), or not (0). The remaining columns represent molecular descriptors (D1 through D1776), these are calculated properties that can capture some of the characteristics of the molecule – for example size, shape, or elemental constitution. The descriptor matrix has been normalized.

Source.

Categories
Data Mining

Sklearn Decision trees

We show how to work with Decision trees at the Sklearn library.

Sklearn.treeSklearn tree examples

Categories
Development

Cheerio.js, get items from html table into object

Suppose there is a table like below (1 info row only):

Blows
Minute (BPM)
Speed (RPM) Power, PSI Flow, PSI
Tool Sys
0-2500 0-250 1.8 HP 2.6-13.2 GPM SDS Max

How to scrape it using cheerio.js as a parser?

Case 1 (1 row only)

Categories
Data Mining

Bike Sharing Demand Problem, part 2 – Sklearn SGD regression model, scaling, transformation chain and Random Forest nonlinear model

The Bike Sharing Demand problem requires using historical data on weather conditions and bicycle rental to predict the number of occupied bicycles (rentals) for a certain hour of a certain day.

In the original problem statement, there are 11 features available. The feature set contains both real, categorical, and binary data. For the demonstration, a training sample bike_sharing_demand.csv is used from the original data.

See the Bike Sharing Demand, part 1 of the task where we performed some initial problem analysis.