Categories
Data Mining

Sklearn, Classification and Regression metrics

in the post will reviewed a number of metrics for evaluating classification and regression models. For that we use the functions we use of the sklearn library. We’ll learn how to generate model data and how to train linear models and evaluate their quality.

The code as an IPython notebook

sklearn.metrics

Docs

import warnings
warnings.filterwarnings("ignore")
  
from sklearn import model_selection, datasets, linear_model, metrics 
from matplotlib.colors import ListedColormap
%pylab inline
  

Datasets generation

Since we are to solve 2 problems: Classification and Regression, we’ll need 2 sets of data.

For that we use make_classification and make_regression functions.

For each problem we generate datasets of 2 features and we visualize them. In the Classification problem both features are informative, while in Regression problem, only 1 of 2 features is an informative feature.

clf_data, clf_target = datasets.make_classification(n_features = 2, n_informative = 2, n_classes = 2, 
                                                    n_redundant = 0, n_clusters_per_class = 1, 
                                                    random_state = 7)
  
reg_data, reg_target = datasets.make_regression(n_features = 2, n_informative = 1, n_targets = 1, 
                                                noise = 5., random_state = 7)
  
Classification dataset display
colors = ListedColormap(["red", "blue"])
pylab.scatter(clf_data[:,0], clf_data[:,1], c = clf_target, cmap = colors)
pylab.rcParams["figure.figsize"] = [10, 7]
  
Regression dataset display
pylab.scatter(reg_data[:,1], reg_target, color = "r")
pylab.scatter(reg_data[:,0], reg_target, color = "b")
pylab.rcParams["figure.figsize"] = [10, 7]
  

We split data into train and test sets.

clf_train_data, clf_test_data, clf_train_labels, clf_test_labels = model_selection.train_test_split(clf_data, clf_target,
                                                                                     test_size = 0.3, random_state = 1)
  
reg_train_data, reg_test_data, reg_train_labels, reg_test_labels = model_selection.train_test_split(reg_data, reg_target,
                                                                                     test_size = 0.3, random_state = 1)
  

Quality metrics in classification tasks

Classification model training

We’ll use SGDClassifier. It is a Linear classifier based on Stochastic gradient decent .

  • Probabilistic classifier (Loss funciton: loss = ‘log’)
classifier = linear_model.SGDClassifier(loss = "log", random_state = 1, max_iter=1000)
classifier.fit(clf_train_data, clf_train_labels)
  
SGDClassifier(loss="log", random_state=1)
  

Generate classifier predicted labels:

predictions = classifier.predict(clf_test_data)
  

Generate a prediction in the form of the probability that the object belongs to the zero class or the first class.

probability_predictions = classifier.predict_proba(clf_test_data)
  
# original dataset labels
print(clf_test_labels)
  
[1 0 0 1 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0]
  
# predicted labels
print(predictions)
  
[1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0]
  

Probabilities that the object belongs to the zero class or the first class.

The first probability is that object belongs to the the zero class and the second probability is that object belongs to the first class

print(probability_predictions[:10])
  
[[0.00000000e+00 1.00000000e+00]
 [9.99999997e-01 2.90779994e-09]
 [9.99990982e-01 9.01818055e-06]
 [0.00000000e+00 1.00000000e+00]
 [1.00000000e+00 7.01333183e-14]
 [5.16838702e-07 9.99999483e-01]
 [6.66133815e-16 1.00000000e+00]
 [1.00000000e+00 6.21822808e-13]
 [0.00000000e+00 1.00000000e+00]
 [9.99999998e-01 2.30155106e-09]]
  

We’ve done the preparational work. Now we come to the calculating metrics.

Accuracy

Accuracy is metric that shows closeness of the measurements to a specific value , designating a portion of correctly classified objects.

  • pair[0] – correct label
  • pair[1] – predicted label
  • len(clf_test_labels) – data volume
# calculating thru Python means
acc1 = sum([1. if pair[0] == pair[1] else 0. for pair in zip(clf_test_labels, predictions)])/len(clf_test_labels)
# inbuilt accuracy score
acc2 = metrics.accuracy_score(clf_test_labels, predictions)
print (acc1, acc2)
  
0.9333333333333333 0.9333333333333333
  

Confusion matrix

Confusion matrix is the NxN matrix where N – number of classes (N=2 in our case). Confusion matrix provides to calculate many statistics metrics.

It shows the main characteristic of a built model:
  1. True positive (TP) eqv. with hit
  2. True negative (TN) eqv. with correct rejection
  3. False positive (FP) eqv. with false alarm, type I error or underestimation
  4. False negative (FN) eqv. with miss, type II error or overestimation
matrix = metrics.confusion_matrix(clf_test_labels, predictions)
print("Confusion matrix\n",  matrix)
  
Confusion matrix
 [[17  1]
 [ 1 11]]
  
# manual calculations of True positives and True negatives
sum([1 if pair[0] == pair[1] else 0 for pair in zip(clf_test_labels, predictions)])
  
28
  
matrix.diagonal().sum()
  
28
  

Precision

Precision describes the random errors, a measure of statistical variability.

First we estimate the accuracy of the classification to the zero class. We call the precision_score() function, pass it the correct class labels, the predicted class labels. And since by default our label is 1, we need to explicitly say that in this case we estimate the classification accuracy to the zero class. To do this, we use the pos_label argument and say that it is equal to 0 .

metrics.precision_score(clf_test_labels, predictions, pos_label = 0)
  
0.9444444444444444
  

We estimate the accuracy of the objects classification to the first class. Default pos_label = 1.

metrics.precision_score(clf_test_labels, predictions)
  
0.9230769230769231
  

Recall

See the Presision and Recall at a vivid picture:

metrics.recall_score(clf_test_labels, predictions, pos_label = 0)
  
0.9444444444444444
  
metrics.recall_score(clf_test_labels, predictions)
  
0.9166666666666666
  

F-score

metrics.f1_score(clf_test_labels, predictions, pos_label = 0)
  
0.9444444444444444
  
metrics.f1_score(clf_test_labels, predictions)
  
0.9166666666666666
  

Classification report

We use the classification_report function to get the summary table for each class.

print(metrics.classification_report(clf_test_labels, predictions))
  
              precision    recall  f1-score   support
           0       0.94      0.94      0.94        18
           1       0.92      0.92      0.92        12
    accuracy                           0.93        30
   macro avg       0.93      0.93      0.93        30
weighted avg       0.93      0.93      0.93        30
  

ROC curve – Receiver Operating Characteristic curve

An ROC curve plots TPR vs FPR at different classification thresholds. We use probability_predictions in our case.

  • probability_predictions[:,1] is the probability that object is of the first class.
fpr, tpr, _ = metrics.roc_curve(clf_test_labels, probability_predictions[:,1])
# _ contains thresholds, we not using them 
  
pylab.plot(fpr, tpr, label = "Linear model classifier")
pylab.plot([0, 1], [0, 1], "--", color = "grey", label = "Random classifier")
pylab.xlim([-0.05, 1.05])
pylab.ylim([-0.05, 1.05])
pylab.xlabel("False Positive Rate")
pylab.ylabel("True Positive Rate")
pylab.title("ROC curve")
pylab.legend(loc = "lower right") 
pylab.rcParams["figure.figsize"] = [10, 7]
  

ROC AUC

ROC AUC shows a square area under the ROC function.

metrics.roc_auc_score(clf_test_labels, predictions)
  
0.9305555555555554
  
metrics.roc_auc_score(clf_test_labels, probability_predictions[:,1])
  
0.9907407407407407
  

PR AUC – Precision AUC

metrics.average_precision_score(clf_test_labels, predictions)
  
0.873611111111111
  

log_loss – logistical losses of Logistic regression

metrics.log_loss(clf_test_labels, probability_predictions[:,1])
  
0.21767621111290084
  

Metrics in the Regression problem

Training the regression model

regressor = linear_model.SGDRegressor(random_state = 1, max_iter = 20)
regressor.fit(reg_train_data, reg_train_labels)
  
SGDRegressor(max_iter=20, random_state=1)
  
reg_predictions = regressor.predict(reg_test_data)
  
print(reg_test_labels)
  
[   2.67799047    7.06525927  -56.43389936   10.08001896  -22.46817716
  -19.27471232   59.44372825  -21.60494574   32.54682713  -41.89798772
  -18.16390935   32.75688783   31.04095773    2.39589626   -5.04783924
  -70.20925097   86.69034305   18.50402992   32.31573461 -101.81138022
   15.14628858   29.49813932   97.282674     25.88034991  -41.63332253
  -92.11198201   86.7177122     2.13250832  -20.24967575  -27.32511755]
  
print(reg_predictions)
  
[ -1.46503565   5.75776789 -50.13234306   5.05646094 -24.09370893
  -8.34831546  61.77254998 -21.98350565  30.65112022 -39.25972497
 -17.19337022  30.94178225  26.98820076  -6.08321732  -3.46551
 -78.9843398   84.80190097  14.80638314  22.91302375 -89.63572717
  14.5954632   31.64431951  95.81031534  21.5037679  -43.1101736
 -95.06972123  86.70086546   0.47837761 -16.44594704 -22.72581879]
  

MAE – Mean Absolute Error

metrics.mean_absolute_error(reg_test_labels, reg_predictions)
  
3.748761311885298
  

MSE – Mean Squared Error

metrics.mean_squared_error(reg_test_labels, reg_predictions)
  
24.114925597460914
  

Mathematical Square Root of MSE

sqrt(metrics.mean_squared_error(reg_test_labels, reg_predictions))
  
4.91069502183356
  

Coefficient of determination – R2 score

The close R2 score to 1 the better/preciser model we have.

metrics.r2_score(reg_test_labels, reg_predictions)
  
0.989317615054695
  

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.