Sklearn, Classification and Regression metrics

in the post will reviewed a number of metrics for evaluating classification and regression models. For that we use the functions we use of the sklearn library. We’ll learn how to generate model data and how to train linear models and evaluate their quality.

The code as an IPython notebook

sklearn.metrics Download

sklearn.metrics

Docs

import warnings
warnings.filterwarnings("ignore")

from sklearn import model_selection, datasets, linear_model, metrics 
from matplotlib.colors import ListedColormap
%pylab inline

Datasets generation

Since we are to solve 2 problems: Classification and Regression, we’ll need 2 sets of data.

For that we use make_classification and make_regression functions.

For each problem we generate datasets of 2 features and we visualize them. In the Classification problem both features are informative, while in Regression problem, only 1 of 2 features is an informative feature.

clf_data, clf_target = datasets.make_classification(n_features = 2, n_informative = 2, n_classes = 2, 
                                                    n_redundant = 0, n_clusters_per_class = 1, 
                                                    random_state = 7)

reg_data, reg_target = datasets.make_regression(n_features = 2, n_informative = 1, n_targets = 1, 
                                                noise = 5., random_state = 7)

Classification dataset display

colors = ListedColormap(["red", "blue"])
pylab.scatter(clf_data[:,0], clf_data[:,1], c = clf_target, cmap = colors)
pylab.rcParams["figure.figsize"] = [10, 7]

Regression dataset display

pylab.scatter(reg_data[:,1], reg_target, color = "r")
pylab.scatter(reg_data[:,0], reg_target, color = "b")
pylab.rcParams["figure.figsize"] = [10, 7]

We split data into train and test sets.

clf_train_data, clf_test_data, clf_train_labels, clf_test_labels = model_selection.train_test_split(clf_data, clf_target,
                                                                                     test_size = 0.3, random_state = 1)

reg_train_data, reg_test_data, reg_train_labels, reg_test_labels = model_selection.train_test_split(reg_data, reg_target,
                                                                                     test_size = 0.3, random_state = 1)

Quality metrics in classification tasks

Classification model training

We’ll use SGDClassifier. It is a Linear classifier based on Stochastic gradient decent .

Probabilistic classifier (Loss funciton: loss = ‘log’)

classifier = linear_model.SGDClassifier(loss = "log", random_state = 1, max_iter=1000)
classifier.fit(clf_train_data, clf_train_labels)

SGDClassifier(loss="log", random_state=1)

Generate classifier predicted labels:

predictions = classifier.predict(clf_test_data)

Generate a prediction in the form of the probability that the object belongs to the zero class or the first class.

probability_predictions = classifier.predict_proba(clf_test_data)

# original dataset labels
print(clf_test_labels)

[1 0 0 1 0 1 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0]

# predicted labels
print(predictions)

[1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0]

Probabilities that the object belongs to the zero class or the first class.

The first probability is that object belongs to the the zero class and the second probability is that object belongs to the first class

print(probability_predictions[:10])

[[0.00000000e+00 1.00000000e+00]
 [9.99999997e-01 2.90779994e-09]
 [9.99990982e-01 9.01818055e-06]
 [0.00000000e+00 1.00000000e+00]
 [1.00000000e+00 7.01333183e-14]
 [5.16838702e-07 9.99999483e-01]
 [6.66133815e-16 1.00000000e+00]
 [1.00000000e+00 6.21822808e-13]
 [0.00000000e+00 1.00000000e+00]
 [9.99999998e-01 2.30155106e-09]]

We’ve done the preparational work. Now we come to the calculating metrics.

Accuracy

Accuracy is metric that shows closeness of the measurements to a specific value , designating a portion of correctly classified objects.

pair[0] – correct label
pair[1] – predicted label
len(clf_test_labels) – data volume

# calculating thru Python means
acc1 = sum([1. if pair[0] == pair[1] else 0. for pair in zip(clf_test_labels, predictions)])/len(clf_test_labels)
# inbuilt accuracy score
acc2 = metrics.accuracy_score(clf_test_labels, predictions)
print (acc1, acc2)

0.9333333333333333 0.9333333333333333

Confusion matrix

Confusion matrix is the NxN matrix where N – number of classes (N=2 in our case). Confusion matrix provides to calculate many statistics metrics.

It shows the main characteristic of a built model:

True positive (TP) eqv. with hit
True negative (TN) eqv. with correct rejection
False positive (FP) eqv. with false alarm, type I error or underestimation
False negative (FN) eqv. with miss, type II error or overestimation

matrix = metrics.confusion_matrix(clf_test_labels, predictions)
print("Confusion matrix\n",  matrix)

Confusion matrix
 [[17  1]
 [ 1 11]]

# manual calculations of True positives and True negatives
sum([1 if pair[0] == pair[1] else 0 for pair in zip(clf_test_labels, predictions)])

matrix.diagonal().sum()

Precision

Precision describes the random errors, a measure of statistical variability.

First we estimate the accuracy of the classification to the zero class. We call the precision_score() function, pass it the correct class labels, the predicted class labels. And since by default our label is 1, we need to explicitly say that in this case we estimate the classification accuracy to the zero class. To do this, we use the pos_label argument and say that it is equal to 0 .

metrics.precision_score(clf_test_labels, predictions, pos_label = 0)

0.9444444444444444

We estimate the accuracy of the objects classification to the first class. Default pos_label = 1.

metrics.precision_score(clf_test_labels, predictions)

0.9230769230769231

Recall

See the Presision and Recall at a vivid picture:

metrics.recall_score(clf_test_labels, predictions, pos_label = 0)

0.9444444444444444

metrics.recall_score(clf_test_labels, predictions)

0.9166666666666666

F-score

metrics.f1_score(clf_test_labels, predictions, pos_label = 0)

0.9444444444444444

metrics.f1_score(clf_test_labels, predictions)

0.9166666666666666

Classification report

We use the classification_report function to get the summary table for each class.

print(metrics.classification_report(clf_test_labels, predictions))

              precision    recall  f1-score   support
           0       0.94      0.94      0.94        18
           1       0.92      0.92      0.92        12
    accuracy                           0.93        30
   macro avg       0.93      0.93      0.93        30
weighted avg       0.93      0.93      0.93        30

ROC curve – Receiver Operating Characteristic curve

An ROC curve plots TPR vs FPR at different classification thresholds. We use probability_predictions in our case.

probability_predictions[:,1] is the probability that object is of the first class.

fpr, tpr, _ = metrics.roc_curve(clf_test_labels, probability_predictions[:,1])
# _ contains thresholds, we not using them

pylab.plot(fpr, tpr, label = "Linear model classifier")
pylab.plot([0, 1], [0, 1], "--", color = "grey", label = "Random classifier")
pylab.xlim([-0.05, 1.05])
pylab.ylim([-0.05, 1.05])
pylab.xlabel("False Positive Rate")
pylab.ylabel("True Positive Rate")
pylab.title("ROC curve")
pylab.legend(loc = "lower right") 
pylab.rcParams["figure.figsize"] = [10, 7]

ROC AUC

ROC AUC shows a square area under the ROC function.

metrics.roc_auc_score(clf_test_labels, predictions)

0.9305555555555554

metrics.roc_auc_score(clf_test_labels, probability_predictions[:,1])

0.9907407407407407

PR AUC – Precision AUC

metrics.average_precision_score(clf_test_labels, predictions)

0.873611111111111

log_loss – logistical losses of Logistic regression

metrics.log_loss(clf_test_labels, probability_predictions[:,1])

0.21767621111290084

Metrics in the Regression problem

Training the regression model

regressor = linear_model.SGDRegressor(random_state = 1, max_iter = 20)
regressor.fit(reg_train_data, reg_train_labels)

SGDRegressor(max_iter=20, random_state=1)

reg_predictions = regressor.predict(reg_test_data)

print(reg_test_labels)

[   2.67799047    7.06525927  -56.43389936   10.08001896  -22.46817716
  -19.27471232   59.44372825  -21.60494574   32.54682713  -41.89798772
  -18.16390935   32.75688783   31.04095773    2.39589626   -5.04783924
  -70.20925097   86.69034305   18.50402992   32.31573461 -101.81138022
   15.14628858   29.49813932   97.282674     25.88034991  -41.63332253
  -92.11198201   86.7177122     2.13250832  -20.24967575  -27.32511755]

print(reg_predictions)

[ -1.46503565   5.75776789 -50.13234306   5.05646094 -24.09370893
  -8.34831546  61.77254998 -21.98350565  30.65112022 -39.25972497
 -17.19337022  30.94178225  26.98820076  -6.08321732  -3.46551
 -78.9843398   84.80190097  14.80638314  22.91302375 -89.63572717
  14.5954632   31.64431951  95.81031534  21.5037679  -43.1101736
 -95.06972123  86.70086546   0.47837761 -16.44594704 -22.72581879]

MAE – Mean Absolute Error

metrics.mean_absolute_error(reg_test_labels, reg_predictions)

3.748761311885298

MSE – Mean Squared Error

metrics.mean_squared_error(reg_test_labels, reg_predictions)

24.114925597460914

Mathematical Square Root of MSE

sqrt(metrics.mean_squared_error(reg_test_labels, reg_predictions))

4.91069502183356

Coefficient of determination – R2 score

The close R2 score to 1 the better/preciser model we have.

metrics.r2_score(reg_test_labels, reg_predictions)

0.989317615054695