Categories
Data Mining

Finding Classifier parameters on the grid, Sklearn.grid_search

Let’s answer the question: how do the parameters of the model affect its quality? And how can we select the optimal parameters for the task to be solved? We will look at the grid_search module in the sklearn library and learn how to select model parameters from the grid.

import warnings
warnings.filterwarnings("ignore")
from sklearn import model_selection, datasets, linear_model, metrics
import numpy as np
import pandas as pd

Dataset generation

iris = datasets.load_iris()
train_data, test_data, train_labels, test_labels = model_selection.train_test_split(
iris.data, iris.target, test_size = 0.3,random_state = 0)

Set up Classifier model

classifier = linear_model.SGDClassifier(random_state = 0, tol=1e-3)

Set up Grid to check classifier

classifier.get_params().keys()
dict_keys(["alpha",
"average",
"class_weight",
"early_stopping",
"epsilon",
"eta0",
"fit_intercept",
"l1_ratio",
"learning_rate",
"loss",
"max_iter",
"n_iter",
"n_iter_no_change",
"n_jobs",
"penalty",
"power_t",
"random_state",
"shuffle",
"tol",
"validation_fraction",
"verbose",
"warm_start"])

Choosing parameters_grid

We want to get the the classifier scores for each value of each parameter, compare them, and choose the one with the optimal parameters. Let’s define parameters_grid for our following optimization.

parameters_grid = {
    "loss" : ["hinge", "log", "squared_hinge", "squared_loss"],
    "penalty" : ["l1", "l2"],
    "max_iter" : np.arange(5,10),
    "alpha" : np.linspace(0.0001, 0.001, num = 5),
}
cv = model_selection.StratifiedShuffleSplit(n_splits=10, test_size = 0.2, random_state = 0)

Selection of parameters and quality assessment

grid_cv is a grid object that searches for optimal classifier parameters. Its inputs are the following: A search consists of:

  • an estimator (regressor or classifier, SGDClassifier in our case);
  • a parameter space (grid);
  • a method for searching or sampling candidates;
  • a cross-validation scheme; and
  • a score function.
grid_cv = model_selection.GridSearchCV(classifier, parameters_grid, scoring = "accuracy", cv = cv)
%%time
grid_cv.fit(train_data, train_labels)
CPU times: user 3.53 s, sys: 0 ns, total: 3.53 s
Wall time: 3.53 s
GridSearchCV(cv=StratifiedShuffleSplit(n_splits=10, random_state=0, test_size=0.2,
            train_size=None),
       error_score="raise-deprecating",
       estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None,
       early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
       l1_ratio=0.15, learning_rate="optimal", loss="hinge", max_iter=None,
       n_iter=None, n_iter_no_change=5, n_jobs=None, penalty="l2",
       power_t=0.5, random_state=0, shuffle=True, tol=0.001,
       validation_fraction=0.1, verbose=0, warm_start=False),
       fit_params=None, iid="warn", n_jobs=None,
       param_grid={"loss": ["hinge", "log", "squared_hinge", "squared_loss"], "penalty": ["l1", "l2"], "max_iter": array([5, 6, 7, 8, 9]), "alpha": array([0.0001 , 0.00032, 0.00055, 0.00078, 0.001  ])},
       pre_dispatch="2*n_jobs", refit=True, return_train_score="warn",
       scoring="accuracy", verbose=0)
grid_cv.best_estimator_
SGDClassifier(alpha=0.000325, average=False, class_weight=None,
       early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
       l1_ratio=0.15, learning_rate="optimal", loss="squared_hinge",
       max_iter=9, n_iter=None, n_iter_no_change=5, n_jobs=None,
       penalty="l1", power_t=0.5, random_state=0, shuffle=True, tol=0.001,
       validation_fraction=0.1, verbose=0, warm_start=False)
print(grid_cv.best_score_)
print(grid_cv.best_params_)
0.9047619047619048
{"alpha": 0.000325, "loss": "squared_hinge", "max_iter": 9, "penalty": "l1"}

The whole griv_cv results are the following:

grid_cv.cv_results_
{"mean_fit_time": array([0.00093603, 0.00101256, 0.0009665 , 0.00084591, 0.00089548,
        0.0008132 , 0.00095484, 0.00081444, 0.00089443, 0.00084577,
        0.00088301, 0.00091863, 0.00105679, 0.00101361, 0.00109994,
		...

All the grid search resutls you might find in the attached file at the bottom of the post.

Such a seach allows us to save resources for finding classifier parameters for big datasets and/or big grid parameters.

Here the n_iter parameter is added. It sets the number of iterations of randomized_grid_cv. For the original grid_cv we had 200 iterations as Dekart multiplication of all parameters from the parameters_grid with all values. Here we’ll do only 20 iterations with random values from parameters_grid.
randomized_grid_cv = model_selection.RandomizedSearchCV(classifier, parameters_grid, scoring = "accuracy", cv = cv, n_iter = 20, 
                                                   random_state = 0)
%%time
randomized_grid_cv.fit(train_data, train_labels)
CPU times: user 353 ms, sys: 0 ns, total: 353 ms
Wall time: 351 ms
RandomizedSearchCV(cv=StratifiedShuffleSplit(n_splits=10, random_state=0, test_size=0.2,
            train_size=None),
          error_score="raise-deprecating",
          estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None,
       early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True,
       l1_ratio=0.15, learning_rate="optimal", loss="hinge", max_iter=None,
       n_iter=None, n_iter_no_change=5, n_jobs=None, penalty="l2",
       power_t=0.5, random_state=0, shuffle=True, tol=0.001,
       validation_fraction=0.1, verbose=0, warm_start=False),
          fit_params=None, iid="warn", n_iter=20, n_jobs=None,
          param_distributions={"loss": ["hinge", "log", "squared_hinge", "squared_loss"], "penalty": ["l1", "l2"], "max_iter": array([5, 6, 7, 8, 9]), "alpha": array([0.0001 , 0.00032, 0.00055, 0.00078, 0.001  ])},
          pre_dispatch="2*n_jobs", random_state=0, refit=True,
          return_train_score="warn", scoring="accuracy", verbose=0)
print(randomized_grid_cv.best_score_)
print(randomized_grid_cv.best_params_)
0.8666666666666667
{"penalty": "l1", "max_iter": 9, "loss": "log", "alpha": 0.00055}

Analysis

The random grid search score changed only in the third sign, not much worse to the exhaustive grid search score. Now we analyze the parameters. We see that only the alpha coefficient has changed. The type of loss function, the type of regularization, and the number of iterations has remained the same. Now we can either continue the optimization process from this point, or stop at the found set of parameters.

Conclusion

In this post, we learned how to select model parameters using “grid search” and “random grid search”. This means that we can now not only generate model data, build models, and evaluate their quality, but also optimize models.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.