Let’s answer the question: how do the parameters of the model affect its quality? And how can we select the optimal parameters for the task to be solved? We will look at the grid_search module in the sklearn library and learn how to select model parameters from the grid.
import warnings warnings.filterwarnings("ignore")
from sklearn import model_selection, datasets, linear_model, metrics import numpy as np import pandas as pd
iris = datasets.load_iris()
train_data, test_data, train_labels, test_labels = model_selection.train_test_split( iris.data, iris.target, test_size = 0.3,random_state = 0)
classifier = linear_model.SGDClassifier(random_state = 0, tol=1e-3)
classifier.get_params().keys()
dict_keys(["alpha", "average", "class_weight", "early_stopping", "epsilon", "eta0", "fit_intercept", "l1_ratio", "learning_rate", "loss", "max_iter", "n_iter", "n_iter_no_change", "n_jobs", "penalty", "power_t", "random_state", "shuffle", "tol", "validation_fraction", "verbose", "warm_start"])
Choosing parameters_grid
We want to get the the classifier scores for each value of each parameter, compare them, and choose the one with the optimal parameters. Let’s define parameters_grid for our following optimization.
parameters_grid = { "loss" : ["hinge", "log", "squared_hinge", "squared_loss"], "penalty" : ["l1", "l2"], "max_iter" : np.arange(5,10), "alpha" : np.linspace(0.0001, 0.001, num = 5), }
cv = model_selection.StratifiedShuffleSplit(n_splits=10, test_size = 0.2, random_state = 0)
Exhaustive Grid Search
grid_cv is a grid object that searches for optimal classifier parameters. Its inputs are the following: A search consists of:
- an estimator (regressor or classifier, SGDClassifier in our case);
- a parameter space (grid);
- a method for searching or sampling candidates;
- a cross-validation scheme; and
- a score function.
grid_cv = model_selection.GridSearchCV(classifier, parameters_grid, scoring = "accuracy", cv = cv)
%%time grid_cv.fit(train_data, train_labels)
CPU times: user 3.53 s, sys: 0 ns, total: 3.53 s Wall time: 3.53 s
GridSearchCV(cv=StratifiedShuffleSplit(n_splits=10, random_state=0, test_size=0.2, train_size=None), error_score="raise-deprecating", estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate="optimal", loss="hinge", max_iter=None, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty="l2", power_t=0.5, random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False), fit_params=None, iid="warn", n_jobs=None, param_grid={"loss": ["hinge", "log", "squared_hinge", "squared_loss"], "penalty": ["l1", "l2"], "max_iter": array([5, 6, 7, 8, 9]), "alpha": array([0.0001 , 0.00032, 0.00055, 0.00078, 0.001 ])}, pre_dispatch="2*n_jobs", refit=True, return_train_score="warn", scoring="accuracy", verbose=0)
grid_cv.best_estimator_
SGDClassifier(alpha=0.000325, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate="optimal", loss="squared_hinge", max_iter=9, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty="l1", power_t=0.5, random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False)
print(grid_cv.best_score_) print(grid_cv.best_params_)
0.9047619047619048 {"alpha": 0.000325, "loss": "squared_hinge", "max_iter": 9, "penalty": "l1"}
The whole griv_cv results are the following:
grid_cv.cv_results_
{"mean_fit_time": array([0.00093603, 0.00101256, 0.0009665 , 0.00084591, 0.00089548, 0.0008132 , 0.00095484, 0.00081444, 0.00089443, 0.00084577, 0.00088301, 0.00091863, 0.00105679, 0.00101361, 0.00109994, ...
All the grid search resutls you might find in the attached file at the bottom of the post.
Randomized Grid Search
Such a seach allows us to save resources for finding classifier parameters for big datasets and/or big grid parameters.
randomized_grid_cv = model_selection.RandomizedSearchCV(classifier, parameters_grid, scoring = "accuracy", cv = cv, n_iter = 20, random_state = 0)
%%time randomized_grid_cv.fit(train_data, train_labels)
CPU times: user 353 ms, sys: 0 ns, total: 353 ms Wall time: 351 ms
RandomizedSearchCV(cv=StratifiedShuffleSplit(n_splits=10, random_state=0, test_size=0.2, train_size=None), error_score="raise-deprecating", estimator=SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate="optimal", loss="hinge", max_iter=None, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty="l2", power_t=0.5, random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False), fit_params=None, iid="warn", n_iter=20, n_jobs=None, param_distributions={"loss": ["hinge", "log", "squared_hinge", "squared_loss"], "penalty": ["l1", "l2"], "max_iter": array([5, 6, 7, 8, 9]), "alpha": array([0.0001 , 0.00032, 0.00055, 0.00078, 0.001 ])}, pre_dispatch="2*n_jobs", random_state=0, refit=True, return_train_score="warn", scoring="accuracy", verbose=0)
print(randomized_grid_cv.best_score_) print(randomized_grid_cv.best_params_)
0.8666666666666667 {"penalty": "l1", "max_iter": 9, "loss": "log", "alpha": 0.00055}
Analysis
The random grid search score changed only in the third sign, not much worse to the exhaustive grid search score. Now we analyze the parameters. We see that only the alpha coefficient has changed. The type of loss function, the type of regularization, and the number of iterations has remained the same. Now we can either continue the optimization process from this point, or stop at the found set of parameters.
Conclusion
In this post, we learned how to select model parameters using “grid search” and “random grid search”. This means that we can now not only generate model data, build models, and evaluate their quality, but also optimize models.