Scikit-Learn API (tune.sklearn)

class ray.tune.sklearn.TuneGridSearchCV(estimator, param_grid, early_stopping=None, scoring=None, n_jobs=None, cv=5, refit=True, verbose=0, error_score='raise', return_train_score=False, max_iters=10, use_gpu=False)[source]

Exhaustive search over specified parameter values for an estimator.

Important members are fit, predict.

GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated grid-search over a parameter grid.

Parameters
  • estimator (estimator) – Object that implements the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

  • param_grid (dict or list of dict) – Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.

  • early_stopping (bool, str or TrialScheduler, optional) –

    Option to stop fitting to a hyperparameter configuration if it performs poorly. Possible inputs are:

    • If True, defaults to ASHAScheduler.

    • A string corresponding to the name of a Tune Trial Scheduler (i.e., “ASHAScheduler”). To specify parameters of the scheduler, pass in a scheduler object instead of a string.

    • Scheduler for executing fit with early stopping. Only a subset of schedulers are currently supported. The scheduler will only be used if the estimator supports partial fitting

    • If None or False, early stopping will not be used.

  • scoring (str, callable, list, tuple, dict or None) – A single string or a callable to evaluate the predictions on the test set. For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values. NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each. If None, the estimator’s score method is used. Defaults to None.

  • n_jobs (int) – Number of jobs to run in parallel. None or -1 means using all processors. Defaults to None.

  • cv (int, cross-validation generator or iterable) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross validation,

    • integer, to specify the number of folds in a (Stratified)KFold,

    • An iterable yielding (train, test) splits as arrays of indices.

    For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. Defaults to None.

  • refit (bool, str, or callable) – Refit an estimator using the best found parameters on the whole dataset. For multiple metric evaluation, this needs to be a string denoting the scorer that would be used to find the best parameters for refitting the estimator at the end. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance. Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_params_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer. best_score_ is not returned if refit is callable. See scoring parameter to know more about multiple metric evaluation. Defaults to True.

  • verbose (int) – Controls the verbosity: 0 = silent, 1 = only status updates, 2 = status and trial results. Defaults to 0.

  • error_score ('raise' or int or float) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error. Defaults to np.nan.

  • return_train_score (bool) – If False, the cv_results_ attribute will not include training scores. Defaults to False. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.

  • max_iters (int) – Indicates the maximum number of epochs to run for each hyperparameter configuration sampled. This parameter is used for early stopping. Defaults to 10.

property best_params_

Parameter setting that gave the best results on the hold out data.

For multi-metric evaluation, this is present only if refit is specified.

Type

dict

property best_score_

Mean cross-validated score of the best_estimator

For multi-metric evaluation, this is present only if refit is specified.

Type

float

property classes_

Get the list of unique classes found in the target y.

Type

list

property decision_function

Get decision_function on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports decision_function.

Type

function

fit(X, y=None, groups=None, **fit_params)

Run fit with all sets of parameters.

tune.run is used to perform the fit procedure.

Parameters
  • X (array-like (shape = [n_samples, n_features])) – Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like) – Shape of array expected to be [n_samples] or [n_samples, n_output]). Target relative to X for classification or regression; None for unsupervised learning.

  • groups (array-like (shape (n_samples,)), optional) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold).

  • **fit_params (dict of str) – Parameters passed to the fit method of the estimator.

Returns

TuneBaseSearchCV child instance, after fitting.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

property inverse_transform

Get inverse_transform on the estimator with the best found parameters.

Only available if the underlying estimator implements inverse_transform and refit=True.

Type

function

property predict

Get predict on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict.

Type

function

property predict_log_proba

Get predict_log_proba on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict_log_proba.

Type

function

property predict_proba

Get predict_proba on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict_proba.

Type

function

score(X, y=None)

Compute the score(s) of an estimator on a given test set.

Parameters
  • X (array-like (shape = [n_samples, n_features])) – Input data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like) – Shape of array is expected to be [n_samples] or [n_samples, n_output]). Target relative to X for classification or regression. You can also pass in None for unsupervised learning.

Returns

computed score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

property transform

Get transform on the estimator with the best found parameters.

Only available if the underlying estimator supports transform and refit=True.

Type

function

class ray.tune.sklearn.TuneSearchCV(estimator, param_distributions, early_stopping=None, n_iter=10, scoring=None, n_jobs=None, refit=True, cv=None, verbose=0, random_state=None, error_score=nan, return_train_score=False, max_iters=10, search_optimization='random', use_gpu=False)[source]

Generic, non-grid search on hyper parameters.

Randomized search is invoked with search_optimization set to "random" and behaves like scikit-learn’s RandomizedSearchCV.

Bayesian search is invoked with search_optimization set to "bayesian" and behaves like scikit-learn’s BayesSearchCV.

TuneSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.

The parameters of the estimator used to apply these methods are optimized by cross-validated search over parameter settings.

In contrast to GridSearchCV, not all parameter values are tried out, but rather a fixed number of parameter settings is sampled from the specified distributions. The number of parameter settings that are tried is given by n_iter.

Parameters
  • estimator (estimator) – This is assumed to implement the scikit-learn estimator interface. Either estimator needs to provide a score function, or scoring must be passed.

  • param_distributions (dict or list) –

    Serves as the param_distributions parameter in scikit-learn’s RandomizedSearchCV or as the search_space parameter in BayesSearchCV. For randomized search: dictionary with parameters names (string) as keys and distributions or lists of parameter settings to try for randomized search. Distributions must provide a rvs method for sampling (such as those from scipy.stats.distributions). If a list is given, it is sampled uniformly. If a list of dicts is given, first a dict is sampled uniformly, and then a parameter is sampled using that dict as above. For Bayesian search: dictionary with parameter names (string) as keys. Values can be

    • a (lower_bound, upper_bound) tuple (for Real or Integer dimensions),

    • a (lower_bound, upper_bound, “prior”) tuple (for Real dimensions),

    • as a list of categories (for Categorical dimensions), or

    • an instance of a Dimension object (Real, Integer or Categorical).

    https://scikit-optimize.github.io/stable/modules/ classes.html#module-skopt.space.space

  • early_stopping (bool, str or TrialScheduler, optional) –

    Option to stop fitting to a hyperparameter configuration if it performs poorly. Possible inputs are:

    • If True, defaults to ASHAScheduler.

    • A string corresponding to the name of a Tune Trial Scheduler (i.e., “ASHAScheduler”). To specify parameters of the scheduler, pass in a scheduler object instead of a string.

    • Scheduler for executing fit with early stopping. Only a subset of schedulers are currently supported. The scheduler will only be used if the estimator supports partial fitting

    • If None or False, early stopping will not be used.

  • n_iter (int) – Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution. Defaults to 10.

  • (str, callable, list, tuple, dict (scoring) – or None): A single string (see Scikit-Learn documentation on scoring_parameter) or a callable to evaluate the predictions on the test set. For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values. NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each. If None, the estimator’s score method is used. Defaults to None.

  • n_jobs (int) – Number of jobs to run in parallel. None or -1 means using all processors. Defaults to None.

  • refit (bool, str, or callable) – Refit an estimator using the best found parameters on the whole dataset. For multiple metric evaluation, this needs to be a string denoting the scorer that would be used to find the best parameters for refitting the estimator at the end. The refitted estimator is made available at the best_estimator_ attribute and permits using predict directly on this GridSearchCV instance. Also for multiple metric evaluation, the attributes best_index_, best_score_ and best_params_ will only be available if refit is set and all of them will be determined w.r.t this specific scorer. best_score_ is not returned if refit is callable. See scoring parameter to know more about multiple metric evaluation. Defaults to True.

  • cv (int, cross-validation generator or iterable) –

    Determines the cross-validation splitting strategy. Possible inputs for cv are:

    • None, to use the default 5-fold cross validation,

    • integer, to specify the number of folds in a (Stratified)KFold,

    • An iterable yielding (train, test) splits as arrays of indices.

    For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold is used. In all other cases, KFold is used. Defaults to None.

  • verbose (int) – Controls the verbosity: 0 = silent, 1 = only status updates, 2 = status and trial results. Defaults to 0.

  • random_state (int or RandomState) – Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Defaults to None. Ignored when doing Bayesian search.

  • error_score ('raise' or int or float) – Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error. Defaults to np.nan.

  • return_train_score (bool) – If False, the cv_results_ attribute will not include training scores. Defaults to False. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting trade-off. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.

  • max_iters (int) – Indicates the maximum number of epochs to run for each hyperparameter configuration sampled (specified by n_iter). This parameter is used for early stopping. Defaults to 10.

  • search_optimization ("random" or "bayesian") – If “random”, uses randomized search over the param_distributions. If “bayesian”, uses Bayesian optimization from scikit-optimize (https://scikit-optimize.github.io/stable/index.html) to search for hyperparameters.

property best_params_

Parameter setting that gave the best results on the hold out data.

For multi-metric evaluation, this is present only if refit is specified.

Type

dict

property best_score_

Mean cross-validated score of the best_estimator

For multi-metric evaluation, this is present only if refit is specified.

Type

float

property classes_

Get the list of unique classes found in the target y.

Type

list

property decision_function

Get decision_function on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports decision_function.

Type

function

fit(X, y=None, groups=None, **fit_params)

Run fit with all sets of parameters.

tune.run is used to perform the fit procedure.

Parameters
  • X (array-like (shape = [n_samples, n_features])) – Training vector, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like) – Shape of array expected to be [n_samples] or [n_samples, n_output]). Target relative to X for classification or regression; None for unsupervised learning.

  • groups (array-like (shape (n_samples,)), optional) – Group labels for the samples used while splitting the dataset into train/test set. Only used in conjunction with a “Group” cv instance (e.g., GroupKFold).

  • **fit_params (dict of str) – Parameters passed to the fit method of the estimator.

Returns

TuneBaseSearchCV child instance, after fitting.

get_params(deep=True)

Get parameters for this estimator.

Parameters

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

property inverse_transform

Get inverse_transform on the estimator with the best found parameters.

Only available if the underlying estimator implements inverse_transform and refit=True.

Type

function

property predict

Get predict on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict.

Type

function

property predict_log_proba

Get predict_log_proba on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict_log_proba.

Type

function

property predict_proba

Get predict_proba on the estimator with the best found parameters.

Only available if refit=True and the underlying estimator supports predict_proba.

Type

function

score(X, y=None)

Compute the score(s) of an estimator on a given test set.

Parameters
  • X (array-like (shape = [n_samples, n_features])) – Input data, where n_samples is the number of samples and n_features is the number of features.

  • y (array-like) – Shape of array is expected to be [n_samples] or [n_samples, n_output]). Target relative to X for classification or regression. You can also pass in None for unsupervised learning.

Returns

computed score

Return type

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters

**params (dict) – Estimator parameters.

Returns

self – Estimator instance.

Return type

object

property transform

Get transform on the estimator with the best found parameters.

Only available if the underlying estimator supports transform and refit=True.

Type

function