【机器学习】五种超参数优化技巧 / 开普饭

转载：我不爱机器学习

超参数是用于控制学习过程的不同参数值，对机器学习模型的性能有显著影响。

超参数优化是找到超参数值的正确组合，以在合理的时间内实现数据的最大性能的过程

1 数据处理

import pandas as pd

import numpy as np

from sklearn.ensemble import RandomForestRegressor

from sklearn import metrics

from sklearn.model_selection import GridSearchCV, RandomizedSearchCV

from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')
filepath = './AEP_hourly.csv'

data = pd.read_csv(filepath, index_col=0)
data.shape  # (121273, 1)
data.head()
#                       AEP_MW

# Datetime

# 2004-12-31 01:00:00  13478.0

# 2004-12-31 02:00:00  12865.0

# 2004-12-31 03:00:00  12577.0

# 2004-12-31 04:00:00  12517.0

# 2004-12-31 05:00:00  12670.0
# show data properties

data.describe()

# show data information

data.info()
# check if it has missing values

data.isnull().sum()
# 构建特征

def create_features(data):

    data.index = pd.to_datetime(data.index)

    data['hour'] = data.index.hour

    for i in range(1, 25):

        data[f'AEP_MW_lag{i}'] = data['AEP_MW'].shift(i)

    data.dropna(inplace=True)
create_features(data)

# split data into features and target

X = data.drop('AEP_MW', axis=1).values[-500:]

y = data['AEP_MW'].values[-500:]
# standardize the feature variables

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)
# set different parameter values to tune

param_grid = {

    'n_estimators': [100, 200],

    'max_depth': [1, 3],

    'criterion': ['mse', 'mae'],

}

# Create regressor rf_regressor = RandomForestRegressor(n_jobs=-1)

2 格网优化(Grid Search)

网格搜索通过在模型中尝试所有可能的参数组合来工作。这意味着它将花费大量的时间来执行整个搜索，这在计算上非常昂贵。

# set gridsearchmodel = GridSearchCV(    estimator=rf_regressor, param_grid=param_grid, cv=5, verbose=2, n_jobs=1)# train the model with gridserchCVmodel.fit(X_scaled, y)

# print the best score and estimatorprint(model.best_score_)# 0.7891695734212759print(model.best_estimator_.get_params())# {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'mae', 'max_depth': 3,# 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0,# 'min_impurity_split': None, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0,# 'n_estimators': 200, 'n_jobs': -1, 'oob_score': False, 'random_state': None, 'verbose': 0, 'warm_start': False}

3 随机查找(random search)

这种方法的工作方式有点不同：超参数值的随机组合被用来寻找所建模型的最佳解决方案。

随机搜索的缺点是它有时会遗漏搜索空间中的重要点(值)。

model = RandomizedSearchCV(

    estimator=rf_regressor, param_distributions=param_grid, n_iter=5, cv=5, verbose=2, n_jobs=1, random_state=42

)

# train the model with gridserchCV

model.fit(X_scaled, y)

# print the best score and estimator

print(model.best_score_)

# 0.7878893378170959

print(model.best_estimator_.get_params())

# {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'mae', 'max_depth': 3, 'max_features': 'auto',

# 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_impurity_split': None,

# 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 200, 'n_jobs': -1,

# 'oob_score': False, 'random_state': None, 'verbose': 0, 'warm_start': False}

4 Hyperopt

它使用一种贝叶斯优化形式进行参数调优，使您能够获得给定模型的最佳参数。它可以对具有数百个参数的模型进行大规模优化。

Hyperopt有四个重要的特性：

1 Search Space

Hyperopt有不同的函数来指定输入参数的范围。这些被称为随机搜索空间。搜索空间最常见的选项是：

hp.choice(label, options) ：这可以用于分类参数。它返回一个选项，是一个列表或元组。如: hp.choice('criterion', ['gini','entropy',])
hp.randint(label, upper)：可以用于整数参数。它返回范围(0,upper)内的一个随机整数。如：hp.randint('max_features',50)
hp.uniform(label, low, high)：这将在low和high之间统一返回一个值 hp.uniform('max_leaf_nodes',1,10)
hp.normal(label, mu, sigma)：这返回含均值和标准差的正态分布的实值，hp.qnormal(label, mu, sigma, q)它返回的值类似于round(normal(mu, sigma) / q) * q
hp.lognormal(label, mu, sigma) ：这返回exp(normal(mu, sigma))
hp.qlognormal(label, mu, sigma, q) ：返回round(exp(normal(mu, sigma)) / q) * q

每个可优化的随机表达式都有一个标签(例如，n_estimators)作为第一个参数。这些标签用于在优化过程中向调用者返回参数选择。

2 Objective Function

这是一个最小化函数，它从搜索空间接收超参数值作为输入，并返回损失。

在优化过程中，我们用选择的haypeparameter值训练模型，预测目标特征。然后我们评估预测误差并将其返回给优化器。

优化器将决定检查哪些值并再次迭代。

3 fmin函数是迭代不同算法集及其超参数，然后最小化目标函数的优化函数。

其输入：用于最小化的目标函数，定义的搜索空间、搜索算法如随机搜索、TPE (Tree Parzen Estimators)和Adaptive TPE。

Hyperopt.rand.suggest和hyperopt.tpe.suggest为超参数空间的顺序搜索提供了逻辑最大评估次数、试验对象(可选)。

4 Trials Object

Trials对象用于保存所有超参数、损失和其他信息。可以在运行优化后访问它。

Trials还可以保存重要信息，稍后加载，然后恢复优化过程

步骤

初始化需要查找的空间
定义目标函数
选择搜索算法
运行hyperopt函数
在trials object 中分析评估输出

from sklearn.model_selection import cross_val_scorefrom hyperopt import tpe, hp, fmin, STATUS_OK, Trialsfrom hyperopt.pyll.base import scope

# step 1 : 初始化需要查找的空间space = {'n_estimators': hp.choice('n_estimators', [100, 200]),         'max_depth': hp.quniform('max_depth', 1, 4, 1),         'criterion': hp.choice('criterion', ['mse', 'mae'])}

# step 2 : 定义目标函数# https://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameterdef hyperparameter_tuning(params):    reg = RandomForestRegressor(**params, n_jobs=-1)    mae = cross_val_score(reg, X_scaled, y, scoring='neg_mean_absolute_error').mean()    return {'loss': -mae, 'status': STATUS_OK}

# step 3 : 初始化试验对象trials = Trials()

best = fmin(    fn=hyperparameter_tuning,    space=space,    algo=tpe.suggest,    max_evals=100,    trials=trials)print(f'best:{best}')# 100%|██████████| 100/100 [03:52<00:00,  2.33s/trial, best loss: 477.21263500000003]# best:{'criterion': 1, 'max_depth': 4.0, 'n_estimators': 1}

# step 4 : 基于试验对象分析结果

#  show a list of dictionaries returned by 'objective' during the searchtrials.results# [{'loss': 511.437781538007, 'status': 'ok'},#  {'loss': 538.1389298379975, 'status': 'ok'},# ...]

# shows a list of losses (float for each 'ok' trial).trials.losses()# shows a list of status stringstrials.statuses()

这个trials对象可以保存、传递给内置绘图例程

5 Scikit-Optimize

Scikit-optimize是另一个用于超参数优化的开源Python库。

它实现了几种基于序列模型的优化方法。

该库非常易于使用，并为贝叶斯优化提供了一个通用工具包，可用于超参数调优。

它还支持调整scikit-learn库提供的机器学习算法的超参数。

scikit- optimization是建立在Scipy、NumPy和Scikit-Learn之上的。

为了进行第一次优化，scikit - optimization至少需要了解四个重要特性：

1 space

Scikit-optimize有不同的函数来定义一个或多个维度的优化空间。最常见的搜索空间选择是:

Real：这是一个搜索空间维度，可以具有任何实际值。需要定义下界和上界，两者都是包含的。Real(low=0.2, high=0.9, name='min_samples_leaf')
Integer：可以取整数值的搜索空间维度。Integer(low=3, high=25, name='max_features')
Categorical：这是一个搜索空间维度，可以使用分类值Categorical(['gini','entropy'],name='criterion')

在每个搜索空间中，必须定义超参数名称，以便使用name参数进行优化。

2 BayesSearchCV

BayesSearchCV类提供了一个类似于GridSearchCV或RandomizedSearchCV的接口，但是它对超参数执行贝叶斯优化。

BayesSearchCV实现了一个fit和一个score方法，以及其他常见的方法，如predict()、predict_proba()、 decision_function()、transform()和inverse_transform()，如果使用的估计器中有的话。

与GridSearchCV不同的是，并非所有参数值都被尝试。而是从指定的分布中采样固定数量的参数设置。尝试的参数设置的数量由n_iter给出。

3 目标函数

这是搜索过程将调用的函数。它从搜索空间接收超参数值作为输入，并返回损失(越低越好)。

在优化过程中，我们用选择的超参数值训练模型，预测目标特征。然后我们评估预测误差并将其返回给优化器

优化器将决定检查哪些值并再次迭代。

4 优化器

这是执行贝叶斯超参数优化过程的函数。优化函数在每个模型和搜索空间迭代以优化并最小化目标函数。

scikit- optimization库提供了不同的优化函数，例如:dummy_minimize在给定范围内通过均匀抽样进行随机搜索。

使用决策树进行顺序优化。使用梯度增强树进行顺序优化。使用高斯过程的gp_minimize贝叶斯优化。

第一种方法：

from skopt.searchcv import BayesSearchCV

from skopt.space import Integer, Real, Categorical

from skopt.utils import use_named_args

from skopt import gp_minimize
# define search space

params = {

    'n_estimators': [100, 300],

    'max_depth': (1, 9),

    'criterion': ['mse', 'mae']

}
# define the search

search = BayesSearchCV(

    estimator=rf_regressor,

    search_spaces=params,

    n_jobs=1,

    cv=5,

    n_iter=30,

    scoring='neg_mean_absolute_error',

    verbose=4,

    random_state=42

)
# perform the search

search.fit(X_scaled, y)

# report the best result print(search.best_score_) # -469.6407772534216 print(search.best_params_) # OrderedDict([('criterion', 'mse'), ('max_depth', 6), ('n_estimators', 100)])

第二种方法：

# define the space of hyperparameters to searchsearch_space = []search_space.append(Categorical([100, 200], name='n_estimators'))search_space.append(Categorical(['mse', 'mae'], name='criterion'))search_space.append(Integer(1, 9, name='max_depth'))

# define the function used to evaluate a given configuration@use_named_args(search_space)def evaluate_model(**params):    # configure the model with specific hyperparameters    reg = RandomForestRegressor(**params, n_jobs=-1)    mae = cross_val_score(reg, X_scaled, y, scoring='neg_mean_absolute_error').mean()    return -mae

# perform optimizationresult = gp_minimize(    func=evaluate_model,    dimensions=search_space,    n_calls=30,    random_state=42,    verbose=True,    n_jobs=1,)

# summarizing finding:

print('Best Accuracy: %.3f' % (result.fun))# Best Accuracy: 480.179

print('Best Parameters: %s' % (result.x))# Best Parameters: [200, 'mse', 9]

# Print Function Valuesprint(result.func_vals)

# plot convergence tracesfrom skopt.plots import plot_convergence

plot_convergence(result)

图中显示了优化过程中不同迭代时的函数值

6 Optuna

Optuna是另一个用于超参数优化的开源Python框架，它使用贝叶斯方法自动搜索超参数空间。

该框架是由一家名为Preferred Networks的日本人工智能公司开发的。

Optuna比Hyperopt更容易实现和使用。还可以指定优化过程应该持续多长时间。

Optuna的五个重要特性：

1 search spaces

Optuna为所有超参数类型提供了不同的选项。最常见的选择如下:

Categorical parameters：trials.suggest_categorical() 需要提供参数的名称及其选项。
Integer parameters ：trials.suggest_int() 需要提供参数的名称、低值和高值
Float parameters：trials.suggest_float() 需要提供参数的名称、低值和高值
Continuous parameters：trials.suggest_uniform() 需要提供参数的名称、低值和高值
Discrete parameters：trials.suggest_discrete_uniform() 需要提供参数的名称、低值、高值和离散化步骤。
2 optimization methods(samplers)

Optuna有不同的方法来执行超参数优化过程。最常见的方法是：

GridSampler 使用网格搜索。试验建议在研究过程中给定搜索空间内的所有参数组合。
RandomSampler 使用随机抽样。该采样器是基于独立采样的
TPESamplerTPE 采用(Tree-structured Parzen Estimator)算法
CmaEsSampler 采用CMA-ES算法
3 Objective Function

目标函数的工作方式与hyperopt和scikit-optimize技术相同。唯一的区别是，Optuna允许您在一个函数中定义搜索空间和目标。

4 study

一项研究对应着一项优化任务(一组试验)。如果你需要开始优化过程，你需要创建一个研究对象，并将目标函数传递给一个名为optimize()的方法

5 Visualization

Optuna中的可视化模块提供了不同的方法来为优化结果创建图形。

plot_contour()：该方法将参数关系绘制成研究中的等高线图。
plot_intermediate_values()：该方法绘制一项研究中所有试验的中间值。
plot_optimization_history()：该方法绘制研究中所有试验的优化历史。
plot_param_importances()：这个方法绘制超参数重要性及其值。
plot_edf()：绘制研究的目标值EDF(经验分布函数)。

import joblib

import optuna

from optuna.samplers import TPESampler
# define the search space and the objective function

def objective(trial):

    # define the search space

    criterion = trial.suggest_categorical('criterion', ['mse', 'mae'])

    max_depths = trial.suggest_int('max_depth', 1, 9, 1)

    n_estimators = trial.suggest_int('n_estimators', 100, 300, 100)
reg = RandomForestRegressor(n_estimators=n_estimators,

                                criterion=criterion,

                                max_depth=max_depths,

                                n_jobs=-1)

    # cross validation avoids overfitting

    score = cross_val_score(reg, X_scaled, y, scoring='neg_mean_absolute_error').mean()

    return score
# create a study object

# Create_study()方法允许选择是最大化还是最小化目标函数

# 除了optuna,其他优化包均是最小化
study = optuna.create_study(study_name='randomForest_optimization',

                            direction='maximize',

                            sampler=TPESampler())
# 将研究对象命名为randomForest_optimization。优化的方向是maximize(即得分越高越好)，使用的优化方法是TPESampler()。
# 微调模型

# 要运行优化过程，需要在已创建的研究对象的optimize()方法中传递目标函数和试验次数。

# pass the objective function to method optimize()

study.optimize(objective, n_trials=10)

# show the best hyperparameters values selected

print(study.best_params)

# {'criterion': 'mae', 'max_depth': 4, 'n_estimators': 300}

# show the best score or accuracy

print(study.best_value)

# -482.8129100000002

# plot the optimization history of all trials in a study

optuna.visualization.plot_optimization_history(study)
# Plot the high-dimentional parameter relationships in a study.

optuna.visualization.plot_parallel_coordinate(study, params=['criterion', 'max_depth', 'n_estimators'])
# Plot hyperparameter importances.
optuna.visualization.plot_param_importances(study)

# save your hyperparameter searches
joblib.dump(study, 'optuna_searches/study.pkl')
# load your hyperparameter searches
study = joblib.load('optuna_searches/study.pkl')

# print the study name study.study_name

参考：https://www.freecodecamp.org/news/hyperparameter-optimization-techniques-machine-learning/

公众号：AI蜗牛车

保持谦逊、保持自律、保持进步

个人微信

备注：昵称+学校/公司+方向

如果没有备注不拉群！

拉你进AI蜗牛车交流群

AI蜗牛车

机器学习算法工程师，在顶级外企研究院和国内顶级知名大厂效力过，分享时间序列、时空序列、异常检测和诊断归因、气象AI、智能运维、智能交通，图神经网络、机器学习、深度学习、数据挖掘、个人发展、求职分享、AI竞赛方案等~

151篇原创内容

公众号

【机器学习】五种超参数优化技巧