ML之回归预测:利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)

ML之回归预测:利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)


相关文章
ML之回归预测:利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)
ML之回归预测:利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)实现

利用Lasso、ElasticNet、GBDT等算法构建集成学习算法AvgModelsR对国内某平台上海2020年6月份房价数据集【12+1】进行回归预测(模型评估、模型推理)

1、数据集基本信息

 (3000, 13) 13 3000

 total_price         object
unit_price          object
roomtype            object
height              object
direction           object
decorate            object
area                object
age                float64
garden              object
district            object
total_price_Num    float64
unit_price_Num       int64
area_Num           float64
dtype: object

 Index(['total_price', 'unit_price', 'roomtype', 'height', 'direction',
       'decorate', 'area', 'age', 'garden', 'district', 'total_price_Num',
       'unit_price_Num', 'area_Num'],
      dtype='object')

   total_price unit_price roomtype  ... total_price_Num unit_price_Num area_Num
0        290万  46186元/平米     2室1厅  ...           290.0          46186    62.79
1        599万  76924元/平米     2室1厅  ...           599.0          76924    77.87
2        420万  51458元/平米     2室1厅  ...           420.0          51458    81.62
3      269.9万  34831元/平米     2室2厅  ...           269.9          34831    77.49
4        383万  79051元/平米     1室1厅  ...           383.0          79051    48.45

[5 rows x 13 columns]

      total_price unit_price roomtype  ... total_price_Num unit_price_Num area_Num
2995        230万  43144元/平米     1室1厅  ...           230.0          43144    53.31
2996        372万  75016元/平米     1室1厅  ...           372.0          75016    49.59
2997        366万  49973元/平米     2室1厅  ...           366.0          49973    73.24
2998        365万  69103元/平米     2室1厅  ...           365.0          69103    52.82
2999        420万  49412元/平米     2室2厅  ...           420.0          49412    85.00

[5 rows x 13 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3000 entries, 0 to 2999
Data columns (total 13 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   total_price      3000 non-null   object
 1   unit_price       3000 non-null   object
 2   roomtype         3000 non-null   object
 3   height           3000 non-null   object
 4   direction        3000 non-null   object
 5   decorate         3000 non-null   object
 6   area             3000 non-null   object
 7   age              2888 non-null   float64
 8   garden           3000 non-null   object
 9   district         3000 non-null   object
 10  total_price_Num  3000 non-null   float64
 11  unit_price_Num   3000 non-null   int64
 12  area_Num         3000 non-null   float64
dtypes: float64(3), int64(1), object(9)
memory usage: 304.8+ KB

                age  total_price_Num  unit_price_Num     area_Num
count  2888.000000      3000.000000     3000.000000  3000.000000
mean   2001.453601       631.953450    58939.028333   102.180667
std       9.112425       631.308855    25867.208297    62.211662
min    1911.000000        90.000000    11443.000000    17.050000
25%    1996.000000       300.000000    40267.500000    67.285000
50%    2003.000000       437.000000    54946.000000    89.230000
75%    2008.000000       738.000000    73681.250000   119.035000
max    2018.000000      9800.000000   250813.000000   801.140000

2、模型结果输出

AvgModelsR(models=(Pipeline(steps=[('robustscaler', RobustScaler()),
                                   ('lasso',
                                    Lasso(alpha=0.001, random_state=1))]),
                   Pipeline(steps=[('robustscaler', RobustScaler()),
                                   ('elasticnet',
                                    ElasticNet(alpha=0.001, l1_ratio=0.9,
                                               random_state=3))]),
                   GradientBoostingRegressor(random_state=5)))
R2_res [0.9944881811696309, 0.000626615309319283, array([0.99470591, 0.99512495, 0.99435729, 0.99491104, 0.99334171])]
MAE_res [-0.004994183753322101, 0.0001083601234287803, array([-0.00493338, -0.005202  , -0.00489054, -0.00498097, -0.00496404])]
RMSE_res [-8.323227156546791e-05, 9.870911328329942e-06, array([-8.14778066e-05, -7.79621763e-05, -7.93078692e-05, -7.49049128e-05,
       -1.02508593e-04])]
AvgModelsR(models=(Pipeline(steps=[('robustscaler', RobustScaler()),
                                   ('lasso',
                                    Lasso(alpha=0.001, random_state=1))]),
                   Pipeline(steps=[('robustscaler', RobustScaler()),
                                   ('elasticnet',
                                    ElasticNet(alpha=0.001, l1_ratio=0.9,
                                               random_state=3))]),
                   GradientBoostingRegressor(random_state=5)))
Avg_Best_models Score value: 0.9947618159336031
Avg_Best_models R2    value: 0.9947618159336031
Avg_Best_models MAE   value: 0.0064209273962331555
Avg_Best_models MSE   value: 9.023779248949011e-05

Avg_Best_models模型花费时间: 0:06:14.344069
(0)

相关推荐