ML之RF：利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)

相关文章
ML之RF：利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)
ML之RF：利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)全部代码

利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)

数据说明

该数据集是葡萄牙银行机构进行营销活动所得。这些营销活动一般以电话为基础，银行的客服人员至少联系客户一次，以确认客户是否有意愿购买该银行的产品（定期存款）。目标是预测客户是否购买该银行的产品。

NO	字段名称	数据类型	字段描述
1	ID	Int	客户唯一标识
2	age	Int	客户年龄
3	job	String	客户的职业
4	marital	String	婚姻状况
5	education	String	受教育水平
6	default	String	是否有违约记录
7	balance	Int	每年账户的平均余额
8	housing	String	是否有住房贷款
9	loan	String	是否有个人贷款
10	contact	String	与客户联系的沟通方式
11	day	Int	最后一次联系的时间（几号）
12	month	String	最后一次联系的时间（月份）
13	duration	Int	最后一次联系的交流时长
14	campaign	Int	在本次活动中，与该客户交流过的次数
15	pdays	Int	距离上次活动最后一次联系该客户，过去了多久（999表示没有联系过）
16	previous	Int	在本次活动之前，与该客户交流过的次数
17	poutcome	String	上一次活动的结果
18	y	Int	预测客户是否会订购定期存款业务

数据参考：Citation: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014

输出结果

查看数据分布

分析数据

 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   ID         25317 non-null  int64
 1   age        25317 non-null  int64
 2   job        25317 non-null  object
 3   marital    25317 non-null  object
 4   education  25317 non-null  object
 5   default    25317 non-null  object
 6   balance    25317 non-null  int64
 7   housing    25317 non-null  object
 8   loan       25317 non-null  object
 9   contact    25317 non-null  object
 10  day        25317 non-null  int64
 11  month      25317 non-null  object
 12  duration   25317 non-null  int64
 13  campaign   25317 non-null  int64
 14  pdays      25317 non-null  int64
 15  previous   25317 non-null  int64
 16  poutcome   25317 non-null  object
 17  y          25317 non-null  int64
dtypes: int64(9), object(9)
memory usage: 3.5+ MB

训练集计算相关系数：
 y           1.000000
ID          0.556627
duration    0.394746
pdays       0.107565
previous    0.088337
campaign    0.075173
balance     0.057564
day         0.031886
age         0.029916

训练集 y标签的比例： 0.11695698542481336
依次查看训练集、测试集中，类别型字段的细分类
job ['admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown']
job ['admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown']
marital ['divorced', 'married', 'single']
marital ['divorced', 'married', 'single']
education ['primary', 'secondary', 'tertiary', 'unknown']
education ['primary', 'secondary', 'tertiary', 'unknown']
default ['no', 'yes']
default ['no', 'yes']
housing ['no', 'yes']
housing ['no', 'yes']
loan ['no', 'yes']
loan ['no', 'yes']
contact ['cellular', 'telephone', 'unknown']
contact ['cellular', 'telephone', 'unknown']
month ['apr', 'aug', 'dec', 'feb', 'jan', 'jul', 'jun', 'mar', 'may', 'nov', 'oct', 'sep']
month ['apr', 'aug', 'dec', 'feb', 'jan', 'jul', 'jun', 'mar', 'may', 'nov', 'oct', 'sep']
poutcome ['failure', 'other', 'success', 'unknown']
poutcome ['failure', 'other', 'success', 'unknown']

输出训练过程

Fitting 7 folds for each of 32 candidates, totalling 224 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  31.1s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  31.0s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  31.7s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  32.2s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  27.1s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  27.1s
[CV]  forst_reg__max_features=45, forst_reg__n_estimators=50, total=  26.6s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........