ML之RF:利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)
ML之RF:利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)
相关文章
ML之RF:利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)
ML之RF:利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)全部代码
利用Pipeline(客户年龄/职业/婚姻/教育/违约/余额/住房等)预测客户是否购买该银行的产品二分类(预测、推理)
数据说明
该数据集是葡萄牙银行机构进行营销活动所得。这些营销活动一般以电话为基础,银行的客服人员至少联系客户一次,以确认客户是否有意愿购买该银行的产品(定期存款)。目标是预测客户是否购买该银行的产品。
NO | 字段名称 | 数据类型 | 字段描述 |
---|---|---|---|
1 | ID | Int | 客户唯一标识 |
2 | age | Int | 客户年龄 |
3 | job | String | 客户的职业 |
4 | marital | String | 婚姻状况 |
5 | education | String | 受教育水平 |
6 | default | String | 是否有违约记录 |
7 | balance | Int | 每年账户的平均余额 |
8 | housing | String | 是否有住房贷款 |
9 | loan | String | 是否有个人贷款 |
10 | contact | String | 与客户联系的沟通方式 |
11 | day | Int | 最后一次联系的时间(几号) |
12 | month | String | 最后一次联系的时间(月份) |
13 | duration | Int | 最后一次联系的交流时长 |
14 | campaign | Int | 在本次活动中,与该客户交流过的次数 |
15 | pdays | Int | 距离上次活动最后一次联系该客户,过去了多久(999表示没有联系过) |
16 | previous | Int | 在本次活动之前,与该客户交流过的次数 |
17 | poutcome | String | 上一次活动的结果 |
18 | y | Int | 预测客户是否会订购定期存款业务 |
数据参考:Citation: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
输出结果
查看数据分布
分析数据
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 25317 non-null int64
1 age 25317 non-null int64
2 job 25317 non-null object
3 marital 25317 non-null object
4 education 25317 non-null object
5 default 25317 non-null object
6 balance 25317 non-null int64
7 housing 25317 non-null object
8 loan 25317 non-null object
9 contact 25317 non-null object
10 day 25317 non-null int64
11 month 25317 non-null object
12 duration 25317 non-null int64
13 campaign 25317 non-null int64
14 pdays 25317 non-null int64
15 previous 25317 non-null int64
16 poutcome 25317 non-null object
17 y 25317 non-null int64
dtypes: int64(9), object(9)
memory usage: 3.5+ MB
训练集计算相关系数:
y 1.000000
ID 0.556627
duration 0.394746
pdays 0.107565
previous 0.088337
campaign 0.075173
balance 0.057564
day 0.031886
age 0.029916
训练集 y标签的比例: 0.11695698542481336
依次查看训练集、测试集中,类别型字段的细分类
job ['admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown']
job ['admin.', 'blue-collar', 'entrepreneur', 'housemaid', 'management', 'retired', 'self-employed', 'services', 'student', 'technician', 'unemployed', 'unknown']
marital ['divorced', 'married', 'single']
marital ['divorced', 'married', 'single']
education ['primary', 'secondary', 'tertiary', 'unknown']
education ['primary', 'secondary', 'tertiary', 'unknown']
default ['no', 'yes']
default ['no', 'yes']
housing ['no', 'yes']
housing ['no', 'yes']
loan ['no', 'yes']
loan ['no', 'yes']
contact ['cellular', 'telephone', 'unknown']
contact ['cellular', 'telephone', 'unknown']
month ['apr', 'aug', 'dec', 'feb', 'jan', 'jul', 'jun', 'mar', 'may', 'nov', 'oct', 'sep']
month ['apr', 'aug', 'dec', 'feb', 'jan', 'jul', 'jun', 'mar', 'may', 'nov', 'oct', 'sep']
poutcome ['failure', 'other', 'success', 'unknown']
poutcome ['failure', 'other', 'success', 'unknown']
输出训练过程
Fitting 7 folds for each of 32 candidates, totalling 224 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 31.1s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 31.0s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 31.7s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50 ..........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 32.2s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 27.1s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 27.1s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=50, total= 26.6s
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
[CV] forst_reg__max_features=45, forst_reg__n_estimators=100 .........
导出推理结果
赞 (0)