ML之FE:在模型训练中,仅需两行代码实现切分训练集和测试集并分离特征与标签
ML之FE:在模型训练中,仅需两行代码实现切分训练集和测试集并分离特征与标签
仅需两行代码实现切分训练集和测试集并分离特征与标签
输出结果
name object
ID object
age object
age02 int64
age03 object
born datetime64[ns]
sex object
hobbey object
money float64
weight float64
test01 float64
test02 float64
dtype: object
name ID age age02 age03 born sex hobbey money weight 0 Bob 1 NaN 14 14 NaT 男 打篮球 200.0 140.5
1 LiSa 2 28 26 26 1990-01-01 女 打羽毛球 240.0 120.8
2 Mary 38 24 24 1980-01-01 女 打乒乓球 290.0 169.4
3 Alan None 6 6 NaT None 300.0 155.6
test01 test02
0 1.000000 1.000000
1 2.123457 2.123457
2 3.123457 3.123457
3 4.123457 4.123457
0 140.5
1 120.8
2 169.4
Name: weight, dtype: float64
name ID age age02 age03 born sex hobbey money weight test01 3 Alan None 6 6 NaT None 300.0 155.6 4.123457
test02
3 4.123457
实习代码
import pandas as pd
import numpy as np
contents={"name": ['Bob', 'LiSa', 'Mary', 'Alan'],
"ID": [1, 2, ' ', None], # 输出 NaN
"age": [np.nan, 28, 38 , '' ], # 输出
"age02": [14, 26, 24 , 6],
"age03": [14, '26', '24' , '6'],
"born": [pd.NaT, pd.Timestamp("1990-01-01"), pd.Timestamp("1980-01-01"), ''], # 输出 NaT
"sex": ['男', '女', '女', None,], # 输出 None
"hobbey":['打篮球', '打羽毛球', '打乒乓球', '',], # 输出
"money":[200.0, 240.0, 290.0, 300.0], # 输出
"weight":[140.5, 120.8, 169.4, 155.6], # 输出
"test01":[1, 2.123456789, 3.123456781011126, 4.123456789109999], # 输出
"test02":[1, 2.123456789, 3.123456781011126, 4.123456789109999], # 输出
}
data_frame = pd.DataFrame(contents)
# data_frame.to_excel("data_Frame.xls")
print(data_frame.dtypes)
print(data_frame)
# ML之FE:在模型训练中,仅需两行代码实现切分训练集和测试集并分离特征与标签
train_test_split_Index=3
label_col='weight'
train_X = data_frame[:train_test_split_Index]
train_y = data_frame[:train_test_split_Index][label_col]
test_X = data_frame[train_test_split_Index:]
test_y = data_frame[train_test_split_Index:][label_col]
print(train_y)
print(test_X)
赞 (0)