ML之FE:在模型训练中,仅需两行代码实现切分训练集和测试集并分离特征与标签

ML之FE:在模型训练中,仅需两行代码实现切分训练集和测试集并分离特征与标签


仅需两行代码实现切分训练集和测试集并分离特征与标签

输出结果

name              object
ID                object
age               object
age02              int64
age03             object
born      datetime64[ns]
sex               object
hobbey            object
money            float64
weight           float64
test01           float64
test02           float64
dtype: object
   name    ID  age  age02 age03       born   sex hobbey  money  weight  0   Bob     1  NaN     14    14        NaT     男    打篮球  200.0   140.5
1  LiSa     2   28     26    26 1990-01-01     女   打羽毛球  240.0   120.8
2  Mary         38     24    24 1980-01-01     女   打乒乓球  290.0   169.4
3  Alan  None           6     6        NaT  None         300.0   155.6   

     test01    test02
0  1.000000  1.000000
1  2.123457  2.123457
2  3.123457  3.123457
3  4.123457  4.123457
0    140.5
1    120.8
2    169.4
Name: weight, dtype: float64
   name    ID age  age02 age03 born   sex hobbey  money  weight    test01  3  Alan  None          6     6  NaT  None         300.0   155.6  4.123457   

     test02
3  4.123457

实习代码

import pandas as pd
import numpy as np

contents={"name": ['Bob',        'LiSa',                     'Mary',                       'Alan'],
          "ID":   [1,              2,                         ' ',                          None],    # 输出 NaN
          "age":  [np.nan,        28,                           38 ,                          '' ],   # 输出
          "age02":  [14,           26,                           24 ,                          6],
          "age03":  [14,           '26',                      '24' ,                        '6'],
        "born": [pd.NaT,     pd.Timestamp("1990-01-01"),  pd.Timestamp("1980-01-01"),        ''],     # 输出 NaT
          "sex":  ['男',          '女',                        '女',                        None,],   # 输出 None
          "hobbey":['打篮球',     '打羽毛球',                   '打乒乓球',                    '',],   # 输出
          "money":[200.0,                240.0,                   290.0,                     300.0],  # 输出
          "weight":[140.5,                120.8,                 169.4,                      155.6],  # 输出
          "test01":[1,    2.123456789,        3.123456781011126,   4.123456789109999],    # 输出
          "test02":[1,    2.123456789,        3.123456781011126,   4.123456789109999],    # 输出
          }
data_frame = pd.DataFrame(contents)
# data_frame.to_excel("data_Frame.xls")
print(data_frame.dtypes)
print(data_frame)

# ML之FE:在模型训练中,仅需两行代码实现切分训练集和测试集并分离特征与标签
train_test_split_Index=3
label_col='weight'
train_X = data_frame[:train_test_split_Index]
train_y = data_frame[:train_test_split_Index][label_col]
test_X  = data_frame[train_test_split_Index:]
test_y  = data_frame[train_test_split_Index:][label_col]
print(train_y)
print(test_X)
(0)

相关推荐