(3条消息) Python3pandas库Series用法(基础整理)
pandas库Series用法
- 构造/初始化Series的3种方法:
- 1)用列表list构建Series
- 1.a)pandas会默认用0到n来做Series的index,但也可以自己指定index,index你可以理解为dict里面的key
- 2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
- 3)用numpy array来构建Series
- 选择数据:
- 1)可以像对待一个list一样对待一个Series,完成各种切片的操作
- 2)Series就像一个dict,前面定义的index就是用来选择数据的
- 3)boolean indexing,和numpy很像
- Series元素赋值:
- 1)直接利用索引值赋值
- 2)不要忘了上面的boolean indexing,在赋值里它也可以用
- 数学运算
- 数据缺失
构造/初始化Series的3种方法:
1)用列表list构建Series
import pandas as pdmy_list=[7,'Beijing','19大',3.1415,-10000,'Happy']s=pd.Series(my_list)print(type(s))print(s)
<class 'pandas.core.series.Series'>0 71 Beijing2 19大3 3.14154 -100005 Happydtype: object
1.a)pandas会默认用0到n来做Series的index,但也可以自己指定index,index你可以理解为dict里面的key
s=pd.Series([7,'Beijing','19大',3.1415,-10000,'Happy'],index=['A','B','C','D','E','F'])print(s)
A 7B BeijingC 19大D 3.1415E -10000F Happydtype: object
2)用字典dict来构建Series,因为Series本身其实就是key-value的结构
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
3)用numpy array来构建Series
import numpy as npd=pd.Series(np.random.randn(5),index=['a','b','c','d','e'])print(d)
a -0.329401b -0.435921c -0.232267d -0.846713e -0.406585dtype: float64
选择数据:
1)可以像对待一个list一样对待一个Series,完成各种切片的操作
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
print(apts[3])
60000.0
print(apts[[3,4,1]])
Shanghai 60000.0Suzhou NaNGuangzhou 45000.0Name: income, dtype: float64
print(apts[1:])
Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64
print(apts[:-2])
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Name: income, dtype: float64
print(apts[1:]+apts[:-1])
Beijing NaNGuangzhou 90000.0Hangzhou 40000.0Shanghai 120000.0Suzhou NaNshenzhen NaNName: income, dtype: float64
2)Series就像一个dict,前面定义的index就是用来选择数据的
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts['Shanghai']) ###
60000.0
print('Hangzhou' in apts)
True
print('Choingqing' in apts)
False
3)boolean indexing,和numpy很像
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')less_than_50000=(apts<=50000) ###print(apts[less_than_50000])
Guangzhou 45000.0Hangzhou 20000.0shenzhen 50000.0Name: income, dtype: float64
注:可以使用numpy的各种函数mean,median,max,min
print(apts.mean())
46000.0
Series元素赋值:
1)直接利用索引值赋值
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')print(apts)print('Old income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 50000.0Name: income, dtype: float64Old income of shenzhen:50000.0
apts['shenzhen']=70000 ###print(apts)print('New income of shenzhen:{}'.format(apts['shenzhen']))
Beijing 55000.0Guangzhou 45000.0Hangzhou 20000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64New income of shenzhen:70000.0
2)不要忘了上面的boolean indexing,在赋值里它也可以用
import pandas as pdcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000) ###print(less_than_50000)apts[less_than_50000]=40000 ###print(apts)
Beijing FalseGuangzhou TrueHangzhou TrueShanghai FalseSuzhou Falseshenzhen FalseName: income, dtype: boolBeijing 55000.0Guangzhou 40000.0Hangzhou 40000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64
数学运算
import pandas as pdimport numpy as npcities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000print('New income of shenzhen:{}'.format(apts['shenzhen']))less_than_50000=(apts<50000) apts[less_than_50000]=40000 print(apts)print(apts/2) ###print(apts**1.5) ###print(np.log(apts)) ###apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)print(apts+apts2) ###
数据缺失
cities={'Beijing':55000,'Shanghai':60000,'shenzhen':50000,'Hangzhou':20000,'Guangzhou':45000,'Suzhou':None}apts=pd.Series(cities,name='income')apts['shenzhen']=70000less_than_50000=(apts<50000)apts[less_than_50000]=40000print(apts)
Beijing 55000.0Guangzhou 40000.0Hangzhou 40000.0Shanghai 60000.0Suzhou NaNshenzhen 70000.0Name: income, dtype: float64
apts2=pd.Series({'Beijing':10000,'Shanghai':8000,'shenzhen':6000,'Tianjin':40000,'Guangzhou':7000,'Chongqing':30000})print(apts2)
Beijing 10000Chongqing 30000Guangzhou 7000Shanghai 8000Tianjin 40000shenzhen 6000dtype: int64
print('Hangzhou' in apts) ###print('Hangzhou' in apts2)
TrueFalse
print(apts.notnull()) #boolean条件 ###
Beijing TrueGuangzhou TrueHangzhou TrueShanghai TrueSuzhou Falseshenzhen TrueName: income, dtype: bool
print(apts.isnull()) ###
Beijing FalseGuangzhou FalseHangzhou FalseShanghai FalseSuzhou Trueshenzhen FalseName: income, dtype: bool
print(apts[apts.isnull()]) #利用缺失索引布尔值取元素
Suzhou NaNName: income, dtype: float64
apts=apts+apts2 #索引缺失相加print(apts)
Beijing 65000.0Chongqing NaNGuangzhou 47000.0Hangzhou NaNShanghai 68000.0Suzhou NaNTianjin NaNshenzhen 76000.0dtype: float64
apts[apts.isnull()]=apts.mean() #将缺失位置赋值为中值print(apts)
Beijing 65000.0Chongqing 64000.0Guangzhou 47000.0Hangzhou 64000.0Shanghai 68000.0Suzhou 64000.0Tianjin 64000.0shenzhen 76000.0dtype: float64
赞 (0)