012 Stata:排序与分组
主要是sort
、gsort
、bysort
三个命令
1.sort 升序
sort varlist [in] [, stable]
sysuse auto, clearkeep make price mpg length foreign*对一个变量排序sort make //对make进行升序排列listsort mpg //对mpg进行升序排列list*对多个变量排序sort mpg length //先对mpg进行排序,再对length排序listsort length mpg //先对length进行排序,再对mpg排序list
2.gsort 升序/降序
gsort [+|-] varname [[+|-] varname ...] [, generate(newvar) mfirst]
sysuse auto, clearkeep make price mpg length foreign*对一个变量排序gsort +make //升序排列,等价于gsort make 和 sort makelistgsort -make //降序排列list*对多个变量排序gsort -mpg -length //先对mpg进行排序,再对length排序(均为降序排列)listgsort -length mpg //先对length进行排序(降序),再对mpg排序(升序)list
3.bysort 分组 排序
bysort varlist: stata_cmd
bysort varlist1 [(varlist2)] [, rc0]: stata_cmd
sysuse auto,clearby foreign : sum price mpg length weight rep78 //分组描述性统计sort rep78by rep78 : sum price mpg length weight //分组之前,必须先进行排序by rep78,sort : sum price mpg length weightbysort rep78 : sum price mpg length weight*bysort可以简写为byclearinput str2 v1 v2A 3B 4A 1A 1A 2B 5endbysort v1 v2 : gen num1 = _N //对v1、v2进行排序并分组,生成num1等于某组的观测值总数bysort v1(v2): gen num2 = _N //v2只排序,不分组,生成num2等于某组的观测值总数*示例1*事件研究删除停牌期间事件use 事件列表,clearjoinby stkcd using 停复牌gen date1 = date(date,"YMD")bysort stkcd date : gen num1 = _N drop if date1 >= startdate & date1 <= enddatebysort stkcd date : gen num2 = _N drop if num1 != num2keep stkcd dateduplicates drop*示例2*赫芬达尔指数(HHI)的计算clearuse 赫芬达尔指数,clearsort year industrybysort year industry: egen sumsize = sum(资产总计) //egen与sum搭配时,生成的是列总和,而gen与sum搭配时生成的是列累积和。gen ratio = (资产总计/sumsize)^2bysort year industry: egen HHI = sum(ratio)drop sumsize ratio
城市排序
use 城市列表, clearreplace city = ustrfrom(city,"gb18030",1)sort city //在stata14、stata15中,汉字的排序按照utf-8编码顺序
赞 (0)