想要对大数据集,分组后取percentile筛票统计,该如何写呢

如下,剔除大于percentile的取平均,结果不准确,该怎么写?

tb_2 = select secu_code,trade_date, avg(yield) as factor_value from tb_1 where yield <= percentile(yield,40)
group by trade_date,secu_code order by trade_date,secu_code

tt1= select * from tb_2 where trade_date = 2017.01.12 and secu_code =`600000

tt2= select secu_code,trade_date, avg(yield) as factor_value from tb_1 where trade_date = 2017.01.12 and secu_code =`600000 and yield <= percentile(yield,40) 
group by trade_date,secu_code trade_date,secu_code order by trade_date,secu_code

unionAll(tt1, tt2)
请先 登录 后评论

1 个回答

mhxiang


用aggrTopN函数

tb=select *from  loadText("/home/data.csv")
tt=select aggrTopN(avg, funcArgs=yield, sortingCol=yield, top=0.4, ascending=true) as factor_value from tb group by trade_date,secu_code
请先 登录 后评论