模拟数据如下:
login(`admin, `123456) pnodeRun(clearAllCache) undef all syms = format(1..3000, "SH000000") N = 10000 t = cj(table(syms as symbol), table(rand(100.0, N) as price, rand(10000, N) as volume))
方法1:context by,耗时约3.3 s。
timer result1 = select mwavg(price, volume, 4) from t context by symbol
方法2:for loop,耗时约25 min。
arr = array(ANY, syms.size()) timer { for(i in 0 : syms.size()) { price_vec = exec price from t where symbol = syms[i] volume_vec = exec volume from t where symbol = syms[i] arr[i] = mwavg(price_vec, volume_vec, 4) } res = reduce(join, arr) }
两种方法的性能相差约400多倍。context by 一次性对于所有股票分组,再对每组分别进行计算。而for loop 每一次循环都要扫描全表以获取某只股票相应的10000记录,所以耗时较长。