如何将一条数据根据起始、终止日期填充或扩容为多条记录?

转换前:

attachments-2022-09-zItPHkAT631afcd6c5d48.png
转换后:

attachments-2022-09-efoED6ua6320234b92a9d.jpg
比如股票A,希望在ENTRY_DT与REMOVE_DT之间数据填充出来,填充之后变成三列,S_INFO_WINDCODE、SW_IND_CODE 字段与原始那条数据保持一致。



请先 登录 后评论

2 个回答

Juntao Wang

思路:原始数据为每只股票入市、退市日期记录,数据量不大,可以逐条处理,使用交叉连接生成每只股票相应的记录,再unionAll合并。

示例代码如下:

def f(t, i) {
	windCode = t[i][`S_INFO_WINDCODE]
	code = t[i][`SW_IND_CODE]
	entryDate = t[i][`ENTRY_DT]
	removeDate = t[i][`REMOVE_DT]
	days = entryDate..removeDate
	days = days[weekday(days) between 1:5]
	return cj(table(windCode as S_INFO_WINDCODE, code as SW_IND_CODE), table(days as DT))
}
t = table(`A`B`C as S_INFO_WINDCODE, `6112010200`6112010200`6112010200 as SW_IND_CODE, 1994.09.07 1998.11.11 1999.05.27 as ENTRY_DT, 2011.09.03 2010.10.08 2011.09.30 as REMOVE_DT)
unionAll(each(f{t}, 1..size(t) - 1), false)
请先 登录 后评论
OliviaH

原始数据为每只股票入市、退市日期记录,需要将交易日期的数据填充进去。

生成表格:

t = table(`A`B`C as S_INFO_WINDCODE, `6112010200`6112010200`6112010200 as SW_IND_CODE, 1994.09.07 1998.11.11 1999.05.27 as ENTRY_DT, 2011.09.03 2010.10.08 2011.09.30 as REMOVE_DT)

方案一:peach+unionAll

def f(t, i) {
    marketDay = getMarketCalendar("SSE", t[i][`ENTRY_DT], t[i][`REMOVE_DT]) 
    return table(take(t[i][`S_INFO_WINDCODE], size(marketDay)) as S_INFO_WINDCODE, take(t[i][`SW_IND_CODE], size(marketDay)) as SW_IND_CODE, marketDay as DT)
}

result = unionAll(peach(f{t}, 1..size(t) - 1), false)

方案二:cj+each+unionAll

def f(t, i) {
	windCode = t[i][`S_INFO_WINDCODE]
	code = t[i][`SW_IND_CODE]
	entryDate = t[i][`ENTRY_DT]
	removeDate = t[i][`REMOVE_DT]
	days = entryDate..removeDate
	days = days[weekday(days) between 1:5]
	return cj(table(windCode as S_INFO_WINDCODE, code as SW_IND_CODE), table(days as DT))
}
timer unionAll(each(f{t}, 1..size(t) - 1), false)

性能测试:测试数据如上,记录单次运行耗时,结果如下

attachments-2024-02-J93oOMza65bb6c5de4da9.png






请先 登录 后评论