当前按照排序键去重的设计非常反直觉,容易造成非预期数据覆盖,建议去重策略同时包括分区+排序键或者排序键默认包含分区键
参考最佳实践设计窄表,文档中排序键未包含分区键,造成数据覆盖
复现代码:
d_db1 = database("", HASH,[SYMBOL,20])
d_db2 = database("", VALUE, 2010.01M..2030.01M)
d_db = database("dfs://test",COMPO, [d_db1, d_db2],engine="TSDB")
day_col_names = `time`instrument`factor`value
day_col_types = [DATE,SYMBOL,SYMBOL,FLOAT]
factor_day_table = table(1:0, day_col_names, day_col_types)
d_db.createPartitionedTable(factor_day_table, `factor_day, `factor`time, compressMethods={time:`delta}, sortColumns=`instrument`time,keepDuplicates=LAST)
t1 = table(2010.01.01 as time, `test as instrument, `open as factor, 1 as value)
t2 = table(2010.01.01 as time, `test as instrument, `low as factor, 1 as value)
t = loadTable("dfs://test", `factor_day)
t.append!(t1)
t.append!(t2)
select * from t
预期两行,结果只有一行