实施方案
方案一:hash分区,dropPartition指定路径删除分区:
def dropCodeData(dbName, tableName, code,day){ cnt = exec count(*) from loadTable(dbName, tableName) where date=day, ID=code if(cnt==0) return; t = select * from loadTable(dbName, tableName) where date=day, hashBucket(ID, 5)=hashBucket(code, 5) delete from t where ID = code dropPartition(dbName,"/"+temporalFormat(day,"yyyyMMdd")+"/Key"+string(hashNumber),tableName = tableName) loadTable(dbName, tableName).append!(t) }
方案二:用sqlDelete删除数据:
sqlDelete (loadTable("dfs://compodb","pt"), < ID = `c2 and date in (2017.08.07..2017.08.11)>).eval()
方案三:dropPartition可以指定条件进行删除:
dropPartition(db,[2017.08.07..2017.08.11,`c2] ,tableName)
方案四:用delete删除数据:
delete from pt where ID = `c2 and date in (2017.08.07..2017.08.11) map;
性能测试:
实施方案 耗时
ploop多线程并行删除某只股票多天的记录 54.901 ms
loop单线程删除某只股票多天的记录 98.867 ms
删除某天某只股票的记录 32.025 ms
sqlDelete 31.995 ms
dropPartition指定条件删除分区 27.332 ms
delete 29.387 ms