DolphinDB中如何删除某只股票某段时间内的数据

我在dolphindb中创建了一个这样的database和table:

login("admin", "123456")

n=1000000
ID=rand(`a1`a2`a3`a4`a5`a6`a7`a8`b1`b2`b3`c1`c2`c3, n)
dates=2017.08.07..2017.08.11
date=rand(dates, n)
x=rand(10.0, n)
t=table(ID, date, x)

if(existsDatabase("dfs://compodb")){
    dropDatabase("dfs://compodb")
}

dbDate = database(, VALUE, 2017.08.07..2017.08.11)
dbID = database(, HASH,  [STRING, 5]);
db = database("dfs://compodb", COMPO, [dbDate, dbID])
pt = db.createPartitionedTable(t, "pt", `date`ID)

我现在想删除2017.08.07至2017.08.11中的ID=`c2的股票的数据,请问应该如何操作比较高效?

请先 登录 后评论

2 个回答

Jax Wu

可以先定义一个删除某只股票某天的函数,然后用loop或ploop函数,去删除指定时间段内的数据:

def dropCodeData(dbName, tableName, code,day){
    cnt = exec count(*) from loadTable(dbName, tableName) where date=day, ID=code
    if(cnt==0) return;
    hashNumber = hashBucket(code, 5)
    t = select * from loadTable(dbName, tableName) where date=day, hashBucket(ID, 5)=hashNumber
    delete from t where ID = code
    dropPartition(dbName,"/"+temporalFormat(day,"yyyyMMdd")+"/Key"+string(hashNumber))
    loadTable(dbName, tableName).append!(t)
}

//ploop多线程并行删除某只股票多天的记录
ploop(dropCodeData{database("dfs://compodb"), "pt", `c2}, 2017.08.07..2017.08.11)

//loop单线程删除某只股票多天的记录
loop(dropCodeData{database("dfs://compodb"), "pt", `c2}, 2017.08.07..2017.08.11)

//删除某天某只股票的记录
dropCodeData(database("dfs://compodb"), "pt", `c2, 2017.08.10)
请先 登录 后评论
谭华

实施方案

方案一:hash分区dropPartition指定路径删除分区:
def dropCodeData(dbName, tableName, code,day){
    cnt = exec count(*) from loadTable(dbName, tableName) where date=day, ID=code
    if(cnt==0) return;
    t = select * from loadTable(dbName, tableName) where date=day, hashBucket(ID, 5)=hashBucket(code, 5)
    delete from t where ID = code
    dropPartition(dbName,"/"+temporalFormat(day,"yyyyMMdd")+"/Key"+string(hashNumber),tableName = tableName)
    loadTable(dbName, tableName).append!(t)
}
方案二:用sqlDelete删除数据:
sqlDelete (loadTable("dfs://compodb","pt"), < ID = `c2  and date in (2017.08.07..2017.08.11)>).eval()
方案三:dropPartition可以指定条件进行删除:
 dropPartition(db,[2017.08.07..2017.08.11,`c2] ,tableName)
方案四:用delete删除数据:
delete from pt where ID = `c2  and date in (2017.08.07..2017.08.11) map;

性能测试:

                       实施方案                                                                耗时

ploop多线程并行删除某只股票多天的记录                             54.901 ms

loop单线程删除某只股票多天的记录                                       98.867 ms

删除某天某只股票的记录                                                         32.025 ms

sqlDelete                                                                                 31.995 ms

dropPartition指定条件删除分区                                               27.332 ms

delete                                                                                       29.387 ms

请先 登录 后评论