请问dolphindb中如何快速的将多个csv文件导入到dolphindb的分区表中？

我这里每天都会有大量的金融数据生成文件，但是单纯的通过append!数据到分区表中速度又比较慢。

例如我测试过4000个csv的文件导入

ta=loadTable(db,^day)
ta. append!(p loadText(“hdhdhjxjs.csv”))

类似这种写法，4000个文件大概需要3天才能导入完成，请问下有没有更加高效的方式？

0 条评论
分类：技术问答

默认排序时间排序

1 个回答

chenweijian 2021-06-09 17:10

step1：根据数据量，合理分区，创建好数据库
demo示例，数据量每个月差不多1000W条记录，分区原则按照每个最小分区内100W条记录，设计两层分区，第一层按月，第二层按HASH均分为10个分区。
代码：

//undef all
login("admin", "123456")

dbPath = "dfs://min_k"
tbName = "min_table"
filePath = "G:/Data/2020-1-good/000001.SZ.csv"

schemaTB = extractTextSchema(filePath)
col1 = exec name from schemaTB
col2 = exec type from schemaTB
t = table(10:0, col1, col2)

if(existsDatabase(dbPath)){
    dropDB(dbPath)
}
dbMonth = database("", VALUE, 2021.01M..2021.02M)
dbSymbol = database("", HASH, [SYMBOL,10])
db = database(dbPath, COMPO, [dbMonth,dbSymbol])
createPartitionedTable(db, t, tbName, `datetime`symbol)

min_table = loadTable(dbPath, tbName)

step2:
先批量读取csv文件（比如500个一读），用ploadText函数多线程加载数据到内存表，然后再写入DFS数据库表。
代码：

//undef all

//读取单个csv
def loadOrderBook(path, filename){
    t = ploadText(path + "/" + filename)
     return t
}
//写入数据库
def loadOrderDir(mutable tb, path){
    fileList = exec filename from files(path, "%.csv")
     fs= fileList.cut(100)
     for(i in 0:fs.size()){
          t=table(500000:0, tb.schema().colDefs['name'], tb.schema().colDefs['typeString'])
          for(f in fs[i]) {
           t.append!(loadOrderBook(path, f))
          }
      tb.append!(t)
     }
}

//读取csv 每个文件并提交写入数据库的作业
def loopLoadOrderBook(dir, mutable tb){
    dirs = exec filename from files(dir) where isDir = true
     for (path in dirs){
     path = dir + "/" + path
      print path
      submitJob("writing", "loadOrderDir"+path, loadOrderDir{tb, path})
     }
}

login("admin", "123456")
dbPath = "dfs://min_k"
tbName = "min_table"
dir = "G:/Data"

orderbooktb = loadTable(dbPath, tbName)
loopLoadOrderBook(dir, orderbooktb)
//查询提交的job：getRecentJobs()
//取消提交的job：cancelJob(`sdn_3101202103020002)
//select count(*) from orderbooktb 
loadOrderDir: fs = ::cut(fileList, 100) => The cut size must be greater than  one and less than the vector size

请问dolphindb中如何快速的将多个csv文件导入到dolphindb的分区表中？

1 个回答

相似问题