python从其他数据库读取到某张表的数据,现在想直接利用这个DATAFRAME新建数据库,如何实现?
如果先本地生成CSV会发现DolphinDB解析的类型格式,会与DATAFRAME实际得到的不一致,导致后续新数据APPEND失败。
以下是一个直接利用DataFrame新建数据库的case,可以参考以下
def create_table_by_df(self, dbPath=None, tableName=None, partitionColumns=None): """ 通过pandas读取一个样例文件,通过样例数据创建一个表 :param dbPath: :param tableName: :return: """ dataPath = "D://work//data//stock//test//second//20191212.csv" colfloat = ['ask1', 'ask2', 'ask3', 'ask4', 'ask5', 'bid1', 'bid2', 'bid3', 'bid4', 'bid5'] colint = ['asize1', 'asize2', 'asize3', 'asize4', 'asize5', 'bsize1', 'bsize2', 'bsize3', 'bsize4', 'bsize5'] cols = ['ask1', 'ask2', 'ask3', 'ask4', 'ask5', 'bid1', 'bid2', 'bid3', 'bid4', 'bid5', 'asize1', 'asize2', 'asize3', 'asize4', 'asize5', 'bsize1', 'bsize2', 'bsize3', 'bsize4', 'bsize5'] df = pd.read_csv(dataPath, encoding='utf-8') df['symbol'] = df['symbol'].astype(str) to_datetime64_for_df(df, ['DateTime', 'Date']) df['Time'] = df['Time'].apply(lambda x: datetime.strptime(str(x), "%H:%M:%S.%f")) df[colfloat] = df[colfloat].astype('float32') df[colint] = df[colint].astype('int64') s.upload({"newdf": df}) s.run("""dbt=database("{}")""".format(dbPath)) s.run( """newdf=select symbol(symbol) as symbol,nanotimestamp(DateTime) as DateTime,date(Date) as Date,nanotime(Time) as Time ,{} from newdf""".format( ','.join(cols))) s.run("""t=dbt.createPartitionedTable(newdf,"{}",`{})""".format(tableName, '`'.join(partitionColumns))) s.run("t.append!(newdf)")
后续导入到此表的,先在python中进行初步的数据转化,再通过pythonAPI的函数tableAppender导入,参考文档https://gitee.com/dolphindb/api_python3#614-%E4%BD%BF%E7%94%A8-tableappender-%E5%AF%B9%E8%B1%A1%E8%BF%BD%E5%8A%A0%E6%95%B0%E6%8D%AE%E6%97%B6%E8%87%AA%E5%8A%A8%E8%BD%AC%E6%8D%A2%E6%97%B6%E9%97%B4%E7%B1%BB%E5%9E%8B
def import_df_to_dolphindb_tableAppender(self, dbPath=None, tableName=None, allFiles=None): """ 本例为利用tableAppender把文件通过DataFrame加载到数据表中的样例; 通过tableAppender导入数据,使用tableAppender对象追加数据时自动转换时间类型, 但是其他的数据类型转换需要在python中显示转换为相应的类型, 如表中存储为int, 则在python端需要显示转化为int32 如表中存储为long, 则在python端需要显示转化为int64 :param dbPath: :param tableName: :param allFiles: """ ###2.通过tableAppender导入数据,使用tableAppender对象追加数据时自动转换时间类型 colfloat = ['ask1', 'ask2', 'ask3', 'ask4', 'ask5', 'bid1', 'bid2', 'bid3', 'bid4', 'bid5'] colint = ['asize1', 'asize2', 'asize3', 'asize4', 'asize5', 'bsize1', 'bsize2', 'bsize3', 'bsize4', 'bsize5'] # # # Appender = ddb.tableAppender(dbPath=dbPath, tableName=tableName, ddbSession=s, action="fitColumnType") for ifile in allFiles: df = pd.read_csv(ifile, encoding='utf-8') df['symbol'] = df['symbol'].astype(str) to_datetime64_for_df(df, ['DateTime', 'Date']) df['Time'] = df['Time'].apply(lambda x: datetime.strptime(str(x), "%H:%M:%S.%f")) df[colfloat] = df[colfloat].astype('float32') df[colint] = df[colint].fillna(0).astype('int64') Appender.append(df)