针对查询所得数据,间隔N条采样取一条,最终返回1000条。

请先 登录 后评论

1 个回答

Juntao Wang

方案一:先把数据查到一张内存表中,再对内存表间隔采样。

ids = ["ID000234"]

result = select * from loadTable("dfs://Test", "Test") where time between 2022.03.01T00:00:00.000 : 2022.03.01T23:00:00.000, id in ids order by time asc
gap = result.size() / 1000
select top 1000 id, time, v, q from (select first(id) as id, first(time) as time, first(v) as v, first(q) as q from result group by rowNo(id) / gap)

方案二:直接对于分布式表每个chunk中的数据间隔采样。

n = 1000
timeRange = 2022.03.01T00:00:00.000 : 2022.03.10T10:00:00.000
days = date(timeRange)[1] - date(timeRange)[0] + 1
t = select * from loadTable("dfs://Test", "Test")  where time between timeRange, id in ids context by id having rowNo(id) in (rowNo(id).max()/(n/days)*1..(n/days)) map
请先 登录 后评论