方案一:先把数据查到一张内存表中,再对内存表间隔采样。
ids = ["ID000234"] result = select * from loadTable("dfs://Test", "Test") where time between 2022.03.01T00:00:00.000 : 2022.03.01T23:00:00.000, id in ids order by time asc gap = result.size() / 1000 select top 1000 id, time, v, q from (select first(id) as id, first(time) as time, first(v) as v, first(q) as q from result group by rowNo(id) / gap)
方案二:直接对于分布式表每个chunk中的数据间隔采样。
n = 1000 timeRange = 2022.03.01T00:00:00.000 : 2022.03.10T10:00:00.000 days = date(timeRange)[1] - date(timeRange)[0] + 1 t = select * from loadTable("dfs://Test", "Test") where time between timeRange, id in ids context by id having rowNo(id) in (rowNo(id).max()/(n/days)*1..(n/days)) map