使用最新的时间戳选择每个唯一的ID
问题描述:
我有一个Big Query中的表,其中包含唯一的ID,时间戳和距离,并且希望通过ID和最新的时间戳选择一条记录。使用最新的时间戳选择每个唯一的ID
E.g.表看起来像
ID|timestamp|distance
A|100|2
A|90|3
B|110|5
D|100|4
A|80|2
B|10|2
查询应返回类似:
A|100|2
B|110|5
D|100|4
PostgreSQL中工作的查询看起来是这样,但没有“明显ON” BigQuery中的?
SELECT * FROM (
SELECT DISTINCT ON (ID)
id, timestamp, distance
FROM ranking
ORDER BY ID, timestamp DESC
) AS latest_dtg
ORDER BY distance
答
这个呢?
SELECT a.*
FROM yourtable AS a
INNER JOIN (
SELECT id, MAX(timestamp) AS newesttimestamp
FROM yourtable
GROUP BY id
) AS b
ON a.id = b.id AND a.timestamp = b.newesttimestamp
ORDER BY a.id
答
这里有一个想法:
#standardSQL
WITH ranking AS
(SELECT 'A' id, 100 ts, 2 distance UNION ALL
SELECT 'A', 90, 3 UNION ALL
SELECT 'B', 110, 5 UNION ALL
SELECT 'D', 100, 4 UNION ALL
SELECT 'B', 10, 2 UNION ALL
SELECT 'A', 80, 2)
SELECT id, ARRAY_AGG(STRUCT(ts, distance) ORDER BY ts DESC LIMIT 1)[SAFE_OFFSET(0)]
FROM ranking
GROUP BY id
答
下面是BigQuery的标准SQL
#standardSQL
SELECT row.* FROM (
SELECT ARRAY_AGG(r ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)] AS row
FROM ranking AS r
GROUP BY id
)
你可以从你的问题与播放/测试下方的虚拟数据
#standardSQL
WITH ranking AS (
SELECT 'A' AS id, 100 AS timestamp, 2 AS distance UNION ALL
SELECT 'A', 90, 3 UNION ALL
SELECT 'B', 110, 5 UNION ALL
SELECT 'D', 100, 4 UNION ALL
SELECT 'B', 10, 2 UNION ALL
SELECT 'A', 80, 2
)
SELECT row.* FROM (
SELECT ARRAY_AGG(r ORDER BY timestamp DESC LIMIT 1)[OFFSET(0)] AS row
FROM ranking AS r
GROUP BY id
)