如何优化具有多个左连接的SQL查询

如何优化具有多个左连接的SQL查询

问题描述:

我有用于存储用户网页查看的日志表。如何优化具有多个左连接的SQL查询

表结构:

CREATE TABLE page_views (
    id integer NOT NULL, 
    user_id integer, 
    ip character varying(255), 
    url character varying, 
    title character varying(255), 
    user_agent character varying(255), 
    created_at timestamp(0) with time zone 
); 

CREATE SEQUENCE analytics_id_seq 
    START WITH 1 
    INCREMENT BY 1 
    NO MINVALUE 
    NO MAXVALUE 
    CACHE 1; 

ALTER SEQUENCE analytics_id_seq OWNED BY page_views.id; 

ALTER TABLE ONLY page_views ALTER COLUMN id SET DEFAULT nextval('analytics_id_seq'::regclass); 

ALTER TABLE ONLY page_views ADD CONSTRAINT page_views_pk PRIMARY KEY (id); 

CREATE INDEX page_views_url_title_index ON page_views USING btree (url, title); 

CREATE INDEX page_views_user_id_index ON page_views USING btree (user_id); 

我想拉约每页面访问量计的信息。

  • 总计数
  • 伯爵在本周
  • 在上周
  • 计数
  • 本月
  • 计数
  • 在上个月计数

的页面必须通过分组网址和标题。

查询我写了这个样子的:

SELECT count(page_views.id) AS "total_count", "page_views"."url", "page_views"."title", "current_week"."count" AS "current_week_count", "prev_week"."count" AS "prev_week_count", "current_month"."count" AS "current_month_count", "prev_month"."count" AS "prev_month_count" FROM "page_views" 

LEFT JOIN (
    SELECT count(id) AS "count", "url", "title" FROM "page_views" 
    WHERE (extract(year from page_views.created_at) = extract(year from current_date)) AND (extract(week from page_views.created_at) = extract(week from current_date)) 
    GROUP BY "url", "title" 
) "current_week" ON page_views.url = current_week.url AND page_views.title = current_week.title 

LEFT JOIN (
    SELECT count(id) AS "count", "url", "title" FROM "page_views" 
    WHERE (extract(year from page_views.created_at) = extract(year from current_date)) AND (extract(week from page_views.created_at) = extract(week from current_date - interval '1 week')) 
    GROUP BY "url", "title" 
) "prev_week" ON page_views.url = prev_week.url AND page_views.title = prev_week.title 

LEFT JOIN (
    SELECT count(id) AS "count", "url", "title" FROM "page_views" 
    WHERE (extract(year from page_views.created_at) = extract(year from current_date)) AND (extract(month from page_views.created_at) = extract(month from current_date)) 
    GROUP BY "url", "title" 
) "current_month" ON page_views.url = current_month.url AND page_views.title = current_month.title 

LEFT JOIN (
    SELECT count(id) AS "count", "url", "title" FROM "page_views" 
    WHERE (extract(year from page_views.created_at) = extract(year from current_date)) AND (extract(month from page_views.created_at) = extract(month from current_date - interval '1 month')) 
    GROUP BY "url", "title" 
) "prev_month" ON page_views.url = prev_month.url AND page_views.title = prev_month.title 


GROUP BY "page_views"."url", "page_views"."title", "current_week_count", "prev_week_count", "current_month_count", "prev_month_count" 

ORDER BY "total_count" DESC 

LIMIT 25 

在首先需要的只有总数。为url和标题对添加索引增加了查询性能。但是现在再增加一点,执行时间再次变大。

平均查询执行时间:17秒。

使用该查询的页面加载大约需要45秒(该查询再执行一次以获得记录总数)。

我认为左连接是原因,因为5个单个查询执行速度非常快。

EXPLAIN ANALYZE给出了下面的输出:

QUERY PLAN 
Limit (cost=93923.09..93923.16 rows=25 width=136) (actual time=18779.707..18779.722 rows=25 loops=1) 
-> Sort (cost=93923.09..93961.56 rows=15387 width=136) (actual time=18779.701..18779.707 rows=25 loops=1) 
Sort Key: (count(page_views.id)) 
Sort Method: top-N heapsort Memory: 21kB 
-> GroupAggregate (cost=90257.65..93488.88 rows=15387 width=136) (actual time=16884.156..18768.617 rows=17778 loops=1) 
-> Sort (cost=90257.65..90642.32 rows=153868 width=136) (actual time=16711.488..18631.405 rows=153879 loops=1) 
Sort Key: page_views.url, page_views.title, (count(page_views_1.id)), (count(page_views_2.id)), (count(page_views_3.id)), (count(page_views_4.id)) 
Sort Method: external merge Disk: 20968kB 
-> Hash Left Join (cost=44767.14..55958.90 rows=153868 width=136) (actual time=3531.133..4874.422 rows=153879 loops=1) 
Hash Cond: (((page_views.url)::text = (page_views_4.url)::text) AND ((page_views.title)::text = (page_views_4.title)::text)) 
-> Hash Left Join (cost=33575.36..43613.10 rows=153868 width=128) (actual time=2730.681..3609.524 rows=153879 loops=1) 
Hash Cond: (((page_views.url)::text = (page_views_3.url)::text) AND ((page_views.title)::text = (page_views_3.title)::text)) 
-> Hash Left Join (cost=22383.57..31267.29 rows=153868 width=120) (actual time=2103.744..2738.826 rows=153879 loops=1) 
Hash Cond: (((page_views.url)::text = (page_views_2.url)::text) AND ((page_views.title)::text = (page_views_2.title)::text)) 
-> Hash Left Join (cost=11191.79..18921.49 rows=153868 width=112) (actual time=1157.999..1538.455 rows=153879 loops=1) 
Hash Cond: (((page_views.url)::text = (page_views_1.url)::text) AND ((page_views.title)::text = (page_views_1.title)::text)) 
-> Seq Scan on page_views (cost=0.00..6575.68 rows=153868 width=104) (actual time=0.027..91.389 rows=153879 loops=1) 
-> Hash (cost=11191.77..11191.77 rows=1 width=108) (actual time=1157.958..1157.958 rows=3072 loops=1) 
Buckets: 1024 Batches: 1 Memory Usage: 581kB 
-> HashAggregate (cost=11191.75..11191.76 rows=1 width=104) (actual time=1150.583..1153.481 rows=3072 loops=1) 
-> Seq Scan on page_views page_views_1 (cost=0.00..11191.72 rows=4 width=104) (actual time=573.709..1105.539 rows=21945 loops=1) 
Filter: ((date_part('year'::text, created_at) = date_part('year'::text, (('now'::cstring)::date)::timestamp without time zone)) AND (date_part('week'::text, created_at) = date_part('week'::text, (('now'::cstring)::date)::timestamp without time zone))) 
Rows Removed by Filter: 131934 
-> Hash (cost=11191.77..11191.77 rows=1 width=108) (actual time=945.707..945.707 rows=4093 loops=1) 
Buckets: 1024 Batches: 1 Memory Usage: 756kB 
-> HashAggregate (cost=11191.75..11191.76 rows=1 width=104) (actual time=938.578..941.428 rows=4093 loops=1) 
-> Seq Scan on page_views page_views_2 (cost=0.00..11191.72 rows=4 width=104) (actual time=291.257..889.204 rows=29781 loops=1) 
Filter: ((date_part('year'::text, created_at) = date_part('year'::text, (('now'::cstring)::date)::timestamp without time zone)) AND (date_part('week'::text, created_at) = date_part('week'::text, (('now'::cstring)::date - '7 days'::interval)))) 
Rows Removed by Filter: 124098 
-> Hash (cost=11191.77..11191.77 rows=1 width=108) (actual time=626.909..626.909 rows=3072 loops=1) 
Buckets: 1024 Batches: 1 Memory Usage: 581kB 
-> HashAggregate (cost=11191.75..11191.76 rows=1 width=104) (actual time=621.925..623.915 rows=3072 loops=1) 
-> Seq Scan on page_views page_views_3 (cost=0.00..11191.72 rows=4 width=104) (actual time=284.954..598.724 rows=21945 loops=1) 
Filter: ((date_part('year'::text, created_at) = date_part('year'::text, (('now'::cstring)::date)::timestamp without time zone)) AND (date_part('month'::text, created_at) = date_part('month'::text, (('now'::cstring)::date)::timestamp without time zone))) 
Rows Removed by Filter: 131934 
-> Hash (cost=11191.77..11191.77 rows=1 width=108) (actual time=800.412..800.412 rows=10871 loops=1) 
Buckets: 1024 Batches: 4 (originally 1) Memory Usage: 1025kB 
-> HashAggregate (cost=11191.75..11191.76 rows=1 width=104) (actual time=776.933..785.074 rows=10871 loops=1) 
-> Seq Scan on page_views page_views_4 (cost=0.00..11191.72 rows=4 width=104) (actual time=0.028..680.576 rows=84245 loops=1) 
Filter: ((date_part('year'::text, created_at) = date_part('year'::text, (('now'::cstring)::date)::timestamp without time zone)) AND (date_part('month'::text, created_at) = date_part('month'::text, (('now'::cstring)::date - '1 mon'::interval)))) 
Rows Removed by Filter: 69634 
Total runtime: 19112.027 ms 

目前有153 868条记录在生产服务器上。 在我的开发服务器上,只有3 463条记录,查询执行速度非常快。

我认为你可以做,在一个单一的查询:

SELECT url, 
     title, 
     count(*) as total_count, 
     count(case when date_trunc('week', current_date) = date_trunc('week', created_at) then 1 else null end) as current_week_count, 
     count(case when date_trunc('week', current_date - interval '1 week') = date_trunc('week', created_at) then 1 else null end) as prev_week_count, 
     count(case when date_trunc('month', current_date) = date_trunc('month', created_at) then 1 else null end) as this_month_count, 
     count(case when date_trunc('month', current_date - interval '1' month) = date_trunc('month', created_at) then 1 else null end) as prev_month_count 
FROM page_views 
GROUP BY url, title 
+0

对不起,你是什么意思“......你的图片......”是什么意思? – arogachev 2014-12-05 08:47:45

+0

@arogachev:我懒得重复上个月的表达式,因为这与其他表达式只是用不同的比较完全相同。 – 2014-12-05 08:49:45

+0

经过测试,错误出现:错误:语法错误在或当“时”LINE 4:计数(当date_trunc('week',current_date)= date_tr ... – arogachev 2014-12-05 08:51:13