对于多列SQL查询 - SUM(CASE WHEN x THEN 1 ELSE 0)
问题描述:
我正在查看是否有更好的方法来查询下面。我想要做的是创建一个总结报告,按日期编制统计信息。对于多列SQL查询 - SUM(CASE WHEN x THEN 1 ELSE 0)
SELECT CAST(Detail.ReceiptDate AS DATE) AS 'DATE'
, SUM(CASE WHEN Detail.Type = 'TotalMailed' THEN 1 ELSE 0 END) AS 'TOTALMAILED'
, SUM(CASE WHEN Detail.Type = 'TotalReturnMail' THEN 1 ELSE 0 END) AS 'TOTALUNDELINOTICESRECEIVED'
, SUM(CASE WHEN Detail.Type = 'TraceReturnedMail' THEN 1 ELSE 0 END) AS 'TRACEUNDELNOTICESRECEIVED'
FROM
(
select SentDate AS 'ReceiptDate', 'TotalMailed' AS 'Type'
from MailDataExtract
where sentdate is not null
UNION ALL
select MDE.ReturnMailDate AS 'ReceiptDate', 'TotalReturnMail' AS 'Type'
from MailDataExtract MDE
where MDE.ReturnMailDate is not null
UNION ALL
select MDE.ReturnMailDate AS 'ReceiptDate', 'TraceReturnedMail' AS 'Type'
from MailDataExtract MDE
inner join DTSharedData.dbo.ScanData SD ON SD.ScanDataID = MDE.ReturnScanDataID
where MDE.ReturnMailDate is not null AND SD.ReturnMailTypeID = 1
) AS Detail
GROUP BY CAST(Detail.ReceiptDate AS DATE)
ORDER BY 1
这仅仅是该查询(其在一报告中所使用)的样品作为有一些其它列的和用于其它统计信息的逻辑是方式更加复杂。有没有更优雅的方法来获取这类信息/撰写这种报告?
答
我会更改查询在以下几个方面:
- 不要在子查询的聚集。这可以利用关于表格的更多信息来优化
group by
。 - 合并第二个和第三个子查询。他们正在聚合在同一列。这需要使用
left outer join
来确保所有数据都可用。 - 通过使用
count(<fieldname>)
可以消除与is null
的比较。这对第二和第三个计算值很重要。 - 要组合第二个和第三个查询,它需要计算
mde
表中的ID。这些使用mde.mdeid
。
以下版本如下您例如,通过使用union all
:
SELECT CAST(Detail.ReceiptDate AS DATE) AS "Date",
SUM(TOTALMAILED) as TotalMailed,
SUM(TOTALUNDELINOTICESRECEIVED) as TOTALUNDELINOTICESRECEIVED,
SUM(TRACEUNDELNOTICESRECEIVED) as TRACEUNDELNOTICESRECEIVED
FROM ((select SentDate AS "ReceiptDate", COUNT(*) as TotalMailed,
NULL as TOTALUNDELINOTICESRECEIVED, NULL as TRACEUNDELNOTICESRECEIVED
from MailDataExtract
where SentDate is not null
group by SentDate
) union all
(select MDE.ReturnMailDate AS ReceiptDate, 0,
COUNT(distinct mde.mdeid) as TOTALUNDELINOTICESRECEIVED,
SUM(case when sd.ReturnMailTypeId = 1 then 1 else 0 end) as TRACEUNDELNOTICESRECEIVED
from MailDataExtract MDE left outer join
DTSharedData.dbo.ScanData SD
ON SD.ScanDataID = MDE.ReturnScanDataID
group by MDE.ReturnMailDate;
)
) detail
GROUP BY CAST(Detail.ReceiptDate AS DATE)
ORDER BY 1;
不使用full outer join
类似以下内容:
SELECT coalesce(sd.ReceiptDate, mde.ReceiptDate) AS "Date",
sd.TotalMailed, mde.TOTALUNDELINOTICESRECEIVED,
mde.TRACEUNDELNOTICESRECEIVED
FROM (select cast(SentDate as date) AS "ReceiptDate", COUNT(*) as TotalMailed
from MailDataExtract
where SentDate is not null
group by cast(SentDate as date)
) sd full outer join
(select cast(MDE.ReturnMailDate as date) AS ReceiptDate,
COUNT(distinct mde.mdeID) as TOTALUNDELINOTICESRECEIVED,
SUM(case when sd.ReturnMailTypeId = 1 then 1 else 0 end) as TRACEUNDELNOTICESRECEIVED
from MailDataExtract MDE left outer join
DTSharedData.dbo.ScanData SD
ON SD.ScanDataID = MDE.ReturnScanDataID
group by cast(MDE.ReturnMailDate as date)
) mde
on sd.ReceiptDate = mde.ReceiptDate
ORDER BY 1;
答
我认为你应该做一个子查询来做分组。在这种情况下,内部子查询返回少量行,并且不需要CASE语句。因此,我认为这将是更快:
select Detail.ReceiptDate AS 'DATE',
SUM(TotalMailed),
SUM(TotalReturnMail),
SUM(TraceReturnedMail)
from
(
select SentDate AS 'ReceiptDate',
count('TotalMailed') AS TotalMailed,
0 as TotalReturnMail,
0 as TraceReturnedMail
from MailDataExtract
where sentdate is not null
GROUP BY SentDate
UNION ALL
select MDE.ReturnMailDate AS 'ReceiptDate',
0 AS TotalMailed,
count(TotalReturnMail) as TotalReturnMail,
0 as TraceReturnedMail
from MailDataExtract MDE
where MDE.ReturnMailDate is not null
GROUP BY MDE.ReturnMailDate
UNION ALL
select MDE.ReturnMailDate AS 'ReceiptDate',
0 AS TotalMailed,
0 as TotalReturnMail,
count(TraceReturnedMail) as TraceReturnedMail
from MailDataExtract MDE
inner join DTSharedData.dbo.ScanData SD
ON SD.ScanDataID = MDE.ReturnScanDataID
where MDE.ReturnMailDate is not null AND SD.ReturnMailTypeID = 1
GROUP BY MDE.ReturnMailDate
) as Detail
GROUP BY Detail.ReceiptDate
ORDER BY 1
这是在一个进程或视图,还是其他什么东西?基本上,你可以引入变量并运行多个语句,还是只是一个大的'select'语句? –
这是一个将被用于SSRS报告的过程,所以它将基本上是一个select语句,因为我需要返回一个结果集(对吧?) – MickJuice
是的,你最终将有一个大的'select'结束,但由于它处于proc中,因此您可以将查询分解为更小,更简单的块,并根据变量将值分配给变量。这可以在可读性方面产生很大的不同。例如,可以将三个小型独立查询事先运行并将汇总结果指定给变量,而不是将这三个子查询进行联合或分组,而不是将这些变量分配给您的返回查询。可能更容易阅读和理解,并可能更好地表现。 –