SQL - 为复杂动态行选择查询
我需要根据搜索条件从下表中检索ListingId。请帮助的最佳方式以检索下面SQL - 为复杂动态行选择查询
注意的条件查询:ListingId可以有任意数量ExtrafieldId的,所以搜索ListingId是基于动态ExtrafieldId
If (ExtrafieldId = 1 and Value = 1) => OUTPUT - 20, 22
If (ExtrafieldId = 1 and Value = 1) and (ExtrafieldId = 2 and Value = 7) => OUTPUT - 21
If (ExtrafieldId =4and Value = 1999) => OUTPUT - 20, 21, 23
等等...
ListingId ExtraFieldId Value
20 1 1
20 2 4
20 3
20 4 1990
21 1 2
21 2 7
21 3
21 4 1990
22 1 1
22 2 4
22 3
22 4 2000
23 1 NULL
23 2 NULL
23 4 1999
SELECT
t1.ListingID
FROM
TableX AS t1
JOIN --- 2nd JOIN
TableX AS t2
ON
t2.ListingID = t1.ListingID
JOIN --- 3rd JOIN
TableX AS t3
ON
t3.ListingID = t1.ListingID
WHERE
(t1.ExtraFieldID, t1.Value) = (@ExtraFieldID_search1, @Value_search1)
--- 2nd condition
AND
(t2.ExtraFieldID, t2.Value) = (@ExtraFieldID_search2, @Value_search2)
--- 3rd condition
AND
(t3.ExtraFieldID, t3.Value) = (@ExtraFieldID_search3, @Value_search3)
如果您需要3个条件,你需要加入表本身更多的时间(所以共3个次)
使用HAVING
而不是自加入。因为不需要连接并且只需要1次表扫描,所以效率更高。这也意味着如果有多个条件,它只需要在HAVING
子句中添加一个表达式,而不是额外的连接。
例如你的第二个例子:
SELECT ListingID
FROM [YourTable]
GROUP BY ListingID
HAVING COUNT(CASE WHEN ExtrafieldId = 1 AND Value = 1 THEN 1 END) > 0
AND COUNT(CASE WHEN ExtrafieldId = 2 AND Value = 7 THEN 1 END) > 0
附录
以上是完全错误的。我认为这在眼睛上稍微容易一些,但下面的效率更高。
SELECT t1.ListingID
FROM Listing AS t1
INNER JOIN Listing AS t2
ON t2.ListingID = t1.ListingID
INNER JOIN Listing AS t3
ON t3.ListingID = t1.ListingID
INNER JOIN Listing AS t4
ON t4.ListingID = t1.ListingID
WHERE (t1.ExtraFieldID = 1 AND t1.Value = 1)
AND (t2.ExtraFieldID = 2 AND t2.Value = 7)
AND (t3.ExtraFieldID = 3 AND t3.Value = '')
AND (t4.ExtraFieldID = 4 AND t4.Value = 1999)
为了证明这一点,我跑到下面的代码进行测试:
DECLARE @Iterations INT, @Listings INT
/*******************************************************************************************************
SET THE PARAMETERS FOR THE TEST HERE, @Listings IS THE NUMBER OF ListingIDs TO INSERT INTO THE SAMPLE
TABLE. EACH LISTING GETS 4 RECORDS SO 10,000 LISTINGS WILL GENERATE A SAMPLE OF 40,000 RECORDS ETC.
@Iterations IS THE NUMBER OF SELECTS TO PERFORM TO TEST THE PERFORMANCE OF EACH METHOD.
*******************************************************************************************************/
SET @Iterations = 500
SET @Listings = 1000000
/*******************************************************************************************************/
/*******************************************************************************************************/
IF EXISTS (SELECT * FROM TempDB.INFORMATION_SCHEMA.TABLES WHERE Table_Name LIKE '#Listing%')
BEGIN
DROP TABLE #Listing
END
CREATE TABLE #Listing (ListingID INT NOT NULL, ExtraFieldID TINYINT NOT NULL, Value VARCHAR(4), PRIMARY KEY (ListingID, ExtraFieldID))
IF EXISTS (SELECT * FROM TempDB.INFORMATION_SCHEMA.TABLES WHERE Table_Name LIKE '#Results%')
BEGIN
DROP TABLE #Results
END
CREATE TABLE #Results (GroupBy INT, SelfJoin INT)
DECLARE @i INT, @Time DATETIME, @Time2 DATETIME, @t INT
SET @i = ISNULL((SELECT MAX(ListingID) + 1 FROM #Listing), 0)
-- FILL LISTING TABLE WITH RANDOM VALUES
WHILE @i < @Listings
BEGIN
INSERT #Listing VALUES (@i, 1, ROUND(RAND() * 4, 0))
INSERT #Listing VALUES (@i, 2, ROUND(RAND() * 20, 0))
INSERT #Listing VALUES (@i, 3, CASE WHEN ROUND(RAND(), 0) = 0 THEN '' ELSE CONVERT(VARCHAR(4), ROUND(RAND(), 3) * 1000) END)
INSERT #Listing VALUES (@i, 4, DATEPART(YEAR, DATEADD(YEAR, (RAND()-1) * 100, GETDATE())))
SET @i = @i + 1
END
CREATE NONCLUSTERED INDEX #IX_Listing_Value ON #Listing (Value) WITH FILLFACTOR = 100
SET @i = 0
-- PERFORM BOTH METHODS X NUMBER OF TIMES TO GET AN AVERAGE EXECUTION TIME
WHILE @i < @Iterations
BEGIN
SET @Time = GETDATE()
SELECT @t = COUNT(*)
FROM ( SELECT ListingID
FROM #Listing
GROUP BY ListingID
HAVING COUNT(CASE WHEN ExtrafieldId = 1 AND Value = 1 THEN 1 END) > 0
AND COUNT(CASE WHEN ExtrafieldId = 2 AND Value = 7 THEN 1 END) > 0
AND COUNT(CASE WHEN ExtrafieldId = 3 AND Value = '' THEN 1 END) > 0
AND COUNT(CASE WHEN ExtrafieldId = 4 AND Value = 1999 THEN 1 END) > 0
) D
SET @Time2 = GETDATE()
SELECT @t = COUNT(*)
FROM ( SELECT t1.ListingID
FROM #Listing AS t1
JOIN #Listing AS t2
ON t2.ListingID = t1.ListingID
JOIN #Listing AS t3
ON t3.ListingID = t1.ListingID
JOIN #Listing AS t4
ON t4.ListingID = t1.ListingID
WHERE (t1.ExtraFieldID = 1 AND t1.Value = 1)
AND (t2.ExtraFieldID = 2 AND t2.Value = 7)
AND (t3.ExtraFieldID = 3 AND t3.Value = '')
AND (t4.ExtraFieldID = 4 AND t4.Value = 1999)
) D
INSERT INTO #Results
SELECT DATEDIFF(MICROSECOND, @Time, @Time2) [GroupBy],
DATEDIFF(MICROSECOND, @Time2, GETDATE()) [SelfJoin]
SET @i = @i + 1
END
IF NOT EXISTS (SELECT 1 FROM TempDB.INFORMATION_SCHEMA.TABLES WHERE Table_Name LIKE '#OverallResults%')
BEGIN
CREATE TABLE #OverallResults (GroupBy INT NOT NULL, SelfJoin INT NOT NULL, Iterations INT NOT NULL, Listings INT NOT NULL)
END
INSERT INTO #OverallResults
SELECT AVG(GroupBy) [Group By],
AVG(SelfJoin) [Self Join],
COUNT(*) [Iterations],
@Listings
FROM #Results
SELECT AVG(GroupBy) [Group By],
AVG(SelfJoin) [Self Join],
COUNT(*) [Iterations],
CONVERT(DECIMAL(5, 4), (AVG(GroupBy) - AVG(SelfJoin))/1000000.0) [Difference (Seconds)],
CONVERT(DECIMAL(4, 2), 100 * (1 - (1.0 * AVG(SelfJoin)/AVG(GroupBy)))) [Percent Faster]
FROM #Results
DROP TABLE #Listing
DROP TABLE #results
SELECT Records,
Iterations,
GroupBy [Group By],
SelfJoin [Self Join],
CONVERT(DECIMAL(5, 4), (GroupBy - SelfJoin)/1000000.0) [Difference (Seconds)],
CONVERT(DECIMAL(4, 2), 100 * (1 - (1.0 * SelfJoin/GroupBy))) [Percent Faster]
FROM ( SELECT Listings * 4 [Records],
SUM(Iterations) [Iterations],
SUM(GroupBy * Iterations)/SUM(Iterations) [GroupBy],
SUM(SelfJoin * Iterations)/SUM(Iterations) [SelfJoin]
FROM #OverallResults
GROUP BY Listings
) a
这可以通过不同的变量来反复执行。我为100,1000,10000,100000和1000000列表运行了这个列表,每列有500条select语句以获得平均执行时间,这表明自上次加入的速度平均快了约60%,直到1,000,000个列表中的速度加快了95%。自我加入方式显然是表现的赢家。
Gareth,如何在不使用'IN'子句或其他一些最佳方式的情况下将MasterItem列表[Listing]的listingId连接到上面的ListingId? – 2012-02-21 12:29:24
您可以使用'SELECT * FROM ListingMasterTable INNER JOIN([MyAnswer])b ON a.ListingID = b.ListingID'。以下内容仍然有效:'SELECT * FROM ListingMasterTable WHERE列表ID IN([MyAnswer])'它可能不是最有效的方法。围绕IN和JOIN的优点讨论很多文章。 http://stackoverflow.com/questions/2577174/join-vs-subquery – GarethD 2012-02-21 12:47:39
这个查询将需要一个完整的表扫描或一些完整的索引扫描,以及一个“GROUP BY”计数。您可能会说它效率更高,但在大多数情况下,带有许多JOIN且没有GROUP BY的查询将比这更有效。因为它需要一些索引搜索(但不包括整个索引,只有相关的部分,与整个索引相比可能很小)。 – 2012-02-21 12:59:59
你可以使用union和distinct很容易。如果您使用的是IN子句使用ListingId作为另一个查询的输入您不必否则介意重复,您可以添加
SELECT DISTINCT ListingId FROM (
SELECT
ListingId
... -- the rest from below
) AS Data
这里的查询来获取上市(可能重复! ):
SELECT
ListingID
FROM
TABLE_NAME
WHERE
ExtrafieldId = 1 and Value = 1
UNION ALL
SELECT
ListingID
FROM
TABLE_NAME
WHERE
ExtrafieldId = 1 AND Value = 1 AND ExtrafieldId = 2 and Value = 7
UNION ALL
SELECT
ListingID
FROM
TABLE_NAME
WHERE
ExtrafieldId = 4 AND Value = 1999
中间的SELECT是毫无意义的。 'WHERE ExtrafieldId = 1 AND Value = 1 AND ExtrafieldId = 2和Value = 7'永远不会返回任何结果 - 因为如果Extrafield = 1那么它不能等于2,所以条件永远不会满足。您还正在使用'UNION'来分隔独立的WHERE子句。 “OR”和圆括号同样适用于更好的性能。 – GarethD 2012-02-21 12:03:55
嗨感谢您的回答,您可以给同样的查询3或4表。我是一名初学者,无法写入加入第三个表的查询。最大我有4或5表加入这样。 – 2012-02-21 11:39:43
+1正如我在答复中所评论的,这是比我发布的更有效的解决方案。 – GarethD 2012-02-21 14:02:52