如何识别和删除重复行,除了最近
我在HeidiSQL中工作,我试图找出如何删除除最近的所有重复行。 “重复”之间有一些细微的差别,但是每当有四个以上特定值相同(即UserID,ContactID,SMSID和EventID)时,该行就被认为是重复的。我需要根据最近的行(由CreatedDate标识)删除这些。如何识别和删除重复行,除了最近
下面的查询识别这些行:
SELECT a.UserID, a.ContactID, a.SMSID, a.EventID, CreatedDate
FROM WhenToText a
JOIN (SELECT UserID, ContactID, SMSID, EventID
FROM WhenToText
GROUP BY UserID, ContactID, SMSID, EventID
HAVING COUNT(*) > 1) b
ON a.UserID = b.UserID
AND a.ContactID = b.ContactID
AND a.SMSID = b.SMSID
AND a.EventID = b.EventID
ORDER BY UserID, ContactID, SMSID, EventID, CreatedDate DESC
但是,我不知道我已经确定了他们后如何删除这些重复。
下面是一些样本数据:
这里有一个办法:
DELETE FROM WhenToText w1
INNER JOIN
(
SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS MaxDate
FROM WhenToText
GROUP BY UserID, ContactID, SMSID, EventID
) w2
ON w1.UserID = w2.UserID AND w1.ContactID = w2.ContactID AND w1.SMSID = w2.SMSID
AND w1.EventID = w2.EventID
AND w1.CreatedDate != w2.MaxDate
这将删除CreatedDate
不是最新的给定(UserID, ContactID, SMSID, EventID)
组的任何记录。请记住,如果共享最新的CreatedDate
,这可能会为每个组留下多个记录。
如果要首先测试哪个查询以查看哪些记录将作为删除目标,那么可以用SELECT w1.* FROM WhenToText w1
替换DELETE FROM WhenToText w1
。
这里是一个SQL小提琴一个链接,演示了查询如何确定删除记录:
SQLFiddle
非常感谢,这正是我一直在寻找的。 – David
下面是一个使用DELETE FROM JOIN,W /与您的数据的完整演示解决方案。
SQL:
-- Data preparation
create table WhenToText(UserID int, ContactID int, SMSID int, EventID int, CreatedDate datetime);
insert into WhenToText values
(4, 25, 7934, 7407, '2016-02-10 00:00:11'),
(4, 25, 7934, 7407, '2016-02-09 00:00:12'),
(4, 29, 5132, 7407, '2016-02-10 00:00:11'),
(4, 29, 5132, 7407, '2016-02-09 00:00:12'),
(4, 31, 12944, 7405, '2016-02-10 07:03:02'),
(4, 31, 12944, 7405, '2016-02-10 05:03:02'),
(4, 146, 12908, 7405, '2016-02-10 06:52:02'),
(4, 146, 12908, 7405, '2016-02-10 04:52:02'),
(15, 63, 12964, 7401, '2016-02-10 03:42:04'),
(15, 63, 12964, 7401, '2016-02-10 03:41:04'),
(15, 64, 12326, 7401, '2016-02-07 03:01:03'),
(15, 64, 12326, 7401, '2016-02-07 03:00:03');
SELECT * FROM WhenToText;
-- SQL needed
DELETE a FROM
WhenToText a INNER JOIN
(
SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate
FROM WhenToText
GROUP BY UserID, ContactID, SMSID, EventID
) b
USING(UserID, ContactID, SMSID, EventID)
WHERE
a.CreatedDate != b.CreatedDate;
SELECT * FROM WhenToText;
输出:
mysql> SELECT * FROM WhenToText;
+--------+-----------+-------+---------+---------------------+
| UserID | ContactID | SMSID | EventID | CreatedDate |
+--------+-----------+-------+---------+---------------------+
| 4 | 25 | 7934 | 7407 | 2016-02-10 00:00:11 |
| 4 | 25 | 7934 | 7407 | 2016-02-09 00:00:12 |
| 4 | 29 | 5132 | 7407 | 2016-02-10 00:00:11 |
| 4 | 29 | 5132 | 7407 | 2016-02-09 00:00:12 |
| 4 | 31 | 12944 | 7405 | 2016-02-10 07:03:02 |
| 4 | 31 | 12944 | 7405 | 2016-02-10 05:03:02 |
| 4 | 146 | 12908 | 7405 | 2016-02-10 06:52:02 |
| 4 | 146 | 12908 | 7405 | 2016-02-10 04:52:02 |
| 15 | 63 | 12964 | 7401 | 2016-02-10 03:42:04 |
| 15 | 63 | 12964 | 7401 | 2016-02-10 03:41:04 |
| 15 | 64 | 12326 | 7401 | 2016-02-07 03:01:03 |
| 15 | 64 | 12326 | 7401 | 2016-02-07 03:00:03 |
+--------+-----------+-------+---------+---------------------+
12 rows in set (0.00 sec)
mysql>
mysql> -- SQL needed
mysql> DELETE a FROM
-> WhenToText a INNER JOIN
-> (
-> SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate
-> FROM WhenToText
-> GROUP BY UserID, ContactID, SMSID, EventID
-> ) b
-> USING(UserID, ContactID, SMSID, EventID)
-> WHERE
-> a.CreatedDate != b.CreatedDate;
SELECT * FQuery OK, 6 rows affected (0.00 sec)
mysql>
mysql> SELECT * FROM WhenToText;
+--------+-----------+-------+---------+---------------------+
| UserID | ContactID | SMSID | EventID | CreatedDate |
+--------+-----------+-------+---------+---------------------+
| 4 | 25 | 7934 | 7407 | 2016-02-10 00:00:11 |
| 4 | 29 | 5132 | 7407 | 2016-02-10 00:00:11 |
| 4 | 31 | 12944 | 7405 | 2016-02-10 07:03:02 |
| 4 | 146 | 12908 | 7405 | 2016-02-10 06:52:02 |
| 15 | 63 | 12964 | 7401 | 2016-02-10 03:42:04 |
| 15 | 64 | 12326 | 7401 | 2016-02-07 03:01:03 |
+--------+-----------+-------+---------+---------------------+
6 rows in set (0.00 sec)
这应该提供您正在寻找的解决办法,因为CreatedDate是一个日期数据类型。这也是基于最近的行在技术上是最近的CreatedDate的假设。
SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate
FROM WhenToText
GROUP BY 1, 2, 3, 4;
有了这些值,你可以只覆盖WhenToText表...这将是这个样子......
CREATE TABLE tmp_table LIKE WhenToText;
INSERT INTO tmp_table (SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate
FROM WhenToText
GROUP BY 1, 2, 3, 4);
TRUNCATE WhenToText;
INSERT INTO WhenToText (SELECT * FROM tmp_table);
DROP TABLE tmp_table;
我只是试着用这个,它一直说我不能“在CreatedDate上分组”。 – David
另外,我试着做一些调整,比如用ORDER BY替换GROUP BY,并且查询只返回1个单行(即MAX(CreatedDate)。) – David
1st问:@David不知道如何得到GROUP BY错误,因为如果它们是select语句中的前4个项目,那么1,2,3,4只指向项目(UserID,ContactID,SMSID,EventID)。第二个问题:您需要GROUP BY语句正确地运行聚集MAX函数...使用聚合引入ORDER BY子句而不是GROUP BY聚合,只返回一行 – TomDobbs
当你说'“根据最近的一排”'你的意思是,在在重复的情况下,您想要保留最近的记录吗?你能向我们展示一些样本数据吗? –
是的,这是正确的。我只想保留最近的重复。所以我对最新的CreatedDate感兴趣。我在原始文章中添加了一些示例数据的屏幕截图。再次感谢。 – David