问题描述:有一张表因为没有加重复判断,导致会有些重复数据存在,现在需要根据重复规则去掉重复的记录,只保留第一条。解决方案:方案1:采用rowid的方式
DELETE FROM tmp101 A WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM tmp101 b WHERE A.col1 = b.col1 AND A.col2 = b.col2 AND A.col3 = b.col3 AND A.col4 = b.col4);
如果表中没有外键约束,上述方式可以很好的解决。如果有外键的话可能会遇到如下问题。
这部分记录已经在别的表里面有引用了,如果要删除,需要先删除子表中的数据。
-- 删除子表 delete from tmp1011 where col in( select col FROM tmp101 A WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM tmp101 b WHERE A.col1 = b.col1 AND A.col2 = b.col2 AND A.col3 = b.col3 AND A.col4 = b.col4)); -- 删除主表 DELETE FROM tmp101 A WHERE ROWID NOT IN (SELECT MIN(ROWID) FROM tmp101 b WHERE A.col1 = b.col1 AND A.col2 = b.col2 AND A.col3 = b.col3 AND A.col4 = b.col4);
方案2:采用row_number()over(partition by order by)
-- 删除子表 DELETE FROM tmp1011 WHERE col IN ( SELECT col FROM ( SELECT col,col1,col2,col3,col4 row_number()OVER(PARTITION BY col1,col2,col3,col4 ORDER BY col) rn FROM tmp101) WHERE rn <> 1 ); -- 删除主表 DELETE FROM tmp101 WHERE col IN ( SELECT col FROM ( SELECT col,col1,col2,col3,col4 row_number()OVER(PARTITION BY col1,col2,col3,col4 ORDER BY col) rn FROM tmp101) WHERE rn <> 1 );
如果数据量大导致的删除困难,记得加下循环。
-- 按照日期进行循环 DECLARE bdate DATE; edate DATE; BEGIN edate := TRUNC(SYSDATE); SELECT NVL(TRUNC(MIN(col5)),TRUNC(SYSDATE)) INTO bdate FROM tmp101; WHILE bdate < edate LOOP DELETE FROM tmp1011 WHERE col IN ( SELECT col FROM ( SELECT col,col1,col2,col3,col4 row_number()OVER(PARTITION BY col1,col2,col3,col4 ORDER BY col) rn FROM tmp101 WHERE col5 >= bdate AND col5 < bdate + 1) WHERE rn <> 1 ); COMMIT; bdate := bdate + 1; END LOOP; END; /
