So, first of all, we will create a table and insert some records in it. Then we will write a query to delete the duplicate records from the table. In this article, we will show an example of deleting duplicate records from a table using three different techniques as using GROUP BY and HAVING, an analytical function such as row_number, and the third one by using the correlated subquery. The reader can decide on which way he is comfortable with doing the job.
They key to delete duplicate records lies in the following 3 steps:
- Decide on your Primary Key(How will you consider a record duplicate, which column(s) data if repeated will be considered as duplicate)
- Find the duplicate records
- Delete them using rowid
So let’s begin.
STEP 1: Create a table
Lets create a dummy table first:
-- Create table CREATE TABLE COUNTRY_MAS ( COUNTRY_ID NUMBER, COUNTRY_NAME VARCHAR2(50) );
STEP 2 Insert some records in the table
-- insert some records with duplicates in the table INSERT ALL INTO COUNTRY_MAS VALUES(100,'INDIA') INTO COUNTRY_MAS VALUES(101,'UK') INTO COUNTRY_MAS VALUES(102,'USA') INTO COUNTRY_MAS VALUES(103,'KSA') INTO COUNTRY_MAS VALUES(100,'INDIA') INTO COUNTRY_MAS VALUES(104,'UAE') INTO COUNTRY_MAS VALUES(101,'UK') INTO COUNTRY_MAS VALUES(105,'CANADA') INTO COUNTRY_MAS VALUES(106,'GERMANY') SELECT * FROM DUAL; COMMIT;
Here I will consider a record as duplicate if the country_name column data is being repeated. Now, as you can see you have two records with COUNTRY_NAME as ‘INDIA’ and ‘UK’.
1. DELETE DUPLICATE RECORD USING GROUP BY AND HAVING CLAUSE
So we would write a query to find the duplicate with GROUP BY and HAVING clause.
SELECT COUNTRY_NAME, COUNT(1) FROM COUNTRY_MAS GROUP BY COUNTRY_NAME; --- OUTPUT COUNTRY_NAME COUNT(1) ----------------------------------- GERMANY 1 USA 1 CANADA 1 INDIA 2 UAE 1 UK 2 KSA 1
Now, we know that for COUNTRY_NAME such as ‘UK’ and ‘INDIA’ the count(1) is 2, meaning they appear twice in the table. So now we can get all the records with count(1) > 1 and put them into an IN clause and delete them as below:
DELETE FROM COUNTRY_NAME WHERE COUNTRY_NAME IN ('UK','INDIA');
If you want to take the game to the next level, instead of manually writing the COUNTRY_NAME in the IN clause, we can target the ROWID of all the duplicate records and delete them, What we will do is we will take the MIN(ROWID) for all the unique COUNTRY_NAME records.
SELECT MIN(ROWID) FROM COUNTRY_MAS GROUP BY COUNTRY_NAME; -- output ROWID ------------ AAAU3NAAEAAAAndAAI AAAU3NAAEAAAAndAAC AAAU3NAAEAAAAndAAH AAAU3NAAEAAAAndAAA AAAU3NAAEAAAAndAAF AAAU3NAAEAAAAndAAB AAAU3NAAEAAAAndAAD
As you can see we got 7 rowid, one for each record (there are 9 records in the table). Now we can delete all the extra records from the table as follows:
DELETE FROM COUNTRY_MAS WHERE ROWID NOT IN ( SELECT MIN(ROWID) FROM COUNTRY_MAS GROUP BY COUNTRY_NAME); -- 2 records deleted
2. DELETE DUPLICATE RECORD USING CORRELATED SUBQUERY
The user can find the duplicate even by using the correlated subquery such as below:
SELECT COUNTRY_NAME FROM COUNTRY_MAS C2 WHERE ROWID IN ( SELECT MIN(ROWID) FROM COUNTRY_MAS C1 WHERE C1.COUNTRY_NAME = C2.COUNTRY_NAME);
Now delete the duplicate:
DELETE FROM COUNTRY_MAS WHERE ROWID NOT IN ( SELECT ROWID FROM COUNTRY_MAS CM WHERE ROWID IN ( SELECT MIN(ROWID) FROM COUNTRY_MAS C WHERE CM.COUNTRY_ID = C.COUNTRY_ID AND CM.COUNTRY_NAME = C.COUNTRY_NAME));
3. DELETE DUPLICATE USING ANALYTICAL FUNCTION:
The user can find the duplicate even by using the analytical function such as below:
SELECT ROWID FROM ( SELECT ROWID, ROW_NUMBER() OVER(PARTITION BY COUNTRY_ID, COUNTRY_NAME ORDER BY COUNTRY_ID) RANK FROM COUNTRY_MAS) WHERE RANK=2;
Now delete the duplicate:
DELETE FROM COUNTRY_MAS WHERE ROWID IN ( SELECT ROWID FROM ( SELECT ROWID, ROW_NUMBER() OVER(PARTITION BY COUNTRY_ID, COUNTRY_NAME ORDER BY COUNTRY_ID) RANK FROM COUNTRY_MAS) WHERE RANK=2);