• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer navigation

The Geek Diary

  • OS
    • Linux
    • CentOS/RHEL
    • Solaris
    • Oracle Linux
    • VCS
  • Interview Questions
  • Database
    • oracle
    • oracle 12c
    • ASM
    • mysql
    • MariaDB
  • DevOps
    • Docker
    • Shell Scripting
  • Big Data
    • Hadoop
    • Cloudera
    • Hortonworks HDP

Deleting duplicate records from a table

by admin

So, first of all, we will create a table and insert some records in it. Then we will write a query to delete the duplicate records from the table. In this article, we will show an example of deleting duplicate records from a table using three different techniques as using GROUP BY and HAVING, an analytical function such as row_number, and the third one by using the correlated subquery. The reader can decide on which way he is comfortable with doing the job.

They key to delete duplicate records lies in the following 3 steps:

  • Decide on your Primary Key(How will you consider a record duplicate, which column(s) data if repeated will be considered as duplicate)
  • Find the duplicate records
  • Delete them using rowid

So let’s begin.

STEP 1: Create a table

Lets create a dummy table first:

-- Create  table

CREATE TABLE COUNTRY_MAS
( 
COUNTRY_ID NUMBER,
COUNTRY_NAME VARCHAR2(50)
);

STEP 2 Insert some records in the table

--  insert some records with duplicates in the table
INSERT ALL
INTO COUNTRY_MAS VALUES(100,'INDIA')
INTO COUNTRY_MAS VALUES(101,'UK')
INTO COUNTRY_MAS VALUES(102,'USA')
INTO COUNTRY_MAS VALUES(103,'KSA')
INTO COUNTRY_MAS VALUES(100,'INDIA')
INTO COUNTRY_MAS VALUES(104,'UAE')
INTO COUNTRY_MAS VALUES(101,'UK')
INTO COUNTRY_MAS VALUES(105,'CANADA')
INTO COUNTRY_MAS VALUES(106,'GERMANY')
SELECT * FROM DUAL;

COMMIT;

Here I will consider a record as duplicate if the country_name column data is being repeated. Now, as you can see you have two records with COUNTRY_NAME as ‘INDIA’ and ‘UK’.

1. DELETE DUPLICATE RECORD USING GROUP BY AND HAVING CLAUSE

So we would write a query to find the duplicate with GROUP BY and HAVING clause.

SELECT COUNTRY_NAME, COUNT(1) FROM COUNTRY_MAS GROUP BY COUNTRY_NAME;

--- OUTPUT

COUNTRY_NAME COUNT(1) 
-----------------------------------
GERMANY              1
USA                  1
CANADA               1
INDIA                2
UAE                  1
UK                   2
KSA                  1

Now, we know that for COUNTRY_NAME such as ‘UK’ and ‘INDIA’ the count(1) is 2, meaning they appear twice in the table. So now we can get all the records with count(1) > 1 and put them into an IN clause and delete them as below:

DELETE FROM COUNTRY_NAME WHERE COUNTRY_NAME IN ('UK','INDIA');

If you want to take the game to the next level, instead of manually writing the COUNTRY_NAME in the IN clause, we can target the ROWID of all the duplicate records and delete them, What we will do is we will take the MIN(ROWID) for all the unique COUNTRY_NAME records.

SELECT MIN(ROWID)
FROM COUNTRY_MAS
GROUP BY COUNTRY_NAME;

-- output

ROWID
------------
AAAU3NAAEAAAAndAAI
AAAU3NAAEAAAAndAAC
AAAU3NAAEAAAAndAAH
AAAU3NAAEAAAAndAAA
AAAU3NAAEAAAAndAAF
AAAU3NAAEAAAAndAAB
AAAU3NAAEAAAAndAAD

As you can see we got 7 rowid, one for each record (there are 9 records in the table). Now we can delete all the extra records from the table as follows:

DELETE FROM COUNTRY_MAS WHERE ROWID NOT IN (
SELECT MIN(ROWID)
FROM COUNTRY_MAS
GROUP BY COUNTRY_NAME);

-- 2 records deleted

2. DELETE DUPLICATE RECORD USING CORRELATED SUBQUERY

The user can find the duplicate even by using the correlated subquery such as below:

SELECT COUNTRY_NAME FROM COUNTRY_MAS C2 WHERE ROWID IN ( 
SELECT MIN(ROWID)
FROM COUNTRY_MAS C1
WHERE C1.COUNTRY_NAME = C2.COUNTRY_NAME);

Now delete the duplicate:

DELETE FROM COUNTRY_MAS WHERE ROWID NOT IN (
SELECT ROWID FROM COUNTRY_MAS CM WHERE ROWID IN (
SELECT MIN(ROWID)
FROM COUNTRY_MAS C
WHERE CM.COUNTRY_ID = C.COUNTRY_ID
AND CM.COUNTRY_NAME = C.COUNTRY_NAME));

3. DELETE DUPLICATE USING ANALYTICAL FUNCTION:

The user can find the duplicate even by using the analytical function such as below:

SELECT ROWID FROM (
SELECT ROWID, ROW_NUMBER() OVER(PARTITION BY COUNTRY_ID, COUNTRY_NAME ORDER BY COUNTRY_ID) RANK
FROM COUNTRY_MAS)
WHERE RANK=2;

Now delete the duplicate:

DELETE FROM COUNTRY_MAS WHERE ROWID IN (
SELECT ROWID FROM (
SELECT ROWID, ROW_NUMBER() OVER(PARTITION BY COUNTRY_ID, COUNTRY_NAME ORDER BY COUNTRY_ID) RANK
FROM COUNTRY_MAS)
WHERE RANK=2);

Filed Under: oracle

Some more articles you might also be interested in …

  1. Oracle RMAN Backup Shell Script Example
  2. How to Perform Manual Archiving in Oracle Database
  3. Oracle – How To Check if Autoextensible Datafiles Set To Maxsize Unlimited
  4. How to Use Udev Rules to Create oracleasm Disks in CentOS/RHEL 8
  5. Adding Users to Oracle Passwordfile (Oracle Passwordfile Authentication)
  6. What are different Oracle Database Vault Roles
  7. Difference between SQL and SQL*Plus Statements
  8. How to Enable or Disable Veritas ODM for Oracle database 11g
  9. How to Check if a table is Indexed in Oracle
  10. Establish Connection in CDB and PDB

You May Also Like

Primary Sidebar

Recent Posts

  • protonvpn-cli Command Examples in Linux
  • protonvpn-cli connect Command Examples
  • procs Command Examples in Linux
  • prlimit: command not found

© 2023 · The Geek Diary

  • Archives
  • Contact Us
  • Copyright