Hi,
I have a situation where i have duplicate records in my table. This is determined based on a combination of a few natural/business keys.
I know the way to identify if there are duplicates using the below query :
SELECT COL1,COL2,COL3,COUNT(*)
FROM TABLE
GROUP BY COL1,COL2,COL3
HAVING COUNT(*)>1
Lets assume that this returns 2.6 million records.
My requirement is to retrieve the complete records from the table that are duplicated and NOT JUST the duplicate keys.
I coded the below SQL :
SELECT A.COL1,A.COL2,A.COL3,A.COL4,A.COL5
FROM
(
SELECT COL1,COL2,COL3,COL4,COL5
FROM TABLE
) A
INNER JOIN
(
SELECT DISTINCT COL1,COL2,COL3
FROM TABLE
GROUP BY COL1,COL2,COL3
HAVING COUNT(*)>1
) B
ON
A.COL1=B.COL1 AND
A.COL2=B.COL2 AND
A.COL3=B.COL3
This SQL is returining only 2.9 million records. When i get 2.6 million records having more than 1 record for a key combination, when i join it with the table i should get a minimum of 2.6 Million *2 (assuming there are 2 duplicate keys) or more. But it can't be less.
I understand that I am not getting the desired results. Could you please share your thoughts if the above approach is correct? and if so, could you please share your thoughts on not getting the desired output?
Forums: