Forum Moderators: open
Anyways, I have this table with two int columns.
The data looks like this:
1 3
1 4
2 6
2 7
3 1
3 4
3 6
3 8
...
14225 838827
14225 838828
14225 838892
...
it goes on like that... for over 12 million rows.
Anyways, I don't need duplicates in there. And there are plenty.
Not only are there multiple identical rows like this,
1 2
1 2
1 2
but there are "flipped" rows, which (for my purposes) are identical, like:
1 3
1 3
3 1
3 1
what's the easiest way to dedupe this table?
delete from table where column1 = 10 and column2 = 25 otherwise it will delete the dupe and the original.
I think you might have to select out dupes using GROUP BY, write down which ones have dupes, then manually delete them.
OR
Create a third column as a rowId and then write a small program to loop through the table and remove the duplicates via their rowId.
Personally, I'd choose the second option and then just removed the rowId column and setup the primary key correctly.
write down which ones have dupes, then manually delete them
I figure to do that with 12 million records, if I could do one every second, working 24h a day without sleep or interruption, I'd be finished in just a little over 9 years. I'd probably also go through a few thousand books of paper and ball-point pens
piatkow's solution seems pretty solid... I'll try it. I wonder how long it'll take to run that one!
isn't there a query I can do using DISTINCT and HAVING? hmm
It perplexes me why isn't this built into the SQL language