MySQL offers a virtually endless variety of options for dealing with duplicated data. Most can be updated or removed using one statement. However, there are times when multiple commands must be issued to get the job done. Today’s blog will present a solution that employs a temporary table along with a SELECT DISTINCT query.
Permanent vs. Temporary Tables in MySQL
It should be noted that the temporary table that we will be creating here today differs from a true temporary table in MySQL, in that we are not adding the TEMPORARY keyword to the CREATE TABLE statement.
In MySQL, a temporary table is a special type of table that allows you to store a temporary result set, which you can reuse several times in a single session. A temporary table comes in handy when it’s impossible or expensive to query data using a single SELECT statement. Like a temporary table created using the TEMPORARY keyword, our “temporary” table will store the immediate result of a SELECT query, so that we can issue one or more additional queries to fully process the data. We will then replace the target table with the temp table.
Removing Duplicate Rows from the amalgamated_actors Table
In the How to Delete Duplicate Rows with Different IDs in MySQL (Part 3) blog, we successfully removed rows that contained duplicate names. However, that still left rows whose IDs and names were the same, in other words, where entire rows were duplicated. For instance, we can see in the result set below that “22 JENNIFER DAVIS” appears twice:
id | first_name | last_name | ||
--------------------------------------------------- | ||||
10 | PENELOPE | GUINESS | ||
14 | ED | CHASE | ||
22 | JENNIFER | DAVIS | ||
23 | JOHNNY | LOLLOBRIGIDA | ||
27 | BETTE | NICHOLSON | ||
34 | GRACE | MOSTEL | ||
41 | NICK | WAHLBERG | ||
39 | JOE | SWANK | ||
23 | CHRISTIAN | GABLE | ||
22 | JENNIFER | DAVIS |
This is an ideal candidate for the temp table approach.
MySQL offers the special CREATE TABLE ... LIKE command to create an empty table based on the definition of another table, including any column attributes and indexes defined in the original table.
Hence, we can create a table based on the amalgamated_actors table like so:
-- Create temporary table
CREATE TABLE wp.temp_table LIKE wp.amalgamated_actors;
Here’s the statement to copy all of the data from the amalgamated_actors table into temp_table:
INSERT INTO wp.temp_table
SELECT DISTINCT * FROM wp.amalgamated_actors;
The SELECT DISTINCT clause is key to removing duplicate rows.
Finally, we need to rename the original table, so that we can replace it with the temp table, and drop the original table:
-- Rename and drop
RENAME TABLE wp.amalgamated_actors TO wp.old_amalgamated_actors,
wp.temp_table TO wp.amalgamated_actors;
DROP TABLE wp.old_amalgamated_actors;
Now there is only one row with JENNIFER DAVIS:
id | first_name | last_name | ||
--------------------------------------------------- | ||||
10 | PENELOPE | GUINESS | ||
14 | ED | CHASE | ||
22 | JENNIFER | DAVIS | ||
23 | JOHNNY | LOLLOBRIGIDA | ||
27 | BETTE | NICHOLSON | ||
34 | GRACE | MOSTEL | ||
41 | NICK | WAHLBERG | ||
39 | JOE | SWANK | ||
23 | CHRISTIAN | GABLE |
Our original amalgamated_actors table is no more, having been replaced by the “temp” table.
Removing Duplicate Rows using the UNIQUE Constraint
In the next blog on handling duplicate data, we’ll employ the UNIQUE constraint to delete rows with duplicate name fields, regardless of whether or not the IDs are duplicated.