We are sourcing some data externally and I'm working on importing these into our database on a regular basis. The sourced data however isn't always perfect and I need to be able to deal with duplicate records. These come in two flavors:
- Records that are just repeated; all fields have the same values.
- Records that have the same primary key but different values in some other fields.
The first case shouldn't cause us too much headache, but we still want to report back to where the data came from so errors can be corrected. In the second case we'll probably involve the user, to whom the data makes more sense, perhaps allowing the user
to choose to import one of the received versions of a given record.
To my surprise, however, I am having difficulty just identifying
the records that are identical (in all fields). What I'm doing is:
- Create a staging table to hold the data as-received. Let's call this [MyTable]. It has a RowID INT IDENTITY(1,1) column in addition to the columns it needs to hold the received data.
- Insert the received data.
- Get the set of identical rows as follows: