We have a data warehouse into which we import data obtained from our outside vendors. The data are then used to build data cubes. Lately we have started to get a duplicate key error on the "County" attribute in our "Location" dimension.
The problem appears to be caused by these two values returned by the query that returns the distinct list of counties:
These are German counties and they are really the same county, I believe. Our data warehouse stores County as a varchar. If I cast the stored value as nvarchar, the query that returns the distinct list of counties returns only the 2nd value and
so should avoid the duplicate key error. (I haven't tried it yet, just speculating.)
This post suggests the root cause of the problem:
We don't have any control over the data before our vendors give it to us, and it comes from all over the world. What is the best way to avoid occurrences of the duplicate attribute key error? Should we be storing all our character data as nvarchar?
Would it be enough to cast the stored data as nvarchar when the cubes are being built? Is there a better approach? I'd like to get an idea of what the best practice would be. Our warehouse
View Complete Post