Dirty data is gonna dog you from day one and keep biting you even months after you go live – not exaggerating.
“You may be through with the past, but the past is not through with you.”
In short, you’ve got dirty data debt.
Here’s the big problem with dirty data: the amount of time you spend on it is completely asymmetrical with its prevalence. You can spend as much time fixing a problem that occurs on one record as a problem that occurs on a thousand records. Every problem needs a solution.
Worse, some of these problems are in the edge cases of company policy and you need people whose time is valuable and scarce to solve them.
But let’s say you know how to fix a given problem.
Do you fix it in the current production data base?
Will that introduce bugs in production somehow? Or require code changes to prevent generation of more bad data?
Or will you fix it on the fly in the migration transformation process?
You can do either – choose wisely!There are so many examples of problems migrating dirty data it warrants a book, or ten.
What are some of your favorite examples of dirty data you have had to wrestle with?