Saturday, August 25, 2012

Data Quality: Take it to the limit

During my studies, I had the good fortune of experiencing two different approaches to teaching. The first was to address a wide range of problems, with the focus on practising and mastering specific techniques, while the other was to explore the subject area in terms of its concepts, and rules. This is rather similar, for instance, to  learning what you need to do in order to catch a flight from point "A" to "B". One approach would be to board many flights, while the other is to question how the governments, airport authorities and airlines operate together to provide the service we call "a flight".

The analogy above is somewhat bias, as most of us would regard boarding multiple flights for the sake of learning impractical (although appealing). The point is that each approach has its advantages and shortfalls.

The only difference I would like to highlight here is really what I call "Taking it to the limit". One of the ways to truly appreciate the nature of a system, is by testing its behaviour, and especially close to its boundaries (or limits). Believe it or not, anyone who dealt with data quality has or should have been dealing with a boundary problem.

Now, if you try to run through the security check at the airport (without letting the authorities check your belongings) - you will very quickly learn to appreciate how the system works. When it comes to Data Quality, it is not that much different. If the sales department captures the wrong number of zeros at the end of their reporting figures  - they quickly learn, not only how people react, but also what is the likely impact on the company (and their job).

I am NOT advocating you should insert errors in to your data just to "test the system", but rather that data quality analysis should explore, with rigour, the various scenarios for the data - especially close to the data's boundaries. Moreover, the additional value of the process of testing the boundaries with the people who control them is important for building a sustainable data quality solution. 

By understanding the value of the (quality) controls, people not only feel part of something bigger, but also feel empowered to take proactive steps to advance the common goals around data quality.