In any kind of business pragmatism demands a balance between quality cost and time. This applies to data management as much as it does to any other type of effort.
When you need statistical significance and to understand trends – it seems almost intuitive that you do not need the highest quality of data. However, one has to be careful. Some dimensions of the quality will be critical to your usage, others would need to comply to some boundary rules and other dimensions might be completely irrelevant. It all depends on the impact, or sensitivity of your measure to those dimensions.
So to be pragmatic, you need to consider these sensitivities and decide what you can, or should accept from a practical perspective. I know that ideally, having beautiful, defect-free data sounds like a dream come true (sorry, but I am a data geek!), but you may have certain objectives which may not care about being data-perfect at all.
For example, consider a sample of data depicting the volume of traffic on a road at various days and times in the week. The precise number of cars may be irrelevant if you are evaluating how well the roads are designed to handle the traffic, or it may be critical, if the road is a toll road and you need to bill the road users. Even then, it may be worth investing in cheaper license plate recognition technology which will help you identify 90% of the road users, than buying and maintaining a system that gives you 99% success rate but costing you a lot more on the long run. It all depends on the practical constraints of your operation.
The larger the data, the more important are the boundaries (or thresholds), rather than the exact value, while for small data, every bit counts (literally).
At the end of the day, what matters is what you do with the data. As the old saying goes: knowledge is power, but how you apply it makes all the difference.
When you need statistical significance and to understand trends – it seems almost intuitive that you do not need the highest quality of data. However, one has to be careful. Some dimensions of the quality will be critical to your usage, others would need to comply to some boundary rules and other dimensions might be completely irrelevant. It all depends on the impact, or sensitivity of your measure to those dimensions.
So to be pragmatic, you need to consider these sensitivities and decide what you can, or should accept from a practical perspective. I know that ideally, having beautiful, defect-free data sounds like a dream come true (sorry, but I am a data geek!), but you may have certain objectives which may not care about being data-perfect at all.
For example, consider a sample of data depicting the volume of traffic on a road at various days and times in the week. The precise number of cars may be irrelevant if you are evaluating how well the roads are designed to handle the traffic, or it may be critical, if the road is a toll road and you need to bill the road users. Even then, it may be worth investing in cheaper license plate recognition technology which will help you identify 90% of the road users, than buying and maintaining a system that gives you 99% success rate but costing you a lot more on the long run. It all depends on the practical constraints of your operation.
The larger the data, the more important are the boundaries (or thresholds), rather than the exact value, while for small data, every bit counts (literally).
At the end of the day, what matters is what you do with the data. As the old saying goes: knowledge is power, but how you apply it makes all the difference.
No comments:
Post a Comment