Thoughts on Data Management: December 2012

Every decision, every activity and every interaction is data-driven. Even intuition, spontaneous responses and instincts rely on data. In other words, some set of conditions, or information, stimulates, allows or invokes them. We exist in an ecosystem of triggers and responses. This is natural and happens everywhere in the universe – all the time. The second law in thermodynamics states that entropy, or level of disorder / uncertainty always increases. In other words the complexity in the world is ever increasing. With the associated, ever-increasing, amount of information generated by anything and everything all the time – why and how should this information be probed?

The “how” question is being answered quicker than the “why”. We really do need to ensure that we invest in understanding data only for justifiable reasons. Information is an asset after all and should be used effectively. What are these questions? Why and how should they be answered? Is analyzing endless amounts of Peta bytes of data really the right way to go? Is it safe for us? Is it safe for the owners of the data?

I have discussed some thoughts relating to data ownership before. With the increasing amounts of data being probed, I believe there is still a gap between the intention of policy makers, and the realization of data management measures. The biggest gap is the poor ability to identify and assign accountability and appropriate leavers for data stakeholders to control data.

Let’s look at two examples: Firstly, you must have heard how a big retailer in the US knew a teenager was pregnant before her father did by analyzing her shopping habits. Do they have the right to analyze their consumer information in this manner and make assumptions or deduce such personal facts? You may argue that by agreeing to shop in their stores – shoppers consent to this type of personal probing. This can obviously lead to more effective marketing, but as a consumer, would you consider this an acceptable practice?

My second example is something a lot more familiar. I often moan that search engines try and force me to use their local search site. In a typical environment, the global site picks up my location (using GPS or internet provider identification). It then redirects my query automatically to the local search site which then customizes the results based on my location. It makes the default assumption that this is what I want. I do know this can be customized, and I do know that these services are free – so I can’t really complain. Nonetheless it is critical that what I NEED, WANT and GET is carefully balanced. The last thing you want is to upset the user or customer by making assumptions or appearing as invading their privacy.

What is the gap that needs to be addressed to support the intention of policy makers? I believe it is clearer lines between what we NEED to know, DO know, CAN know and WANT to know.

In a typical environment, you might find that: what we DO know is more than what we NEED to know, but is less than what we WANT to know. At the same time, it is less than what we CAN know (as we often know things we do not necessarily want to know). Simply put:

Can know > Want to know > Do know > Need to know

If the world really operated in this fashion – everything would be quite simple. However, we struggle with data ownership because we fail to control the boundaries between what we need to know, do know, can know and want to know.

Let's look at some of the problematic scenarios:

If we DO know less than what we NEED to know – we are dysfunctional. We are forced to guess missing information which can be dangerous. In this situation we often make mistakes purely out of ignorance.

If we WANT to know more than we CAN know – we are chasing windmills. The question here is whether what we seek to know can be known (i.e. is it discoverable or not).

If we DO know all that we WANT to know – we may be crossing the line. The additional information may not be something we are entitled to know. This is where information arbitrage is used to take advantage of a situation (e.g. try and win over customers). This then really becomes a question of ethics.

To me it makes sense that owners of information should have the right to control who may access their information and why. This obviously needs to be protected in the context relevant legal and social rules.

My question is: what do we need to do to ensure this right of data owners is protected?

Thoughts on Data Management

Monday, December 31, 2012

The basic right of data owners