Monday, December 31, 2012

The basic right of data owners

Every decision, every activity and every interaction is data-driven. Even intuition, spontaneous responses and instincts rely on data. In other words, some set of conditions, or information, stimulates, allows or invokes them. We exist in an ecosystem of triggers and responses. This is natural and happens everywhere in the universe – all the time. The second law in thermodynamics states that entropy, or level of disorder / uncertainty always increases. In other words the complexity in the world is ever increasing. With the associated, ever-increasing, amount of information generated by anything and everything all the time – why and how should this information be probed?

The “how” question is being answered quicker than the “why”. We really do need to ensure that we invest in understanding data only for justifiable reasons. Information is an asset after all and should be used effectively. What are these questions? Why and how should they be answered? Is analyzing endless amounts of Peta bytes of data really the right way to go? Is it safe for us? Is it safe for the owners of the data?

I have discussed some thoughts relating to data ownership before. With the increasing amounts of data being probed, I believe there is still a gap between the intention of policy makers, and the realization of data management measures. The biggest gap is the poor ability to identify and assign accountability and appropriate leavers for data stakeholders to control data.

Let’s look at two examples: Firstly, you must have heard how a big retailer in the US knew a teenager was pregnant before her father did by analyzing her shopping habits. Do they have the right to analyze their consumer information in this manner and make assumptions or deduce such personal facts? You may argue that by agreeing to shop in their stores – shoppers consent to this type of personal probing. This can obviously lead to more effective marketing, but as a consumer, would you consider this an acceptable practice?

My second example is something a lot more familiar. I often moan that search engines try and force me to use their local search site. In a typical environment, the global site picks up my location (using GPS or internet provider identification). It then redirects my query automatically to the local search site which then customizes the results based on my location. It makes the default assumption that this is what I want. I do know this can be customized, and I do know that these services are free – so I can’t really complain. Nonetheless it is critical that what I NEED, WANT and GET is carefully balanced. The last thing you want is to upset the user or customer by making assumptions or appearing as invading their privacy.

What is the gap that needs to be addressed to support the intention of policy makers? I believe it is clearer lines between what we NEED to know, DO know, CAN know and WANT to know.

In a typical environment, you might find that: what we DO know is more than what we NEED to know, but is less than what we WANT to know. At the same time, it is less than what we CAN know (as we often know things we do not necessarily want to know). Simply put:

Can know > Want to know > Do know > Need to know

If the world really operated in this fashion – everything would be quite simple. However, we struggle with data ownership because we fail to control the boundaries between what we need to know, do know, can know and want to know.

Let's look at some of the problematic scenarios:

If we DO know less than what we NEED to know – we are dysfunctional. We are forced to guess missing information which can be dangerous. In this situation we often make mistakes purely out of ignorance.

If we WANT to know more than we CAN know – we are chasing windmills. The question here is whether what we seek to know can be known (i.e. is it discoverable or not).

If we DO know all that we WANT to know – we may be crossing the line. The additional information may not be something we are entitled to know. This is where information arbitrage is used to take advantage of a situation (e.g. try and win over customers). This then really becomes a question of ethics.

To me it makes sense that owners of information should have the right to control who may access their information and why. This obviously needs to be protected in the context relevant legal and social rules.

My question is: what do we need to do to ensure this right of data owners is protected?

Friday, November 30, 2012

You can't have my data - it's worthless

When kids fight over toys, they're not only trying to gain play-time with it. Anyone with kids will tell you  it is really about ownership, or more accurately - entitlementIf you do not have kids, think about your siblings when you were young, or your friends from an early age.

Now, hardly anyone thinks about what is the best or optimal use of data. Most people are concerned with how the data can best serve their objectives. This is rightfully so. You are rewarded if you achieve certain goals and the data is there to server you. But never make the mistake - you do not, EVER, own valuable data!!

Let me explain. Most people would agree that data is meaningless without context. A context is only valuable if it relates to some sort of communication need, which usually relates one entity to another. Hence data has no value without information and information has no value without communication. So you can own data, but it is meaningless and worthless to the rest of the word unless it exist in the context of some sort of communication - which means you do not have exclusive rights to it.

The uncertainty principle states that you cannot have absolute confidence in the value of certain pairs of properties. In the world of data you cannot have absolute ownership and utilization. If you declare an absolute set of stakeholders as the owners of the data, you are limiting the opportunity for the data to be used to its fullest extent (you are likely to inhibit access from some of its rightful users). On the other hand, if you allow the data to be used by all its rightful (some unknown) users, you cannot conclude absolute ownership as you are uncertain of the full set of stakeholders in the data value chain.

Bottom line:

You can have "my" data. It is invaluable. But know this: It is not my data and you can only have it if we agree to cap its value and only if our joint entitlement helps me achieve my goals.

Tuesday, October 30, 2012

The Big Data-Fall

Have you ever stood by a big waterfall? Somewhere like the Niagara or Iguazu falls? What resonates with me the most, while visiting such places, is the sheer volume of water that keeps streaming down. If you stand there for five minutes - well ok, great. It's big, it's noisy, and very wet. But if you stand there for an hour, it all of a sudden dawns on you: "This unbelievable amount of water just keeps coming down". It coming through the whole day, week, year. Wait a minute - this waterfall is here for decades and centuries...



And think about the energy that is being dispersed... Now ... Think about the volumes of information being created and dispersed around the world... Take a step back... Nature has been generating big data since the dawn of time. Have you heard of chaos theory (the butterfly effect)? - now this is starting to get interesting.

Can a single act, by one person lead itself to fundamental changes in society? Well yes. There are examples of this in history. Can a single tweet change the world? possibly!


The speed and volume of information has changed drastically. We are no longer looking at a little streams of information trickling about in the plane. We have transformed our world into gigantic cascades of information falls, where data flows in all directions.

Step back again, and look at the water fall. Think about its purpose. There are reasons why it flows the way it does. There are so many ways to exploit its energy, and so much potential for what you can do with it.

Don't forget to also marvel at its beauty, and don't forget the risks of abusing or mishandling it.



Maybe its time to start thinking of information as a constant flow. Maybe its time to think about its patterns and its dynamic behavior. Do we really need to store all this data? Maybe its just about using its power and then letting it go. Is this perhaps what "big data" is really about?... I don't know. All I know is that information, like water, has the potential to create and the power to destroy.

Think about what you are doing with the power of the Data-Fall? Are you sailing down the river? Up the river? Are you on the bank, building a hydro-power (info-power) station? Are you merely a spectator, appreciating it beauty or simply ignoring it? Or perhaps you are operating an Data-Fall visitors' centre.



Whatever it is, just remember that you probably cannot change its nature (but don't forget the butterfly effect). Also, keep in mind that this Data-Fall will be here long after you have gone. Ask yourself - what is the legacy you want to leave? And how can you harness the power of the Data-Fall?

Friday, September 28, 2012

The two sides of data In-e-quality


Not all data is born the same. Its intrinsic characteristics differ based on its genetics, while its quality and value differ based on its environment.

In short - never judge data by its raw statistical profile appearance.

Bad looking data may be economic and relevant to the business, while beautiful, complete data might actually carry high maintenance costs, and dare I say - redundancy...

But lets focus on the opening statement. I stand on giants shoulders, when I talk about the DNA and eco-system of data. Data can be measured in various dimensions from volume to change velocity, verity and extent of construct. Data is only as valuable as its fitness for use. Oh, and of course, different people use the same data in different ways - so what is really good data?

We say a person is good or bad, useful to have a around or a burden. We choose friends differently than we choose team mates - or do we? This is perhaps the key for why people struggle so much with data quality. Maybe it's not just about how well the data fits its purpose. Maybe its really how comfortable we are with it.

Believability you say? Nonsense! You like the data because you are either familiar with it, or it chimes with your view of the world! That is why people change data sources. This is why you will never find an absolute single version of the truth.

People use data, and people are subjective. It is part of what makes us risky and successful. So have we lost the war for data quality already?

The answer is no. We might hold an individual view on data quality, but in the same way society manages to create law and order to give each person an equal opportunity, so can data governance create data quality across the enterprise. I like to call this "data in enterprise quality",  or data in-e-quality for short. Not only because data is different, and needs to be handled differently, and not only because its quality is seen differently by different people. All data is born equal with the right and opportunity to be in-e-quality.

Saturday, August 25, 2012

Data Quality: Take it to the limit

During my studies, I had the good fortune of experiencing two different approaches to teaching. The first was to address a wide range of problems, with the focus on practising and mastering specific techniques, while the other was to explore the subject area in terms of its concepts, and rules. This is rather similar, for instance, to  learning what you need to do in order to catch a flight from point "A" to "B". One approach would be to board many flights, while the other is to question how the governments, airport authorities and airlines operate together to provide the service we call "a flight".

The analogy above is somewhat bias, as most of us would regard boarding multiple flights for the sake of learning impractical (although appealing). The point is that each approach has its advantages and shortfalls.

The only difference I would like to highlight here is really what I call "Taking it to the limit". One of the ways to truly appreciate the nature of a system, is by testing its behaviour, and especially close to its boundaries (or limits). Believe it or not, anyone who dealt with data quality has or should have been dealing with a boundary problem.

Now, if you try to run through the security check at the airport (without letting the authorities check your belongings) - you will very quickly learn to appreciate how the system works. When it comes to Data Quality, it is not that much different. If the sales department captures the wrong number of zeros at the end of their reporting figures  - they quickly learn, not only how people react, but also what is the likely impact on the company (and their job).

I am NOT advocating you should insert errors in to your data just to "test the system", but rather that data quality analysis should explore, with rigour, the various scenarios for the data - especially close to the data's boundaries. Moreover, the additional value of the process of testing the boundaries with the people who control them is important for building a sustainable data quality solution. 

By understanding the value of the (quality) controls, people not only feel part of something bigger, but also feel empowered to take proactive steps to advance the common goals around data quality.

Friday, May 25, 2012

Teneo Vulgo (commonly known) is bigger than big data


Big data enables the analysis of trends. It will only really become consistently reliable if validated over time. It can help capitalize on opportunities, since it allows the collection of large amount of information quickly. However, does it really provide that elusive strategic advantage? Until the models for its usage mature and it becomes comparable to sound research methodologies - it is like to be somewhat of a bull in a china shop. Would you use big data to decide shifting your company's products' focus? Well, certainly not as a single input.

Nonetheless, trend analysis always has been, and always will be - a great tool for understanding behaviour of customers, and other trends important to an organization. Big data only really changes the “interface point”, i.e. where and how you collect trend data (and at what quality). Hence this type of “contract” (interface point) needs to be carefully assessed before subscribing to it and using it to make important decisions.

The next big shift in information management will be, not the way we analyse data, but how data flows. Social media was a big shift in information management in the sense that it changed the behaviour patterns of how humans communicate. Similarly, were, the internet, telephony, and the post service. All of these technologies affected the time taken for information to reach its audience (delay) and the method of communication (the channel).

Big data is a method to harness social networking, mobile phones’ position information and similar “mass data”. However, to witness a big shift we need to see a change in the technologies affecting the behaviour associated to publishing and disseminating information.

The present modes of information flows are largely unstructured and loosely governed. It is no wonder that organizations find it challenging to manage information. This stems from the approach people take to engage with information. It is not to say we cannot, or do not manage some information properly – we most certainly do. However, this type of information is in fact a very small portion of the information that exists (there are peta bytes of data is out there). Managing information carefully requires laborious training and controls, due to specific (unique) demands, its relevance to a particular purpose and the fact that people do not generally see information as an asset that needs to be managed. Makes you wonder what ever happened to the saying: knowledge is power.

The next big shift will affect the behaviour of people in terms of their perspective towards information they produce, own or consume.

The value-add for this change will originate from the need to become more precise (after decades of information chaos) in terms of our engagement with information. Why? Because this will make us more comfortable and confident about the information - allowing us to make quicker and better decisions based on informed choices over our information sources. This will apply to both personal and professional information uses.

This next big shift is Teneo Vulgo.

Saturday, April 28, 2012

The (Big Data) Café


Ever gone on a long road trip? The aircon (data issues management) is barely cooling you down. The windows (data optimization) are stuck, the water bottles (data migration project budgets) are empty, and you're not even half way through your road (-map)...

But wait! What luck! A little town is coming up. It's got a funny name: "Big Data". Well, in any case you are sure to find a small café. Your data business case is dried up , your stewards are querying on bad data fumes (again) and you need a fresh data strategy to demonstrate business value and fuel up your operational and strategic budget...

There it is: "The Big Data Café". It has big sign saying: "Today's special: Get a Yotta (10^24) Data Warehouses with live links to ALL the universe's data feeds real-time. Buy one today and get 10 models with 2 million predictions on the hottest trends in the world for the next 100 years!".

Wow !!! you've got to have some of that, and what a perfect timing as well. Needless to say you stop and walk in. As you sit down at a conference table, you notice a lot of other (tired) drivers. The are all busy munching away of some work-stream and enjoying the flavor of data mining analytics. You now recall hearing about this café, and how people praise its wonderful deserts (no, this is not a misspelling). They are rich with Fata Morganas (promises of value add) , and if you navigate carefully, you will pick up some fruits from the various oases (gold nuggets of real value).

Then you remember something else, Teneo Vulgo - the destination.

Should you grab some takeaways (strategies, contacts and techniques) or settle in for a bit longer and maybe order that special. It will take longer to digest, and you wonder if it is really the right menu item for you.

What happens next is up to you. It depends on the business you support and their appetite. It also depends on what you need to deliver in terms of you road map.

Eventually, however, you will finish your meal, and get back on the road. Is it better to speed up your journey? or slow down and take more value along the way? That is for you to answer...

In case you did not get my message - Big Data is not necessarily bad, but I do not believe we will remain there. The globalization and competitiveness will signal us to Teneo Vulgo, where unstructured data will eventually become meaningless (not because we don't care, but because ALL relevant data will be qualified by default). My only suggestion is that we become vigilant not to exclude opportunities to diversify our communal knowledge (in other words not to discount alternative approaches to analyzing, representing and managing data). Based on the theory of evolution - species that do not diversify their genome pool, tend to get extinct.

Enjoy the Café while it is still relevant to you, but I do not recommend trying the special....

Sunday, February 5, 2012

The Data Quality Accord - Initial Thoughts


Poor data quality affects the business. It forces higher costs and introduces unwanted risk. The cost is tangible (if you measure it) and affects the bottom line. The risk, however, is much harder to quantify in terms of how it affects the business. With the higher increase of data versus business growth [rubinworldwide] - this problem is not likely to go away anytime soon.

Cost of poor data quality include the cost of reacting to the presence of bad data either by providing ad-hoc solutions, taking responsibility for the after-effect, or by fixing the problem (at source - of course). This can be measured in terms of delays, lost opportunities, time/money needed to resolve/fix issues/consequences as well as the cost of implementing a remedy.

The risk aspect of poor data quality, which is the real subject of this entry, may involve reputation, operational, regulatory and legal risk. Risk is commonly defined as the product of probability and outcome. The higher the impact, the more you would like to control and reduce the chances that something bad might happens. Ideally, the business will make an informed decision as to how much of a certain risk it wants to undertake. This is necessary to ensure managed boundaries which accommodate both opportunity and flexibility. Take a food market stall for example. The merchant knows, more or less, how much produce he might sell. If he offers too little, some customers will end up disappointed - but he will not waste any product. On the other hand, if he brings too much - he is likely to be left with unwanted produce. The merchant will therefore use his experience (amongst other means) to make a decision of how much produce to offer.

Financial institutions make similar decisions in terms of the exposure they are willing to take towards certain clients and industries. The Basel Accord is one of the main tools used to  evaluate the amount of financial risk a company takes, and more importantly, the level of safety a financial institution needs to employ. This ends up with regulators prescribing the amount of capital these institutions need to keep aside for a "rainy day". It becomes very complex, very quickly, but in a nutshell: The amount that is kept aside is a product of the likely amount of money that would not be recovered should borrowers fail to deliver on their obligations (default) and a function of  the parameters which affect the chances of this risk materializing. In other words: impact X probability.

Here is the argument when it comes to data quality: It stands to reason that every company, not only financial ones, needs to seek some sort of safety against the impact of poor data quality. It is true that the Basel Accord looks at certain aspects relating to the quality of the data - but the risk is not limited to industry of financial products. Think for example about the impact of poor quality medical information and its possible consequences...  I think enough said!

A Data Quality Accord would define the level of provisions a business needs to employ to protect the (un)likely(?) outcomes of poor data. Taking the risk model of quantifying risk, there are a two  aspects to consider: Firstly, what is the potential impact of poor data quality and how well can the business respond to occurrences of poor data ("Loss Given DQ") ? Secondly, how do we evaluate the probability of the risk materializing ( represented  by function for probability which we can refer to as " f " )? The first one can be addressed by analyzing the various scenarios of what would happen if a machine is misaligned, a doctor is misinformed or a business partner provides or receives poor data. The second dimension, or the qualification of the probability, is a much more complex and interesting problem.

What affects the chances that poor data will bring negative outcomes to the business? Taking this question through a logical journey - here are some thoughts:

1. Impact of poor data requires poor data to be present (trivial). So the first parameter in the probability function is how likely is it that the data is poor (let "P" represent the presence of poor data)
2. The presence of poor data does not necessarily means it would lead to undesired outcomes. The second parameter is therefore the level of protection the business has to divert the effects of poor data quality. (let "DG" represent "Level of Data Protection". Why DG? The level of successfully applied Data Governance and the implied data management functions)
3. These aspects need to be summed against the number of instances and frequency at which the poor data might be used (let "data life" represent the places where data might be used and might lead to negative impact on the business).

Therefore I propose that Data Quality Capital be defined as:

DQ Capital = (Loss Given DQ) x f ( DG, P, data life)

Monday, January 9, 2012

Good enough i-Governance


Some people call it data, some call it information. Some talk about structured vs. unstructured, and some say the difference is about the context. Some people call them fields, some call the business terms... We really need not care what it's called - but rather how it is managed.

There is no dispute that the communication needs between two workers is the fundamental necessity of succeeding in a collaborative effort. From conversations, to legal contracts, to database fields used to feed reports that influence decision making.

Information Governance, data governance, enterprise information management. Are they the same? It depends who you ask!

Now that we got this out of the way, I would like to talk about good enough i-Governance. If you are a parent, you might have heard the term "good enough parenting". This basically means, that despite all your good intentions, you will never be able to be "the perfect parent". You have certain ability to control your circumstances - but there are aspects which you will never be able to control. Being a "good enough parent" is really about trying your best and achieving reasonable results.

In i-Governance, we have the same problem. For me the scope for i-Governance is about the business end result. What is the impact of the communication between knowledge workers, and how can we optimize this communication for the business.

There are a few dimensions that we need to consider: business strategy, budgets /resources, maturity, awareness and capability (skills and technologies).

Depending on your personal family (I mean business) situation, you need to work with / around / against the reality which you are facing. Chances are, you cannot cast a magic spell and move everything into the right place in one stroke. It is a journey. When you raise a child, it takes years of teaching and a lot of patience to instill the right values and behavior. With changing the mind set and behavior within an organization - the same applies. This in all, is not impossible. However, what is rather more challenging is steering and maintaining the direction throughout the turmoil of realities that your business needs to work through.

So my advice is:  apply "good enough i-governance". Keep your eyes on the goal, be patient and persevere. Listen to good advice and make sure you prioritize early value creation.