Monday, December 1, 2014

Data Mirages

There are all kinds of illusions in the world. Those that are orchestrated by humans (magic) and those enforced by the laws of nature (mirages). These both hold a common theme of creating a perception that something we think is true - is in fact false.

This can play a significant role in information management. It can affect how data is being accessed, what its true quality is, in terms of its intended usage, and it ultimately impacts how data is being governed.

Think about steganography. While encryption is an explicit way of hiding information, steganography does not tell you that information is being hidden. This gives you the illusion that there is no more that what you see, when in fact there is a hidden message. Only the people who know about the hidden information are likely to know how, and successfully extract it from the concealing medium.

A data mirage, however, is more a matter of opinion, and what I mean by opinion - is perception. What appears to one party as an accurate and complete account of an observation, may in fact be partial in the point of view of someone else. Like any natural mirage, this “opinion” is circumstantial and will depend on various “natural” factors such as different point-of-views, the taxonomy gap between topics, inconsistencies of data quality standards and the differences in objectives between the parties involved in the information exchange.

To manage the risk of “seeing” a data mirage, make sure you understand the differences between your language and that of your partner you are communicate with (think about: knowing your audience) ; ensure your service level agreements, or expectations, are explicit, not only in terms of the protocol being used but also in terms of the quality of the information as it relates to what is being measured and how;  Finally, gain an understanding of what are the priorities of the other party you are communicating with, and what might be concealed from you, either intentionally or inadvertently.

Have you identified all the data mirages in your world? and are you sure you are truly separating data facts from data fiction?

Saturday, November 15, 2014

Out of the Box Data Governance

To implement effective data governance you need to think out of the box. I don't mean just being creative and finding new innovative ways of doing things, which by its own right - is great. Rather, I am referring to thinking out of the box of your responsibilities. You have a certain responsibility to look after the data under your custodianship - and that can easily be blurred by your team's performance indicators.

We all have a role in data governance, because we all manage data. Whether you direct teams that implement solutions, manage resources that operate solutions, build new solutions or ensure solutions are operating as needed on a daily basis. What you do, and how you evaluate your success has a direct impact on the fitness of the data for its usage. If your team's goals are not fully aligned to serve the intended usage of the data - your priorities will not best-serve the effective and efficient usage of the data.

Take for example, a business that sells products under warranties. The manufacturer has an interest in knowing who actually purchased the product, but the retailer might only care about sales and customer loyalty. The retailer will prioritize in-shop experience, products quality, variety and pricing which would make it harder for the manufacturer to capture accurate data on the end consumers. To address this challenge manufactures learned to rely on end users to provide purchase information as a means of maintaining the important connection between them and the consumer. This only works, if the consumer sees value in registering their product. In other instances, the manufacturer would have to depend on the retailer to collect this type of information. The only way this will work is if the manufacturer provides a benefit to the retailer for collecting this information on their behalf. What we see in this second case is the manufacturer influencing the responsibilities by aligning their information needs with their partner's objectives.

In any data handling operation, the level of fitness of the data for its intended usage is directly dependent on the knowledge workers ability and motivation to support this usage. This is why data governance is important, and this is why understanding the context of the data from both a consumer and a manufacturer perspective is important.

If your knowledge workers think only inside the box, either due to lack of motivation or constraints of your business operating model, ask yourself if you are really delivering the value proposition your information handling is offering? If you cannot even answer this question, maybe it is time to think about how you measure the success of your information handling processes, and what you need to do to get people to think outside the box.

Friday, October 31, 2014

The Minister for Information Affairs

Governments have wisely coined the term "minister of communication". Whether or not the mandate covers all aspects related to information management - is a separate issue, which relates to politics, semantics and priorities. None the less, in my opinion, with that term in place - information management should  certainly  be part of this minster's portfolio.

But seeing that companies share a lot of similarities with governments in terms of having to control a large pool of resources, products and services  - it does beg the question: where is the minister of communication, or a "CCO" for companies?

Yes, we all know that the CIO, by definition should cover the responsibilities of Information Management, but we also know that this role is often executed as a pure CTO role with the Information Management piece disappearing in good intention.

Even in governments, the Information Management responsibilities are scattered across  health and safety, security, internal affairs, finance and so forth. This is natural and should actually not be restricted. Every area in an entity needs to have freedom to manage information.

While the role of the minster of communication might extend to information management,the focus there needs to be on governance rather than implementation. The role of the CIO in terms of the same portfolio needs to be executed in the same manner. The problem is that information is increasingly embedded in technology and as the power of the CTO grows in terms of being able to influence how information is practically managed, the segregation of duties in terms of information governance needs to grow. This does not only help the CTO focus on their domain and on servicing the business, but also nurtures a healthier and better trusted framework to manage information.

To put it bluntly - Information Management should not be the responsibility of the same person who is in charge of the tools used to control the information. An additional benefits to this is improving the focus on resources within the data management  domain on the collaboration and information exchange needs, rather than on the economy of information handling.

Ask your self: who is looking after information management in your organization? by definition, and in reality?

Thursday, October 16, 2014

Max in / Max out - The Marshal Art of Managing Data

In order to be effective, a data strategy need to be implemented correctly at a micro level, that is at a field definition and value validation level. At the same time, for the strategy to be effective it needs to be implemented correctly at a macro level. Being able to switch between those two perspectives - is crucial for the success of the strategy.

This applies not only to the data storage design, business rules and view points designed to support decision making, but also to other dimensions of data management, including governance, quality control and meta-data management (no name a couple). This may sound obvious in theory, but from an implementation perspectives - the challenges are countless. You need to worry about macro issues such as business and technology strategy alignment and internal politics to micro level issues such as resource prioritization and technical.

How do you then navigate these rough waters to reach the shores of success? As the title of this post suggests - learn to max in / max out in terms of your influence of the implementation. To steer the strategy correctly, you need to consider your macro influences, and when the need arises, dive-in to the detail to ensure implementation guidelines are followed sensibly. Ideally, if your work focuses on the detailed implementation, aside from your detailed execution, you need to "jump-out" and be able to "step back" and look for the value proposition of your implementation from both a business and a data strategy.

To "Step back" you need to ask questions such as: does this storage design make sense in terms of being able to expand the company's products according to our strategy? (of course you need to know what is first); Are the business rules defined to filter, validate and govern the data in place? do they make sense in terms of our business model? in terms of the value proposition of our products? (of course you need to know what the business model and value proposition of the products are first).

To "Dive-in", ask your implementers to demonstrate examples that directly contribute to the benefits of the strategy. Ask them to quantify those, not in terms of money, but in terms of impact. For example: by applying a date validation on the transaction record we are able to reduce invalid dates which in turn provides us a more accurate view on the periodic sales amounts and hence allow us to better understand how our products are preforming. It also increases our accuracy in financial and regulatory reporting. Then demonstrate the value by showing a metric. For example: after the initial application of the validation rule, we were able to increase transaction date accuracy by 20%, which resulted in 5% increase in correct period reporting... and by the way, we were able to identify inefficiencies by isolating the specific cause of some of those invalid dates.

The art of max in / max out, which can be analogous to zooming in / out of a picture can go along way, but can be hard to master. To complete the analogy, consider a famous painting. From a distance it has its meaning and its beauty. From a close-up one can appreciate the craftsmanship and complexity.


I argue that your data management implementation is only as good as the accumulation of implementers and guiding strategists throughout the history of your business. Are you employing the right mix of people to deliver and guide a high-quality data strategy implementation? and are the able to max in / max out effectively?  

Tuesday, September 30, 2014

Lost Precision in Dataminea

Things are looking good, they say... Our data is getting more structured, and data mining has become a standard working tool in the business toolbox. We can analyse our customers, their behavior, the markets in which we operate and our own supply-chain environment.

With all this apparent maturity - you would think we are in a good place.

Initially we gave away a lot of our data without realizing it. Now, at least, we are aware of what data we share and kind-of how it is used. It sounds sensible and fair and for the most part of it - this is true.

The danger we have opened ourselves to, however, is an increased sensitivity to information misconceptions. While we may have increased our precision in representing data, we have insufficient tools to control its accuracy.

Before going further, I think it is important to highlight the distinction between the two. Precision, refers to the ability to generate results which are consistent and repeatable - meaning that our tool is reliable in generating the same result over and over again. Accuracy, on the other hand, is how close the measurement is to the actual truth.

Now as I have noted in the past, the truth can be perceived from different points of view, and while we may be able to generate more reliable results, they tend to serve a limited set of perspectives. This is no accident, as these views are used to satisfy specific measures and drive  specific behavior. This is not new. Politicians, advertisers and a lot of other groups and individuals continue to use this ability to distort the view on certain realities and create an arbitrage in opinion to their advantage.

Some may call this the art of doing business, and perhaps that is what it is.

The bottom line is, however, that with all this Dataminea going on around us, we are becoming much more sensitive to data miss-representation which can be a good or bad thing - depending what you are trying to achieve, and how you manage your data.

My question to you is: do you understand the perspectives and the level of precision of the data you handle?

Sunday, September 14, 2014

Price Tagging Data

How much is data worth? Is it based on how rare it is? how hard it was to obtain? how it serves the business opportunities or risk of those who buy it? If other people sell your data, shouldn’t you get a cut? what should that be? How do you price tag data?
 
The truth is that this is like any other economy of trade. As a producer  of data, you would be interested in making a profit, and hence will optimize your costs and look to price your data at a level that will fit the market in which you operate. As a consumer, you will be willing to pay based on the profit you believe the information is likely to deliver for you, coupled with the market conditions for obtaining the information. This of course is bundled with quality and timing.

How is the price of data affected differently than other commodities? Timing, for one, is a big factor. When access to certain information becomes pervasive - the cost of the information diminishes dramatically. No one will pay to know what the name of our planet is. Start talking about very old information, which is no longer pervasive- and the price will start going up. You may want to pay someone to know certain facts about the history within a particular region. Although the underlying value is the access to information, the influencing criteria is time.

Quality is probably the second point worth mentioning. While I do not see a major difference in the impact of quality of data on the value and the cost - the objective valuation of the quality of data is harder to achieve than in other commodities. Many products are evaluated for quality over a clear a predefined criteria. Gold has carets, cars have performance indicators and design styles, services have customer satisfaction and food has taste and presentation. Quality of information not only mean different things to different people, businesses define data quality differently, depending on how they intend to use it. This dual quality obscureness of both people and organization leads to an incoherent set of standards to which to value information against. While it could be plotted on a 2 dimensional surface - in reality people tend to draw subjective conclusions on the quality of the data - and it almost becomes a matter of taste rather than a skilled practice.

So while experience counts, and some data will prove itself more valuable than other. Are we simply in an open marketplace for information exchange? without regulation for each data type, we will continue to buy data fruits while being blindfolded. While everyone else is doing this - I guess it is fair game. But I am not too sure that is the case.

How do you, or would you, put a price on the data you manage?

Sunday, August 31, 2014

Good Design or Fast Results? The Truth? Learn to Manage Expectations

I write about data management, but this topic is deliberately none data specific. It actually applies to many aspects of one's work and life - but since I am a data geek - the content and examples will focus on data. You can extrapolate and draw behavioral conclusions on other domains of productivity.

Let's get started. Some people would say that a good design is imperative for success, and sufficient time should be spent on that phase of the project. They are of course - right. Other people say you need to deliver results, respond to market changes and generate revenue and contain costs, otherwise the business does not survive. These people are of course - also right. So how can both groups be right? The answer is balance.

One of the most valuable lessons you will learn (if you have not done so already) is that balance is as important as substance. What I mean by this is that there is an optimal point, where over-engineering becomes a hindrance, while lack of design can lead to unacceptable results. We will get to data in a minute, but my favorite example on this life-lesson is from civil engineering: suppose you are tasked to build a bridge across a river. You need it long enough, but not too long. If you measure your materials to the millimeter - chances are you are over-engineering the process and will spend too much time and money getting the exact "right" measure for your bridge. On the other hand, if you just grab a few planks of wood and start building, not only do you have a risk of bridge instability, you might actually have too little wood, or too much left over. In all cases, you will probably need to extend the time and effort to complete the bridge in order to compensate for these shortcomings. What you need is a balance: make a "good-enough" plan quickly enough, and then build a bridge correctly with optimal time and effort.

Now let's talk data management. One of the typical problems in today's information chaos age is the need to build or upgrade the train bridge while the train is already speeding along the tracks. Another well-known metaphor is building a plane while it is already in flight. The overwhelming criteria in this type of operation is stability. As long as we do not lose business or cause harm. This often leads to patches and workarounds and compromises in good design which eventually leads to issues in costs and strategic alignment. Raise your hand if you felt frustrated on not being able to implement a good design, because of time pressure or concerns about systemic risk. For example, not only is it expensive to change a data model (migration costs will make your executives cringe), but there often seems to be limited foresight as to the strategic benefits.

This is when the conversation needs to shift gears, and the balance needs to be struck by managing expectations. Understand where the business is going, and agree what is best from a technology and design perspective. Then start creating a balance by managing changes such that new parts are created ahead of time, and then brought to the bridge when opportunity allows it. Justify the "extra" efforts by identifying and quantifying the architectural benefits (in other words what you stand to gain or lose in the longer term). You may conclude that your data model should not change right now, but every time it changes it does so in a deliberate direction. It will also help you to understand its limitations and sensitivities better which helps with systemic risk management. The notion of managing expectations and creating a common architectural vision will also funnel independent views of resources in the business to work together under a joint effort, and will allow you to harness many minds to improve progressing in a common and agreed direction.

Now tell me, are you working towards implementing a strategic data management plan, or just doing localized data fire-fighting?

Friday, August 15, 2014

Motivating Data

What motivates people? A belief, a hope, a goal. The only thing that makes you read this is the belief or hope of finding something useful here. Something you can learn from, quench your curiosity or help you reach a certain goal in learning or performing.

What is motivating data? Well, data makes no decisions, and cannot apply any resources to a particular action. So, the statement is an absolute absurdity. Or not ... While data cannot change the way it interacts with the world - people certainly can affect this. And that is the point.

When I first started writing about data management, yes the notion of people affecting how data is applied seemed very trivial. Of course people manage data, and of course we affect how it is being applied. But what I am realizing more and more is how pervasive the subjective motivation of data really is. Every day, all of us, make dozens of decisions to disseminate or block information from flowing. Whether it is of personal nature, such as protecting loved ones from anxiety or pain, or professionally, where we use our professional judgment to improve the outcome of our efforts and those in our teams.

On a larger scale, companies and organizations make conscientious decisions as to what information to expose, when and to whom. This is really part of doing business, or interacting with the world. There are many strategies one can apply to affect these type of results. You can provide too much information with the intent to overwhelm, or create an impression of sharing everything, while carefully omitting certain bits of information. Whether it is right or wrong depends on the situation and the parties involved.

On the other side of the coin, we are information seekers. We look for information that can help us satisfy our beliefs, hopes and goals. We subscribe to channels of information in hope to receive the information we seek. We continuously fine-tune these subscriptions, replace or change the optionality’s on those channels. But in reference to what I said above - this is a tall order, as without understanding your information provider's intention - you are subject to their information filters.

Strategies to combat disinformation of that sort, include evaluating consistency and patterns related to the details of information you receive (sounds complicated, but we do this all the time naturally). A second strategy involves subscribing to multiple channels in hope of verification, or ability to have a more comprehensive perspective.

As a small bolt in the global information system, we can originate or terminate information flows by applying some of the strategies I noted above. We also need to motivate other people to behave the same in order to reach a level of meaningful influence.

My question to you is how are you motivating "your" data, and more importantly - what is your intention?

Friday, August 1, 2014

The Data Ocean

Are we ever going to get tired of comparing data to water? We heard of trickling data, data waterfalls, now people are talking about data lakes and guess what – I am going to talk about a data ocean. Yes, I am referring to the notion of the largest bodies of data in the world, comprising of a multitude of sources and consumers across many, many data domains.  But what I really want to focus on is what these oceans currently look like, what they will look like in the future, and how we should prepare ourselves to maximize their value.
 
What makes water so powerful is its combined force, its chemistry and its consistent and predictable behavior. Our current way of handling data is more like trying to mix water with oil, mud, rocks, milk, sand and lots of other stuff. That is far from the elegant nature of water.
 
While each entity in the world feels the need to derive its own chemistry of data, the truth is that the nature of data is as pure as water. We perceive data as murky and hence treat it as such. Therefore it molds, by our own actions, and becomes difficult to manage.
 
What am I saying? We have no common (agreed) perceptions, models or governance on data. As our data management models mature, we will see more harmony in data.  The essence of the simplicity in data has always been there and the notion of a data lake, and a data ocean, never caught me by surprise.
 
To truly “see” these bodies of data, we need to ensure we are able to view them as such. True “global” harmonization of perceptions on datasets is key to drive governance, management and hence data chemistry.
 
The sooner and broader you can tune in your organization and your business partners to mature and harmonize data perceptions – the better prepared you will be for Teneo Vulgo.
 
I predict a world where ALL data flows through a central data delivery framework, probably centered around a few major providers. In this world entities which have prepared and invested in orchestration power over data will hold the advantage. This is not a world where data is fair (when has it ever been fair), but rather a world where data is managed to serve those who have the strongest ability to influence and exploit it.
 
Think about data tsunamis, data storms as well as data seaside holiday homes, data sanitation systems and data feeding into our daily life. The power of information is only starting to emerge. For me, the famous saying comes to mind: may you live interesting times…

Monday, July 14, 2014

The Sixth Data Sense

Initially, I thought of writing about the importance of our ability to sense, or be receptive to information, but then it occurred to me, that I could also take this further and break to concept into 6 dimensions of "data sensory" categories. So here goes:

Our data sense is our ability to pick up data signals, filter and analyze them into meaningful information. While all our human senses do this all the time biologically in our body, there is a relevant analogy here to the "business body".

Companies are often compared to a human organism. We talk about the ability of various units within a company to communicate with each other; their ability to respond to change, and how their internal design and components affect their effectiveness and consumption of resources.

Similarly, we can look at the "data sensitivity" of a company as a fundamental sensory capability: The effectiveness of the company to absorb and integrate information into their processes - undoubtedly affects their ability to cope with its environment. So we can talk about over sensitivity (information overload), information deficiencies (disabilities in information processing) as well as information blindness (which may sometimes be aided by information sight-aiding tools, such as consultants and investors).

This now leads us to a natural need to define a set of categories which can be used to assess how "data blind" (or not) your business might be?

1. Information Blindness: The company is unable to translate information from its environment into "normal" response to stimulus. There are a lot of aspects to consider in this definition, starting from where to failure occurs (from leadership to field agents) to possible broken means of integrating data into decision making.

2. Information blind spots: The company is able to absorb and process some data, but is acutely deficient in processing certain types of information, or respond to certain environmental conditions. Again, there are a multitude of factors that may lead to this condition.

3. Information blurriness: Information is absorbed, analyzed and translated to responses, but the details are inaccurate, and the actions lead to an over, or under estimation of the required behavior. Here you could expect either an issue in correctly translating data into information, or flawed decision making processes, which might allow for too much subjectivity or restrictions in response.

4. Information vision delays: Here we are looking at time-to-market of information. How old is the data being used for decisions? How well is the company adapting to the information it has received? You can expect a large and "clunky" company to struggle in this category.

5. Information distortion: Information is misrepresented to the business, or the business is responding to the information through an incorrect interpretation of data. This is an issue with the way data is filtered and/or delivered to the business.

6. Information sensitivity: This category looks at how expensive the sourcing and integration of information is to the company. If the costs are too high, and the results are poor - we have a condition of low sensitivity. If the company consumes too many sources, and struggles to respond consistently to its environment, we may be dealing with information over-sensitivity.

These definitions, while presented here to stimulate thought, could in fact prove helpful in characterizing information sensory issues and their causes, which when treated correctly, could help your business succeed.

So what are you waiting for? Why don't you give your business a data vision assessment based on the categories above?

Monday, June 30, 2014

The Internet of Things? No, of Attention

The technology is here. It has been here for a while. Ever since the first microchip was built - you could, in principle, buy the right components and "smart up" your life. But we never really went that extra mile. Why? because it was not practical, economic or really applicable to our lives.

What has changed since those days? the economy of technology. Processing power has become so cheep, you can install it in throw-away items. In addition, our lives have grown to depend heavily on technology. You cannot move, without checking your e-mail, social media, or tracking your pulse, or Geo-location or calorie-burn rate. We have become so accustomed to digitizing everything - that it has become second nature, if not first.

So the internet of things, as coined by some big companies out there, is a slogan that, to my understanding, relates to the ability to "smart-up" devices and consumer needs that have been more remote from the internet and from the automated integration into other consumer services. Now while a completely agree that we can, are, and will continue to "Smart-up" our "technologies" - I am more skeptical on the true value-maturation related to the consumer services aspect.

Time for an example. Suppose you buy a smart refrigerator, that can tell you that your low on milk, or out of eggs. Sounds amazing. Check your phone while you are doing your shopping, and voilà - you know what you need to buy. Really?  Here is the problem I have with this proposition: prioritization. How many communication channels do you have open (types of information sources)? How many content items (messages, articles, tweets etc.) do you follow? There is no point avoiding the "elephant in the room" - your attention has become expensive. This is probably the highest-value commodity of modern times. Example: when you get the alert to get the milk - you would be the perfect marketing candidate for a discounts and product information related to milk and milk products. Why would you install and app that will diverse your attention further? You already recognized that you are likely to go get groceries, and milk, was probably at the top of your list.

To integrate such information into our lives, will require either automation, or a real value-add to the consumer. Beside the initial buzz around this "new" capability, there is actually very little value of out sourcing your mind further in terms of this type of planning. Perhaps in the business world (or not). You may be going on holiday - and you do not need more milk, or you might have stopped using it because you prefer a different product. Whatever the case is, the amount of effort needed to configure your life to consume this type of integrated technology - falls short of leaving you with a sense of a real value-add. This may change in the future when true semantic web matures - but I do not see this happening for another generation.

While we continue to evolve our lives into a digital maturity (Matrix - here we come), let's not forget that digital is just second-to, base-on and completely dependent on the real world. Now what kind of house can you build on a poorly maintained foundation? Nothing I would ever want to brag about.

My message to you is: build new technologies, but do not forget to respect the priorities of consumers in terms of real needs which will always prevail (refer to topics such as Maslow’s hierarchy of needs).

Sunday, June 15, 2014

YOUR data is NOT being investigated

Depending on your perspective, you might be happy or unhappy to hear this, but the reality is that no one besides yourself really cares or understands what information you own, or need to have in order to reach your goals. In essence it is as personal as your own mind.

People might share some of your information, but there is always a limitation on how well they are able to understand or help you achieve your tasks. This may be due to difference in responsibilities, access to information, different ways of processing information and personal agendas (theirs or yours).

But, what about your information pertaining to data about you? While you may feel that you own your own personal data, in reality you only own the information you create. On other bits of information, you may have certain rights (and responsibilities), and some of the information you technically own - you may have to accept rights the others may have to access, or even change.
 
I do not intend to discuss information security in the post, although it is a hot and very interesting topic. My goal in this post is to emphasize the notion of information perspectives.
 
When others consume information about you, or provide and receive information from you - they engage with the data from their perspective. This includes their authority, responsibilities as well as the quality and channel through which they interact with this information, and let’s not forget their knowledge and experience.

The point is that you have to consider all of this when you make assumptions or engage with other stakeholders. Some people in fact, are acutely aware of this data perspective paradigm and build an entire business model around these facts. Unfortunately, this is actually quite common in deceptive behavior.

Your data is often either ignored or misunderstood, and you need to understand that, and also that you are likely to misunderstand other people’s data.

In our quest to an evolution of data management, we need to accept this reality and learn to steer through information misconceptions and find ways to effectively accelerate the achievement of common goals. There may be something we can learn from people who do this for a living today and perhaps we can turn their deceptive behavior in to a tool for learning how to better prevent these misuses and for the development of Teneo Vulgo.

So go on, investigate not other people's data, but rather how others data is being understood and used.

Saturday, May 31, 2014

What is the (data) point?

We have a growing ability to measure and record our environment and we are continuously increasing our data entropy. However our data still converges to the single decision point. We integrate several pieces of information and derive a conclusion which is, at the end of the day; only as good as the data it is based on.

Moreover, with evolving technologies these days, sophisticated data analysis tools and machine learning capabilities, we further see an increase in the automation and filtering of data as well as in decision making. Think, for example, about navigation systems, which gives you optimized driving directions, taking into account route and traffic information. This makes the need for accurate and well-managed fault tolerance levels even more important than ever.

A simple, yet powerful example on how vulnerable  we still are to the outcomes of poor data management is the recent disappearance of flight MH370. As you may know, this event has led to weeks of extensive searches and to what has been referred to as the most expensive search operation in history. Yet, while it is astonishing how limited amount of information was available to analyze, it is even more incomprehensible how ineffective the global community was in interpreting the data.

Now while it may be true that lack of data sharing of information related to the disappearance of the plane may stem from political and other reasons, it does show how far we are from servicing some of the most basic collaboration needs across our specie to act towards a common purpose. Furthermore, while it is may also be true that when we have multiple sources of information, it is easier for us to collaborate, it is precisely when we have little information, that the true quality of our ability to work towards a common purpose is exposed. These are situations that lead to sometimes critical decisions, which can have definitive and far-reaching implications on individuals and communities.

To me this screams communal data management immaturity. It is almost ironic, that while technologies have evolved to manage Peta bytes of data at the speed of light – our ability to tap into the real power of information remains in its infancy, especially when we cross communities and cultures.

But do not despair. We are still in the dawn of Teneo Vulgo, and we all know that a journey has to be completed one step at a time. We need to work on strengthening our close communal data management, and work towards bridging communal knowledge across isolated groups.

In conclusion, with the increase of data entropy, and our increasing need to apply information quickly and effectively – putting our head in the sand is simply not good enough.

Let’s work towards evolving our existence into a new level of social consciousness, where the exchange of information across communities becomes a force that helps us reach common goals.

Now did I hear someone saying data is boring?

Thursday, May 15, 2014

Data Diet

I was contemplating on the title for this thought and initially got concerned with the notion that a diet often refers to reducing or eliminating type of food from ones regular consumption. However, what seems to be a more popular definition for this term is: the usual, or regulated foods and drinks consumed by humans or animals [http://www.thefreedictionary.com/diet].

First of all, I would like to affirm that it is not my intention to suggest that you should necessarily consume less data, or less of certain types of data. Instead, I want to focus on the analogy, and on one of my usual themes which is tying business value to business case and to data management.

Why do we diet? well for starters, we need food to survive. Secondly, depending on what we care about (longevity, appearance etc.) - we may choose to regulate our diet to support our health and our fit to society norms. Now while the first reason seems to carry naturally in to the realm of data management, the second reason seems emotional and perhaps disconnected from the topic of data. However, I would like to argue this as false.

Here is why: Data is ultimately managed by people and confidence that the data is well managed, leads to an increase of trust in the data, which results in cheaper and faster data certification and adaptation to change. If the business is plagued with ambiguities, inconsistencies in data quality and overly-complex data delivery solutions - the amount of time and effort needed to address data consumers request is substantially longer since clarity and confidence must prevail. If the semantics are well understood and quality is adequately controlled - the time it takes to understand changes and to collate information is significantly reduced.

Now there are many strategies for managing your diet, and there are many strategies for managing your data. Depending on your business appetite, your daily business demands and the advantages you want to gain from your business level of fitness and fit in the markets in which you operate, you are likely to have a different set of constraints and preferences.

Yes, ultimately you need to choose which data you will consume, at what quantity and which data you will avoid. However, whatever you choose - make sure it fits your budget and your business case.

At the end of the day, being data-obese or data-anorexic are probably both extremes that will harm your business.

So eat wisely, consume from all the source-groups that you need, and do not indulge with data if you are running the risk of data inefficiencies.

Now I suppose the next question is what does "Data exercise” mean in this context... but this is probably a good topic for another post.

Tuesday, April 29, 2014

Fixing e-Logical Data

How important is a Logical Data Model? On the one hand you have a database, which holds information used by many products and many users. On the other, you have custom queries populating reports requested by specialized teams.

Yet, there are commonalities and obvious flows which live in between the two, and while some data may be stored more efficiently in a normalized form, the further you drift from the business reality in terms of your data representation, the harder it becomes to manage it.

This is exactly where a Logical Data Model comes in. The purpose of the model is to handle the complexity of mapping many models to one, and vise versa. Ideally, it should assist in optimizing the data structure for both the common data model and the dimensional data models as well as assist in the correct translation of the model to “physical” (implemented) structures.

So what is e-logical here? and what is broken that needs fixing? SImply put - the low priority given to the LDM is illogical, and this becomes pervasive in electronic data storage and processing. In particular when models become larger and more complex the implicit impact is greater. Customers put pressure on business to provide them with accurate and relevant information, while the technology is focused on optimizing costs. The business hence sees little value in a “shared” data model, and the database people care mostly about performance and storage cost.

One might think this is trivial and that a logical data model always exists and is well managed. However, in reality, an LDM exists, in-part across multiple products and teams. By chance, or by the nature of the business, you will get similarities, and sometimes by design you will get some good synergies. However, even subtle differences in semantics can lead to an enormous amount of time and energy spent on resolving miscommunication by either over or under estimating the meaning and the appropriate usage of the data.  

The ultimate goal of an LDM is full data standardization. To visualize what a mature LDM framework will deliver, consider the level of standardization that exists with electronic power and communication cables. A manufacturer of a new electronic device would refer to the existing standards and even existing components to ensure the product they create is compatible with the standard electric and auxiliary connections they wish to offer its users. It makes the product more useful and appealing.

In a mature and governed LDM the same notion of appeal and usefulness applies. Certain conventions are the standard and certain level of quality can be expected. The internet is in fact the product of the OSI model which manages the transport of data, but it remains largely context-less as it focuses on connectivity and delivery and not meaning. As a more visible example, think about the iso country code standards and natural language. Without a standard for English letters and agreed language rules and word meaning - you would not be able to read this thought.

Now imagine your organization working at that level of data standardization across customer information, product details and supply management. Sounds great… and expensive.

While in the long run it becomes less expensive and mutually valuable to all users, it takes time and effort to get there. What you can do in the mean time is to keep the goal in mind and use opportunities to evolve and mature the LDM in your business and industry. While context will always augment your data to a dimensional model, the need to collaborate will push for standardization.

Tuesday, April 15, 2014

Breaking the Data Chains

Change is hard, and frankly, people do not generally like change. We all enjoy having our routines. The same route to work, the same familiar faces, smells and sounds. It gives us comfort knowing what to expect. Moreover, there are many instances where good habits and predictive behaviour is helpful in maintaining a well-functioning society.

But change can be good, especially when it is as a result of a well-thought of plan. You may want to improve your well-being and start doing more exercise; you may want to improve on your financial well-being, learn new skills and improve your contribution to the business.

Data chains, however, refer to the phenomenon where an organization fears systemic risk on their data, and in effect avoid changes that may “rock the boat”.  It is of course natural and sensible to mitigate risks, however, this should not come at the expense of opportunities, and can lead to loss of competitive advantage and optimization.

As an example, assume a company that handles customer information.  When customers complain that the system limits the amount of information you can input to an address field - the company responds by providing interactive assistance in abbreviating parts of the address. For example, helping the customer abbreviate street to st. drive to dr. and so on.

You do not have to be an information architect to appreciate the adverse impact on your customers. Imagine you are the customer, and you want to update your service provider with a new address. Will you spend 10-15 minutes working out how to fit your 150 characters street address into a 50 character space? I know I got very irritated when I did that. This is an example of a data chain.

The designers and developers of the system decided to add dubious functionality instead of correcting the flaws in managing the data requirement appropriately. Don’t even get me started on the poor interoperability and life-time cost of maintaining this solution.

People compromise on sub-optimal data management solutions with a short to medium outlook. This is a dis-service to the business, which ties down the organization to complexity and higher systemic risk.

To liberate your organization from data chains, you must create a clear vision and a capability to guide workers towards that vision. This will allow you to make decisive decisions on how to design and implement data solutions. And in the event where time and money is a constraint (now when did that ever happen…) - factor in the long-term implications of the solution, implement a short term fix, but secure the funding and commitment to revisit the problem and ensure the adverse long term negative impact is addressed.

Now go on, set your data free...

Monday, March 31, 2014

You’ve Got Data-mail

Since the dawn of mankind, there has been a need to relay messages between people who are unable to communicate directly to exchange information. This has now evolved to an extent that often, the two individuals who effectively communicate, do not even know who the other person is. Basically, the communication has become a product of an indirect relationship based on social and economic norms.


But what has not changed, is the basic building blocks of the communication. There is a medium (channel), language (protocol), and various steps in disseminating the information from one end to the other. Now here is the interesting part: we have done this with messengers, smoke signals,  postage mail and electronic mail. There is a common, fundamental set of principles here, that has never changed, and in my humble opinion - never will.


For this reason, I believe that by using a simple set of concepts and their relationships (aka a model), one can and should be able to describe any system of information exchange. This will result in simplification of data management by trivializing the reference framework. My personal preference for such a model would be to reuse the concepts already used in one of the most classical forms of delivering information, specifically -  the postal service. You can then apply the framework to any system of information exchange, by identifying how the teams, systems and processes map to the “information postal service”. This will further support the evolution of governance a data quality control practices.


I see this model of comprising of three levels, namely: business contract (relating those who HAVE the information with those who NEED it at an agreement level); Information Services layer, which underly the steps in delivering the data (think mail delivery services); and finally the service management goals, comprising of the parameters, or sensitivities needed to to be managed in order to  ensure that the services operate efficiently to deliver the appropriate level of quality.


Other terms that come to mind include: “posting”, “packaging”, “gathering”, “sorting”, “distributing”, “delivering” and “collecting” the data. As I mentioned, what this means in your information exchange system will depend on how you design, configure and run the “system”.


So the next time when you send or receive an e-mail, or post a message on someone’s social media channel,  just think for a moment how your information HAVE (or NEED) relates to other forms, formats and volumes of information exchange.


End of transmission...

Saturday, March 15, 2014

Why Manage Vertical Data Lineage

Vertical data lineage refers to the alignment, appropriateness and visibility (in other words the health of the connection) between the physical data models, the technical capability and the business processes and objectives across the entire enterprise architecture stack (see TOGAF). While technology lives solely for the purpose of supporting the business, too often it is out of touch from what is really needed by the business.

There are several reasons for this ailment, including lack of strategic planning (resourcing), subjective decisions over technical capabilities (politics) and poor management (skills). These sound like great areas to work on in order to evolve your business, but let's review how those issues affect the optimal use of information in the business.

When you choose an inappropriate technology, or method, particularly on the data management side of things - you increase you data-business distance. This means that you have weaker control on how your data supports your business. Not only is your business struggling to get the right information to the right people on time, the technology group struggles to try and fit square requirements into triangle-shaped technical solutions. This in turn increases what I call “data wrinkles” (work-a-rounds which are cumbersome, unnecessarily and expensive data flow solutions) which then lead to a natural increase in risk and operational costs. A classic example would be managers deciding to migrate data to a new platform based on limited and/or subjective view on the solution’s capabilities and true ability to answer the business requirements (and in case you didn’t know: migrating to a newer platform does not constitute a business requirements). It is more likely, in these cases that the related issues stem from problem is processes design or gaps in data governance.

In order to ensure your technology is driven to support the information requirements of your business - you need to task someone with precisely this task. This might sound like a trivial statement, but do you actually have someone in your business who has this objective on their performance contract? This role involves understanding the information needs of the business and ensuring that processes are designed using well-selected data models. The models need to be adopted by everyone in the business to minimize data entropy and the data wrinkles, and the technology decisions need to be in-line with the technology, business and data strategy.

If you do not have someone in your business acting as an Enterprise Data Architect - I would strongly recommend you get someone assigned to these duties.

Friday, February 28, 2014

Data Pragmatism

In any kind of business pragmatism demands a balance between quality cost and time. This applies to data management as much as it does to any other type of effort.

When you need statistical significance and to understand trends – it seems almost intuitive that you do not need the highest quality of data. However, one has to be careful. Some dimensions of the quality will be critical to your usage, others would need to comply to some boundary rules and other dimensions might be completely irrelevant. It all depends on the impact, or sensitivity of your measure to those dimensions.

So to be pragmatic, you need to consider these sensitivities and decide what you can, or should accept from a practical perspective. I know that ideally, having beautiful, defect-free data sounds like a dream come true (sorry, but I am a data geek!), but you may have certain objectives which may not care about being data-perfect at all.

For example, consider a sample of data depicting the volume of traffic on a road at various days and times in the week. The precise number of cars may be irrelevant if you are evaluating how well the roads are designed to handle the traffic, or it may be critical, if the road is a toll road and you need to bill the road users. Even then, it may be worth investing in cheaper license plate recognition technology which will help you identify 90% of the road users, than buying and maintaining a system that gives you 99% success rate but costing you a lot more on the long run. It all depends on the practical constraints of your operation.

The larger the data, the more important are the boundaries (or thresholds), rather than the exact value, while for small data, every bit counts (literally).

At the end of the day, what matters is what you do with the data. As the old saying goes: knowledge is power, but how you apply it makes all the difference.

Saturday, February 15, 2014

Why Lack of Data Management is So Pervasive

People invest their time in what they believe is worthwhile. This is an overloaded statement. What is the definition of worthwhile? What is affecting the belief that a certain outcomes will transpire?

A business, or any organization for that matter, is driven to realize a certain vision. When you look at what companies’ state as their mission, you find things like: to be the best..; to serve our clients...; to create...; to change...; it is all about making a difference and effectively dominate the servicing of a need. This is no secret - fulfill a certain need or want and be rewarded in return. This is basic principles of trade.

So then we ask why care about data? Even customer experience is heavily skewed by emotions and perceptions. So how much does the control of the data really matter? The answer is - as much as it affects the fulfillment of the mission.

Usually, new businesses capitalize on being rewarded for delivering to a need that has not been serviced before, or that is in great demand. Therefore the ability to deliver overrides the efficiencies and even quality (to an extent).

As the needs market mature, the organization finds itself crossing in to a different realm where efficiencies and quality becomes more important. New competitors arrive; the market becomes more demanding and so forth.

By the time this happens, the servicing organization has grown to focus mostly on delivering on speed and volume. The size of the organization has increased and priorities have started to shift. This movement increases the entropy of the organization and in most cases the impact on data management is ignored. Attention to management and operational efficiencies enjoy all the attention and poor data becomes an unfortunate reality.

This might be very difficult to avoid, and that is why the lack of data management is so pervasive to begin with.

But why does it linger?

We realize we have a need to improve the data, and we start talking about a data strategy. In the common and unfortunate cases, organizations fall into believing you can fix the data by upgrading systems and cleaning the data. If you've been in the data management business for a bit of time - you know this is not enough.

So what's next? Data governance. Ah..... 99% of your organization runs away. More admin? More boring meetings? We don't have time to talk about the bad data - we just need to fix it.

So here starts the true challenge of data management: Fix the data now, but make it last forever. With no commitments to co-stewardship, make sure it lives across the entire value chain. This probably sounds familiar, but do not despair.

Realization and appropriate investment should prevail. With the growing maturity in data management and in the worthwhileness of common information consciousness, I am confident that the sensitivity and naturalization of data stewardship will prevail.

In the meantime, keep your eyes on the goal but make sure to stay pragmatic so that you remain relevant.

Friday, January 31, 2014

Metadata cognition

The good news is that our ability to manage our personal metadata has improved. The bad news is that our ability to work with a shared metadata has not.

It has become a lot easier to manage your personal multimedia collection, your address book and even filing of bills and mails. Virtually all our data can be easily accessed on and synchronized across various devices. But the wheels fall off when you cross to a metadata parallel universe, or in other words, when you switch from one person to another. The way I organize my data is not the same as how you do it.

Now don't get me wrong, I am a big fan of diversity and individualism. The problem is that when you need to collaborate, you have to work out of a common basis which includes semantics and collaboration platforms.

Language, which is one of my favorite examples of a standard is a fundamental tool used by animals and humans to communicate. Without a pre-defined and agreed set of rules it is impossible to communicate. You need to have a shared set of meanings represented by symbols (visual audible etc.)

However, it is also a well-known principle that a degree of freedom is necessary in order for a machine to operate optimally. Now of course, if the parts are too tight, the machine cannot move at all, and if the degree of freedom is too large, excessive wear and tare can occur. So in practice, the degree of freedom is in effect a reflection of the level of control the operator, or manager, of a "machine" decides to apply.

In metadata management specifically, it is not uncommon to apply a wide-brush stoke approach. There is either too much or too little metadata management. This simply indicates that not enough effort has gone in to designing and controlling the metadata management. And the cost? Information risk management, either through opportunity loss, information liabilities and / or productivity loss.

Any good data management program must take into account the impact of doing too little, or too much, in controlling their metadata. Ask yourself how much time is spent resolving issues rooted in misunderstandings? How compatible is your metamodel with external standards?

Semantic coherence and metadata cognition are simply modern terms for the classic story of the tower of babel. And I wonder if we will ever learn...

Wednesday, January 15, 2014

The Data-Business Distance

One of the things every data management professional will notice when they start working in an environment is what I like to call the Data-Business (DB) distance. This is the degree of separation the realities of the day-to-day data design and operations have in relation to the business value of the products or services the business provides.

In a small business, especially a new one, you will find a very short DB-Distance. The innovations are tightly coupled to customer needs, the revenue streams are volatile, and every little decision can potentially make or break the business. So naturally, the small team running the business is highly collaborative and sensitive to the impact each decision has on the business.

When the business grows, more people join the company, higher degrees of separation are introduced by creating more distinctive layers of responsibilities, and along the way, the distance of the technical decision around data management and the end customer grows. This is a major problem for a lot of organizations.

Take for example any sizable company, where the technology department includes an infrastructure team, a database or wherehousing support team, a separate project team with analysts, architects and developers. Now inject all these different people and domains with specific outlooks on how to provide better service and efficiencies (think strategy, architecture and governance). You will quickly realize that the degree of separation between the customer experience and the technical implementation decisions are overwhelmed by many push and pull, political and otherwise domain-specific "machines".

To address this challenge, data management professionals, who must look through these layers, and protect the interest of the business value of the data, have to become creative, and get strong support from their managers and project sponsors.

How do you do this? Firstly, you need to recognize how far your data (management and design practices) are from the business value. You need to consider how many service components layers you have between the consumption and distribution of the data (is the scope of your project correct?); then you need to consider the strategies and governance structures that might divert your attention from ensuring the data will deliver its value promises (is the mandate to impact existing control structures correct?); then, you need to consider the people and mind-sets that you might need to overcome (or bypass) in order to persist the correct changes to the environment (do you have the right people change management component in place?).

Once you identify and qualify the DB-distance in terms of the dimensions mentioned above, your next step is to play out practical scenarios on how these dimensions can affect the outcome of the effort to improve the customers' experience.  This can end up looking quite scary and pessimistic, but the purpose is to build a strong case to empower the right mechanism to streamline the data value through these layers. By establishing a focused collaboration that is designed to reduce the distortion of the DB-distance layers - you can create an effective control mechanism, which serves as a management tool to help you ensure data-business success.

So what are you waiting for? Go identify your specific DB-Distance factors and do not shy away from pragmatic and real forces at work. Play the scenarios, get support and create a valuable tool for your business to control the distortions of your data-business distance.