Big data is a joke. Well not in a sense that it is not important, or that we cannot extract value from it - but rather in the sense that it is actually small, relatively that is. You see, it is merely the tip of the iceberg when it comes to the total information that exists in the universe. Our ability to process information, even what we call big data is insignificant compared to the amount of data the world around us is processing. If we were living in The Matrix, you can only imagine the processing power and the incomprehensible size of data stores that would be required. Now, if you know a bit about modern physics, you have probably come across the double-slit experiment. The finding, which indicates that light is both a wave and a particle, seem to suggest that there is some form of communication between the different light particles. Regardless of how you believe the world was created, there is no doubt that there is ongoing connection between large objects such as planets and miniscule particles such as atoms, quarks, strings, or whatever sits at the bottom of the physics chain.
Continuing with physics (apologies, one of my pet topics) - thermodynamics second law states that the entropy of an isolated system never decreases. This means the level of disorder in the universe is ever increasing. So whatever information had to be processed by the universe when you started reading this post has already increased immensely by the time you got to this sentence. So just think about the amount of information that exists in this ever disordering universe – it is mind boggling!!
So yes, we can track GPS locations of billions of users, calculate correlations of distant measures and generate unheard of before predictions based on relationships between seemingly unrelated phenomena. But, is the universe really one perfectly orchestrated data management system?
The answer is yes. Simply put – by the laws of nature. The things that define measures of control are frame of reference, or in other words - patterns. You can only control something that has some level of predictability. Even randomness is regarded as a pattern, and has some predictable scale of operation (normal distribution is the classic example). So what am I saying here? Data management is a set of laws that help us describe, understand and control information. Furthermore, I believe there is a valid case to compare information management and the theories in physics. Science has learned that different theories fit different scales. While Newtonian physics is well suited to describe everyday type of human realities, quantum physics is best to describe interactions at the atomic level. Whilst widely different, they naturally increase or decrease in validity as you scale the phenomena you observe.
Continuing with physics (apologies, one of my pet topics) - thermodynamics second law states that the entropy of an isolated system never decreases. This means the level of disorder in the universe is ever increasing. So whatever information had to be processed by the universe when you started reading this post has already increased immensely by the time you got to this sentence. So just think about the amount of information that exists in this ever disordering universe – it is mind boggling!!
So yes, we can track GPS locations of billions of users, calculate correlations of distant measures and generate unheard of before predictions based on relationships between seemingly unrelated phenomena. But, is the universe really one perfectly orchestrated data management system?
The answer is yes. Simply put – by the laws of nature. The things that define measures of control are frame of reference, or in other words - patterns. You can only control something that has some level of predictability. Even randomness is regarded as a pattern, and has some predictable scale of operation (normal distribution is the classic example). So what am I saying here? Data management is a set of laws that help us describe, understand and control information. Furthermore, I believe there is a valid case to compare information management and the theories in physics. Science has learned that different theories fit different scales. While Newtonian physics is well suited to describe everyday type of human realities, quantum physics is best to describe interactions at the atomic level. Whilst widely different, they naturally increase or decrease in validity as you scale the phenomena you observe.
So, my point here is that big data (which is all about statistics, complex structures, correlations and finding patterns) requires one set of rules, whilst "Little data" (which is all about quality and insight) require a completely different set of rules. The two sets should increase and decrease in validity as you scale the phenomena you are observing, and should jointly provide simplicity and beauty that helps us understand and appreciate how information is exchanged between different stakeholders.