Big Data: A Tool, Not an Answer
Big Data is nothing unless the data is properly digested, correlated, matched, and transformed across systems.
Consider the data, both historical and recently collected, about your network's status and use. Consider the data collected from social media applications. Consider the data collected in your call center. All this data together can produce Big Data.
The data is nothing until it is analyzed so that conclusions can be gleaned and its meaning determined. Most people speak of Big Data's value in helping businesses make predictions.
I most recently encountered a discussion of Big Data in the book, "The Signal and the Noise: Why So Many Predictions Fail--but Some Don't" by Nate Silver. You may be familiar with him because of his accurate predictions in the 2008 and 2012 national elections. He had a nearly perfect score predicting how states would vote for president and who would be the elected senators. He also predicted that the Seattle Seahawks would win the Super Bowl (though he didn't predict the blowout that the game turned out to be).
In the introduction to his book, Silver states, "Data-driven predictions can succeed--and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves."
New Correlations Can Be Made
Smaller data sets may be limited to known sets of elements, but when we can use Big Data analysis, bigger questions can be asked.
For example, a smaller data set can be used to analyze network traffic, which can then be used to predict future network requirements and performance. Now take another smaller data sub-set, social media traffic, which ordinarily may seem less predictable. Correlating social media activity and network information may provide a better view of the impact of social media traffic--different social media events can produce very different traffic issues.
This is an example of the use of multiple smaller data sets combined together to deliver greater insight for the network architects and operators so they can be prepared for social media's impact. Still, even this broader data analysis cannot really be considered Big Data.
Big Data Defined
Big Data is defined as a collection of large, complex data sets that are difficult to process using most current database management tools or traditional data processing applications. The growth of large data sets has been stimulated by the additional information available for analysis of a single large set of related data, when compared to multiple small sets that add up to the same total amount of data. The correlations unearthed and the conclusions drawn from effective analysis have seemingly endless applications, enabling big data to be used for anything from identifying business trends to preventing disease.
Five Factors of Big Data
There are five factors that make working with big data difficult:
* Quantity--The amount of data produced is expanding rapidly. Data comes in both structured and unstructured forms, an example of the latter being social media. The declining cost of data storage has stimulated the collection and retention of more data than ever.
* Delivery Speed--The rate of data delivery stimulates rapid processing of data. If left unanalyzed, its value may decrease to where it is only historical data that is obsolete for forming predictions.
* Variability--Data creation does not follow a nice smooth creation pattern. Large data production may be produced by unexpected events as well as periodic events.
* Many Formats--It would be nice if all the data was in a common format, but this is rarely if ever the case. The variety of formats is already significant, with new formats introduced every year. This makes analysis that much more complicated.
* Many Sources--Attempting to connect, link, match, and transform the data is quite a task. If correlations cannot be made, data relationships will be fragmented and the end result will be a loss of control.
Warnings When Working with Big Data
When reading Silver's book, three conclusions can be drawn:
1. The data collection process should be thought through well. Do not be impatient. Experience should be gained in steps, not all at once. Make sure you collect critical data. However, collecting massive data does not mean that analysis will automatically be fruitful.
2. Though you need to move quickly to take advantage of the analysis while the data is still fresh (see "Delivery Speed" above), there is risk in launching projects that are not well thought-out. You want to produce the conclusions ASAP, but too many false starts and aborted projects will only delay the end results.
3. IT has to keep their environment running smoothly. Be aware that IT has limited resources, and analyzing Big Data is another drain on these resources. Do not let the Big Data effort cause IT problems. On the other hand, don't pursue Big Data analysis without the support of IT.
Big Data is nothing unless the data is properly digested, correlated, matched, and transformed across systems. A problem for the staff in charge of Big Data is that most of the collecting systems are not connected--nor are they designed to be connected. It appears that analyzing collected data will require training, experience, new system connections, and a lot of feedback to see if the right data was collected and if the right questions were asked. Otherwise, the predictions will be useless and possibly dangerous to the health of the enterprise.
An excellent article from The New York Times is, "The Age of Big Data".