Big Data

Tech buzz words come and go as quickly as Justin Bieber gets into trouble…which is rather fast! The issue with that is that most of the buzz is short-lived and over-hyped which leaves us, the not so techie person, wondering what’s next and how to prepare for it. So, you ready for the next, greatest, most-bestest buzz word for today? Here you go…..

“Big data” is the big buzz-word that everyone is talking about today.  However, very few people know what exactly “big data” is and, more importantly, what to do with it.  In this series of articles, we will explain what exactly “big data” is, as well as how to get your organization’s data and technical infrastructure ready for it.

“Big data” is really nothing more than the organization of as much data as possible, and then the application of statistical methods to find patterns and new information in that data.  This new information is then used to make business decisions and reach desired business outcomes.   It should be noted that this really is nothing new; plenty of organizations have applied statistical analysis to their business practices to find improvements in efficiency and reduce costs.  The really novel thing that’s happening today is the scale of the data-sets that this analysis is being run on.  Today’s explosion in how much business is transacted electronically provides a very accessible data-record of virtually every aspect of an organization’s business.  Even more than that, the automation and move to electronic internal record-keeping enables direct correlations between internal business data from various business units (think HR data regarding commission structures for employees, financial data regarding accounts payable and costs, outlook schedules and meeting lengths).  This enables a company with a well-designed infrastructure to examine virtually every aspect of their business in relation to efficiency and ultimately, production.

And that’s where the real struggle with “big data” is: a well-designed data infrastructure.  More often than not, the data for each aspect of the business is in a separate data-system.  Usually, HR has one data-storage system, another for finance, and yet another for the company’s transactional data.  The difficulty with this is two-fold: how to get all the data from all these different systems into one place so that it can be looked at together, and then designing a database structure that allows the data to have a common representational system so that the data-sets are compatible.   The process of moving data from one data-system and consolidating it into another is usually called “ETL”, which stands for “Extract, Transform, and Load”.  This describes the process of accessing and extracting a given data-set from the system it resides in, transforming it into a common standard that all of the other data will conform to, and then loading it into the ultimate destination system, usually a data-warehouse.  Once that’s done, the next hurdle is designing a database/warehouse system that can accommodate all the data and actually allow analysis to be run on it.

Ideally, the data-warehouse is designed as all of the various other systems are brought online, however this is rarely the case.  Analytics are usually something a company decides it wants to do after it has accumulated all the data.  In the next few articles in this series, we will cover the different design considerations between a transactional and warehouse database system, common ETL tools and routines, the process of data-cleaning and then dive in to some of the common pitfalls of data-warehouse design and how to avoid them.

So now that you know what big data is….you have any big ideas as to the next trend, hot buzz word, or new technology to knock us off our feet? Let us know…

Submit a Comment