Taming Big Data
We all know the buzzword in the Technology world today: “Big Data”
Every technology discussion forum, academic or corporate, features ‘Big Data’ as the next big thing that can bring disruptive change to the way we conduct business.
Big Data-analytics applied to humongous data sets like Social Media interactions; online click-throughs are helping organizations navigate the digital universe to gain better insights on their operations and their customers.
The projections of the market size of Big Data related solutions and services are bound to interest any technology services provider as a potent business opportunity.
As a technology service provider, one detail that would have definitely caught your attention is the projection of growth in the big data market.
If you have ever wondered about how to catch the big data wave, here is something that I would like to share. As a technology company interested in analytics, when we wanted to get into big data, our primary problem was: how do we try out the big data technologies? As almost all technologies that comprise the big data ecosystem are open source, there is no problem of license procurement. The issue was more about
- What is the problem to be solved – who would give us the use cases?
- How to get access to such volumes of data with high variability that big data is expected to handle?
The path we took was to resolve the second problem first. We searched for sources where we could access big data dumps and finally zeroed in, on the U.S weather data sites. These sites carry huge data dumps of different forms of weather data dating back 30 years. They also provide a neat data dictionary and documentation. We downloaded these data and figured out what use cases could be developed using these as the raw data. We came up with analytics linking the weather data with U.S GDP and population. Thus we successful completed our first venture and this gave us the confidence to further dwell deep into big data.
For the second proof of concept, we once again decided to use information available on the web and the site that we used was stack overflow. This site carries huge datasets in XML format that could be the source for many forms of analytics. Using these datasets and by exploiting Mahout and developing map reduce routines in Java, we developed many interesting analytics to find most active tags, members etc.,
These two proofs of concept were our stepping stones into big data domain. Thanks to weather data and stack overflow, we were able to explore big data technologies, win the confidence of our customers to take add greater value to their business intelligence systems.