A Scrumptious Open Source Cuisines in Big Data Landscape 2016:
A food critic, a traveler or an adrenaline junkie would never limit themselves at a single cuisine. Food is not something to be compared as a day-to-day habit. It is a culture and a depiction of the place where you are from. If you want to involve with a different community of people, you’ve got to taste their food foremost and that is why a backpacker will not constrain oneself from tasting a different cuisine at the other side of the world.
In this part of the big data landscape, I would love to showcase the best scrumptious open-source cuisines in and around the world. How about that?
Framework: An Italian Pizza
In big data context, these umbrella frameworks provide for distributed storage and distributed processing of very large data sets in a computing environment that comprises of commodity hardware. These frameworks typically have modules such as distributed file system, Job Scheduler, Resource Manager, Streaming data processor and MapReduce.
- Hadoop HDFS
- Hadoop MapReduce
- Apache Kylin
Query/Data Flow: The Chinese Roasted Duck
These are query engines that allow for structuring data and querying using a SQL-like language. For instance, Dataflow is a unified programming model and a managed service for developing and executing a wide range of data processing patterns including ETL, batch computation, and continuous computation.
- Apache Hive
- Apache Drill
- Google Cloud dataflow
Data Access: A French Baguette
This category comprises of a) non-relational, distributed, scalable, high-performance, big data stores such as Hbase, MongoDb and b) frameworks that facilitate collection and storage of data in real time such as Flume, Kafka.
- Apache Hbase
Coordination: A Spanish Prawn
Data coordination is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data coordination solution delivers trusted data from a variety of sources.
- Apache Zookeeper
- Apache Ambari
Real-Time: Japanese Sushi
Real-time analytics is the use of, or the capacity to use, all available enterprise data and resources when they are needed. It consists of dynamic analysis and reporting, based on data entered into a system less than one minute before the actual time of use.
Stat Tools: Indian Veg Curries
Statistics is an important part of big data analytics required to build and interpret appropriate models given the usually huge and complicated data. This includes a wide collection of data mining and machine learning topics, ranging from regularization, support vector machines, and boosting to more recent topics such as networks analysis, recommendation systems, and digitized advertising. These tools support the easy implementation of these concepts and are specifically capable of handling mammoth data volumes.
Machine Learning: A Greek Salad
Machine learning delivers on the promise of extracting value from big and disparate data sources with far less reliance on human direction. It is data driven and runs at machine scale. It is well suited to the complexity of dealing with disparate data sources and the huge variety of variables and amounts of data involved. And unlike traditional analysis, machine learning thrives on growing datasets. The more data fed into a machine learning system, the more it can learn and apply the results to higher quality insights.
- Apache Singa
- MAD Lib
- Scikit learn
Search: A Thai Coconut Soup
Search analytics helps website owners understand and improve their performance on search engines, for example identifying highly valuable site visitors or understanding user intent. Search analytics includes search volume trends and analysis, reverse searching (entering websites to see their keywords), keyword monitoring, search result and advertisement history, advertisement spending statistics, website comparisons, affiliate marketing statistics, multivariate ad testing .
Security: Mexican Tacos
Security Intelligence with Big Data provides exceptional threat and risk detection, combining deep security expertise with analytical insights on a massive scale.
- Apache Ranger
Visualization: An American Hotdog
Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs.
How was the salivating range of open source international cuisines? I hope you grabbed some bites. Stay Tuned for the last portion of big data Landscape 2016.