Big Data is also the focus of one of our “Collaborate 13 Outtakes” that we are happy to bring you now.
Big Data and Hadoop Updates
Big Data and Hadoop are part of an ecosystem. We talk about Big Data and Hadoop related solutions, we come across multiple names like Hadoop Distributed File System, MapReduce, Pig, Hive, HBase, HCatalog, Ambari, Mahout, Flume, and so on. You can combine one or more from the list to create specific solution.
Big Data and Hadoop systems are becoming more stable and we’re seeing adoption spread as a result. More vendors now offer Big Data platforms giving choices to customer. Some vendors like Hortonworks and Microsoft are working on Windows version of Big Data platforms based on Hadoop and related technologies.
Ambari is another important Apache tool in an incubator state which is targeted for Hadoop based ecosystem management. It can be used for provisioning, managing, and monitoring apache Hadoop clusters, HDFS, MapReduce Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig, Sqoop using Hadoop management web UI backed by Restful APIs.
More and more tools and products are now providing integration with Hadoop based tools to take advantage of processing and analytical capabilities.
It’s a myth that Big Data based solutions replace existing BI solutions. Rather, Big Data solutions augment the existing BI solutions. For example, weblog files and data warehouse data both can be analyzed together using Big Data and BI respectively. Results from these can then be combined to provide sentiment analysis on web sites and contents, for example.
When designing Big Data and Hadoop based solutions it is important to consider maintainability, ease of use, learning curve etc…. Also remember that high level languages like pig and hive are almost always a better option than writing Mapreduce programs yourself. Platforms like Oracle Exadata offers in memory Mapreduce and code writing is far, far simpler.