Hadoop Integration
Top Ten Priorities for Hadoop Integration. On-DemandReplay
These are recommendations, requirements, or rules that can guide you.
- Embrace the new tool and platform ecosystem of Hadoop.
- Know the 10 FACTS of Hadoop and communicate them daily.
- Hadoop consists of multiple products.
- Hadoop is open source from the Apache Software Foundation (apache.org), but available from vendors, too.
- Hadoop is an ecosystem, not a single product.
- The Hadoop Distributed File System (HDFS) is a file system, not a database management system.
- Hive QL resembles SQL, but isn't standard SQL.
- HDFS and MapReduce are related, but don’t require each other.
- MapReduce provides control for analytics, not analytics per se.
- Hadoop is about data diversity, not just data volume.
- Hadoop complements a DW, rarely replaces one.
- Hadoop enables many types of analytics, not just Web analytics.
- Don’t be fooled: Hadoop isn't free.
- Get training (and maybe new staff) for new Hadoop.
- Look for capabilities that make Hadoop data look relational.
- Expect to wait a while for certain Hadoop functionality to mature.
- Beware silo’d analytics, including Hadoop implementations.
- Adjust your data warehouse (DW) architecture to make place(s) for Hadoop.
- Set up a proof of concept (POC), if you haven’t already.
- Develop/apply a strategy for Hadoop integration with business intelligence (BI) / data warehouse (DW)