Hadoop Integration

Top Ten Priorities for Hadoop Integration. On-DemandReplay

These are recommendations, requirements, or rules that can guide you.

  • Embrace the new tool and platform ecosystem of Hadoop.
  • Know the 10 FACTS of Hadoop and communicate them daily.
    1. Hadoop consists of multiple products.
    2. Hadoop is open source from the Apache Software Foundation (apache.org), but available from vendors, too.
    3. Hadoop is an ecosystem, not a single product.
    4. The Hadoop Distributed File System (HDFS) is a file system, not a database management system.
    5. Hive QL resembles SQL, but isn't standard SQL.
    6. HDFS and MapReduce are related, but don’t require each other.
    7. MapReduce provides control for analytics, not analytics per se.
    8. Hadoop is about data diversity, not just data volume.
    9. Hadoop complements a DW, rarely replaces one.
    10. Hadoop enables many types of analytics, not just Web analytics.
  • Don’t be fooled: Hadoop isn't free.
  • Get training (and maybe new staff) for new Hadoop.
  • Look for capabilities that make Hadoop data look relational.
  • Expect to wait a while for certain Hadoop functionality to mature.
  • Beware silo’d analytics, including Hadoop implementations.
  • Adjust your data warehouse (DW) architecture to make place(s) for Hadoop.
  • Set up a proof of concept (POC), if you haven’t already.
  • Develop/apply a strategy for Hadoop integration with business intelligence (BI) / data warehouse (DW)

Issues That Trip Up Business When Implementing Hadoop

Ten Common HadoopableProblems: Real-World Hadoop Use Cases