HDFS Alternatives

Cassandra (via DataStax Enterprise)


DataStax Enterprise Edition is a complete big data platform, built on a production-certified version of Apache Cassandra™, that is architected to manage real-time, batch analytic, and enterprise search data all in the same database cluster.

Ceph is a distributed object store and file system designed to provide excellent performance, reliability and scalability.

Cleversafe enables organizations to easily store and manage large scale digital content today.

  • Offers limitless scale
  • Secures data-in-motion and data-at-rest
  • 100 million times more reliable than RAID
  • Drives up to 90% of the storage costs out of the organization

GlusterFS is an open source, distributed file system capable of scaling to several petabytes (actually, 72 brontobytes!) and handling thousands of clients. GlusterFS clusters together storage building blocks over Infiniband RDMA or TCP/IP interconnect, aggregating disk and memory resources and managing data in a single global namespace. GlusterFS is based on a stackable user space design and can deliver exceptional performance for diverse workloads.

The IBM General Parallel File System™ (GPFS™), a high-performance enterprise file management platform, can help you move beyond simply adding storage to optimizing data management.

Efficient, cost-effective storage for Big Data from a single file system, serving I/O-intensive applications, storage, and nearline archives.

Lustre is a an open source high-performance file system that some claim can make for an HDFS alternative where performance is a major concern.

NetApp Open Solution for Hadoop delivers a ready-to-deploy, flexible Hadoop cluster for handling big-data analytics.

Introducing QFS 1.0

Developed at Quantcast and now released to the open source community, the Quantcast File System (QFS) is an alternative to the Hadoop Distributed File System (HDFS) for large-scale batch data processing. It is a production hardened, 100% open-source distributed file system that is fully integrated with Hadoop and delivers significantly improved performance while consuming 50% less disk space.

Tachyon is a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves its high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that are frequently read.

NetApp Open Solution for Hadoop