HDFS is another user-space filesystem that was originally developed for map/reduce.
Feature PlasmaFS HDFS
---------------------------------------------------------------------------
Supported blocksizes any any
recommended 64K-1M recommended >= 64M
Blocksize can be set for
each file separately no yes
System can allocate blocks
in contiguous regions yes (all blocks are no (each block is a
stored in a single separate file)
file)
Number of datanodes is
limited by RAM in namenode yes yes
Number of files is limited
by RAM in namenode no yes
Replication can be set for
each file separately yes yes
Client communicates directly
with datanodes yes yes
Block checksums no yes
Random read access to files yes yes
Random write access to files yes no
(blocks can only be (files immutable after
replaced but not creation)
overwritten)
Directory hierarchy yes yes
Symbolic links yes no
POSIX file semantics yes (few exceptions) no
Authentication system yes no
Encrypted data communication optional no
Authorization system yes yes
(based on fake
authentication)
Several namenode operations
can be bundled in an
atomic transaction yes no
Namenode crashes can lead to
data loss no (2-phase commit) yes
Datanode crashes are handled
automatically (fail-over) yes yes
Namenode crashes are handled
automatically (fail-over) not yet no
(but planned)
so far: auto-selection
of live coordinator at
startup time
Datanode configuration can
be changed w/o restart
(e.g. add node, del node) yes no
Namenodes profit from special
hardware yes (SSDs) no
Filesystem can be mounted yes (NFS bridge) no
Rebalancing not yet yes
(but planned)
Communication to local
datanode servers via shared
memory yes no
Primary access method SunRPC from any ad-hoc protocol
language (undocumented)
Clients available Ocaml Java
Access from any
language via NFS