The question covered here is how accesses are secured in PlasmaFS.
PlasmaFS consists of a bunch of daemons running on several systems, and an open number of clients accessing the daemons. All the communication paths between these endpoints need to be secured in a reasonable way. As all communication is done via SunRPC, we can use the security options of this protocol to ensure that only permitted clients connect to servers, and that optionally even all SunRPC data is encrypted.
After establishing security for the RPC layer, there is the question how clients are identified for the filesystem, and which accesses are granted to them. We allow here that the filesystem user ID is different from the user ID in the RPC layer. We explain this idea in detail below.
The RPC layer uses SCRAM (RFC 5802) for authentication, and optionally for integrity protection and privacy. The SCRAM method uses simple passwords which are safely checked by a challenge-response protocol. SCRAM is enabled for SunRPC via GSS-API - this gives us some additional flexibility, because it is relatively easy to switch the authentication mechanism (and e.g. enable Kerberos as an even more secure method).
We use only two user IDs, called "proot" and "pnobody":
etc/password_pnobodyfiles on each node (with mode 600, so only the Unix user running the daemons can access these passwords).
You may ask why the "pnobody" user exists. Couldn't we also allow that clients connect anonymously? The purpose of this user ID is to keep foreign hosts out of the PlasmaFS network. Also, RPC messages can only be encrypted when a user/password pair exists, and we want encryption at least for some communication paths.
Btw., SCRAM uses an SHA1-based HMAC for authentation. For encryption, AES-128 is employed.
plasma command-line utility allows it to set the RPC
user ID with the
$ plasma ls / -auth proot
This command authenticates as
proot, and one has to enter the password
for it. As
proot is superuser, it is possible to list
$ plasma ls / -auth pnobody # fails
This command authenticates as
pnobody, again by providing the password.
This results in an
EPERM error code, because
pnobody is not allowed
to perform any filesystem operation.
Any other user ID will not be able to successfully authenticate
Auth_failed). If you omit
plasma utility falls back
to the default behavior, which is to ask the authentication daemon for
The filesystem generally implements POSIX semantics, with only a few exceptions (and a few generalizations). For file accesses, one needs to have a user ID and a group ID.
Note that users and groups are always handled as names, and never as numeric IDs!
Each file has an owner, expressed as the owning user and the owning group. The file mode bits determine who can access the files. There are no ACLs.
$ plasma ls / drwxr-xr-x gerd admin 0 2011-10-05 21:41 input drwxr-xr-x gerd admin 0 2011-10-05 22:09 log drwxr-xr-x gerd admin 0 2011-10-05 22:09 output drwxr-xr-x gerd admin 0 2011-10-05 22:09 work
You can change the mode bits with
$ plasma chmod 777 /log
You can change the owning user and group with
$ plasma chown auser /work -auth proot $ plasma chown :agroup /work $ plasma chown auser:agroup /work -auth proot
Note that changes of the user is restricted to the superuser, hence
we have to add
-auth proot. The group can be set without that, provided
the user is member of the group.
Impersonation is the process of becoming a regular user of the filesystem. There are three ways of setting the user and group for a session:
Of course, the filesystem needs to know which users exist, and which
groups are defined. For this reason, the files
/etc/group also exist within PlasmaFS (stored in a database table).
The versions in PlasmaFS have exactly the same format as those
/etc on each Unix system. One can only impersonate as a
user defined in
passwd and become only member of a group where
group allows this. (These special files can be read and written
plasma utility, see
Pitfall: When authenticating as
proot, nothing is said who will
be the owner of newly created files. Because of this,
$ plasma mkdir /dir -auth proot # fails
will fail (code
EINVAL). The switches
-group can be
given to specify an arbitrary owner:
$ plasma mkdir /dir -auth proot -user foo -group bar
Basically, an authentication ticket is a random number which is used as key in a special access-control table in the namenode. The number (called "verifier") is connected with a user name, a group name, and a list of supplementary groups.
Of course, "proot" can create such tickets arbitrarily. Normal users can only create such tickets for themselves. This is useful for passing the current access rights to further processes which can even be running on different machines (and for map/reduce we need this feature).
Many PlasmaFS programs accept such tickets in the environment variable
The lifetime of the tickets is limited (but can be extended).
Example: One can request a ticket with
$ plasma auth_ticket SCRAM-SHA1:cG5vYm9keQ==:eHh4:Z2VyZA==:Z2VyZA==:YWRt,YWRtaW4=,Y2Ryb20=,ZGlhbG91dA==,Z2VyZA==,bGlidmlydGQ=,bHBhZG1pbg==,bXl0aHR2,cGx1Z2Rldg==,c2FtYmFzaGFyZQ==:-7882205748013259769
The ticket contains various parts. Two of these are valuable login
data: first, the ticket includes the password of
the ticket includes a so-called verifier. These data are worth being
sshto another machine, and run
plasmaas the same user. This works even if the Unix user on the other machine is different, and if there is no authentication daemon on the other machine.
$ PLASMAFS_AUTH_TICKET=`plasma auth_ticket` \ ssh user@machine -o "SendEnv PLASMAFS_AUTH_TICKET" \ <path>/plasma ls / -namenode ... -cluster ...
<path> with the directory of the
plasma command on
machine, and provide the namenode and cluster options - there is
~/.plasmafs on remote machines.)
This daemon can be reached via a Unix Domain socket in /tmp. Such sockets have the ability that they reveal who is connected as client, i.e. we can get the user and group ID of the client. The daemon creates an authentication ticket for this identity, which can then be used by the client for impersonation.
Essentially, this means one can access PlasmaFS without password on
machines where this daemon is running. The authentication daemon is
used by the
plasma utility if there is neither the
PLASMAFS_AUTH_TICKET variable contains a ticket.
Actually, the auth daemon is a broker who retrieves a new authentication ticket for the client, after it has checked the identity of the client.
The clients create separate connections to the datanodes, and the question is how these connections are protected.
On the RPC level, these connections authenticate as user "pnobody" for normal file I/O (and as "proot" for administrative operations).
namenode.conf file includes a directive
By default this setting is set to "auth" meaning that the client
has to provide the "pnobody" password to connect to the datanode,
but otherwise the data are unprotected. One can set this to:
In addition to the protection on RPC level, there is a separate ticket system for authorizing data accesses, using special datanode tickets. Essentially, the namenode generates such tickets, and hands these out to the client. The client can only run the data I/O operations for which the client has permission, as evident from the ticket. The permissions are granted for each data block separately.
The described authentication system has a few weaknesses:
In addition to this, the current implementation may have further
weaknesses or even large security holes. Please refer to the release
notes for details! Also, there are configuration options affecting the
level of security. For example, one can turn off encryption for datanode