The question covered here is how accesses are secured in PlasmaFS.
PlasmaFS consists of a bunch of daemons running on several systems, and an open number of clients accessing the daemons. All the communication paths between these endpoints need to be secured in a reasonable way. As all communication is done via SunRPC, we can use the security options of this protocol to ensure that only permitted clients connect to servers, and that optionally even all SunRPC data is encrypted.
After establishing security for the RPC layer, there is the question how clients are identified for the filesystem, and which accesses are granted to them. We allow here that the filesystem user ID is different from the user ID in the RPC layer. We explain this idea in detail below.
The RPC layer
The RPC layer uses SCRAM (RFC 5802) for authentication, and optionally for integrity protection and privacy. The SCRAM method uses simple passwords which are safely checked by a challenge-response protocol. SCRAM is enabled for SunRPC via GSS-API - this gives us some additional flexibility, because it is relatively easy to switch the authentication mechanism (and e.g. enable Kerberos as an even more secure method).
We use only two user IDs, called "proot" and "pnobody":
etc/password_proot
and etc/password_pnobody
files on each node
(with mode 600, so only the Unix user running the daemons can access
these passwords).
You may ask why the "pnobody" user exists. Couldn't we also allow that clients connect anonymously? The purpose of this user ID is to keep foreign hosts out of the PlasmaFS network. Also, RPC messages can only be encrypted when a user/password pair exists, and we want encryption at least for some communication paths.
Btw., SCRAM uses an SHA1-based HMAC for authentation. For encryption, AES-128 is employed.
Example: The plasma
command-line utility allows it to set the RPC
user ID with the -auth
switch.
$ plasma ls / -auth proot
This command authenticates as proot
, and one has to enter the password
for it. As proot
is superuser, it is possible to list /
.
$ plasma ls / -auth pnobody # fails
This command authenticates as pnobody
, again by providing the password.
This results in an EPERM
error code, because pnobody
is not allowed
to perform any filesystem operation.
Any other user ID will not be able to successfully authenticate
(Auth_failed
). If you omit -auth
, the plasma
utility falls back
to the default behavior, which is to ask the authentication daemon for
help.
The filesystem
The filesystem generally implements POSIX semantics, with only a few exceptions (and a few generalizations). For file accesses, one needs to have a user ID and a group ID.
Note that users and groups are always handled as names, and never as numeric IDs!
Each file has an owner, expressed as the owning user and the owning group. The file mode bits determine who can access the files. There are no ACLs.
Some points:
plasma ls
:
$ plasma ls /
drwxr-xr-x gerd admin 0 2011-10-05 21:41 input
drwxr-xr-x gerd admin 0 2011-10-05 22:09 log
drwxr-xr-x gerd admin 0 2011-10-05 22:09 output
drwxr-xr-x gerd admin 0 2011-10-05 22:09 work
You can change the mode bits with plasma chmod
:
$ plasma chmod 777 /log
You can change the owning user and group with plasma chown
:
$ plasma chown auser /work -auth proot
$ plasma chown :agroup /work
$ plasma chown auser:agroup /work -auth proot
Note that changes of the user is restricted to the superuser, hence
we have to add -auth proot
. The group can be set without that, provided
the user is member of the group.
Impersonation
Impersonation is the process of becoming a regular user of the filesystem. There are three ways of setting the user and group for a session:
Of course, the filesystem needs to know which users exist, and which
groups are defined. For this reason, the files /etc/passwd
and
/etc/group
also exist within PlasmaFS (stored in a database table).
The versions in PlasmaFS have exactly the same format as those
installed in /etc
on each Unix system. One can only impersonate as a
user defined in passwd
and become only member of a group where
group
allows this. (These special files can be read and written
with the plasma
utility, see plasma admin_table
.)
Pitfall: When authenticating as proot
, nothing is said who will
be the owner of newly created files. Because of this,
$ plasma mkdir /dir -auth proot # fails
will fail (code EINVAL
). The switches -user
and -group
can be
given to specify an arbitrary owner:
$ plasma mkdir /dir -auth proot -user foo -group bar
Authentication tickets
Basically, an authentication ticket is a random number which is used as key in a special access-control table in the namenode. The number (called "verifier") is connected with a user name, a group name, and a list of supplementary groups.
Of course, "proot" can create such tickets arbitrarily. Normal users can only create such tickets for themselves. This is useful for passing the current access rights to further processes which can even be running on different machines (and for map/reduce we need this feature).
Many PlasmaFS programs accept such tickets in the environment variable
PLASMAFS_AUTH_TICKET
.
The lifetime of the tickets is limited (but can be extended).
Example: One can request a ticket with plasma auth_ticket
:
$ plasma auth_ticket
SCRAM-SHA1:cG5vYm9keQ==:eHh4:Z2VyZA==:Z2VyZA==:YWRt,YWRtaW4=,Y2Ryb20=,ZGlhbG91dA==,Z2VyZA==,bGlidmlydGQ=,bHBhZG1pbg==,bXl0aHR2,cGx1Z2Rldg==,c2FtYmFzaGFyZQ==:-7882205748013259769
The ticket contains various parts. Two of these are valuable login
data: first, the ticket includes the password of pnobody
. Second,
the ticket includes a so-called verifier. These data are worth being
protected, so:
PLASMAFS_AUTH_TICKET
)ssh
to another
machine, and run plasma
as the same user. This works even if the Unix
user on the other machine is different, and if there is no authentication
daemon on the other machine.
$ PLASMAFS_AUTH_TICKET=`plasma auth_ticket` \
ssh user@machine -o "SendEnv PLASMAFS_AUTH_TICKET" \
<path>/plasma ls / -namenode ... -cluster ...
(Substitute <path>
with the directory of the plasma
command on
machine
, and provide the namenode and cluster options - there is
usually no ~/.plasmafs
on remote machines.)
The authentication daemon
This daemon can be reached via a Unix Domain socket in /tmp. Such sockets have the ability that they reveal who is connected as client, i.e. we can get the user and group ID of the client. The daemon creates an authentication ticket for this identity, which can then be used by the client for impersonation.
Essentially, this means one can access PlasmaFS without password on
machines where this daemon is running. The authentication daemon is
used by the plasma
utility if there is neither the -auth
switch
nor the PLASMAFS_AUTH_TICKET
variable contains a ticket.
Actually, the auth daemon is a broker who retrieves a new authentication ticket for the client, after it has checked the identity of the client.
Protecting the datanodes
The clients create separate connections to the datanodes, and the question is how these connections are protected.
On the RPC level, these connections authenticate as user "pnobody" for normal file I/O (and as "proot" for administrative operations).
The namenode.conf
file includes a directive security_level
.
By default this setting is set to "auth" meaning that the client
has to provide the "pnobody" password to connect to the datanode,
but otherwise the data are unprotected. One can set this to:
In addition to the protection on RPC level, there is a separate ticket system for authorizing data accesses, using special datanode tickets. Essentially, the namenode generates such tickets, and hands these out to the client. The client can only run the data I/O operations for which the client has permission, as evident from the ticket. The permissions are granted for each data block separately.
Conceptual limitations
The described authentication system has a few weaknesses:
In addition to this, the current implementation may have further
weaknesses or even large security holes. Please refer to the release
notes for details! Also, there are configuration options affecting the
level of security. For example, one can turn off encryption for datanode
accesses.