Plasmafs_authentication

PlasmaFS Authentication

The question covered here is how accesses are secured in PlasmaFS.

PlasmaFS consists of a bunch of daemons running on several systems, and an open number of clients accessing the daemons. All the communication paths between these endpoints need to be secured in a reasonable way. As all communication is done via SunRPC, we can use the security options of this protocol to ensure that only permitted clients connect to servers, and that optionally even all SunRPC data is encrypted.

After establishing security for the RPC layer, there is the question how clients are identified for the filesystem, and which accesses are granted to them. We allow here that the filesystem user ID is different from the user ID in the RPC layer. We explain this idea in detail below.

The RPC layer

The RPC layer uses SCRAM (RFC 5802) for authentication, and optionally for integrity protection and privacy. The SCRAM method uses simple passwords which are safely checked by a challenge-response protocol. SCRAM is enabled for SunRPC via GSS-API - this gives us some additional flexibility, because it is relatively easy to switch the authentication mechanism (and e.g. enable Kerberos as an even more secure method).

We use only two user IDs, called "proot" and "pnobody":

The "proot" user has superuser privileges, and can do everything. This can be used for administration. Also, the PlasmaFS daemons communicate as "proot" with each other.
The "pnobody" user has initially no privileges. By providing an authentication ticket (see below) a client can get more rights. This user is the ID for normal, unprivileged accesses.

The passwords for these two users IDs are installed in the etc/password_proot and etc/password_pnobody files on each node (with mode 600, so only the Unix user running the daemons can access these passwords).

You may ask why the "pnobody" user exists. Couldn't we also allow that clients connect anonymously? The purpose of this user ID is to keep foreign hosts out of the PlasmaFS network. Also, RPC messages can only be encrypted when a user/password pair exists, and we want encryption at least for some communication paths.

Btw., SCRAM uses an SHA1-based HMAC for authentation. For encryption, AES-128 is employed.

Example: The plasma command-line utility allows it to set the RPC user ID with the -auth switch.

$ plasma ls / -auth proot

This command authenticates as proot, and one has to enter the password for it. As proot is superuser, it is possible to list /.

$ plasma ls / -auth pnobody                      # fails

This command authenticates as pnobody, again by providing the password. This results in an EPERM error code, because pnobody is not allowed to perform any filesystem operation.

Any other user ID will not be able to successfully authenticate (Auth_failed). If you omit -auth, the plasma utility falls back to the default behavior, which is to ask the authentication daemon for help.

The filesystem

The filesystem generally implements POSIX semantics, with only a few exceptions (and a few generalizations). For file accesses, one needs to have a user ID and a group ID.

Note that users and groups are always handled as names, and never as numeric IDs!

Each file has an owner, expressed as the owning user and the owning group. The file mode bits determine who can access the files. There are no ACLs.

Some points:

BSD-style groups are not implemented (i.e. that new files in a directory are automatically assigned to the same group as the directory)
The "x" mode bit is ignored for directories. This would be pointless because the API allows it to access files by inode numbers, and one could easily circumvent the restrictions introduced by the "x" bit.

Examples: The file mode bits and the owner is shown by a plasma ls:

$ plasma ls /
drwxr-xr-x gerd admin 0 2011-10-05 21:41 input 
drwxr-xr-x gerd admin 0 2011-10-05 22:09 log   
drwxr-xr-x gerd admin 0 2011-10-05 22:09 output
drwxr-xr-x gerd admin 0 2011-10-05 22:09 work

You can change the mode bits with plasma chmod:

$ plasma chmod 777 /log

You can change the owning user and group with plasma chown:

$ plasma chown auser /work -auth proot
$ plasma chown :agroup /work
$ plasma chown auser:agroup /work -auth proot

Note that changes of the user is restricted to the superuser, hence we have to add -auth proot. The group can be set without that, provided the user is member of the group.

Impersonation

Impersonation is the process of becoming a regular user of the filesystem. There are three ways of setting the user and group for a session:

If the session is "proot" in the RPC layer, there are no further checks for impersonation. "proot" can become everybody immediately.
If the client knows an authentication ticket, the client can impersonate with the rights granted by the ticket. (We discuss this below.)
Otherwise, the client can also contact the authentication daemon (which must be running on the same machine). This daemon checks the Unix ID of the client, and creates an authentication ticket for this ID. This means that all users are trusted on machines where this daemon is running.

The impersonation is done by calling a special RPC procedure of the namenode.

Of course, the filesystem needs to know which users exist, and which groups are defined. For this reason, the files /etc/passwd and /etc/group also exist within PlasmaFS (stored in a database table). The versions in PlasmaFS have exactly the same format as those installed in /etc on each Unix system. One can only impersonate as a user defined in passwd and become only member of a group where group allows this. (These special files can be read and written with the plasma utility, see plasma admin_table.)

Pitfall: When authenticating as proot, nothing is said who will be the owner of newly created files. Because of this,

$ plasma mkdir /dir -auth proot                   # fails

will fail (code EINVAL). The switches -user and -group can be given to specify an arbitrary owner:

$ plasma mkdir /dir -auth proot -user foo -group bar

Authentication tickets

Basically, an authentication ticket is a random number which is used as key in a special access-control table in the namenode. The number (called "verifier") is connected with a user name, a group name, and a list of supplementary groups.

Of course, "proot" can create such tickets arbitrarily. Normal users can only create such tickets for themselves. This is useful for passing the current access rights to further processes which can even be running on different machines (and for map/reduce we need this feature).

Many PlasmaFS programs accept such tickets in the environment variable PLASMAFS_AUTH_TICKET.

The lifetime of the tickets is limited (but can be extended).

Example: One can request a ticket with plasma auth_ticket:

$ plasma auth_ticket
SCRAM-SHA1:cG5vYm9keQ==:eHh4:Z2VyZA==:Z2VyZA==:YWRt,YWRtaW4=,Y2Ryb20=,ZGlhbG91dA==,Z2VyZA==,bGlidmlydGQ=,bHBhZG1pbg==,bXl0aHR2,cGx1Z2Rldg==,c2FtYmFzaGFyZQ==:-7882205748013259769

The ticket contains various parts. Two of these are valuable login data: first, the ticket includes the password of pnobody. Second, the ticket includes a so-called verifier. These data are worth being protected, so:

don't save tickets in files
don't pass tickets to programs in command-line arguments (use the environment variable PLASMAFS_AUTH_TICKET)
don't transfer tickets via insecure network connections

Example: We get a ticket, and transfer the ticket via ssh to another machine, and run plasma as the same user. This works even if the Unix user on the other machine is different, and if there is no authentication daemon on the other machine.

$ PLASMAFS_AUTH_TICKET=`plasma auth_ticket` \
    ssh user@machine -o "SendEnv PLASMAFS_AUTH_TICKET" \
      <path>/plasma ls / -namenode ... -cluster ...

(Substitute <path> with the directory of the plasma command on machine, and provide the namenode and cluster options - there is usually no ~/.plasmafs on remote machines.)

The authentication daemon

This daemon can be reached via a Unix Domain socket in /tmp. Such sockets have the ability that they reveal who is connected as client, i.e. we can get the user and group ID of the client. The daemon creates an authentication ticket for this identity, which can then be used by the client for impersonation.

Essentially, this means one can access PlasmaFS without password on machines where this daemon is running. The authentication daemon is used by the plasma utility if there is neither the -auth switch nor the PLASMAFS_AUTH_TICKET variable contains a ticket.

Actually, the auth daemon is a broker who retrieves a new authentication ticket for the client, after it has checked the identity of the client.

Protecting the datanodes

The clients create separate connections to the datanodes, and the question is how these connections are protected.

On the RPC level, these connections authenticate as user "pnobody" for normal file I/O (and as "proot" for administrative operations).

The namenode.conf file includes a directive security_level. By default this setting is set to "auth" meaning that the client has to provide the "pnobody" password to connect to the datanode, but otherwise the data are unprotected. One can set this to:

"int": Request integrity protection (signed RPC messages)
"priv": Request privacy (also enable encryption)

Of course, enabling the "int" or even "priv" level makes data accesses a lot slower, and this is why this is disabled by default.

In addition to the protection on RPC level, there is a separate ticket system for authorizing data accesses, using special datanode tickets. Essentially, the namenode generates such tickets, and hands these out to the client. The client can only run the data I/O operations for which the client has permission, as evident from the ticket. The permissions are granted for each data block separately.

Conceptual limitations

The described authentication system has a few weaknesses:

All users authenticate with the same credentials on the RPC level. This means the users can basically decrypt and manipulate the messages of other users (e.g. they could decrypt messages containing authentication tickets, and steal the tickets). Of course, this requires superuser privileges on the machine (for tracing network traffic), and normal Unix users do not have the privileges for this. And if a user has the right to spoof the local network, the user is probably root, and has access to the "proot" password anyway. Essentially, this means that the system is relatively but not ultimately safe - some remaining risk exists that a local security leak can be abused for getting more rights. Note that this risk only exists for legitimate users (who already have the "pnobody" credentials), but not for external attackers.
It is difficult to lock users out once they have got the "pnobody" credentials and can connect to the system. Especially, one cannot revoke the credentials for a single user, and the user can connect to PlasmaFS forever. The only way to revoke access rights is to delete the user from "passwd".
It is possible that freshly allocated data blocks are not zeroed out. Data may leak to users who don't have access permissions. There is currently no workaround. (It's on the "to-do" list.)
Of course, the usual dangers apply that arise for all password-based systems.

Basically, the system is safe against external intrusion attempts, but has some issues when restricting the access rights of legitimate users to what these users should be allowed to do.

In addition to this, the current implementation may have further weaknesses or even large security holes. Please refer to the release notes for details! Also, there are configuration options affecting the level of security. For example, one can turn off encryption for datanode accesses.

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge