Synopsis
nfs3d -conf file [-fg] [-pid file]
Description
This is the daemon acting as an NFS server and forwarding requests to
the PlasmaFS cluster (namenodes and datanodes). The daemon implements
the nfs
and mountd
programs of NFS version 3. There is no support for
the nlockmgr
protocol yet.
For security reasons, this daemon should only be bound to the local loopback network (127.0.0.1). NFSv3 is inherently insecure, as there are no authentication verifiers. It is possible and recommended to run the daemon on every machine that wants to mount the filesystem. This way, the security problems can be avoided, because the unprotected data exchange is then restricted to a circuit in the local machine.
An instance of the NFS bridge can only connect to a single PlasmaFS cluster.
The NFS bridge can only be contacted over TCP. There is no UDP support, and it is also not planned. NFS runs well over TCP.
NFS clients can mount PlasmaFS volumes as in (Linux syntax):
mount -o intr,port=2801,mountport=2800,nolock <host>:/<clustername> /mnt
Here, <host>
is to be replaced by the machine running the NFS bridge
(normally localhost
). <clustername>
is the name of the
cluster. The port numbers might need adjustments - we assume the same
numbers are used as in the examples.
NFS (version 3) only implements weak cache consistency: An NFS client usually caches data as long as nothing is known about a possible modification, and modifications can only be recognized by changed metadata (i.e. the mtime in the inode is changed after a write). Although NFS clients typically query metadata often, it is possible that data modifications remain unnnoticed. This is a problem in the NFS protocol, not in the bridge. The PlasmaFS protocol has better cache consistency semantics, especially it is ensured that a change of data is also represented as an update of the metadata. However, the different semantics may nevertheless cause incompatibilities. For example, it is allowed for a PlasmaFS client to change data without changing the mtime in the inode. Within the PlasmaFS system this is not a big problem, because there are other means to reliably detect the change. An NFS client connected via this bridge might not see the update, though, and may continue to pretend that its own cache version is up to date. All in all, it is expected that these problems are mostly of theoretical nature, and will usually not occur in practice.
NFS version 3 can deal with large blocks in the protocol, and some client implementations also support that. For example, the Linux client supports block sizes up to 1M automatically, i.e. this is the maximum transmission unit for reads and writes. Independently of the client support, the NFS bridge translates the sizes of the data blocks used in the NFS protocol to what the PlasmaFS protocol requires. This means that the NFS bridge can handle the case that the client uses data sizes smaller than the PlasmaFS block size. There is a performance loss, though.
Especially for write accesses, it should be avoided that the blocksize is larger than the maximum blocksize the NFS client can support. Otherwise there might be an extreme performance loss. (Actually, this is a problem in NFS clients, and cannot be worked around in the server.)
PlasmaFS does not keep the count of the hard links a file has. Because of this, the NFS bridge always reports this count as 1.
Mapping principals
NFS uses numeric UIDs and GIDs to identify users and groups while PlasmaFS prefers names. Because of this, the numeric identifiers need to mapped to names (and vice versa).
The daemon just consults the local /etc/passwd
and /etc/group
files to do the mappping. This is correct if the filesystem is mounted
via the local loopback network (i.e. for the recommended
configuration), and it is better than everything else if the
filesystem is mounted over a real network.
For simplicity, the daemon just believes the group memberships the NFS client claims, i.e. the memberships are not verified with the PlasmaFS namenode. Because of this, it is possible to have different group memberships via NFS than via using the PlasmaFS protocol directly. (This might be fixed in a future release.)
Persistent mounts
The NFS bridge stores mounts in a PlasmaFS file under
/.plasma/var/lib/nfs3
. Because of this, the mounts survive restarts
of the bridge (or other PlasmaFS components).
Options
-conf file
: Reads the configuration from this file. See below for
details.-fg
: Prevents that the daemon detaches from the terminal and puts
itself into the background.-pid file
: Writes this pid file once the service process is forked.
The configuration file is in Netplex
syntax, and also uses many features
from this framework. See the documentation for Netplex
which is available
as part of the Ocamlnet
library package. There are also some explanations
here: Cmd_plasmad
.
The config file looks like:
netplex {
controller {
... (* see plasmad documentation *)
};
namenodes {
clustername = "<name>";
node_list = "<nn_list>";
port = 2730;
buffer_memory = 134217728;
};
access {
user { name = "proot"; password_file = "password_proot" };
user { name = "pnobody"; password_file = "password_pnobody" };
};
service {
name = "Nfs3";
protocol {
name = "mount3";
address {
type = "internet";
bind = "127.0.0.1:2800"
}
};
protocol {
name = "nfs3";
address {
type = "internet";
bind = "127.0.0.1:2801"
}
};
processor {
type = "nfs";
nfs3 { };
mount3 { };
};
workload_manager {
type = "constant";
threads = 1;
};
};
Parameters:
clustername
is the name of the PlasmaFS cluster. node_list
is a text file containing the names of the namenodes, one
hostname a line.buffer_memory
configures how large the internal buffer is that
the NFS bridge uses. Bigger buffers improve performance.
How to shut down the daemon
First, one should unmount all NFS clients. There is no way for an NFS server to enforce unmounts (i.e. that clients write all unsaved data).
The orderly way for shutting down the daemon is the command
netplex-admin -sockdir <socket_directory> -shutdown
netplex-admin
is part of the Ocamlnet
distribution. The
socket directory must be the configured socket directory.
It is also allowed to do a hard shutdown by sending SIGTERM signals to
the process group whose ID is written to the pid file. There is no
risk of data loss in the server because of the transactional
design. However, clients may be well confused when the connections
simply crash.