This guide explains how to get a running system as quickly as possible. It also describes which parts of Plasma are needed for which types of applications:
Working OS (tested):
Whether it runs on other operating systems is unknown. There is a chance it could work on
If you are starting with Plasma, it is not recommended to try one of these untested systems. Stick to 64 bit Linux (all CPUs should work). Of course, if you are an experienced Plasma user, feedback is welcome which problems occur on which OS (and which not).
Before you can do anything, you need to build Plasma, and install the resulting libraries and binaries.
If you are using a cluster of machines, note that you need to install the libaries and binaries only on a single machine. We call this machine the operator node. Of course, the programs need finally also to be copied over to the other machines, but this process is called deployment, and is supported by different scripts.
Essentially, there are three options for the build:
INSTALL
included in the tar ball. This method has the advantage
that you see what is effectively happening. However, there is the
downside that you need a lot of prerequisite software before you
can even start. Currently (January 2012), it is not possible
to get all prerequisites from anywhere in binary form as downloads
(no deb's or rpm's that are recent enough). In short: This is the
hard way. Don't do it, unless you want to help developing Plasma.godi-plasma
. Just install it, and
you are done. (How to get GODI? See
get_godi.html).
Well, actually
it is not that simple. You also need a few prerequisites, but they
are normally available from your Linux distro. This includes a
working C compiler, the PostgreSQL database, and a few libraries
(in particular the development packages for pcre
, and for postgresql
(in Debian called libpq
)).Sess_plasma_install
.
The result of the build is that the software is installed under a
certain path prefix <prefix>
, especially:
<prefix>/bin
contains executables<prefix>/lib/ocaml/pkg-lib
contains libraries, especially
plasmaclient
and mr_framework
<prefix>/doc/godi-plasma
contains documentation and examples<prefix>
- this is just a side-effect of
the build.
Things you should not do: Do not try to find "abbreviations" for the build. This creates more problems than are solved. For example, don't try to use the ocaml compiler that comes with your Linux distro. Ocaml libraries built with different versions of the compiler cannot be mixed, and attempts to do so lead to checksum mismatches.
Since Plasma-0.6, it is possible to run map/reduce jobs without PlasmaFS. The data files are just stored in the local Unix filesystem. Of course, you are then restricted to just a single computer. This mode especially exists for trying out map/reduce for the first time.
So, if this applies to you, you can skip the PlasmaFS deployment.
Remember that the map/reduce configuration file must explicitly
disable PlasmaFS. E.g. if your map/reduce program is called
my_prog
, there is a configuration file my_prog.conf
, and
it must conform to:
netplex {
namenodes {
disabled = true; (* required *)
};
mapred {
node { addr = "localhost" }; (* only one node "localhost" *)
... (* other settings *)
};
mapredjob {
... (* other settings *)
};
}
Caveat: There are many configuration files that look similar. We refer here to the file configuring the map/reduce job.
Read more about map/reduce in these two documents:
Plasmamr_howto
: Explains how to run a job in classic mode.Plasmamr_toolkit
: The advanced functional toolkit. Recommended for
real FP programmers and type enthusiasts.
In this case, you should read the instructions in Plasmafs_deployment
.
In short, you need
For running a map/reduce job, you need to know two PlasmaFS settings:
netplex {
namenodes {
clustername = "the name of the PlasmaFS cluster";
node { addr = "namenode host:namenode port" };
};
mapred {
... (* other settings *)
};
mapredjob {
... (* other settings *)
};
}
It is not necessary to configure anything on the computers running map/reduce tasks. They will automatically get the required settings together with the other task parameters.
This application allows you to store large files in a replicated way. Also, PlasmaFS is, to some degree, fault-tolerant, and gets you close to high availability. Finally, PlasmaFS can be configured to be highly secure.
This case is very similar to the previous application: read the
instructions in Plasmafs_deployment
.
Remember that there are several ways of accessing PlasmaFS:
plasma
utility (see Cmd_plasma
).plasmaclient
library, and especially
the Plasma_client
module in itPlasma_netfs
Plasmafs_deployment
it is explained that there is a file authnode.hosts
where you can
add the host names of all machines running this daemon.~/.plasmafs
(explained in Plasma_client_config
).Plasmafs_nfs
.