Plasma_release

Release Notes For Plasma

This is version: 0.6 "Rechenknecht". This is a beta release intended for broader testing.

Changes

Changed in 0.6.2

New platform:

Plasma works now on MacOS X. Some optional features remain unimplemented, though, especially the shared memory transport for PlasmaFS. You need Ocamlnet-3.6.1 or better. MacOS X is mainly interesting as development platform - do not expect exceptional performance.

Fixes:

Detecting ocamlnet-3.6 (and better), and requiring netstring-pcre as additional findlib dependency
OS X (and probably BSD): reformulated a sed regexp that was broken here
OS X: fixing some shell scripts
OS X: eliminating the invocation of syscalls unsupported on OS X

New features:

The map/reduce framework has a new option map_whole_files. If set, input files are no longer split into parts, i.e. one file is completely processed by a single map process. If not set (default), the previous behavior is active, namely that input files can be split into parts processed by different processes in order to improve data locality.
New options argument of Mapred_main.exec_job
One can now specify the root of the NFS-mounted filesystem
New flags for write_file

Changed in 0.6.1

Fixes:

Handling of files holes and EIO errors in Plasma_client.copy_out. Fixes errors like Invalid_argument("fd_layer_of_buf ...")
Fixing access of local empty files via the filesystem abstraction (exception Invalid_argument("Plasma_util.RangeMap.sub"))

Changed in 0.6

New features in PlasmaFS:

none

Implementation improvements in PlasmaFS:

Fixed a performance bug in the plasma utility when a lot of files are removed. This is no longer done in a single transaction, but in several, and the number of files is restricted that can be removed in one transaction.
Fixed a consistency bug that may affect the database (wrong data is written to the blockalloc table when two transactions commit at the same time to the same rows of this table)
Added a consistency check for tracking down bugs like the previous one.
It is again possible to use unicast for datanode discovery; the multicast mode is still the default, though. In unicast mode the (possible) addresses of datanodes must be specified via
```
 discovery { addr="<ip>" } 
```
entries in the datanodes section of the namenode configuration.
Improved error path for the case that a datanode becomes unavailable.

New features in the map/reduce framework:

New "toolkit" (see Plasmamr_toolkit). The toolkit is an abstract layer on top of the job execution engine, and allows it to formulate map/reduce jobs in a functional way, and to compose complex jobs from elementary ones.
Concept of an abstract filesystem. This allows it to run map/reduce on other filesystems than just PlasmaFS. In this release, there is support for local filesystems. Especially for trying map/reduce out, it is now possible to run it without having a PlasmaFS deployment, although only on a single computer. The abstract filesystem is defined in Mapred_fs.
New fixed-size and variable-size file formats. These are binary formats, and the records can include any data. This overcomes the limitation that records cannot include LF bytes when LF is used as line terminator. There is extensive documentation in Plasmamr_file_formats.
The sorting criterion can be defined by the user (new sorter in Mapred_def.mapred_job)
Support for combiners (new combiner in Mapred_def.mapred_job)
Support for map-only jobs, and "map+sort but no reduce" jobs
New module Mapred_fields for easy parsing of fields.
The RPC channel from the job control program to the task servers is now privacy-protected
The map/reduce jobs can now collect statistics ("counters")

Implementation improvements in the map/reduce framework:

The scheduling algorithm includes a number of performance optimizations. There is now support for "greedy" execution, trying to maximize the number of runnable tasks for best parallelization.
The sort and emap tasks write the output files directly to the nodes where the follow-up tasks will be executed. This way, CPU and network utilization is better interleaved.
The record reader (in Mapred_io) can now read the following blocks in a separate kernel thread while the current blocks are still being processed, resulting in better resource utilization.
The command start_task_servers automatically kills the still running old instance of the task server
Optimized and unified implementation of sorting

Compatibility:

Existing PlasmaFS filesystems for Plasma-0.5 are compatible, and can be retained.
The PlasmaFS protocol is exactly the same as before.
There are incompatible protocol changes in the map/reduce protocol, though.

What is working and not working in PlasmaFS

Generally, PlasmaFS works as described in the documentation. Crashes have not been observed for quite some time now, but occasionally one might see critical exceptions in the log file.

PlasmaFS has so far only been tested on 64 bit, and only on Linux as operation system. There are known issues for 32 bit machines, especially the blocksize must not be larger than 4M, and certain buffers are restricted to 16M.

Data safety: Cannot be guaranteed. It is not suggested to put valuable data into PlasmaFS.

Known problems:

The command plasma put -stdin is very slow if the input pipe is frequently flushed. Workaround: Use "dd bs=1M iflags=fullblock" to write in chunks of 1M to plasma, e.g.
```
 cat file | dd bs=1M iflags=fullblock | plasma put -stdin /file 
```
(Actually, this is not a bug in Plasma but in Ocamlnet 3.4.1, and will be resolved there.)
It is still unclear whether the timeout settings are acceptable.
The generated inode numbers are not necessarily unique after namenode restarts.
Some namenode operations do not reduce the blocklimit metadata field when it is possible
It is not yet possible to limit the number of connections the namenode accepts. When it hits the OS limit, an exception will occur, and the namenode is in an inconsistent state. This is of course not acceptable.
Writing large files via the NFS bridge may result in performance problems when the NFS client does not respect the block boundaries.
There is still the security problem that allocated but not written blocks remain in an undefined state (information leak possible)
The recursive removal of large directory trees in a single transaction runs into a performance problem. Workaround: Use several transactions.

Not implemented features:

There are too many hard-coded constants.
The file name read/lookup functions should never return ECONFLICT errors. (This has been improved in 0.2, though.)
Support for checksums
Support for "host groups", so that it is easier to control which machines may store which blocks. Semantics have to be specified yet.
Define how blocks are handled that are allocated but never written.
Recognition of the death of the coordinator, and restart of the election algorithm.
Lock manager (avoid that clients have to busy wait on locks)
Restoration of missing replicas
Rebalancing of the cluster
Automated copying of the namenode database to freshly added namenode slaves
No IPv6 support yet.

What is working and not working in Plasma MapReduce

Generally, Plasma MapReduce works as described in the documentation.

Not implemented features:

Dynamically extensible task servers
nice web interface
restart/relocation of failed tasks
recompute intermediate files that are no longer accessible due to node failures
Speculative execution of tasks
Support job management (remember which jobs have been run etc.)

What we will never implement:

Jobs only consisting of reduce but no map cannot be supported due to the task scheme. (Reason: Input files for sort tasks must not exceed sort_limit.) Workaround: Use the identity as map.

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge