module Mapred_toolkit:sig
..end
Plasmamr_toolkit
.
Registered functions are used to name functions that are filled into
the placeholders of the map/reduce algorithm scheme (such as
map
and reduce
, but also a few more).
Because the registration must happen at initialization time, it is effectively only possible to register globally defined functions, and not local functions defined inside other functions. (This limitation can currently not removed; a workaround is to pass all data via arguments.)
There is a camlp4 preprocessor helping to define registered functions. Use it like
let my_function =
<:rfun< larg1 larg2 ... largM @ rarg1 rarg2 ... rargN -> body >>
The "@" and "->" characters need to occur literally here. The function
arguments before "@" are local arguments and can be omitted. The
arguments after "@" are remote arguments, and at least one of these
is mandatory. Remember that there is a local caller, and a task server
executing the function. A local argument comes from the caller, and
is sent to the task server (using marshalling). The remote arguments
are, in contrast, supplied with values from the task server (e.g.
a value previously computed in the task server). The type of
my_function
is something like
my_function : L1 -> ... -> LM -> (R1 -> ... -> RN -> T) Mapred_rfun.rfun
(when the local arguments have types Li
and the remote arguments have
types Ri
).
The camlp4 extension is activated if you compile with
ocamlfind ocamlc -syntax camlp4o -package mr_framework.toolkit ...
(or use directly the preprocessor camlp4 pa_toolkit.cma
).
If there are no local arguments, you can also define without camlp4 as
let my_function =
Mapred_rfun.register name (fun rarg1 ... rargN -> body)
Here, name
needs to be a unique identifier for the function. Use
Mapred_rfun.apply_partially
to get the effect of local arguments.
Registered functions can, as a consequence of the value restriction,
only be monomorphic. (The usual workaround of eta-expanding the
functions is not applicable here.)
val invoke : ('a -> 'b) Mapred_rfun.rfun -> 'a -> 'b
typeformat =
[ `Auto_input | `Fixed_size of int | `Line_structured | `Var_size ]
Plasmamr_file_formats
for detailed explanations:
`Line_structured
: A record is a line terminated by an LF byte`Fixed_size n
: A record has exactly a size of n
bytes`Var_size
: This is a binary format allowing records of
variable size`Auto_input
: Recognize the format automatically from the
file name. If you specify this format, only reading files is
supported, and writing files will raise an exception.module Place:sig
..end
module Store:sig
..end
module Seq:sig
..end
module DSeq:sig
..end
val toolkit_job : Mapred_def.mapred_env -> Mapred_def.mapred_job
Mapred_toolkit.DSeq
.class toolkit_job :Mapred_def.mapred_env ->
Mapred_def.mapred_job