Registered functions are used to name functions that are filled into
the placeholders of the map/reduce algorithm scheme (such as
reduce, but also a few more).
Because the registration must happen at initialization time, it is effectively only possible to register globally defined functions, and not local functions defined inside other functions. (This limitation can currently not removed; a workaround is to pass all data via arguments.)
There is a camlp4 preprocessor helping to define registered functions. Use it like
let my_function = <:rfun< larg1 larg2 ... largM @ rarg1 rarg2 ... rargN -> body >>
The "@" and "->" characters need to occur literally here. The function
arguments before "@" are local arguments and can be omitted. The
arguments after "@" are remote arguments, and at least one of these
is mandatory. Remember that there is a local caller, and a task server
executing the function. A local argument comes from the caller, and
is sent to the task server (using marshalling). The remote arguments
are, in contrast, supplied with values from the task server (e.g.
a value previously computed in the task server). The type of
my_function is something like
my_function : L1 -> ... -> LM -> (R1 -> ... -> RN -> T) Mapred_rfun.rfun
(when the local arguments have types
Li and the remote arguments have
The camlp4 extension is activated if you compile with
ocamlfind ocamlc -syntax camlp4o -package mr_framework.toolkit ...
(or use directly the preprocessor
If there are no local arguments, you can also define without camlp4 as
let my_function = Mapred_rfun.register name (fun rarg1 ... rargN -> body)
name needs to be a unique identifier for the function. Use
Mapred_rfun.apply_partially to get the effect of local arguments.
Registered functions can, as a consequence of the value restriction,
only be monomorphic. (The usual workaround of eta-expanding the
functions is not applicable here.)
val invoke :
('a -> 'b) Mapred_rfun.rfun -> 'a -> 'b
[ `Auto_input | `Fixed_size of int | `Line_structured | `Var_size ]
Plasmamr_file_formatsfor detailed explanations:
`Line_structured: A record is a line terminated by an LF byte
`Fixed_size n: A record has exactly a size of
`Var_size: This is a binary format allowing records of variable size
`Auto_input: Recognize the format automatically from the file name. If you specify this format, only reading files is supported, and writing files will raise an exception.
val toolkit_job :
Mapred_def.mapred_env -> Mapred_def.mapred_job
class toolkit_job :