module Mapred_def:sig
..end
class type mapred_env =object
..end
typedesignation =
[ `Deep_dir | `File | `Flat_dir ]
input_dir
:
`File
: Interpret input_dir
as a single file (not as directory)`Flat_dir
: Take the files in input_dir
, except those that start
with a dot or an underscore`Deep_dir
: Take the files in input_dir
and all inner directories.
Again files are ignore when they start with a dot or underscore.
Directories starting with dot or underscore are completely ignored.typephases =
[ `Map | `Map_sort | `Map_sort_reduce ]
`Map
: Only the map phase is executed. The output directory will
contain files mapped_#_#
where "#" is a number.`Map_sort
: The mapped files are sorted. This generates files
sorted_#
where "#" is a number.`Map_sort_reduce
: The sorted files are shuffled and reduced.
This generates files partition_#
where "#" is a number.
This is the default.class type mapred_job_config =object
..end
class type sorter =object
..end
class type task_info =object
..end
class type mapred_job =object
..end
val get_rc : mapred_env -> int -> Mapred_io.record_config
get_rc me bigblock_size
: Returns a record config for the given
environment and the suggested size of the bigblockval get_job_local_dir : mapred_env -> mapred_job_config -> string
task_files
can be found. The task implementations
can use this directory also for other purposes, e.g. temporary
files. The directory exists for the lifetime of the job.
Note that this directory is only created when needed.
Same as Mapred_taskfiles.taskfile_manager.local_directory
.val get_job_log_dir : mapred_env -> mapred_job_config -> string
log_dir
in PlasmaFS when the job is finished.
Note that this directory is only created when needed.
Same as Mapred_taskfiles.taskfile_manager.log_directory
.