Job runners¶
The runner is the fractal-server components that executes a job (based on a certain workflow and dataset) on a computational resource.
Configuration¶
The runner configuration is defined in the jobs_runner_config property of a computational resource. The configuration schemas reported below apply to a local resource (see JobRunnerConfigLocal) and to a slurm_sudo/slurm_ssh resource (see JobRunnerConfigSLURM). Some more specific details of the SLURM configurations are described at advanced SLURM configuration.
JobRunnerConfigLocal
¶
Bases: BaseModel
Runner-configuration specifications, for a local resource.
The typical use case is that setting parallel_tasks_per_job to a
small number (e.g. 1) will limit parallelism when executing tasks
requiring a large amount of resources (e.g. memory) on a local machine.
| ATTRIBUTE | DESCRIPTION |
|---|---|
parallel_tasks_per_job |
Maximum number of tasks to be run in parallel within a local
runner. If
TYPE:
|
Source code in fractal_server/runner/config/_local.py
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | |
JobRunnerConfigSLURM
¶
Bases: BaseModel
Runner-configuration specifications, for a slurm_sudo or
slurm_ssh resource.
Note: this is a common class, which is processed and transformed into more specific configuration objects during job execution.
Valid JSON example
{
"default_slurm_config": {
"partition": "partition-name",
"cpus_per_task": 1,
"mem": "100M"
},
"gpu_slurm_config": {
"partition": "gpu",
"extra_lines": [
"#SBATCH --gres=gpu:v100:1"
]
},
"user_local_exports": {
"CELLPOSE_LOCAL_MODELS_PATH": "CELLPOSE_LOCAL_MODELS_PATH",
"NAPARI_CONFIG": "napari_config.json"
},
"batching_config": {
"target_cpus_per_job": 1,
"max_cpus_per_job": 1,
"target_mem_per_job": 200,
"max_mem_per_job": 500,
"target_num_jobs": 2,
"max_num_jobs": 4
}
}
| ATTRIBUTE | DESCRIPTION |
|---|---|
default_slurm_config |
Common default options for all tasks.
TYPE:
|
gpu_slurm_config |
Default configuration for all GPU tasks.
TYPE:
|
batching_config |
Configuration of the batching strategy.
TYPE:
|
user_local_exports |
Key-value pairs to be included as
TYPE:
|
Source code in fractal_server/runner/config/_slurm.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
Runners¶
The three runner implementations (for the local, SLURM/sudo and SLURM/SSH cases) are constructed based on the following class hierarchy:
BaseRunneris the base class for all runners, which notably includes thesubmitandmultisubmitmethods (to be overridden in child classes).LocalRunneris the runner implementation for alocalcomputational resource.BaseSlurmRunnerinherits fromBaseRunnerand adds the common part of SLURM runners:SlurmSudoRunneris the runner implementation for aslurm_sudoresource.SlurmSSHRunneris the runner implementation for aslurm_sshresource.