Skip to content

Job runners

The runner is the fractal-server components that executes a job (based on a certain workflow and dataset) on a computational resource.

Configuration

The runner configuration is defined in the jobs_runner_config property of a computational resource. The configuration schemas reported below apply to a local resource (see JobRunnerConfigLocal) and to a slurm_sudo/slurm_ssh resource (see JobRunnerConfigSLURM). Some more specific details of the SLURM configurations are described at advanced SLURM configuration.

JobRunnerConfigLocal

Bases: BaseModel

Runner-configuration specifications, for a local resource.

The typical use case is that setting parallel_tasks_per_job to a small number (e.g. 1) will limit parallelism when executing tasks requiring a large amount of resources (e.g. memory) on a local machine.

ATTRIBUTE DESCRIPTION
parallel_tasks_per_job

Maximum number of tasks to be run in parallel within a local runner. If None, then all tasks may start at the same time.

TYPE: int | None

Source code in fractal_server/runner/config/_local.py
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
class JobRunnerConfigLocal(BaseModel):
    """
    Runner-configuration specifications, for a `local` resource.

    The typical use case is that setting `parallel_tasks_per_job` to a
    small number (e.g. 1) will limit parallelism when executing tasks
    requiring a large amount of resources (e.g. memory) on a local machine.

    Attributes:
        parallel_tasks_per_job:
            Maximum number of tasks to be run in parallel within a local
            runner. If `None`, then all tasks may start at the same time.
    """

    model_config = ConfigDict(extra="forbid")
    parallel_tasks_per_job: int | None = None

    @property
    def batch_size(self) -> int:
        return self.parallel_tasks_per_job or 0

JobRunnerConfigSLURM

Bases: BaseModel

Runner-configuration specifications, for a slurm_sudo or slurm_ssh resource.

Note: this is a common class, which is processed and transformed into more specific configuration objects during job execution.

Valid JSON example

{
    "default_slurm_config": {
        "partition": "partition-name",
        "cpus_per_task": 1,
        "mem": "100M"
    },
    "gpu_slurm_config": {
        "partition": "gpu",
        "extra_lines": [
            "#SBATCH --gres=gpu:v100:1"
        ]
    },
    "user_local_exports": {
        "CELLPOSE_LOCAL_MODELS_PATH": "CELLPOSE_LOCAL_MODELS_PATH",
        "NAPARI_CONFIG": "napari_config.json"
    },
    "batching_config": {
        "target_cpus_per_job": 1,
        "max_cpus_per_job": 1,
        "target_mem_per_job": 200,
        "max_mem_per_job": 500,
        "target_num_jobs": 2,
        "max_num_jobs": 4
    }
}

ATTRIBUTE DESCRIPTION
default_slurm_config

Common default options for all tasks.

TYPE: SlurmConfigSet

gpu_slurm_config

Default configuration for all GPU tasks.

TYPE: SlurmConfigSet | None

batching_config

Configuration of the batching strategy.

TYPE: BatchingConfigSet

user_local_exports

Key-value pairs to be included as export-ed variables in SLURM submission script, after prepending values with the user's cache directory.

TYPE: DictStrStr

Source code in fractal_server/runner/config/_slurm.py
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
class JobRunnerConfigSLURM(BaseModel):
    """
    Runner-configuration specifications, for a `slurm_sudo` or
    `slurm_ssh` resource.

    Note: this is a common class, which is processed and transformed into more
    specific configuration objects during job execution.

    Valid JSON example
    ```json
    {
        "default_slurm_config": {
            "partition": "partition-name",
            "cpus_per_task": 1,
            "mem": "100M"
        },
        "gpu_slurm_config": {
            "partition": "gpu",
            "extra_lines": [
                "#SBATCH --gres=gpu:v100:1"
            ]
        },
        "user_local_exports": {
            "CELLPOSE_LOCAL_MODELS_PATH": "CELLPOSE_LOCAL_MODELS_PATH",
            "NAPARI_CONFIG": "napari_config.json"
        },
        "batching_config": {
            "target_cpus_per_job": 1,
            "max_cpus_per_job": 1,
            "target_mem_per_job": 200,
            "max_mem_per_job": 500,
            "target_num_jobs": 2,
            "max_num_jobs": 4
        }
    }
    ```

    Attributes:
        default_slurm_config:
            Common default options for all tasks.
        gpu_slurm_config:
            Default configuration for all GPU tasks.
        batching_config:
            Configuration of the batching strategy.
        user_local_exports:
            Key-value pairs to be included as `export`-ed variables in SLURM
            submission script, after prepending values with the user's cache
            directory.
    """

    model_config = ConfigDict(extra="forbid")

    default_slurm_config: SlurmConfigSet
    gpu_slurm_config: SlurmConfigSet | None = None
    batching_config: BatchingConfigSet
    user_local_exports: DictStrStr = Field(default_factory=dict)

Runners

The three runner implementations (for the local, SLURM/sudo and SLURM/SSH cases) are constructed based on the following class hierarchy:

  • BaseRunner is the base class for all runners, which notably includes the submit and multisubmit methods (to be overridden in child classes).
    • LocalRunner is the runner implementation for a local computational resource.
    • BaseSlurmRunner inherits from BaseRunner and adds the common part of SLURM runners: