Job runners¶

The runner is the fractal-server components that executes a job (based on a certain workflow and dataset) on a computational resource.

Configuration¶

The runner configuration is defined in the jobs_runner_config property of a computational resource. The configuration schemas reported below apply to a local resource (see JobRunnerConfigLocal) and to a slurm_sudo/slurm_ssh resource (see JobRunnerConfigSLURM). Some more specific details of the SLURM configurations are described at advanced SLURM configuration.

`JobRunnerConfigLocal` ¶

Bases: BaseModel

Runner-configuration specifications, for a local resource.

The typical use case is that setting parallel_tasks_per_job to a small number (e.g. 1) will limit parallelism when executing tasks requiring a large amount of resources (e.g. memory) on a local machine.

ATTRIBUTE	DESCRIPTION
`parallel_tasks_per_job`	Maximum number of tasks to be run in parallel within a local runner. If `None`, then all tasks may start at the same time. TYPE: `int \| None`

Source code in fractal_server/runner/config/_local.py

class JobRunnerConfigLocal(BaseModel):
    """
    Runner-configuration specifications, for a `local` resource.

    The typical use case is that setting `parallel_tasks_per_job` to a
    small number (e.g. 1) will limit parallelism when executing tasks
    requiring a large amount of resources (e.g. memory) on a local machine.

    Attributes:
        parallel_tasks_per_job:
            Maximum number of tasks to be run in parallel within a local
            runner. If `None`, then all tasks may start at the same time.
    """

    model_config = ConfigDict(extra="forbid")
    parallel_tasks_per_job: int | None = None

    @property
    def batch_size(self) -> int:
        return self.parallel_tasks_per_job or 0

`JobRunnerConfigSLURM` ¶

Bases: BaseModel

Runner-configuration specifications, for a slurm_sudo or slurm_ssh resource.

Note: this is a common class, which is processed and transformed into more specific configuration objects during job execution.

Valid JSON example

{
    "default_slurm_config": {
        "partition": "partition-name",
        "cpus_per_task": 1,
        "mem": "100M"
    },
    "gpu_slurm_config": {
        "partition": "gpu",
        "extra_lines": [
            "#SBATCH --gres=gpu:v100:1"
        ]
    },
    "user_local_exports": {
        "CELLPOSE_LOCAL_MODELS_PATH": "CELLPOSE_LOCAL_MODELS_PATH",
        "NAPARI_CONFIG": "napari_config.json"
    },
    "batching_config": {
        "target_cpus_per_job": 1,
        "max_cpus_per_job": 1,
        "target_mem_per_job": 200,
        "max_mem_per_job": 500,
        "target_num_jobs": 2,
        "max_num_jobs": 4
    }
}

ATTRIBUTE	DESCRIPTION
`default_slurm_config`	Common default options for all tasks. TYPE: `SlurmConfigSet`
`gpu_slurm_config`	Default configuration for all GPU tasks. TYPE: `SlurmConfigSet \| None`
`batching_config`	Configuration of the batching strategy. TYPE: `BatchingConfigSet`
`user_local_exports`	Key-value pairs to be included as `export`-ed variables in SLURM submission script, after prepending values with the user's cache directory. TYPE: `DictStrStr`

Source code in fractal_server/runner/config/_slurm.py

class JobRunnerConfigSLURM(BaseModel):
    """
    Runner-configuration specifications, for a `slurm_sudo` or
    `slurm_ssh` resource.

    Note: this is a common class, which is processed and transformed into more
    specific configuration objects during job execution.

    Valid JSON example
    ```json
    {
        "default_slurm_config": {
            "partition": "partition-name",
            "cpus_per_task": 1,
            "mem": "100M"
        },
        "gpu_slurm_config": {
            "partition": "gpu",
            "extra_lines": [
                "#SBATCH --gres=gpu:v100:1"
            ]
        },
        "user_local_exports": {
            "CELLPOSE_LOCAL_MODELS_PATH": "CELLPOSE_LOCAL_MODELS_PATH",
            "NAPARI_CONFIG": "napari_config.json"
        },
        "batching_config": {
            "target_cpus_per_job": 1,
            "max_cpus_per_job": 1,
            "target_mem_per_job": 200,
            "max_mem_per_job": 500,
            "target_num_jobs": 2,
            "max_num_jobs": 4
        }
    }
    ```

    Attributes:
        default_slurm_config:
            Common default options for all tasks.
        gpu_slurm_config:
            Default configuration for all GPU tasks.
        batching_config:
            Configuration of the batching strategy.
        user_local_exports:
            Key-value pairs to be included as `export`-ed variables in SLURM
            submission script, after prepending values with the user's cache
            directory.
    """

    model_config = ConfigDict(extra="forbid")

    default_slurm_config: SlurmConfigSet
    gpu_slurm_config: SlurmConfigSet | None = None
    batching_config: BatchingConfigSet
    user_local_exports: DictStrStr = Field(default_factory=dict)

Runners¶

The three runner implementations (for the local, SLURM/sudo and SLURM/SSH cases) are constructed based on the following class hierarchy:

BaseRunner is the base class for all runners, which notably includes the submit and multisubmit methods (to be overridden in child classes).
- LocalRunner is the runner implementation for a local computational resource.
- BaseSlurmRunner inherits from BaseRunner and adds the common part of SLURM runners:
  - SlurmSudoRunner is the runner implementation for a slurm_sudo resource.
  - SlurmSSHRunner is the runner implementation for a slurm_ssh resource.

Job runners¶

Configuration¶

JobRunnerConfigLocal ¶

JobRunnerConfigSLURM ¶

Runners¶

`JobRunnerConfigLocal` ¶

`JobRunnerConfigSLURM` ¶