Fractal task-execution¶

This page describes how fractal-server runs a sequence of Fractal tasks and processes the metadata they produce.

NOTE: The process of defining a single full specification for this interface is still ongoing.

The description below is based on concepts and definitions which are part of fractal-server. For the specific case of the Fractal image list, a more detailed description is available at https://fractal-analytics-platform.github.io/image_list. For clarifications about other terms or definitions, the starting point is the execute_tasks function in the runner.py Python module.

Within fractal-server, a Fractal task is associated to a TaskV2 object, which has either one or both non-parallel and parallel components (where "both" corresponds to compound tasks). The command_non_parallel and command_parallel attributes, when set, represent a command-line executables which are used to run the task. As an example, if command_non_parallel = "/path/to/python /path/to/my_task.py, then the command that is executed will look like

/path/to/python /path/to/my_task.py --args-json /path/to/args.json --out-json /path/to/out.json

For Fractal tasks that are developed in Python, the fractal-task-tools exposes a helper tool to implement this command-line interface.

The main entrypoint for task execution in fractal-server is the execute_tasks function, which executes a list of tasks (that is, part of a Fractal workflow). Its input arguments include:

a Fractal dataset (which also contains an image list),
a list of workflow tasks (each one associated to a TaskV2 object),
filters based on image types or attributes, set by the user upon job submission.

In the following parts of this page we provide a high-level description of the execute_tasks flow. Some aspects which are not covered here are:

Validation procedures and error handling.
Fractal-job statuses and history tracking.
Advanced status-based image filtering.

Initialization phase¶

Before starting the execution of the tasks, fractal-server initializes some relevant variables.

Variables that are extracted from the current dataset state:
- zarr_dir
- The current image list
Variables that are extracted from user-provided job-submission parameters:
- Image-type filters to apply to the image list.

After this preliminary phase the following three steps (pre-execution, execution, post-execution) are repeated for all tasks in the list.

Pre-task-execution phase¶

If the task is a converter, it does not receive any OME-Zarr image as input. For non-converter tasks, however, fractal-server prepares a list of images that will be part of either zarr_urls (for non-parallel or compound tasks) or of the individual zarr_url arguments (for parallel tasks).

The input image list is constructed by applying two sets of filters to the current dataset image list:

Image-type filters obtained as a combination of current type filters, the task input types and the user-specified workflow-task type filters.
Image-attribute filters specified by the user upon job submission.

This procedure leads to a filtered_images list, with all OME-Zarr images that should be used as input for the task.

Task execution¶

This part is covered by task-type specific code blocks like

if task.type in [TaskType.NON_PARALLEL, TaskType.CONVERTER_NON_PARALLEL]:
    outcomes_dict, num_tasks = run_task_non_parallel(
        images=filtered_images,
        zarr_dir=zarr_dir,
        wftask=wftask,
        task=task,
        dataset_id=dataset.id,
        task_type=task.type,
        # ...
    )
elif task.type == TaskType.PARALLEL:
    outcomes_dict, num_tasks = run_task_parallel(
        # ...
    )
elif task.type in [TaskType.COMPOUND, TaskType.CONVERTER_COMPOUND]:
    outcomes_dict, num_tasks = run_task_compound(
        # ...
    )

where each value of outcomes_dict is a SubmissionOutcome object and may have a task_output attribute which is a TaskOutput object.

The inner working of e.g. the run_task_non_parallel function is not described here, and it is implemented in a specific job runner.

Post-task-execution phase¶

Metadata outputs from all units are merged into a single TaskOutput object.
If there are no images to be created or updated, all input images in filtered_images are flagged as "to be updated", so that they will be updated e.g. with the new types set by the task.
For each image that should be created or updated, the image attributes, types and origin properties are updated as appropriate.
All images marked as "to be removed" are removed from the image list.
The current type filters are updated based on task output_types
The existing dataset image list is replaced with the new one, in the database.