Fractal task-execution¶
This page describes how fractal-server runs a sequence of Fractal tasks and processes the metadata they produce.
NOTE: The process of defining a single full specification for this interface is still ongoing.
The description below is based on concepts and definitions which are part of fractal-server. For the specific case of the Fractal image list, a more detailed description is available at https://fractal-analytics-platform.github.io/image_list. For clarifications about other terms or definitions, the starting point is the execute_tasks function in the runner.py Python module.
Within fractal-server, a Fractal task is associated to a TaskV2 object, which has either one or both non-parallel and parallel components (where "both" corresponds to compound tasks).
The command_non_parallel and command_parallel attributes, when set, represent a command-line executables which are used to run the task. As an example, if command_non_parallel = "/path/to/python /path/to/my_task.py, then the command that is executed will look like
/path/to/python /path/to/my_task.py --args-json /path/to/args.json --out-json /path/to/out.json
fractal-task-tools exposes a helper tool to implement this command-line interface.
The main entrypoint for task execution in fractal-server is the execute_tasks function, which executes a list of tasks (that is, part of a Fractal workflow). Its input arguments include:
- a Fractal dataset (which also contains an image list),
- a list of workflow tasks (each one associated to a
TaskV2object), - filters based on image types or attributes, set by the user upon job submission.
In the following parts of this page we provide a high-level description of the execute_tasks flow. Some aspects which are not covered here are:
- Validation procedures and error handling.
- Fractal-job statuses and history tracking.
- Advanced status-based image filtering.
Initialization phase¶
Before starting the execution of the tasks, fractal-server initializes some relevant variables.
- Variables that are extracted from the current dataset state:
zarr_dir- The current image list
- Variables that are extracted from user-provided job-submission parameters:
- Image-type filters to apply to the image list.
After this preliminary phase the following three steps (pre-execution, execution, post-execution) are repeated for all tasks in the list.
Pre-task-execution phase¶
If the task is a converter, it does not receive any OME-Zarr image as input.
For non-converter tasks, however, fractal-server prepares a list of images that will be part of either zarr_urls (for non-parallel or compound tasks) or of the individual zarr_url arguments (for parallel tasks).
The input image list is constructed by applying two sets of filters to the current dataset image list:
- Image-type filters obtained as a combination of current type filters, the task input types and the user-specified workflow-task type filters.
- Image-attribute filters specified by the user upon job submission.
This procedure leads to a filtered_images list, with all OME-Zarr images that should be used as input for the task.
Task execution¶
This part is covered by task-type specific code blocks like
if task.type in [TaskType.NON_PARALLEL, TaskType.CONVERTER_NON_PARALLEL]:
outcomes_dict, num_tasks = run_task_non_parallel(
images=filtered_images,
zarr_dir=zarr_dir,
wftask=wftask,
task=task,
dataset_id=dataset.id,
task_type=task.type,
# ...
)
elif task.type == TaskType.PARALLEL:
outcomes_dict, num_tasks = run_task_parallel(
# ...
)
elif task.type in [TaskType.COMPOUND, TaskType.CONVERTER_COMPOUND]:
outcomes_dict, num_tasks = run_task_compound(
# ...
)
outcomes_dict is a SubmissionOutcome object and may have a task_output attribute which is a TaskOutput object.
The inner working of e.g. the run_task_non_parallel function is not described here, and it is implemented in a specific job runner.
Post-task-execution phase¶
- Metadata outputs from all units are merged into a single
TaskOutputobject. - If there are no images to be created or updated, all input images in
filtered_imagesare flagged as "to be updated", so that they will be updated e.g. with the new types set by the task. - For each image that should be created or updated, the image
attributes,typesandoriginproperties are updated as appropriate. - All images marked as "to be removed" are removed from the image list.
- The current type filters are updated based on task output_types
- The existing dataset image list is replaced with the new one, in the database.