⚠️ OBSOLETE: How to write a Fractal-compatible custom task¶
⚠️⚠️ These instructions are here just as a reference, but they refer to legacy versions of
fractal-server
. While the overall structure of the instructions is still valid, several details are now obsolete and won't work. ⚠️⚠️
The fractal-tasks-core
repository is the reference implementation for Fractal
tasks and for Fractal task packages, but the Fractal platform can also be
used to execute custom tasks.
This page lists the Fractal-compatibility requirements, for a single custom task or for a task package.
Note that these specifications evolve frequently, see e.g. discussion at https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/151.
Note: While the contents of this page remain valid, the recommended procedure to get up to speed and build a Python package of Fractal-compatible tasks is to use the template available at https://github.com/fractal-analytics-platform/fractal-tasks-template.
A Fractal task is mainly formed by two components:
- A set of metadata, which are stored in the
task
table of the database of afractal-server
instance, see Task metadata. - An executable command, which can take some specific command-line arguments (see Command-line interface); the standard example is a Python script.
In the following we explain what are the Fractal-compatibility requirements for a single task, and then for a task package.
Single custom task¶
We describe how to define the multiple aspects of a task, and provide a Full task example.
Task metadata¶
Each task must be associated to some metadata, so that it can be used in Fractal. The full specification is here, and the required attributes are:
name
: the task name, e.g."Create OME-Zarr structure"
;command
: a command that can be executed from the command line;input_type
: this can be any string (typical examples:"image"
or"zarr"
); the special value"Any"
means that Fractal won't perform any check of theinput_type
when applying the task to a dataset.output_type
: same logic asinput_type
.source
: this is meant to be as close as possible to unique task identifier; for custom tasks, it can be anything (e.g."my_task"
), but for task that are collected automatically from a package (see Task package this attribute will have a very specific form (e.g."pip_remote:fractal_tasks_core:0.10.0:fractal-tasks::convert_yokogawa_to_ome-zarr"
).meta
: a JSON object (similar to a Python dictionary) with some additional information, see Task meta-parameters.
There are multiple ways to get the appropriate metadata into the database,
including a POST request to the fractal-server
API (see Tasks
section in
the fractal-server
API
documentation)
or the automated addition of a whole set of tasks through specific API
endpoints (see Task package).
Command-line interface¶
Some examples of task commands may look like
python3 /some/path/my_task.py
,/some/absolute/path/python3.10 /some/other/absolute/path/my_task.py
,/some/path/my_executable_task.py
,- any other executable command (not necessarily based on Python).
Given a task command, Fractal will add two additional command-line arguments to it:
-j /some/path/input-arguments.json
--metadata-out /some/path/output-metadata-update.json
Therefore the task command must accept these additional command-line arguments.
If the task is a Python script, this can be achieved easily by using the
run_fractal_task
function - which is available as part of
fractal_tasks_core.tasks._utils
.
Task meta-parameters¶
The meta
attribute of tasks (see the corresponding item in Task
metadata) is where we specify some requirements on how the
task should be run. This notably includes:
- If the task has to be run in parallel (e.g. over multiple wells of an
OME-Zarr dataset), then
meta
should include a key-value pair like{"parallelization_level": "well"}
. If theparallelization_level
key is missing, the task is considered as non-parallel. - If Fractal is configured to run on a SLURM cluster,
meta
may include additional information on the SLRUM requirements (more info on the Fractal SLURM backend here).
Task input parameters¶
When a task is run via Fractal, its input parameters (i.e. the ones in the file
specified via the -j
command-line otion) will always include a set of keyword
arguments with specific names:
input_paths
output_path
metadata
component
(only for parallel tasks)
Task output¶
The only task output which will be visible to Fractal is what goes in the
output metadata-update file (i.e. the one specified through the
--metadata-out
command-line option). Note that this only holds for
non-parallel tasks, while (for the moment) Fractal fully ignores the output of
parallel tasks.
IMPORTANT: This means that each task must always write any output to disk, before ending.
Advanced features¶
The description of other advanced features is not yet available in this page.
- Also other attributes of the Task metadata exist, and they
would be recognized by other Fractal components (e.g.
fractal-server
orfractal-web
). These include JSON Schemas for input parameters and additional documentation-related attributes. - In
fractal-tasks-core
, we usepydantic v1
to fully coerce and validate the input parameters into a set of given types.
Full task example¶
Here we describe a simplified example of a Fractal-compatible Python task (for
more realistic examples see the fractal-task-core
tasks
folder).
The script /some/path/my_task.py
may look like
# Import a helper function from fractal_tasks_core
from fractal_tasks_core.tasks._utils import run_fractal_task
def my_task_function(
# Reserved Fractal arguments
input_paths,
output_path,
metadata,
# Task-specific arguments
argument_A,
argument_B = "default_B_value",
):
# Do something, based on the task parameters
print("Here we go, we are in `my_task_function`")
with open(f"{output_path}/output.txt", "w") as f:
f.write(f"argument_A={argument_A}\n")
f.write(f"argument_B={argument_B}\n")
# Compile the output metadata update and return
output_metadata_update = {"nothing": "to add"}
return output_metadata_update
# Thi block is executed when running the Python script directly
if __name__ == "__main__":
run_fractal_task(task_function=my_task_function)
run_fractal_task
so that we don't have to take care of the command-line arguments.
Some valid metadata attributes for this task would be:
name="My Task"
command="python3 /some/path/my_task.py"
input_type="Any"
output_type="Any"
source="my_custom_task"
meta={}
Note that this was an example of a non-parallel tasks; to have a parallel one, we would also need to:
- Set
meta={"parallelization_level": "something"}
;- Include
component
in the input arguments ofmy_task_function
.
Task package¶
Given a set of Python scripts corresponding to Fractal tasks, it is useful to
combine them into a single Python package, using the standard
tools or
other options (e.g. for fractal-tasks-core
we use
poetry).
Reasons¶
Creating a package is often a good practice, for reasons unrelated to Fractal:
- It makes it simple to assign a global version to the package, and to host it on a public index like PyPI;
- It may reduce code duplication:
- The scripts may have a shared set of external dependencies, which are defined in a single place for a package.
- The scripts may import functions from a shared set of auxiliary Python modules, which can be included in the package.
Moreover, having a single package also streamlines some Fractal-related
operations. Given the package MyTasks
(available on PyPI, or locally), the
Fractal platform offers a feature that automatically:
- Downloads the wheel file of package
MyTasks
(if it's on a public index, rather than a local file); - Creates a Python virtual environment (venv) which is specific for a given
version of the
MyTasks
package, and installs theMyTasks
package in that venv; - Populates all the corresponding entries in the
task
database table with the appropriate Task metadata, which are extracted from the package manifest.
This feature is currently exposed in the /api/v1/task/collect/pip/
endpoint of fractal-server
(see API documentation).
Requirements¶
To be compatible with Fractal, a task package must satisfy some additional requirements:
- The package is built as a a wheel file, and can be installed via
pip
. - The
__FRACTAL_MANIFEST__.json
file is bundled in the package, in its root folder. If you are usingpoetry
, no special operation is needed. If you are using asetup.cfg
file, see this comment. - Include JSON Schemas. The tools in
fractal_tasks_core.dev
are used to generate JSON Schema's for the input parameters of each task infractal-tasks-core
. They are meant to be flexible and re-usable to perform the same operation on an independent package, but they are not thoroughly documented/tested for more general use; feel free to open an issue if something is not clear. - Include additional task metadata like
docs_info
ordocs_link
, which will be displayed in the Fractal web-client. Note: this feature is not yet implemented.
The ones in the list are the main requirements; if you hit unexpected behaviors, also have a look at https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/151 or open a new issue.