Table specifcations¶
Within fractal-tasks-core
, we make use of tables which are AnnData
objects
stored within OME-Zarr image groups. This page describes the different kinds of
tables we use, and it includes:
- A core table specification, valid for all tables;
- The definition of tables for regions of interests (ROIs);
- The definition of masking ROI tables, namely ROI tables that are linked e.g. to labels;
- A feature-table specification, to store measurements.
Note: The specifications below are largely inspired by a proposed update to OME-NGFF specs. This update is currently on hold, and
fractal-tasks-core
will evolve as soon as an official NGFF table specs is adopted - see also the Outlook section.
Specifications (V1)¶
In this section we describe version 1 (V1) of the Fractal table specifications;
for the moment, only V1 exists.
Note that V1 specifications are only implemented as os of version 0.14.0 of
fractal-tasks-core
.
Core tables¶
The core-table specification consists in the definition of the required Zarr
structure and attributes, and of the AnnData
table format.
AnnData
table format
We store tabular data into Zarr groups as AnnData
("Annotated Data") objects;
the anndata
Python library provides the
definition of this format and the relevant tools. Quoting from the anndata
documentation:
AnnData
is specifically designed for matrix-like data. By this we mean that we have \(n\) observations, each of which can be represented as \(d\)-dimensional vectors, where each dimension corresponds to a variable or feature. Both the rows and columns of this \(n \times d\) matrix are special in the sense that they are indexed.(https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html)
Note that AnnData
tables are easily transformed from/into pandas.DataFrame
objects - see e.g. the AnnData.to_df
method.
Zarr structure and attributes
The structure of Zarr groups is based on the image
specification in NGFF
0.4, with an
additional tables
group and the corresponding subgroups (similar to
labels
):
image.zarr # Zarr group for a NGFF image
|
├── 0 # Zarr array for multiscale level 0
├── ...
├── N # Zarr array for multiscale level N
|
├── labels # Zarr subgroup with a list of labels associated to this image
| ├── label_A # Zarr subgroup for a given label
| ├── label_B # Zarr subgroup for a given label
| └── ...
|
└── tables # Zarr subgroup with a list of tables associated to this image
├── table_1 # Zarr subgroup for a given table
├── table_2 # Zarr subgroup for a given table
└── ...
The Zarr attributes of the tables
group must include the key tables
,
pointing to the list of all tables (this simplifies discovery of tables
associated to the current NGFF image), as in
{
"tables": ["table_1", "table_2"]
}
The Zarr attributes of each specific-table group must include the version of
the table specification (currently version 1), through the
fractal_table_version
attribute. Also note that the anndata
function to
write an AnnData
object into a Zarr group automatically sets additional
attributes. Here is an example of the resulting Zarr attributes:
{
"fractal_table_version": "1",
"encoding-type": "anndata", // Automatically added by anndata 0.11
"encoding-version": "0.1.0", // Automatically added by anndata 0.11
}
ROI tables¶
In fractal-tasks-core
, a ROI table defines regions of space which are
three-dimensional (see also the Outlook section about
dimensionality flexibility) and box-shaped.
Typical use cases are described here.
Zarr attributes
The specification of a ROI table is a subset of the core table
one. Moreover, the table-group Zarr attributes must include the
type
attribute with value roi_table
, as in
{
"fractal_table_version": "1",
"type": "roi_table",
"encoding-type": "anndata",
"encoding-version": "0.1.0",
}
Table columns
The var
attribute
of a given AnnData
object indexes the columns of the table. A
fractal-tasks-core
ROI table must include the following six columns:
x_micrometer
,y_micrometer
,z_micrometer
: the lower bounds of the XYZ intervals defining the ROI, in micrometers;len_x_micrometer
,len_y_micrometer
,len_z_micrometer
: the XYZ edge lengths, in micrometers.
Notes:
- The axes origin for the ROI positions (e.g. for
x_micrometer
) corresponds to the top-left corner of the image (for the YX axes) and to the lowest Z plane.- ROIs are defined in physical coordinates, and they do not store information on the number or size of pixels.
ROI tables may also include other columns, beyond the required ones. Here are
the ones that are typically used in fractal-tasks-core
(see also the Use
cases section):
x_micrometer_original
andy_micrometer_original
, which are a copy ofx_micrometer
andy_micrometer
taken before applying some transformation;translation_x
,translation_y
andtranslation_z
, which are used during registration of multiplexing acquisitions;label
, which is used to link a ROI to a label (either for masking ROI tables or for feature tables).
Masking ROI tables¶
Masking ROI tables are a specific instance of the basic ROI tables described above, where each ROI must also be associated to a specific label of a label image.
Motivation
The motivation for this association is based on the following use case:
- By performing segmentation of a NGFF image, we identify N objects and we store them as a label image (where the value at each pixel correspond to the label index);
- We also compute the three-dimensional bounding box of each segmented object,
and store these bounding boxes into a
masking
ROI table; - For each one of these ROIs, we also include information that link it to both the label image and a specific label index;
- During further processing we can load/modify specific sub-regions of the ROI,
based on information contained in the label image. This kind of operations
are
masked
, as they only act on the array elements that match a certain condition on the label value.
Zarr attributes
For this kind of tables, fractal-tasks-core
closely follows the proposed
NGFF update mentioned above. The
requirements on the Zarr attributes of a given table are:
- Attributes must contain a
type
key, with valuemasking_roi_table
2. - Attributes must contain a
region
key; the corresponding value must be an object with apath
key and a string value (i.e. the path to the data the table is annotating). - Attributes must include a key
instance_key
, which is the key inobs
that denotes which instance inregion
the row corresponds to.
Here is an example of valid Zarr attributes
{
"fractal_table_version": "1",
"type": "masking_roi_table",
"region": { "path": "../labels/label_DAPI" },
"instance_key": "label",
"encoding-type": "anndata",
"encoding-version": "0.1.0",
}
AnnData
table attributes
On top of the required ROI-table colums, the masking-ROI-table AnnData
object
must have an attribute obs
with a key matching to the instance_key
zarr
attribute. For instance if instance_key="label"
then table.obs["label"]
must exist, with its items matching the labels in the image in
"../labels/label_DAPI"
.
Feature tables¶
Motivation
The typical use case for feature tables is to store measurements related to segmented objects, while mantaining a link to the original instances (e.g. labels). Note that the current specification is aligned to the one of masking ROI tables, since they both need to relate a table to a label image, but the two may diverge in the future.
As part of the current fractal-tasks-core
tasks, measurements can be
performed e.g. via regionprops
from scikit-image
, as wrapped in
napari-skimage-regionprops).
Zarr attributes
For this kind of tables, fractal-tasks-core
closely follows the proposed
NGFF update mentioned above. The
requirements on the Zarr attributes of a given table are:
- Attributes must contain a
type
key, with valuefeature_table
2. - Attributes must contain a
region
key; the corresponding value must be an object with apath
key and a string value (i.e. the path to the data the table is annotating). - Attributes must include a key
instance_key
, which is the key inobs
that denotes which instance inregion
the row corresponds to.
Here is an example of valid Zarr attributes
{
"fractal_table_version": "1",
"type": "feature_table",
"region": { "path": "../labels/label_DAPI" },
"instance_key": "label",
"encoding-type": "anndata",
"encoding-version": "0.1.0",
}
AnnData
table attributes
The feature-table AnnData
object must have an attribute obs
with a key
matching to the instance_key
zarr attribute. For instance if
instance_key="label"
then table.obs["label"]
must exist, with its items
matching the labels in the image in "../labels/label_DAPI"
.
Examples¶
Use cases for ROI tables¶
OME-Zarr creation¶
OME-Zarrs created via fractal-tasks-core
(e.g. by parsing Yokogawa images via
the
create_ome_zarr
or
create_ome_zarr_multiplex
tasks) always include two specific ROI tables:
- The table named
well_ROI_table
, which covers the NGFF image corresponding to the whole well1; - The table named
FOV_ROI_table
, which lists all original fields of view (FOVs).
Each one of these two tables includes ROIs that span the whole image size along the Z axis. Note that this differs, e.g., from ROIs which are the bounding boxes of three-dimensional segmented objects, and which may cover only a part of the image Z size.
OME-Zarr import¶
When working with an externally-generated OME-Zarr, one may use the
import_ome_zarr
task
to make it compatible with fractal-tasks-core
. This task optionally adds two
ROI tables to the NGFF images:
- The table named
image_ROI_table
, which covers the whole image; - A table named
grid_ROI_table
, which splits the whole-image ROI into a YX rectangular grid of smaller ROIs. This may correspond to original FOVs (in case the image is a tiled well1), or it may simply be useful for applying downstream processing to smaller arrays and avoid large memory requirements.
As for the case of well_ROI_table
and FOV_ROI_table
described
above, also these two tables include ROIs spanning the
whole image extension along the Z axis.
OME-Zarr processing¶
ROI tables are also used and updated during image processing, e.g as in:
- The FOV ROI table may undergo transformations during processing, e.g. FOV
ROIs may be shifted to avoid overlaps; in this case, we use the optional
columns
x_micrometer_original
andy_micrometer_original
to store the values before the transformation. - The FOV ROI table is also used to store information on the registration of
multiplexing acquisitions, via the
translation_x
,translation_y
andtranslation_z
optional columns. - Several tasks in
fractal-tasks-core
take an existing ROI table as an input and then loop over the ROIs defined in the table. This makes the task more flexible, as it can be used to process e.g. a whole well, a set of FOVs, or a set of custom regions of the array.
Reading/writing tables¶
The anndata
library offers a set of functions for input/output of AnnData
tables, including functions specifically targeting the Zarr format.
Reading a table¶
To read an AnnData
table from a Zarr group, one may use the read_zarr
function.
In the following example a NGFF image was created by stitching together two
field of views, where each one is made of a stack of five Z planes with 1 um
spacing between the planes.
The FOV_ROI_table
has information on the XY position and size of the two
original FOVs (named FOV_1
and FOV_2
):
import anndata as ad
table = ad.read_zarr("/somewhere/image.zarr/tables/FOV_ROI_table")
print(table)
# `AnnData` object with n_obs × n_vars = 2 × 8
print(table.obs_names)
# Index(['FOV_1', 'FOV_2'], dtype='object', name='FieldIndex')
print(table.var_names)
# Index([
# 'x_micrometer',
# 'y_micrometer',
# 'z_micrometer',
# 'len_x_micrometer',
# 'len_y_micrometer',
# 'len_z_micrometer',
# 'x_micrometer_original',
# 'y_micrometer_original'
# ],
# dtype='object')
print(table.X)
# [[ 0. 0. 0. 416. 351. 5. -1448.3 -1517.7]
# [ 416. 0. 0. 416. 351. 5. -1032.3 -1517.7]]
df = table.to_df() # Convert to pandas DataFrame
print(df)
# x_micrometer y_micrometer z_micrometer ... len_z_micrometer x_micrometer_original y_micrometer_original
# FieldIndex ...
# FOV_1 0.0 0.0 0.0 ... 2.0 -1448.300049 -1517.699951
# FOV_2 416.0 0.0 0.0 ... 2.0 -1032.300049 -1517.699951
#
# [2 rows x 8 columns]
In this case, the second FOV (labeled FOV_2
) is defined as the three-dimensional region such that
- X is between 416 and 832 micrometers;
- Y is between 0 and 351 micrometers;
- Z is between 0 and 5 - which means that all the five available Z planes are included.
Writing a table¶
The anndata.experimental.write_elem
function provides the required
functionality to write an AnnData
object to a Zarr group. In
fractal-tasks-core
, the write_table
helper function wraps the anndata
function and includes additional functionalities -- see its
documentation.
With respect to the wrapped anndata
function, the main additional features of write_table
are
- The boolean parameter
overwrite
(defaulting toFalse
), that determines the behavior in case of an already-existing table at the given path. - The
table_attrs
parameter, as a shorthand for updating the Zarr attributes of the table group after its creation.
Here is an example of how to use write_table
:
import numpy as np
import zarr
import anndata as ad
from fractal_tasks_core.tables import write_table
table = ad.AnnData(X=np.ones((10, 10))) # Generate a dummy `AnnData` object
image_group = zarr.open_group("/tmp/image.zarr")
table_name = "MyTable"
table_attrs = {
"type": "feature_table",
"region": {"path": "../labels/MyLabel"},
"instance_key": "label",
}
write_table(
image_group,
table_name,
table,
overwrite=True,
table_attrs=table_attrs,
)
$ tree /tmp/image.zarr/tables/ # View folder structure
/tmp/image.zarr/tables/
└── MyTable
├── layers
├── obs
│ └── _index
│ └── 0
├── obsm
├── obsp
├── uns
├── var
│ └── _index
│ └── 0
├── varm
├── varp
└── X
└── 0.0
12 directories, 3 files
$ cat /tmp/image.zarr/tables/.zattrs # View tables atributes
{
"tables": [
"MyTable"
]
}
$ cat /tmp/image.zarr/tables/MyTable/.zattrs # View single-table attributes
{
"encoding-type": "anndata",
"encoding-version": "0.1.0",
"fractal_table_version": "1",
"instance_key": "label",
"region": {
"path": "../labels/MyLabel"
},
"type": "feature_table"
}
Outlook¶
These specifications may evolve (especially based on the future NGFF updates),
eventually leading to breaking changes in future versions.
fractal-tasks-core
will aim at mantaining backwards-compatibility with V1 for
a reasonable amount of time.
Here is an in-progress list of aspects that may be reviewed:
- We aim at removing the use of hard-coded units from the column names (e.g.
x_micrometer
), in favor of a more general definition of units. - The
z_micrometer
andlen_z_micrometer
columns are currently required in all ROI tables, even when the ROIs actually define a two-dimensional XY region; in that case, we setz_micrometer=0
andlen_z_micrometer
is such that the whole Z size is covered (that is,len_z_micrometer
is the product of the spacing between Z planes and the number of planes). In a future version, we may introduce more flexibility and also accept ROI tables which only include X and Y axes, and adapt the relevant tools so that they automatically expand these ROIs into three-dimensions when appropriate. - Concerning the use of
AnnData
tables or other formats for tabular data, our plan is to follow whatever serialised table specification becomes part of the NGFF standard. For the record, Zarr does not natively support storage of dataframes (see e.g. https://github.com/zarr-developers/numcodecs/issues/452), which is one aspect in favor of sticking with theanndata
library.
-
Within
fractal-tasks-core
, NGFF images represent whole wells; this still complies with the NGFF specifications, as of an approved clarification in the specs. This explains the reason for storing the regions corresponding to the original FOVs in a specific ROI table, since one NGFF image includes a collection of FOVs. Note that this approach does not rely on the assumption that the FOVs constitute a regular tiling of the well, but it also covers the case of irregularly placed FOVs. ↩↩ -
Note that the table types
masking_roi_table
andfeature_table
closely resemble thetype="ngff:region_table"
specification in the previous proposed NGFF table specs. ↩↩