Skip to content

compress_folder

Wrap tar compression command.

This module is used both locally (in the environment where fractal-server is running) and remotely (as a standalon Python module, executed over SSH).

This is a twin-module of extract_archive.py.

The reason for using the tar command via subprocess rather than Python built-in tarfile library has to do with performance issues we observed when handling files which were just created within a SLURM job, and in the context of a CephFS filesystem.

compress_folder(subfolder_path, filelist_path)

Compress e.g. /path/archive into /path/archive.tar.gz

Note that /path/archive.tar.gz may already exist. In this case, it will be overwritten.

Parameters:

Name Type Description Default
subfolder_path Path

Absolute path to the folder to compress.

required
remote_to_local

If True, exclude some files from the tar.gz archive.

required

Returns:

Type Description
str

Absolute path to the tar.gz archive.

Source code in fractal_server/app/runner/compress_folder.py
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
def compress_folder(
    subfolder_path: Path,
    filelist_path: str | None,
) -> str:
    """
    Compress e.g. `/path/archive` into `/path/archive.tar.gz`

    Note that `/path/archive.tar.gz` may already exist. In this case, it will
    be overwritten.

    Args:
        subfolder_path: Absolute path to the folder to compress.
        remote_to_local: If `True`, exclude some files from the tar.gz archive.

    Returns:
        Absolute path to the tar.gz archive.
    """

    logger_name = "compress_folder"
    logger = set_logger(logger_name)

    logger.debug("START")
    logger.debug(f"{subfolder_path=}")
    parent_dir = subfolder_path.parent
    subfolder_name = subfolder_path.name
    tarfile_path = (parent_dir / f"{subfolder_name}.tar.gz").as_posix()
    logger.debug(f"{tarfile_path=}")

    subfolder_path_tmp_copy = (
        subfolder_path.parent / f"{subfolder_path.name}_copy"
    )
    try:
        _copy_subfolder(
            subfolder_path,
            subfolder_path_tmp_copy,
            logger_name=logger_name,
        )
        _create_tar_archive(
            tarfile_path,
            subfolder_path_tmp_copy,
            logger_name=logger_name,
            filelist_path=filelist_path,
        )
        return tarfile_path

    except Exception as e:
        logger.debug(f"ERROR: {e}")
        sys.exit(1)

    finally:
        _remove_temp_subfolder(
            subfolder_path_tmp_copy, logger_name=logger_name
        )