Skip to content

compress_folder

Wrap tar compression command.

This module is used both locally (in the environment where fractal-server is running) and remotely (as a standalon Python module, executed over SSH).

This is a twin-module of extract_archive.py.

The reason for using the tar command via subprocess rather than Python built-in tarfile library has to do with performance issues we observed when handling files which were just created within a SLURM job, and in the context of a CephFS filesystem.

compress_folder(subfolder_path, remote_to_local=False)

Compress e.g. /path/archive into /path/archive.tar.gz

Note that /path/archive.tar.gz may already exist. In this case, it will be overwritten.

Parameters:

Name Type Description Default
subfolder_path Path

Absolute path to the folder to compress.

required
remote_to_local bool

If True, exclude some files from the tar.gz archive.

False

Returns:

Type Description
str

Absolute path to the tar.gz archive.

Source code in fractal_server/app/runner/compress_folder.py
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
def compress_folder(
    subfolder_path: Path, remote_to_local: bool = False
) -> str:
    """
    Compress e.g. `/path/archive` into `/path/archive.tar.gz`

    Note that `/path/archive.tar.gz` may already exist. In this case, it will
    be overwritten.

    Args:
        subfolder_path: Absolute path to the folder to compress.
        remote_to_local: If `True`, exclude some files from the tar.gz archive.

    Returns:
        Absolute path to the tar.gz archive.
    """

    logger_name = "compress_folder"
    logger = set_logger(logger_name)

    logger.debug("START")
    logger.debug(f"{subfolder_path=}")
    parent_dir = subfolder_path.parent
    subfolder_name = subfolder_path.name
    tarfile_path = (parent_dir / f"{subfolder_name}.tar.gz").as_posix()
    logger.debug(f"{tarfile_path=}")

    subfolder_path_tmp_copy = (
        subfolder_path.parent / f"{subfolder_path.name}_copy"
    )
    try:
        copy_subfolder(
            subfolder_path, subfolder_path_tmp_copy, logger_name=logger_name
        )
        create_tar_archive(
            tarfile_path,
            subfolder_path_tmp_copy,
            logger_name=logger_name,
            remote_to_local=remote_to_local,
        )
        return tarfile_path

    except Exception as e:
        logger.debug(f"ERROR: {e}")
        sys.exit(1)

    finally:
        remove_temp_subfolder(subfolder_path_tmp_copy, logger_name=logger_name)