Skip to content

extract_archive

Wrap tar extraction command.

This module is used both locally (in the environment where fractal-server is running) and remotely (as a standalon Python module, executed over SSH).

This is a twin-module of compress_folder.py.

The reason for using the tar command via subprocess rather than Python built-in tarfile library has to do with performance issues we observed when handling files which were just created within a SLURM job, and in the context of a CephFS filesystem.

extract_archive(archive_path)

Extract e.g. /path/archive.tar.gz archive into /path/archive folder

Note that /path/archive may already exist. In this case, files with the same name are overwritten and new files are added.

Parameters:

Name Type Description Default
archive_path Path

Absolute path to the archive file.

required
Source code in fractal_server/app/runner/extract_archive.py
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def extract_archive(archive_path: Path):
    """
    Extract e.g. `/path/archive.tar.gz` archive into `/path/archive` folder

    Note that `/path/archive` may already exist. In this case, files with
    the same name are overwritten and new files are added.

    Arguments:
        archive_path: Absolute path to the archive file.
    """

    logger_name = "extract_archive"
    logger = set_logger(logger_name)

    logger.debug("START")
    logger.debug(f"{archive_path.as_posix()=}")

    # Check archive_path is valid
    if not archive_path.exists():
        sys.exit(f"Missing file {archive_path.as_posix()}.")

    # Prepare subfolder path
    parent_dir = archive_path.parent
    subfolder_name = _remove_suffix(string=archive_path.name, suffix=".tar.gz")
    subfolder_path = parent_dir / subfolder_name
    logger.debug(f"{subfolder_path.as_posix()=}")

    # Create subfolder
    subfolder_path.mkdir(exist_ok=True)

    # Run tar command
    cmd_tar = (
        f"tar -xzvf {archive_path} "
        f"--directory={subfolder_path.as_posix()} "
        "."
    )
    logger.debug(f"{cmd_tar=}")
    run_subprocess(cmd=cmd_tar, logger_name=logger_name)

    logger.debug("END")