Containers

Overview

Containers such as Docker and apptainer address two issues:

  • making pipeline reproducible,
  • we do not have root access in HPC.

Using an existing container

In the file nextflow.config provided in nf-nest we provide an example of container configuration:

head -n 28 experiment_repo/nextflow.config
profiles {
    standard {  
        docker.enabled = true
        process {
            withLabel:containerized {
                container = 'alexandrebouchardcote/default:0.1.6'
            }
        }
    }
    cluster {
        apptainer.enabled = true
        process {
            scratch = true
            executor = 'slurm'
            cpus = 1
            memory = 4.GB
            time = '2h'
            clusterOptions = "--nodes=1 --account $ALLOCATION_CODE"
            withLabel:containerized {
                container = 'alexandrebouchardcote/default:0.1.6'
                module = 'apptainer'
            }
            withLabel:gpu {
                clusterOptions = "--nodes=1 --account ${ALLOCATION_CODE}-gpu --gpus 1"
            }
        }
    }
}

There are two profiles, the default one using Docker, and the cluster one, using apptainer. We define a label for this container, which we call here containerized.

To instruct one of the process in the workflow to use that container, add the directive label 'containerized'.

Here is an example of a process to compile and run a Stan program, where Stan is part of the above container:

process compile_stan {
    label 'containerized' 

    input:
        path stan_file
    output:
        path "${stan_file.baseName}"
        
    """
    # name of stan file without extension
    base_name=`basename $stan_file .stan`

    # need to run Stan's make file from the CMDSTAN dir
    CUR=`pwd`
    cd \$CMDSTAN
    make \$CUR/\$base_name
    """
}

process run_stan {
    label 'containerized' 

    input:
        path stan_exec 
        path data

    """
    ./$stan_exec sample \
        data file=$data \
        output file=samples.csv 

    # Compute ESS from Stan's stansummary utility
    \$CMDSTAN/bin/stansummary samples.csv --csv_filename ess.csv
    """
}

Here is an example workflow using these containerized processes:

include { compile_stan; run_stan; } from "./stan.nf"

def stanModel = file("https://raw.githubusercontent.com/Julia-Tempering/Pigeons.jl/refs/heads/main/examples/stan/bernoulli.stan")
def data = file("https://raw.githubusercontent.com/Julia-Tempering/Pigeons.jl/refs/heads/main/examples/stan/bernoulli.data.json")

workflow {
    compiled = compile_stan(stanModel)
    run_stan(compiled, data)
}

To run it:

cd experiment_repo
./nextflow run nf-nest/examples/stan_example.nf -profile cluster
N E X T F L O W  ~  version 24.10.0
Launching `nf-nest/examples/stan_example.nf` [nice_hopper] DSL2 - revision: 8cef9f29d6
WARN: Apptainer cache directory has not been defined -- Remote image will be stored in the path: /arc/burst/st-alexbou-1/abc/nf-nest-doc/experiment_repo/work/singularity -- Use the environment variable NXF_APPTAINER_CACHEDIR to specify a different location
Pulling Apptainer image docker://alexandrebouchardcote/default:0.1.6 [cache /arc/burst/st-alexbou-1/abc/nf-nest-doc/experiment_repo/work/singularity/alexandrebouchardcote-default-0.1.6.img]
[0e/d322db] Submitted process > compile_stan
[0a/6f5de1] Submitted process > run_stan

Creating containers

There are many reference online for creating containers in general, but much less on creating cross-platform containers working on both x86 and Apple Silicon.

We have created some script to help doing this, see code and instructions at this page.