Intro

Context

Large scale numerical experiments are central to much of contemporary scientific and mathematical research. Performing these numerical experiments in a valid, reproducible and scalable fashion is not easy. Even a small typical project may need to run 1000s of executions, and 10k+ is not uncommon. It is crucial to have good tools to coordinate and organize these experiments.

nf-nest

This webpage documents nf-nest, a collection of small but powerful utilities built on nextflow to help accomplish this. Link to github repository..

Aspects taken into account:

  • Automating cross-product of input parameters in experiments.
  • Automating creation of submission scripts and ordering of jobs (taking care of moving across file system, dynamic memory requirements, etc).
  • Automating the gathering of results from many runs.
  • Caching already ran jobs.
  • Robustness to failure.
  • Reproducibility via apptainer and docker, supporting both x86 and apple silicon.
  • Support for GPU programming.

Technology stack

nf-nest uses the following open source projects:

  • Nextflow: can be thought of as an “operating system” for coordinating numerical experiments.
  • Julia: a programming language to unlock full access to high performance computation on both CPUs and GPUs.

While some features of nf-nest are Julia specific, other parts are language agnostic.

Background

Scientific workflow

A scientific workflow is a directed acyclic graph where each node is a process and each edge between node \(n\) to \(n'\) denote that at least one output of process \(n\) is fed as an input to process \(n'\).

Here is an example from a workflow covered later in this tutorial:

flowchart TB
    subgraph " "
    v0["julia_env"]
    v1["toml"]
    v4["Channel.fromList"]
    v7["julia_env"]
    v8["toml"]
    v12["plot_script"]
    end
    v2([instantiate_process])
    v3([precompile])
    v5([run_julia])
    subgraph combine_workflow
    v9([instantiate_process])
    v10([precompile])
    v11([combine_process])
    end
    v13([plot])
    subgraph " "
    v14["combined_csvs_folder"]
    v15[" "]
    end
    v6(( ))
    v0 --> v2
    v1 --> v2
    v2 --> v3
    v3 --> v5
    v3 --> v13
    v4 --> v5
    v5 --> v6
    v7 --> v9
    v8 --> v9
    v9 --> v10
    v10 --> v11
    v6 --> v11
    v11 --> v13
    v12 --> v13
    v13 --> v15
    v13 --> v14

In a nutshell, nextflow will submit one or several SLURM job for each node in this graph, gather results, and produce some nice reports.

More information

Both nextflow and Julia have excellent and extensive documentation.

See also this nextflow tutorial.