Misc

Process parameters

To declare a command line argument, say n_rounds to your nextflow script, with default argument 5, use:

params.n_rounds = 5

Here is an example:

include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include { combine_csvs; } from '../combine.nf'

def julia_env = file(moduleDir/'julia_env')
def plot_script = file(moduleDir/'plot.jl')

params.n_rounds = 5

def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]

workflow {
    compiled_env = instantiate(julia_env) | precompile
    configs = crossProduct(variables)
    combined = run_julia(compiled_env, configs) | combine_csvs
    plot(compiled_env, plot_script, combined)
}

process run_julia {
    input:
        path julia_env 
        val config 
    output:
        path "${filed(config)}"
    """
    ${activate(julia_env)}

    # run your code
    using Pigeons 
    using CSV 
    pt = pigeons(
            target = toy_mvn_target(1000), 
            n_chains = ${config.n_chains}, 
            seed = ${config.seed},
            n_rounds = ${params.n_rounds})

    mkdir("${filed(config)}")
    CSV.write("${filed(config)}/summary.csv", pt.shared.reports.summary)
    CSV.write("${filed(config)}/swap_prs.csv", pt.shared.reports.swap_prs)
    """
}

process plot {
    input:
        path julia_env 
        path plot_script
        path combined_csvs_folder 
    output:
        path '*.png'
        path combined_csvs_folder
    publishDir "${deliverables(workflow, params)}", mode: 'copy', overwrite: true
    """
    ${activate(julia_env)}

    include("$plot_script")
    create_plots("$combined_csvs_folder")
    """
}

To run it, notice that the arguments specified by params.my_arg should be specified using --my_arg value (in contrast, nextflow’s argument use a single dash, as in -profile cluster):

cd experiment_repo
./nextflow run nf-nest/examples/params.nf -profile cluster --n_rounds 6

N E X T F L O W  ~  version 24.10.0
Launching `nf-nest/examples/params.nf` [extravagant_swirles] DSL2 - revision: 093b673fc8
[8c/a117a8] Cached process > instantiate_process
[5f/997727] Cached process > combine_workflow:instantiate_process
[72/cc3bb2] Cached process > precompile
[b9/57c5b2] Cached process > combine_workflow:precompile
[72/2cc9e6] Submitted process > run_julia (12)
[cb/0f4ef3] Submitted process > run_julia (9)
[7b/b152ab] Submitted process > run_julia (3)
[e1/afa275] Submitted process > run_julia (8)
[0f/43c83a] Submitted process > run_julia (15)
[1e/01aeed] Submitted process > run_julia (14)
[a0/e7e0ae] Submitted process > run_julia (4)
[7d/d461f8] Submitted process > run_julia (2)
[88/d0a075] Submitted process > run_julia (1)
[a6/5caec5] Submitted process > run_julia (10)
[20/581486] Submitted process > run_julia (13)
[86/f2ad4d] Submitted process > run_julia (11)
[42/6b0ba5] Submitted process > run_julia (5)
[88/275b85] Submitted process > run_julia (6)
[d6/1cfa52] Submitted process > run_julia (16)
[1e/635557] Submitted process > run_julia (7)
[e0/12866b] Submitted process > run_julia (17)
[0e/b48681] Submitted process > run_julia (19)
[09/a9a9e5] Submitted process > run_julia (20)
[dd/afd459] Submitted process > run_julia (18)
[82/3f2da5] Submitted process > combine_workflow:combine_process
[bd/575d2c] Submitted process > plot

Notice that the function deliverables(workflow, params) takes it into account so that the deliverables directory is organized correctly:

tree experiment_repo/deliverables

experiment_repo/deliverables
├── scriptName=full.nf
│   ├── output
│   │   ├── summary.csv
│   │   └── swap_prs.csv
│   ├── plot.png
│   └── runName.txt
└── scriptName=params.nf___n_rounds=6
    ├── output
    │   ├── summary.csv
    │   └── swap_prs.csv
    ├── plot.png
    └── runName.txt

4 directories, 8 files

Dry runs

A useful application of process parameter is a “dry run switch” for doing a quick version of the pipeline to help quickly debugging.

Here is an example below. Notice that we pass the dry run option to crossProduct(); instead of emitting all values in the cross product, it will only emit one:

include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'

def julia_env = file(moduleDir/'julia_env')

params.dryRun = false
params.n_rounds = params.dryRun ? 1 : 5

def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]

workflow {
    compiled_env = instantiate(julia_env) | precompile
    configs = crossProduct(variables, params.dryRun)
    run_julia(compiled_env, configs) 
}

process run_julia {
    input:
        path julia_env 
        val config 
    """
    ${activate(julia_env)}

    # run your code
    using Pigeons 
    using CSV 
    pt = pigeons(
            target = toy_mvn_target(1000), 
            n_chains = ${config.n_chains}, 
            seed = ${config.seed},
            n_rounds = ${params.n_rounds})
    """
}

To run it in dry run model:

cd experiment_repo
./nextflow run nf-nest/examples/dry_run.nf -profile cluster --dryRun

N E X T F L O W  ~  version 24.10.0
Launching `nf-nest/examples/dry_run.nf` [boring_borg] DSL2 - revision: c7304bca14
[8c/a117a8] Cached process > instantiate_process
[72/cc3bb2] Cached process > precompile
[84/b007e6] Submitted process > run_julia (1)

What to commit?

Standard git guidelines suggest to never commit “derived files”. We recommend to deviate slightly from git conventions and commit a bit more than just strict minimum:

Manifest.toml is derived from Project.toml, but commit it, as it contains precise version information needed for reproducibility (in contrast to package developers, who would only commit Project.toml, but numerical experiments is not the same as a package!)
Once there are experiments you plan to include in a paper, commit the corresponding sub-folder of deliverables, including the CSV files used to produce that figure. This way the tex repo can just use the experiment repo as a submodule and the authors can compile the paper right away. Having the CSV there also mean that plot esthetics can be quickly tweaked later on (e.g., the night before a talk).

Updating code

Numerical experiment are often based on code you are developing along the way. When the code is updated, with a bit of organization, nextflow can figure out which subset of the workflow needs to be re-run. We present two models for doing this: one for lightweight code such as plotting/analysis, and one for more substantial code, e.g. a method you are developing.

Lightweight code

Include the .jl file in the nextflow repo, feed it to the node as input, and use a Julia include() on it. We have already used that pattern for the plot node in an earlier example.

If you have several Julia files, put them in a directory, and use the syntax:

include { activate; } from "../pkg.nf"

def julia_files = file(moduleDir/"julia/*.jl")

workflow {
    run_julia(julia_files)
}

process run_julia {
    debug true
    input:
        file julia_files
    """
    ${activate()}
    include("a.jl")
    include("b.jl")
    """
}

cd experiment_repo
./nextflow run nf-nest/examples/includes.nf

N E X T F L O W  ~  version 24.10.0
Launching `nf-nest/examples/includes.nf` [spontaneous_lorenz] DSL2 - revision: f263a35a46
[47/6d05cf] Submitted process > run_julia
hello
world

Warning

Passing a directory as input (rather than a collection of files as done above), is not ideal in this context. This is because nextflow does not currently recurse inside the directory to compute the checksum used to determine if the cache can be used. Recall that in unix, the change date of a directory only changes when a file is deleted or added under it, and not when a file under it is edited!

Library

If the code you include is more complex, and/or might be used outside of the context of one nextflow script, it is better to package it.

In Julia, creating a package is very simple and it can be published right away and for free on github: see this tutorial.

For example, we wrote a small Julia package, CombineCSV.jl to perform the CSV combination in this earlier section.

To add or update, you can use a script of that form:

ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0
using Pkg 
Pkg.activate("julia_env")
Pkg.add(url = "https://github.com/UBC-Stat-ML/CombineCSVs")

where you would replace the URL by the git repo you are using. Note that add will also update to the head of the main branch.

This updates the “Manifest.toml” file which in turns signal our “pkg.jl” instantiate utility that it needs to be reran by nextflow:

params.nPrecompileThreads = 10

params.julia_env = 'julia_env'
julia_env = file(params.julia_env)
julia_env.mkdir()

// Can be used as standalone, but typically used inside a user nf file
workflow  {
    instantiate(julia_env) | precompile
}

def instantiate(julia_env) { instantiate_process(julia_env, file(julia_env/"Manifest.toml"))}

process instantiate_process {
    executor 'local' // we need internet access
    scratch false // we want changes in Manifest.toml to be saved
    input: 
        path julia_env
        path toml // needed for correct cache behaviour under updates
    output:
        path julia_env

    """
    ${activate(julia_env)}

    ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0
    using Pkg
    Pkg.instantiate()
    """
}

process precompile {
    input:
        path julia_env
    output:
        path julia_env
    cpus params.nPrecompileThreads 
    memory 15.GB
    """
    ${activate(julia_env, params.nPrecompileThreads)}

    using Pkg 
    Pkg.offline(true) 
    Pkg.precompile()
    """
}

// Start Julia and with the provided environment and optionally, number of threads (1 by default) 
// Needs to be the very first line of the process script
def activate(julia_env, nThreads = 1) {
    return "#!/usr/bin/env julia --threads=${nThreads} --project=$julia_env"
}

def activate() {
    return "#!/usr/bin/env julia --threads=1"
}

Report

Following the nextflow documentation, we have set nextflow.config so that report.html, timeline.html and dag.html are automatically created.

To preview them in VS Code, add a VS Code Extension allowing html preview, for example Live Server. Then right click on the html file and select Show Preview.