include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include { combine_csvs; } from '../combine.nf'
def julia_env = file(moduleDir/'julia_env')
def plot_script = file(moduleDir/'plot.jl')
params.n_rounds = 5
def variables = [
seed: 1..10,
n_chains: [10, 20],
]
workflow {
compiled_env = instantiate(julia_env) | precompile
configs = crossProduct(variables)
combined = run_julia(compiled_env, configs) | combine_csvs
plot(compiled_env, plot_script, combined)
}
process run_julia {
input:
path julia_env
val config
output:
path "${filed(config)}"
"""
${activate(julia_env)}
# run your code
using Pigeons
using CSV
pt = pigeons(
target = toy_mvn_target(1000),
n_chains = ${config.n_chains},
seed = ${config.seed},
n_rounds = ${params.n_rounds})
mkdir("${filed(config)}")
CSV.write("${filed(config)}/summary.csv", pt.shared.reports.summary)
CSV.write("${filed(config)}/swap_prs.csv", pt.shared.reports.swap_prs)
"""
}
process plot {
input:
path julia_env
path plot_script
path combined_csvs_folder
output:
path '*.png'
path combined_csvs_folder
publishDir "${deliverables(workflow, params)}", mode: 'copy', overwrite: true
"""
${activate(julia_env)}
include("$plot_script")
create_plots("$combined_csvs_folder")
"""
}Misc
Process parameters
To declare a command line argument, say n_rounds to your nextflow script, with default argument 5, use:
params.n_rounds = 5Here is an example:
To run it, notice that the arguments specified by params.my_arg should be specified using --my_arg value (in contrast, nextflow’s argument use a single dash, as in -profile cluster):
cd experiment_repo
./nextflow run nf-nest/examples/params.nf -profile cluster --n_rounds 6N E X T F L O W ~ version 25.04.6
Launching `nf-nest/examples/params.nf` [deadly_pasteur] DSL2 - revision: 093b673fc8
[a1/01de97] Cached process > combine_workflow:instantiate_process
[7f/c48687] Cached process > instantiate_process
[18/b3eff4] Cached process > precompile
[62/e5f07c] Cached process > combine_workflow:precompile
[02/d406f5] Submitted process > run_julia (12)
[94/86641f] Submitted process > run_julia (2)
[84/9166ff] Submitted process > run_julia (10)
[73/859689] Submitted process > run_julia (5)
[ee/dea556] Submitted process > run_julia (13)
[2a/fce377] Submitted process > run_julia (8)
[61/a98f2b] Submitted process > run_julia (4)
[ff/7b1a68] Submitted process > run_julia (11)
[5b/d6f5cb] Submitted process > run_julia (1)
[ae/00a054] Submitted process > run_julia (7)
[04/d3b4f3] Submitted process > run_julia (3)
[2d/00705f] Submitted process > run_julia (9)
[e3/de0446] Submitted process > run_julia (6)
[43/eda189] Submitted process > run_julia (14)
[1c/474835] Submitted process > run_julia (15)
[81/49b08f] Submitted process > run_julia (16)
[c2/f05ac8] Submitted process > run_julia (17)
[6c/2f23bd] Submitted process > run_julia (18)
[11/5a5ed4] Submitted process > run_julia (19)
[f7/5c018b] Submitted process > run_julia (20)
[0e/5886b4] Submitted process > combine_workflow:combine_process
[a6/000d86] Submitted process > plot
Notice that the function deliverables(workflow, params) takes it into account so that the deliverables directory is organized correctly:
tree experiment_repo/deliverablesexperiment_repo/deliverables
├── scriptName=full.nf
│ ├── output
│ │ ├── summary.csv
│ │ └── swap_prs.csv
│ ├── plot.png
│ └── runName.txt
└── scriptName=params.nf___n_rounds=6
├── output
│ ├── summary.csv
│ └── swap_prs.csv
├── plot.png
└── runName.txt
4 directories, 8 files
Dry runs
A useful application of process parameter is a “dry run switch” for doing a quick version of the pipeline to help quickly debugging.
Here is an example below. Notice that we pass the dry run option to crossProduct(); instead of emitting all values in the cross product, it will only emit one:
include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
def julia_env = file(moduleDir/'julia_env')
params.dryRun = false
params.n_rounds = params.dryRun ? 1 : 5
def variables = [
seed: 1..10,
n_chains: [10, 20],
]
workflow {
compiled_env = instantiate(julia_env) | precompile
configs = crossProduct(variables, params.dryRun)
run_julia(compiled_env, configs)
}
process run_julia {
input:
path julia_env
val config
"""
${activate(julia_env)}
# run your code
using Pigeons
using CSV
pt = pigeons(
target = toy_mvn_target(1000),
n_chains = ${config.n_chains},
seed = ${config.seed},
n_rounds = ${params.n_rounds})
"""
}To run it in dry run model:
cd experiment_repo
./nextflow run nf-nest/examples/dry_run.nf -profile cluster --dryRunN E X T F L O W ~ version 25.04.6
Launching `nf-nest/examples/dry_run.nf` [spontaneous_colden] DSL2 - revision: c7304bca14
[7f/c48687] Cached process > instantiate_process
[18/b3eff4] Cached process > precompile
[a3/b887a1] Submitted process > run_julia (1)
What to commit?
Standard git guidelines suggest to never commit “derived files”. We recommend to deviate slightly from git conventions and commit a bit more than just strict minimum:
Manifest.tomlis derived fromProject.toml, but commit it, as it contains precise version information needed for reproducibility (in contrast to package developers, who would only commitProject.toml, but numerical experiments is not the same as a package!)- Once there are experiments you plan to include in a paper, commit the corresponding sub-folder of
deliverables, including the CSV files used to produce that figure. This way the tex repo can just use the experiment repo as a submodule and the authors can compile the paper right away. Having the CSV there also mean that plot esthetics can be quickly tweaked later on (e.g., the night before a talk).
Updating code
Numerical experiment are often based on code you are developing along the way. When the code is updated, with a bit of organization, nextflow can figure out which subset of the workflow needs to be re-run. We present two models for doing this: one for lightweight code such as plotting/analysis, and one for more substantial code, e.g. a method you are developing.
Lightweight code
Include the .jl file in the nextflow repo, feed it to the node as input, and use a Julia include() on it. We have already used that pattern for the plot node in an earlier example.
If you have several Julia files, put them in a directory, and use the syntax:
include { activate; } from "../pkg.nf"
def julia_files = file(moduleDir/"julia/*.jl")
workflow {
run_julia(julia_files)
}
process run_julia {
debug true
input:
file julia_files
"""
${activate()}
include("a.jl")
include("b.jl")
"""
}cd experiment_repo
./nextflow run nf-nest/examples/includes.nf N E X T F L O W ~ version 25.04.6
Launching `nf-nest/examples/includes.nf` [loquacious_edison] DSL2 - revision: f263a35a46
[34/6a94ea] Submitted process > run_julia
hello
world
Passing a directory as input (rather than a collection of files as done above), is not ideal in this context. This is because nextflow does not currently recurse inside the directory to compute the checksum used to determine if the cache can be used. Recall that in unix, the change date of a directory only changes when a file is deleted or added under it, and not when a file under it is edited!
Library
If the code you include is more complex, and/or might be used outside of the context of one nextflow script, it is better to package it.
In Julia, creating a package is very simple and it can be published right away and for free on github: see this tutorial.
For example, we wrote a small Julia package, CombineCSV.jl to perform the CSV combination in this earlier section.
To add or update, you can use a script of that form:
ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0
using Pkg
Pkg.activate("julia_env")
Pkg.add(url = "https://github.com/UBC-Stat-ML/CombineCSVs")where you would replace the URL by the git repo you are using. Note that add will also update to the head of the main branch.
This updates the “Manifest.toml” file which in turns signal our “pkg.jl” instantiate utility that it needs to be reran by nextflow:
params.nPrecompileThreads = 10
params.julia_env = 'julia_env'
julia_env = file(params.julia_env)
julia_env.mkdir()
// Can be used as standalone, but typically used inside a user nf file
workflow {
instantiate(julia_env) | precompile
}
def instantiate(julia_env) { instantiate_process(julia_env, file(julia_env/"Manifest.toml"))}
process instantiate_process {
executor 'local' // we need internet access
scratch false // we want changes in Manifest.toml to be saved
input:
path julia_env
path toml // needed for correct cache behaviour under updates
output:
path julia_env
"""
${activate(julia_env)}
ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0
using Pkg
Pkg.instantiate()
"""
}
process precompile {
input:
path julia_env
output:
path julia_env
cpus params.nPrecompileThreads
memory 15.GB
"""
${activate(julia_env, params.nPrecompileThreads)}
using Pkg
Pkg.offline(true)
Pkg.precompile()
"""
}
// Start Julia and with the provided environment and optionally, number of threads (1 by default)
// Needs to be the very first line of the process script
def activate(julia_env, nThreads = 1) {
return "#!/usr/bin/env julia --threads=${nThreads} --project=$julia_env"
}
def activate() {
return "#!/usr/bin/env julia --threads=1"
}Report
Following the nextflow documentation, we have set nextflow.config so that report.html, timeline.html and dag.html are automatically created.
To preview them in VS Code, add a VS Code Extension allowing html preview, for example Live Server. Then right click on the html file and select Show Preview.