{ crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include { combine_csvs; } from '../combine.nf'
include
def julia_env = file(moduleDir/'julia_env')
def plot_script = file(moduleDir/'plot.jl')
.n_rounds = 5
params
def variables = [
: 1..10,
seed: [10, 20],
n_chains]
{
workflow = instantiate(julia_env) | precompile
compiled_env = crossProduct(variables)
configs = run_julia(compiled_env, configs) | combine_csvs
combined plot(compiled_env, plot_script, combined)
}
{
process run_julia :
input
path julia_env
val config :
output"${filed(config)}"
path """
${activate(julia_env)}
# run your code
using Pigeons
using CSV
pt = pigeons(
target = toy_mvn_target(1000),
n_chains = ${config.n_chains},
seed = ${config.seed},
n_rounds = ${params.n_rounds})
mkdir("${filed(config)}")
CSV.write("${filed(config)}/summary.csv", pt.shared.reports.summary)
CSV.write("${filed(config)}/swap_prs.csv", pt.shared.reports.swap_prs)
"""
}
{
process plot :
input
path julia_env
path plot_script
path combined_csvs_folder :
output'*.png'
path
path combined_csvs_folder"${deliverables(workflow, params)}", mode: 'copy', overwrite: true
publishDir """
${activate(julia_env)}
include("$plot_script")
create_plots("$combined_csvs_folder")
"""
}
Misc
Process parameters
To declare a command line argument, say n_rounds
to your nextflow script, with default argument 5
, use:
.n_rounds = 5 params
Here is an example:
To run it, notice that the arguments specified by params.my_arg
should be specified using --my_arg value
(in contrast, nextflow’s argument use a single dash, as in -profile cluster
):
cd experiment_repo
./nextflow run nf-nest/examples/params.nf -profile cluster --n_rounds 6
N E X T F L O W ~ version 24.10.0
Launching `nf-nest/examples/params.nf` [extravagant_swirles] DSL2 - revision: 093b673fc8
[8c/a117a8] Cached process > instantiate_process
[5f/997727] Cached process > combine_workflow:instantiate_process
[72/cc3bb2] Cached process > precompile
[b9/57c5b2] Cached process > combine_workflow:precompile
[72/2cc9e6] Submitted process > run_julia (12)
[cb/0f4ef3] Submitted process > run_julia (9)
[7b/b152ab] Submitted process > run_julia (3)
[e1/afa275] Submitted process > run_julia (8)
[0f/43c83a] Submitted process > run_julia (15)
[1e/01aeed] Submitted process > run_julia (14)
[a0/e7e0ae] Submitted process > run_julia (4)
[7d/d461f8] Submitted process > run_julia (2)
[88/d0a075] Submitted process > run_julia (1)
[a6/5caec5] Submitted process > run_julia (10)
[20/581486] Submitted process > run_julia (13)
[86/f2ad4d] Submitted process > run_julia (11)
[42/6b0ba5] Submitted process > run_julia (5)
[88/275b85] Submitted process > run_julia (6)
[d6/1cfa52] Submitted process > run_julia (16)
[1e/635557] Submitted process > run_julia (7)
[e0/12866b] Submitted process > run_julia (17)
[0e/b48681] Submitted process > run_julia (19)
[09/a9a9e5] Submitted process > run_julia (20)
[dd/afd459] Submitted process > run_julia (18)
[82/3f2da5] Submitted process > combine_workflow:combine_process
[bd/575d2c] Submitted process > plot
Notice that the function deliverables(workflow, params)
takes it into account so that the deliverables directory is organized correctly:
tree experiment_repo/deliverables
experiment_repo/deliverables
├── scriptName=full.nf
│ ├── output
│ │ ├── summary.csv
│ │ └── swap_prs.csv
│ ├── plot.png
│ └── runName.txt
└── scriptName=params.nf___n_rounds=6
├── output
│ ├── summary.csv
│ └── swap_prs.csv
├── plot.png
└── runName.txt
4 directories, 8 files
Dry runs
A useful application of process parameter is a “dry run switch” for doing a quick version of the pipeline to help quickly debugging.
Here is an example below. Notice that we pass the dry run option to crossProduct()
; instead of emitting all values in the cross product, it will only emit one:
{ crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include
def julia_env = file(moduleDir/'julia_env')
.dryRun = false
params.n_rounds = params.dryRun ? 1 : 5
params
def variables = [
: 1..10,
seed: [10, 20],
n_chains]
{
workflow = instantiate(julia_env) | precompile
compiled_env = crossProduct(variables, params.dryRun)
configs run_julia(compiled_env, configs)
}
{
process run_julia :
input
path julia_env
val config """
${activate(julia_env)}
# run your code
using Pigeons
using CSV
pt = pigeons(
target = toy_mvn_target(1000),
n_chains = ${config.n_chains},
seed = ${config.seed},
n_rounds = ${params.n_rounds})
"""
}
To run it in dry run model:
cd experiment_repo
./nextflow run nf-nest/examples/dry_run.nf -profile cluster --dryRun
N E X T F L O W ~ version 24.10.0
Launching `nf-nest/examples/dry_run.nf` [boring_borg] DSL2 - revision: c7304bca14
[8c/a117a8] Cached process > instantiate_process
[72/cc3bb2] Cached process > precompile
[84/b007e6] Submitted process > run_julia (1)
What to commit?
Standard git guidelines suggest to never commit “derived files”. We recommend to deviate slightly from git conventions and commit a bit more than just strict minimum:
Manifest.toml
is derived fromProject.toml
, but commit it, as it contains precise version information needed for reproducibility (in contrast to package developers, who would only commitProject.toml
, but numerical experiments is not the same as a package!)- Once there are experiments you plan to include in a paper, commit the corresponding sub-folder of
deliverables
, including the CSV files used to produce that figure. This way the tex repo can just use the experiment repo as a submodule and the authors can compile the paper right away. Having the CSV there also mean that plot esthetics can be quickly tweaked later on (e.g., the night before a talk).
Updating code
Numerical experiment are often based on code you are developing along the way. When the code is updated, with a bit of organization, nextflow can figure out which subset of the workflow needs to be re-run. We present two models for doing this: one for lightweight code such as plotting/analysis, and one for more substantial code, e.g. a method you are developing.
Lightweight code
Include the .jl
file in the nextflow repo, feed it to the node as input, and use a Julia include()
on it. We have already used that pattern for the plot
node in an earlier example.
If you have several Julia files, put them in a directory, and use the syntax:
{ activate; } from "../pkg.nf"
include
def julia_files = file(moduleDir/"julia/*.jl")
{
workflow run_julia(julia_files)
}
{
process run_julia true
debug :
input
file julia_files"""
${activate()}
include("a.jl")
include("b.jl")
"""
}
cd experiment_repo
./nextflow run nf-nest/examples/includes.nf
N E X T F L O W ~ version 24.10.0
Launching `nf-nest/examples/includes.nf` [spontaneous_lorenz] DSL2 - revision: f263a35a46
[47/6d05cf] Submitted process > run_julia
hello
world
Passing a directory as input (rather than a collection of files as done above), is not ideal in this context. This is because nextflow does not currently recurse inside the directory to compute the checksum used to determine if the cache can be used. Recall that in unix, the change date of a directory only changes when a file is deleted or added under it, and not when a file under it is edited!
Library
If the code you include is more complex, and/or might be used outside of the context of one nextflow script, it is better to package it.
In Julia, creating a package is very simple and it can be published right away and for free on github: see this tutorial.
For example, we wrote a small Julia package, CombineCSV.jl to perform the CSV combination in this earlier section.
To add or update, you can use a script of that form:
ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0
using Pkg
Pkg.activate("julia_env")
Pkg.add(url = "https://github.com/UBC-Stat-ML/CombineCSVs")
where you would replace the URL by the git repo you are using. Note that add
will also update to the head of the main branch.
This updates the “Manifest.toml” file which in turns signal our “pkg.jl” instantiate
utility that it needs to be reran by nextflow:
.nPrecompileThreads = 10
params
.julia_env = 'julia_env'
params= file(params.julia_env)
julia_env .mkdir()
julia_env
// Can be used as standalone, but typically used inside a user nf file
{
workflow instantiate(julia_env) | precompile
}
def instantiate(julia_env) { instantiate_process(julia_env, file(julia_env/"Manifest.toml"))}
{
process instantiate_process 'local' // we need internet access
executor false // we want changes in Manifest.toml to be saved
scratch :
input
path julia_env// needed for correct cache behaviour under updates
path toml :
output
path julia_env
"""
${activate(julia_env)}
ENV["JULIA_PKG_PRECOMPILE_AUTO"]=0
using Pkg
Pkg.instantiate()
"""
}
{
process precompile :
input
path julia_env:
output
path julia_env.nPrecompileThreads
cpus params15.GB
memory """
${activate(julia_env, params.nPrecompileThreads)}
using Pkg
Pkg.offline(true)
Pkg.precompile()
"""
}
// Start Julia and with the provided environment and optionally, number of threads (1 by default)
// Needs to be the very first line of the process script
def activate(julia_env, nThreads = 1) {
return "#!/usr/bin/env julia --threads=${nThreads} --project=$julia_env"
}
def activate() {
return "#!/usr/bin/env julia --threads=1"
}
Report
Following the nextflow documentation, we have set nextflow.config
so that report.html
, timeline.html
and dag.html
are automatically created.
To preview them in VS Code, add a VS Code Extension allowing html preview, for example Live Server
. Then right click on the html file and select Show Preview
.