// includes are relative to the .nf file, should always start with ./ or ../
{ crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include { combine_csvs; } from '../combine.nf'
include
// in contrast, file(..) is relative to `pwd`, use projectDir/
// to make it relative to main .nf file, or moduleDir for the .nf file
def julia_env = file(moduleDir/'julia_env')
def plot_script = file(moduleDir/'plot.jl')
def variables = [
: 1..10,
seed: [10, 20],
n_chains]
{
workflow = instantiate(julia_env) | precompile
compiled_env = crossProduct(variables)
configs = run_julia(compiled_env, configs) | combine_csvs
combined plot(compiled_env, plot_script, combined)
}
{
process run_julia :
input
path julia_env
val config :
output"${filed(config)}"
path """
${activate(julia_env)}
# run your code
using Pigeons
using CSV
pt = pigeons(
target = toy_mvn_target(1000),
n_chains = ${config.n_chains},
seed = ${config.seed})
# organize output as follows:
# - create a directory with name controlled by filed(config)
# to keep track of input configuration
# - put any number of CSV in there
mkdir("${filed(config)}")
CSV.write("${filed(config)}/summary.csv", pt.shared.reports.summary)
CSV.write("${filed(config)}/swap_prs.csv", pt.shared.reports.swap_prs)
"""
}
{
process plot :
input
path julia_env
path plot_script
path combined_csvs_folder :
output'*.png'
path
path combined_csvs_folder"${deliverables(workflow, params)}", mode: 'copy', overwrite: true
publishDir """
${activate(julia_env)}
include("$plot_script")
create_plots("$combined_csvs_folder")
"""
}
Combine outputs
Overview
Now that we know how to run many jobs, the next question is how to combine the output of all these jobs to analyze it.
Example
We will run Pigeons on the cross product formed by calling crossProduct(variables)
with:
def variables = [
: 1..10,
seed: [10, 20],
n_chains]
Suppose we want to create a plot from the output of these 20 Julia processes.
Strategy
Each Julia process will create a folder. Using a function, we will provide an automatic name to this folder encoding the inputs used (seed
and n_chains
). That name is provided by nf-nest
’s filed()
function. In that folder, we will
put csv files.
Then, once all Julia processes are done, another utilities from nf-nest
, combine_csvs
, will merge all CSVs while adding columns for the inputs (here, seed
and n_chains
).
Finally, we will pass the merged CSVs to a plotting process.
Nextflow script
Running the nextflow script
cd experiment_repo
./nextflow run nf-nest/examples/full.nf -profile cluster
N E X T F L O W ~ version 24.10.0
Launching `nf-nest/examples/full.nf` [golden_poitras] DSL2 - revision: a68c131baa
[8c/a117a8] Submitted process > instantiate_process
[5f/997727] Submitted process > combine_workflow:instantiate_process
[72/cc3bb2] Submitted process > precompile
[b9/57c5b2] Submitted process > combine_workflow:precompile
[42/f159a5] Submitted process > run_julia (13)
[bc/98503a] Submitted process > run_julia (10)
[4d/bd3f1f] Submitted process > run_julia (1)
[c6/c5473f] Submitted process > run_julia (7)
[9c/4051ca] Submitted process > run_julia (5)
[c1/2cb9ec] Submitted process > run_julia (11)
[6d/af7d1f] Submitted process > run_julia (16)
[a2/78b5dc] Submitted process > run_julia (8)
[47/65c526] Submitted process > run_julia (14)
[9b/e1998e] Submitted process > run_julia (12)
[65/c7c28f] Submitted process > run_julia (6)
[07/9f3d4c] Submitted process > run_julia (2)
[13/27eb7d] Submitted process > run_julia (3)
[95/88b5e1] Submitted process > run_julia (9)
[33/81cb06] Submitted process > run_julia (19)
[e4/7dc064] Submitted process > run_julia (17)
[50/cb53b6] Submitted process > run_julia (4)
[44/c2c487] Submitted process > run_julia (15)
[b9/05c738] Submitted process > run_julia (20)
[c9/0e2573] Submitted process > run_julia (18)
[cf/4b324f] Submitted process > combine_workflow:combine_process
[ff/ab4541] Submitted process > plot
Accessing the output
Each nextflow process is associated with a unique work directory to ensure the processes do not interfere with each other. Here we cover two ways to quickly access these work directories.
Quick inspection
A quick way to find the output of a nextflow process that we just ran is to use:
cd experiment_repo
nf-nest/nf-open
This lists the work folders for the last nextflow job.
Organizing the output with a publishDir
A better approach is to use the publishDir
directive, combined with nf-nest
’s deliverables()
utility, as illustrated in the run_julia
process above. This will automatically copy the output of the process associated with the directive in a sub-directory of experiment_repo/deliverables
.
cd experiment_repo
tree deliverables
deliverables
└── scriptName=full.nf
├── output
│ ├── summary.csv
│ └── swap_prs.csv
├── plot.png
└── runName.txt
2 directories, 4 files
Here the contents of runName.txt
can be used with nextflow’s log
command to obtain more information on the run.
cat deliverables/scriptName=full.nf/runName.txt
golden_poitras
./nextflow log
TIMESTAMP DURATION RUN NAME STATUS REVISION ID SESSION ID COMMAND
2024-11-12 12:05:21 5.8s angry_northcutt OK 9d1a692a7e 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/hello.nf
2024-11-12 12:05:33 8.6s wise_goldberg OK 9d1a692a7e 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/hello.nf -profile cluster
2024-11-12 12:06:36 1m 20s ridiculous_volta OK fc0374e695 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/pkg.nf -profile cluster
2024-11-12 12:08:22 1m 39s elegant_fermi OK aa082b1978 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/many_jobs.nf -profile cluster
2024-11-12 12:10:07 6.5s tiny_elion OK d9de661ecc 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/filter.nf
2024-11-12 12:10:24 2m 19s nice_hopper OK 8cef9f29d6 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/stan_example.nf -profile cluster
2024-11-12 12:13:41 34.7s clever_neumann OK 713b74ac4a 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/pkg_gpu.nf
2024-11-12 12:14:22 1m 50s voluminous_coulomb OK 9be41cea49 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/gpu.nf -profile cluster
2024-11-12 12:16:23 6m 26s golden_poitras OK a68c131baa 2e019c7d-7c4e-42e5-b142-1dd6770fcb61 nextflow run nf-nest/examples/full.nf -profile cluster
And we can see in the CSV that indeed the columns seed
and n_chains
were added to the left:
head -n 2 deliverables/scriptName=full.nf/output/summary.csv
seed,n_chains,round,n_scans,n_tempered_restarts,global_barrier,global_barrier_variational,last_round_max_time,last_round_max_allocation,stepping_stone
10,10,1,2,,8.998207738433418,,0.000288568,13536.0,-1173.429270641805