Combine outputs

Overview

Now that we know how to run many jobs, the next question is how to combine the output of all these jobs to analyze it.

Example

We will run Pigeons on the cross product formed by calling crossProduct(variables) with:

def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]

Suppose we want to create a plot from the output of these 20 Julia processes.

Strategy

Each Julia process will create a folder. Using a function, we will provide an automatic name to this folder encoding the inputs used (seed and n_chains). That name is provided by nf-nest’s filed() function. In that folder, we will
put csv files.

Then, once all Julia processes are done, another utilities from nf-nest, combine_csvs, will merge all CSVs while adding columns for the inputs (here, seed and n_chains).

Finally, we will pass the merged CSVs to a plotting process.

Nextflow script

// includes are relative to the .nf file, should always start with ./ or ../
include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include { combine_csvs; } from '../combine.nf'

// in contrast, file(..) is relative to `pwd`, use projectDir/ 
//   to make it relative to main .nf file, or moduleDir for the .nf file
def julia_env = file(moduleDir/'julia_env')
def plot_script = file(moduleDir/'plot.jl')

def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]

workflow {
    compiled_env = instantiate(julia_env) | precompile
    configs = crossProduct(variables)
    combined = run_julia(compiled_env, configs) | combine_csvs
    plot(compiled_env, plot_script, combined)
}

process run_julia {
    input:
        path julia_env 
        val config 
    output:
        path "${filed(config)}"
    """
    ${activate(julia_env)}

    # run your code
    using Pigeons 
    using CSV 
    pt = pigeons(
            target = toy_mvn_target(1000), 
            n_chains = ${config.n_chains}, 
            seed = ${config.seed})

    # organize output as follows:
    #   - create a directory with name controlled by filed(config)
    #     to keep track of input configuration
    #   - put any number of CSV in there
    mkdir("${filed(config)}")
    CSV.write("${filed(config)}/summary.csv", pt.shared.reports.summary)
    CSV.write("${filed(config)}/swap_prs.csv", pt.shared.reports.swap_prs)
    """
}

process plot {
    input:
        path julia_env 
        path plot_script
        path combined_csvs_folder 
    output:
        path '*.png'
        path combined_csvs_folder
    publishDir "${deliverables(workflow, params)}", mode: 'copy', overwrite: true
    """
    ${activate(julia_env)}

    include("$plot_script")
    create_plots("$combined_csvs_folder")
    """
}

Running the nextflow script

cd experiment_repo
./nextflow run nf-nest/examples/full.nf -profile cluster

N E X T F L O W  ~  version 25.04.6
Launching `nf-nest/examples/full.nf` [peaceful_planck] DSL2 - revision: a68c131baa
[7f/c48687] Submitted process > instantiate_process
[a1/01de97] Submitted process > combine_workflow:instantiate_process
[62/e5f07c] Submitted process > combine_workflow:precompile
[18/b3eff4] Submitted process > precompile
[e1/4a97fb] Submitted process > run_julia (1)
[94/2bb620] Submitted process > run_julia (12)
[fd/c65f33] Submitted process > run_julia (8)
[fe/8349e2] Submitted process > run_julia (5)
[86/d2188b] Submitted process > run_julia (11)
[dd/a1da28] Submitted process > run_julia (10)
[1f/e7cb21] Submitted process > run_julia (7)
[2e/1f9bba] Submitted process > run_julia (6)
[df/cd6375] Submitted process > run_julia (9)
[1b/a2c88c] Submitted process > run_julia (4)
[78/d434da] Submitted process > run_julia (3)
[66/9afc5a] Submitted process > run_julia (13)
[a8/b2cf25] Submitted process > run_julia (2)
[12/b66506] Submitted process > run_julia (14)
[6b/75f684] Submitted process > run_julia (16)
[41/96a420] Submitted process > run_julia (15)
[9a/bb5545] Submitted process > run_julia (17)
[0c/fbbb9a] Submitted process > run_julia (20)
[82/949f45] Submitted process > run_julia (18)
[99/a84917] Submitted process > run_julia (19)
[71/2877e4] Submitted process > combine_workflow:combine_process
[0a/b73e43] Submitted process > plot

Accessing the output

Each nextflow process is associated with a unique work directory to ensure the processes do not interfere with each other. Here we cover two ways to quickly access these work directories.

Quick inspection

A quick way to find the output of a nextflow process that we just ran is to use:

cd experiment_repo 
nf-nest/nf-open

This lists the work folders for the last nextflow job.

Organizing the output with a publishDir

A better approach is to use the publishDir directive, combined with nf-nest’s deliverables() utility, as illustrated in the run_julia process above. This will automatically copy the output of the process associated with the directive in a sub-directory of experiment_repo/deliverables.

cd experiment_repo
tree deliverables

deliverables
└── scriptName=full.nf
    ├── output
    │   ├── summary.csv
    │   └── swap_prs.csv
    ├── plot.png
    └── runName.txt

2 directories, 4 files

Here the contents of runName.txt can be used with nextflow’s log command to obtain more information on the run.

cat deliverables/scriptName=full.nf/runName.txt

peaceful_planck

./nextflow log

TIMESTAMP           DURATION    RUN NAME            STATUS  REVISION ID SESSION ID                              COMMAND                                                    
2025-07-03 17:01:17 2.9s        small_rosalind      OK      9d1a692a7e  d108fc2d-a202-40d8-b7ca-4e9fc18872fb    nextflow run nf-nest/examples/hello.nf                     
2025-07-03 17:01:23 7.3s        boring_ride         OK      9d1a692a7e  d108fc2d-a202-40d8-b7ca-4e9fc18872fb    nextflow run nf-nest/examples/hello.nf -profile cluster    
2025-07-03 17:02:17 1m 18s      romantic_engelbart  OK      fc0374e695  d108fc2d-a202-40d8-b7ca-4e9fc18872fb    nextflow run nf-nest/pkg.nf -profile cluster               
2025-07-03 17:03:59 1m 48s      tiny_lagrange       OK      aa082b1978  d108fc2d-a202-40d8-b7ca-4e9fc18872fb    nextflow run nf-nest/examples/many_jobs.nf -profile cluster
2025-07-03 17:05:50 5.2s        gigantic_heisenberg OK      d9de661ecc  d108fc2d-a202-40d8-b7ca-4e9fc18872fb    nextflow run nf-nest/examples/filter.nf                    
2025-07-03 17:06:01 6m 24s      peaceful_planck     OK      a68c131baa  d108fc2d-a202-40d8-b7ca-4e9fc18872fb    nextflow run nf-nest/examples/full.nf -profile cluster

And we can see in the CSV that indeed the columns seed and n_chains were added to the left:

head -n 2 deliverables/scriptName=full.nf/output/summary.csv

seed,n_chains,round,n_scans,n_tempered_restarts,global_barrier,global_barrier_variational,last_round_max_time,last_round_max_allocation,stepping_stone
10,10,1,2,,8.998207738433418,,0.000233774,13536.0,-1173.429270641805