Combine outputs

Overview

Now that we know how to run many jobs, the next question is how to combine the output of all these jobs to analyze it.

Example

We will run Pigeons on the cross product formed by calling crossProduct(variables) with:

def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]

Suppose we want to create a plot from the output of these 20 Julia processes.

Strategy

Each Julia process will create a folder. Using a function, we will provide an automatic name to this folder encoding the inputs used (seed and n_chains). That name is provided by nf-nest’s filed() function. In that folder, we will
put csv files.

Then, once all Julia processes are done, another utilities from nf-nest, combine_csvs, will merge all CSVs while adding columns for the inputs (here, seed and n_chains).

Finally, we will pass the merged CSVs to a plotting process.

Nextflow script

// includes are relative to the .nf file, should always start with ./ or ../
include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'
include { combine_csvs; } from '../combine.nf'

// in contrast, file(..) is relative to `pwd`, use projectDir/ 
//   to make it relative to main .nf file, or moduleDir for the .nf file
def julia_env = file(moduleDir/'julia_env')
def plot_script = file(moduleDir/'plot.jl')

def variables = [
    seed: 1..10,
    n_chains: [10, 20], 
]

workflow {
    compiled_env = instantiate(julia_env) | precompile
    configs = crossProduct(variables)
    combined = run_julia(compiled_env, configs) | combine_csvs
    plot(compiled_env, plot_script, combined)
}

process run_julia {
    input:
        path julia_env 
        val config 
    output:
        path "${filed(config)}"
    """
    ${activate(julia_env)}

    # run your code
    using Pigeons 
    using CSV 
    pt = pigeons(
            target = toy_mvn_target(1000), 
            n_chains = ${config.n_chains}, 
            seed = ${config.seed})

    # organize output as follows:
    #   - create a directory with name controlled by filed(config)
    #     to keep track of input configuration
    #   - put any number of CSV in there
    mkdir("${filed(config)}")
    CSV.write("${filed(config)}/summary.csv", pt.shared.reports.summary)
    CSV.write("${filed(config)}/swap_prs.csv", pt.shared.reports.swap_prs)
    """
}

process plot {
    input:
        path julia_env 
        path plot_script
        path combined_csvs_folder 
    output:
        path '*.png'
        path combined_csvs_folder
    publishDir "${deliverables(workflow, params)}", mode: 'copy', overwrite: true
    """
    ${activate(julia_env)}

    include("$plot_script")
    create_plots("$combined_csvs_folder")
    """
}

Running the nextflow script

cd experiment_repo
./nextflow run nf-nest/examples/full.nf -profile cluster 
N E X T F L O W  ~  version 24.10.6
Launching `nf-nest/examples/full.nf` [romantic_wing] DSL2 - revision: a68c131baa
[df/c5a10f] Submitted process > combine_workflow:instantiate_process
[00/0ab92c] Submitted process > instantiate_process
[31/97015a] Submitted process > combine_workflow:precompile
[48/c6c78a] Submitted process > precompile
[38/327da8] Submitted process > run_julia (3)
[b4/accae5] Submitted process > run_julia (1)
[ba/b6f9f9] Submitted process > run_julia (4)
[82/421652] Submitted process > run_julia (2)
[d5/404e35] Submitted process > run_julia (5)
[9d/c5acd5] Submitted process > run_julia (7)
[bb/c8f236] Submitted process > run_julia (6)
[e6/a8195d] Submitted process > run_julia (8)
[df/5456d3] Submitted process > run_julia (9)
[25/7c66e0] Submitted process > run_julia (10)
[f1/880a94] Submitted process > run_julia (11)
[08/c2005a] Submitted process > run_julia (13)
[33/970057] Submitted process > run_julia (12)
[f9/d6123a] Submitted process > run_julia (14)
[fb/913ad5] Submitted process > run_julia (15)
[20/f4a6b3] Submitted process > run_julia (16)
[45/72e15b] Submitted process > run_julia (17)
[8e/8f2ba2] Submitted process > run_julia (18)
[60/2b6bc0] Submitted process > run_julia (19)
[54/d8e5b0] Submitted process > run_julia (20)
[3c/ec105c] Submitted process > combine_workflow:combine_process
[4c/80d2c6] Submitted process > plot

Accessing the output

Each nextflow process is associated with a unique work directory to ensure the processes do not interfere with each other. Here we cover two ways to quickly access these work directories.

Quick inspection

A quick way to find the output of a nextflow process that we just ran is to use:

cd experiment_repo 
nf-nest/nf-open

This lists the work folders for the last nextflow job.

Organizing the output with a publishDir

A better approach is to use the publishDir directive, combined with nf-nest’s deliverables() utility, as illustrated in the run_julia process above. This will automatically copy the output of the process associated with the directive in a sub-directory of experiment_repo/deliverables.

cd experiment_repo
tree deliverables
deliverables
└── scriptName=full.nf
    ├── output
    │   ├── summary.csv
    │   └── swap_prs.csv
    ├── plot.png
    └── runName.txt

2 directories, 4 files

Here the contents of runName.txt can be used with nextflow’s log command to obtain more information on the run.

cat deliverables/scriptName=full.nf/runName.txt 
romantic_wing
./nextflow log
TIMESTAMP           DURATION    RUN NAME            STATUS  REVISION ID SESSION ID                              COMMAND                                                    
2025-04-28 22:23:50 2.9s        furious_bhaskara    OK      9d1a692a7e  8a9edb72-f01d-44e8-821a-46007329be15    nextflow run nf-nest/examples/hello.nf                     
2025-04-28 22:23:56 7.4s        sleepy_cuvier       OK      9d1a692a7e  8a9edb72-f01d-44e8-821a-46007329be15    nextflow run nf-nest/examples/hello.nf -profile cluster    
2025-04-28 22:25:01 47.1s       pensive_leibniz     OK      fc0374e695  8a9edb72-f01d-44e8-821a-46007329be15    nextflow run nf-nest/pkg.nf -profile cluster               
2025-04-28 22:26:11 43.2s       gigantic_brown      OK      aa082b1978  8a9edb72-f01d-44e8-821a-46007329be15    nextflow run nf-nest/examples/many_jobs.nf -profile cluster
2025-04-28 22:26:57 6s          goofy_lattes        OK      d9de661ecc  8a9edb72-f01d-44e8-821a-46007329be15    nextflow run nf-nest/examples/filter.nf                    
2025-04-28 22:27:09 4m 33s      romantic_wing       OK      a68c131baa  8a9edb72-f01d-44e8-821a-46007329be15    nextflow run nf-nest/examples/full.nf -profile cluster     

And we can see in the CSV that indeed the columns seed and n_chains were added to the left:

head -n 2 deliverables/scriptName=full.nf/output/summary.csv 
seed,n_chains,round,n_scans,n_tempered_restarts,global_barrier,global_barrier_variational,last_round_max_time,last_round_max_allocation,stepping_stone
10,10,1,2,,8.998207738433418,,0.000223134,13536.0,-1173.429270641805