Job cross products

Overview

Suppose you are interested in running a piece of code with many different inputs, with each execution performed on a different compute node of a cluster.

This page shows a streamlined way to do so.

Example

As a toy example, suppose we want to compute all additions of the form a + b where a and b are integers from 1 to 3. In addition, we also want a * b. This means we will need \(3 \times 3 \times 2\) calculations.

We can characterize the inputs a the cross-product denoted \(\{1, 2, 3\} \times \{1, 2, 3\} \times \{+, *\}\).

Nextflow script

The script below will perform the following operations.

// we use utilities in the nf-nest submodule
// in user scripts, path would be './nf-nest/cross.nf' 
include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'

def variables = [
    first: 1..3,
    second: 1..3,
    operation: ["+", "*"]
]

// specifies the order of operations
workflow {
    // look at all combinations of variables
    configs = crossProduct(variables)
    // run Julia on 18 nodes!
    run_julia(configs)

    // equivalent syntax:
    // crossProduct(variables) | run_julia
}

process run_julia {
    debug true // by default, standard out is not shown, use this to show it
    
    // information used when submitting job to queue
    time 2.min
    cpus 1 
    memory 5.GB

    input:
        val config 
    """
    ${activate()}
    # ^ this is just a shortcut for:
    #!/usr/bin/env julia --threads=1

    @show ${config.first} ${config.operation} ${config.second}
    """
}

For more information:

Running the script

Running it with the -profile cluster option will:

  • build a cross-product from variables
  • for each one, automatically create submission scripts
  • run these Julia processes and show the standard out.

From the command line, running the script is done as follows:

cd experiment_repo
./nextflow run nf-nest/examples/many_jobs.nf -profile cluster
N E X T F L O W  ~  version 24.10.6
Launching `nf-nest/examples/many_jobs.nf` [gigantic_brown] DSL2 - revision: aa082b1978
[aa/75d1ff] Submitted process > run_julia (1)
[1b/b98b98] Submitted process > run_julia (4)
[7b/1fb8fe] Submitted process > run_julia (2)
[66/177b02] Submitted process > run_julia (3)
[0f/82ebdf] Submitted process > run_julia (5)
[9f/5220ac] Submitted process > run_julia (6)
[a4/17aa5a] Submitted process > run_julia (7)
[07/c610a5] Submitted process > run_julia (8)
[05/9de84f] Submitted process > run_julia (9)
[3d/8925e3] Submitted process > run_julia (10)
[99/d5169a] Submitted process > run_julia (12)
[02/91bf80] Submitted process > run_julia (11)
[97/d9aebe] Submitted process > run_julia (13)
[a0/60d127] Submitted process > run_julia (14)
[7a/56fbdb] Submitted process > run_julia (15)
[42/dcbb4e] Submitted process > run_julia (16)
[bf/0eee7a] Submitted process > run_julia (17)
[07/dc672d] Submitted process > run_julia (18)
1 + 1 = 2
1 + 2 = 3
1 * 1 = 1
1 * 2 = 2
1 + 3 = 4
1 * 3 = 3
2 + 1 = 3
2 * 1 = 2
2 + 2 = 4
2 * 2 = 4
2 * 3 = 6
2 + 3 = 5
3 + 1 = 4
3 * 1 = 3
3 + 2 = 5
3 * 2 = 6
3 + 3 = 6
3 * 3 = 9

Filtering

In some case we want to run only a subset of the cross product. For example, suppose we want only the runs of the form a * a and a + a. This can be done using the filter() function in nextflow:

// we use utilities in the nf-nest submodule
// in user scripts, path would be './nf-nest/cross.nf' 
include { crossProduct; filed; deliverables } from '../cross.nf'
include { instantiate; precompile; activate } from '../pkg.nf'

def variables = [
    first: 1..3,
    second: 1..3,
    operation: ["+", "*"]
]

// specifies the order of operations
workflow {
    configs = crossProduct(variables).filter{ config -> config.first == config.second }
    run_julia(configs)

    // equivalent pipe syntax:
    // crossProduct(variables) | filter{ config -> config.first == config.second } | run_julia
}

process run_julia {
    debug true // by default, standard out is not shown, use this to show it
    input:
        val config 
    """
    ${activate()}

    @show ${config.first} ${config.operation} ${config.second}
    """
}

Running this

cd experiment_repo
./nextflow run nf-nest/examples/filter.nf  
N E X T F L O W  ~  version 24.10.6
Launching `nf-nest/examples/filter.nf` [goofy_lattes] DSL2 - revision: d9de661ecc
[9b/d60378] Submitted process > run_julia (1)
[c0/a403ff] Submitted process > run_julia (3)
[04/795984] Submitted process > run_julia (4)
[26/d6901e] Submitted process > run_julia (2)
[d9/92f578] Submitted process > run_julia (5)
2 + 2 = 4
1 + 1 = 2
[db/52fbbf] Submitted process > run_julia (6)
2 * 2 = 4
1 * 1 = 1
3 + 3 = 6
3 * 3 = 9

Log scales

To create a list of parameters in log scale, use:

(0..2).collect{ i -> Math.pow(2.0, i)}

This will return 2.0^0, 2.0^1, 2.0^2.