module load git
Setting up your HPC account
Overview
You will need to go through these setup instructions before using nf-nest. This needs to be done only once per user and per cluster.
Prerequisites
- Working knowledge of unix (files, permissions, processes, environment variables).
- Working knowledge of git.
There are countless web resources to learn those things. Tedious but worth the investment.
Optional: Password-less SSH
It is imperative for your sanity to avoid entering your password and do 2FA every time a SSH connection is needed.
For example, on Sockeye, the closest you can get to that is with SSH Connection Sharing. This means that you open a first terminal window and perform 2FA in it. As long as that window is open, other SSH connections can be established without password and 2FA. Minimize that first window and do not touch it.
Optional: Install VS Code
VS Code makes it easier to develop on a remote server. For example, it simplifies file editing, resuming your session, manages github credential for you, etc.
- On your laptop, install VSCode.
- Follow two steps documented in the VS Code website
- Install the SSH extension
- Click on the Remote SSH icon in the lower left, this will let you
Connect to Remote Host
.
Use New Terminal
to open a terminal. Once you have a project (git repo) where you develop ready, use Open Folder...
to point VS Code to the root of your project. VS Code is also able to manage your github authentification.
Understanding bash
When you login to HPC, it starts a process allowing you to interact with the server in a scripting language, which is typically bash.
It is a good investment to learn the basics of bash. See the software carpentry open source tutorial.
Understanding environment variables
Any UNIX process owns a special set of global variables called its environment variables. Each variable holds a string, possibly empty.
An important convention is the PATH
environment variable holding a :
-separated list of paths where executable such as julia
or python
will be searched for.
For example, in a bash process, to display the value of an environment variable called MY_ENV_VAR
, use
echo $MY_ENV_VAR
In a Julia process:
ENV["MY_ENV_VAR"]
When a process starts a child process, the parent process decides what the child’s environment variable will be.
For example, consider what happens when you start Julia from bash. It is bash that will decide. Bash does it as follows: it uses the command export
to mark a variable as something we want to propagate to children. This is why you will see export PATH
in a start up file (next section).
Nextflow will take a different approach than bash: the environment variables are specified explicitly to ensure better reproducibility.
Understanding start up files
Bash is used in several different ways, with terminology for each kind.
In one axis: bash is called either on login or not:
login
: when you login to a machine (via SSH, or start-up a linux machine).non-login
after being logged in, bash can be called by another process (e.g. bash, VS Code).
In another axis: bash is called either:
interactively
: when a user will interact with the processnon-interactively
: when it is another program in control.
The following files will be sourced automatically in the following circumstances:
~/.bashrc
, loaded when the shell is interactive and non-login,~/.bash_profile
, loaded for login shell.
A common pattern is to use the following in ~/.bash_profile
# put that in .bash_profile so that anything in .bashrc also gets loaded
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
and then to put environment variables in ~/.bash_profile
and aliases in ~/.bashrc
.
Recall that any environment variables with export
are passed in to child process. So no need to set them in non-login shells. On the other hand, anything with alias
is not propagated to a child process.
One annoyance with this strategy is that after an update of the PATH variable, if you have several windows open, you may have to logout and log back in for the change to take effect. Moreover, even if you disconnect to VSCode, it maintains a process on the server, so you have to kill that server using
ps aux | grep .vscode-server | awk '{print $2}' | xargs kill
then log out, and then log back in.
Git
We assume the command git
is available.
In some HPC such as UBC Sockeye, this command is not available by default, instead a module has to be loaded:
Allocation code
In some HPC such as UBC Sockeye, we need an allocation code to submit to the job queue. Scripts in nf-nest use an ENV variable called ALLOCATION_CODE
to find the allocation code.
To set the variable in your current session, use:
export ALLOCATION_CODE=my-alloc-code
Optional: disable nextflow fancy output
It can be useful to avoid nextflow’s fancy progress output, for example they do not work well in notebook or in screen
.
To disable in the current session, use:
export NXF_ANSI_LOG=false
Java
Nextflow needs Java. Follow these instructions taken from the nextflow website:
First, Install SDKMAN
curl -s https://get.sdkman.io | bash
Second, open a new terminal, and install Java:
sdk install java 17.0.10-tem
Finally, confirm that Java in installed correctly:
java -version
Install Julia
At the time of writing, Julia 1.11 is causing various issues, so install Julia 1.10 using the “Linux and FreeBSD” instructions but using the URL for the Long-term support version 1.10.6.
You do not need root privileges, e.g., install in a folder at ~/bin/
and add to your path.
Julia depot
Julia has a package manager called Pkg
. The Julia depot folder is a location where Pkg
puts files it downloads and compiles. Each user should have their own. The path to the depot is controlled via the JULIA_DEPOT_PATH
environment variable.
In a HPC setup, the JULIA_DEPOT_PATH
should be accessible read/write by all nodes, and ideally be on fast storage.
For example, on Sockeye the first choice if you allocation has it would be to use burst storage, i.e. path of the form
export JULIA_DEPOT_PATH=/arc/burst/[allocation_code]/[username_in_alloc]/depot
A second choice would be to use the scratch space
export JULIA_DEPOT_PATH=/scratch/[allocation_code]/[username_in_alloc]/depot
Optional: useful aliases
Bash supports shortcuts called alias for commonly used commands. We show two use cases here.
A very common patter in Julia is to activate the environment in the current directory and load Revise.jl. To do so, create a Julia script in your home directory called julia-start.jl
and put that in it:
println("Active project: $(Base.active_project())")
println("Loading Revise...")
using Revise
Then add the following to .bashrc
:
alias j='julia --banner=no --load /home/[your_alloc]/julia-start.jl --project=@. '
You will need to install Revise.jl in the global environment:
julia
]
activate
add Revise
Second, most HPC has some ways to start an interactive allocation. Create an alias to be able to do it quicly. Here is an example for interactive CPU and GPU nodes respectively on Sockeye:
alias interact='salloc --time=3:00:00 --mem=16G --nodes=1 --ntasks=2 --account=[your_alloc]'
alias ginteract='salloc --account=[your_alloc-gpu] --partition=interactive_gpu --time=3:00:00 -N 1 -n 2 --mem=16G --gpus=1'