ARCC HPC Quickstart

This Quickstart introduces ARCC's current flagship HPC resource. It only breifly touches on how to login, navigate, load software, and run compute jobs. There are far more things to know if you have never used an HPC cluster before, but this quickstart will cover the details on how to use ARCC's system. It is useful for brand new users and experienced researchers who know how to use HPC, but are just using ARCC's system for the first time.

Disclaimer

The video below is several years old, but does contain some useful material. Please see the written guide below for up-to-date information.

Assumptions

This quickstart makes the assumption that a PI has already requested a project, it has been created, and user accounts have been added.
It also assumes Duo mobile has been configured on anyone's phone who is trying to login.
ARCC's current flagship HPC system is called MedicineBow. This quickstart only covers MedicineBow, it may be helpful for other systems, but just know that there may be differences.

ARCC uses Open OnDemand to provide the gateway to MedicineBow. Please use a web browser and navigate to https://medicinebow.arcc.uwyo.edu.

If you have a uwyo.edu account, you can enter your UWyo username and password then click 'sign in'.
Invalid username or password

If you get this error, it may not actually be your username or password. If you do not have Duo configured to automatically send a push notification then this is second factor authentication timeout error.

To force a push notification, please enter your password with this format:
```
password,push
```
With a comma after your password and the word "push" and no spaces in between.
If you have an ARCC-Only account. Click the button that says "ARCC-Only/Wildiris" to get to your signin page and follow the setup instructions.

Step 2: Navigate the Open OnDemand Dashboard

There are a lot of options for you here, interactive desktops, Jupyter notebooks, files etc. but this quickstart will only just focus on the shell access option because it covers everything new users need to get started.

Open OnDemand Dashboard

Click on the MedicineBow Shell Access button to get to a terminal in your web browser.

Web Terminal

Prefer to use a local terminal?

If you are a person that prefers to use your own terminal to interact with the system, you will need to go to the SSH Key manager to generate and download your keys to get ssh access to the system

SSH Keys on MedicineBow expire on an annual basis. You will have to repeat this process 1 year from when you download them the first time

Step 3: Navigate Linux Filesystem Directories

Upon login you'll notice a few things.

The message of the day. This is where we have some important information on upcoming maintainence windows, how to get help, etc.

The directories your user has access to.

Path	Purpose	Quota
/home/username	Configuration files	50 GB
/project/projectname	Sharing data/software with your project members	5 TB
/gscratch/username	Working directory for storage during anaylsis	5 TB

Finally at the bottom, you will see your username, an '@' symbol, either mblog1 or mblog2, a tilde, and a dollar sign.
```
[username@mblog1 ~]$
```
This means you are logged into a login node and are currently in your /home/username directory as a non-root user.

sudo or root

HPC users will not be able to use sudo or root, and never will.

See the ARCC policies

You will have to use Linux commands to navigate the HPC filesystem, run commands, edit files, etc. If you are unfamilar with Linux, please see our introductory tutorial Intro to Linux.

Step 4: Transfer data

You may have research data stored on your laptop/desktop or perhaps on a different server that you need to use in your research. Transferring this data to HPC can be done in multple ways:

Open OnDemand FilesGlobusCommand Line Tools

If the data files are under 5 GB in size, the Open OnDemand 'Files' app may be an option for you.

OnDemand Files

Danger

Do not attempt if a file is larger than 5 GB. It could cause problems.

Globus File Manager with Globus Connect Personal is ARCC's recommended data transfer method.

Globus File Manager

How to Use Globus

If you have never used Globus before, please see the Intro to Globus Introductory Tutorial to get started.

Command line tools. ARCC prefers to use rclone, but SFTP, rsync, etc. are all viable options.

Step 5: Discover and Install Software

Whether you primarily use Python, MatLab, R, and their various libraries or some other software in your research, there are a couple of different options:

Use what is available on the system by loading modules.
Install it yourself if possible.

Using the Module SystemInstalling Software Yourself

ARCC maintains a minimal software stack that is avialble via modules (1). To get started looking for if the software you want to use is available you can use the command 'module spider'.

ARCC provides a small set of core compilers , languages, libraries and applications, which will be updated on a semi regular basis. When updated, we try and use the latest versions, and typically will not be support older versions. The GNU family of compilers (i.e. gcc) is are primary focus, but we do provide Intel’s oneAPI suite of compilers/libraries as well as NVidia’s hpc-sdk toolset. Please refer to the Software List for what is available and/or use the module spider command to search. ARCC will consider adding to this on a case-by-case basis.

module spider softwarename

This searches what is installed in the module system and what else needs to loaded for it to work. For example loading rclone:

[username@mblog2 ~]$ module spider rclone
----------------------------------------------------------------------------
rclone: rclone/1.63.1
----------------------------------------------------------------------------
    You will need to load all module(s) on any one of the lines below before the "rclone/1.63.1" module is available to load.
    arcc/1.0  gcc/13.2.0
    arcc/1.0  gcc/14.2.0

    Help:
    Rclone is a command line program to sync files and directories to and
    from various cloud storage providers

Then in order to use this software, you must load it's dependencies and then it. For our example above we need both a version of arcc and gcc:

module load arcc/1.0 gcc/14.2.0
module load rclone/1.63.1

Then we can use the rclone commands we want to use.

Installing software yourself (1) on the cluster is often the best way to ensure you get the version you want as well as helps you to be more independent by not waiting for ARCC to install something for you.

ARCC's software policy is written with the following goals: 1) Helping our users to become better HPC users 2) Reflect systems and processes similar to other HPC centers, 3) assisting via consultation and training to developing transferable skills and good practices as well as allowing easier transition and move to other HPC centers.

There are various ways to install software yourself on the cluster, but in general we recommend three options:

Downloading binaries
Software environments e.g. conda
Apptainer containers

Which one to use and when will vary widely so we don't provide examples here, but there may be other tutorials that cover these topics.

Where to install software

Since the quota in your home directory is small, ARCC provides an additional 250 GB software directory within each project directory as the recommended place to install software. navigate to /project/projectname/software to see for yourself.

This provides the additinonal benefit that all project members can use what one member installs! No need for every person to install the same software over and over again.

Step 6: Running Compute Jobs

Running Code on Login Nodes

When you login, you may think that you are ready to run your code, but this is forbidden by ARCC policy. HPC users need to use the Job Scheduler to allocate compute resources to do this work.

ARCC uses the Slurm Workload Manager on HPC systems. Slurm uses directives to allocate compute resources, for example, CPUs, GPUs, and/or memory. But Slurm directives can also allocate time, send emails, and much more.

Required Slurm Directive - Account

The Slurm directive, 'account' is the only one that ARCC currently requires for any job to run. What Account means is the where Slurm directs it's accounting.

On ARCC HPC systems, the account = your project name.

For example:

--account=projectname

With Slurm there are two ways to do this:

Interactive jobs
Batch jobs

Interactive JobsBatch Jobs

Interactive jobs are useful for short running jobs, debugging, or code development. As such, ARCC policy limits interactive jobs to run no longer than 8 hours. Any of the interactive apps found on the Open OnDemand dashboard run interactive jobs, but interactive jobs can also be launched by the command line using 'salloc'.

For Example:

[username@mblog1 ~]$ salloc --account=projectname
salloc: Granted job allocation 48508763
salloc: Waiting for resource configuration
salloc: Nodes t468 are ready for job
[username@t468 ~]$

Batch jobs can run much longer running processes in the background. There are several different queues that batch jobs can fall into depending on your time and priority requirements. Please see the ARCC HPC job policies to know what they are.

In order to run batch jobs, you will need to use your text editor of choice to create a bash script with all of your Slurm directives that you want to use to submit your job.

Here is an example of a simple Python batch job submission script that will run in 1 day and 1 hour and asks for 8 GB of memory.

#!/bin/bash
#SBATCH --account=projectname
#SBATCH --time=1-01:00:00
#SBATCH --mem=8G
#SBATCH --job-name sequential_run
#SBATCH --mail-type=ALL
#SBATCH --mail-user=cowboyjoe@uwyo.edu

module load python/3.12.0

python my_job_sequential_steps.py

After making all the needed edits to this job submission script, you can then use Slurm's 'sbatch' command to send it to the Job scheduler.

Example:

[username@mblog1 ~]$ sbatch jobscript.sh
Submitted batch job 48511808
[username@mblog1 ~]$

Things to consider for running jobs

You only get the hardware you ask for. E.g. if you don't ask for multiple cpus you won't get them.
Some Slurm directives conflict with each other. E.g. time and qos (Quality of Service). Make certain you know which ones you want to use.
There is queueing. There might be times where your job won't be run as fast as you expect. There are multiple factors why your job might be pending.

Success

Covered in this quickstart were:

How to login
How to navigate the Open OnDemand Dashboard
What directories you have available
How to transfer data
How to look for or install software
How to use the job scheduler