Low priority jobs

  1. Submitting low priority GPU jobs
  2. Resource limits
  3. Notes for SAFE project managers

Users who have exhausted their resource budget for the current allocation period may be able to run jobs on a low priority basis if the Sulis hardware is not fully utilised and servers are available. Currently this is enabled only for GPU jobs. Information on submitting low priority GPU jobs which do not consume GPU resource budget is given below.

Submitting low priority GPU jobs

In order to run low priority GPU jobs you must first have been given permission to do this by whoever manages your project via SAFE. See the support page for information on who to contact for this to be enabled on your user account.

Jobs should be submitted to the gpulowpri partition rather than the gpu partition. In additional they should specify a special budget code ratehr than their own GPU budget code. Each HPC Midlands+ site project and GPU-using EPSRC Access to HPC project should have such a budget code named suxxx-gpulowpri where xxx is your SAFE project number. An example job submission script is below.

gpu.slurm

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=42
#SBATCH --mem-per-cpu=3850
#SBATCH --gres=gpu:ampere_a100:1
#SBATCH --partition=gpulowpri
#SBATCH --time=08:00:00
#SBATCH --account=suxxx-gpulowpri

module purge
module load GCC/13.2.0 CUDA/12.4.0

srun ./a.out

Other examples from the GPU jobs page can be modified accordinly to use run low priority GPU jobs.

It is also possible to submit interactive jobs to the gpulowpri to work in (for example) Jupyter notebooks without consuming any GPU resource budget.

Resource limits

In order to limit the time which higher priority users (those with remaining budget) spend queueing behind lower priority jobs, there are some restrictions on the use of the low priority GPU partition compard to the standard GPU partition.

  • The walltime limit is shorter (24 hours rather than 48)
  • There maximum number of queued and running jobs per user is smaller (50 and 10 respectively).

Other limits are the same as for the standard gpu partition documented on the resource limits page.

Notes for SAFE project managers

Low priority GPU jobs consume a special class of resource named SulisLowPri within SAFE. Each top-level Sulis project that needs access to GPUs should have a SAFE project group named suxxx-gpulowpri where xxx is the SAFE Sulis project number. This contains an essentially infinite amount of SulisLowPri resource in an allocation that runs until the anticipated end date of the Sulis service. Users should be able to run jobs against this budget indefinately without exhausting it.

Project managers should be aware of two details.

  1. Users will need to be added as members of the appropriate suxxx-gpulowpri project group in SAFE by their project manager in order to use the gpulowpri partition. This is left to the discretion of project managers and PIs who may wish to restrict access at the project/institutional level.

  2. As with standard SulisGPU resource and the gpu SLURM partition, the suxxx-gpulowpri project group must contain a positive quantity of SulisCPU resource. This will not be consumed by running jobs in the gpulowpri partition, but must nonetheless be present in order for jobs to start. Jobs accidentaly submitted to the compute partition against a suxxx-gpulowpri budget will consume this CPU resource. This should be avoided.