High memory jobs

  1. Requesting increased memory with SLURM
  2. High memory nodes
  3. Very high memory nodes

There are many scientific workloads that benefit from access to servers with large amounts of RAM and Sulis is equipped to support these. We do though frequently encounter cases where requirements for large RAM are the result of poorly written code or a misunderstanding of inputs to software. Please do not be offended if we ask to sanity check such things.

Requesting increased memory with SLURM

Limits on memory are usually derived from the amount of physically available memory within the node, sometimes with an adjustment to ensure there is sufficient memory for non-job processes, such as file system cache. For example, the compute partition servers have a total of 512 GB per node and 128 cores. After the operating system and other RAM overheads are considered, this leaves 3850 MB per core available to SLURM jobs.

SLURM enforces memory limits such that if a job attempts to use more memory than it asked for, it will be killed. Typically this results in out-of-memory error messages, often similar to:

   slurmstepd: error: Detected 1 oom-kill event(s) in step <job_id>.batch cgroup.

Jobs will sometimes require more than the default amount of memory per core. In this case one approach is to request more cores and simply leave them idle, thereby providing increased effective memory per core. This is done using SLURM’s --cpus-per-task resource directive. For example to request 7700 MB for a serial job, on the compute partition:

extra_ram.slurm

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=3850
#SBATCH --time=08:00:00
#SBATCH --account=suxxx-somebudget

module purge
module load GCC/13.2.0

./a.out

In this example, the job is requesting 2 (--cpus-per-task) * 3850 (--mem-per-cpu) = 7700 MB. Whilst two cores are assigned to the job, provided our executable (a.out) runs purely sequential code, one of them will be left idle.

The above method is preferred to requesting 1 CPU with higher mem-per-cpu, as it ensures any CPUs on the server not being used by your jobs have the expected 3850 MB per core available to other users. This avoids confusion that arises from servers that appear to be only partly utilised while jobs are queued.

CPU resource budgets are charged for the number of CPUs requested and for the duration of the job. In the example above the budget suxxx-somebudget will be charged 16 CPUh (assuming it runs for the full 8 hours requested) and not 8.

Users should interpret 1CPUh as the cost for accessing 1 core and its associated RAM.

In some cases it may be possible to make some use of the otherwise idle CPU by enabling threading in your software.

High memory nodes

For jobs that need larger amounts of memory, we have 4 servers available to all users via the hmem partition. These have approximately 1 TB (TerraByte) of RAM per server, i.e. twice that of the standard compute nodes. See the resource limits page for details.

They can be accessed for jobs that need large amounts of memory by requesting the hmem partition explicitly in your SLURM job script, for example;

hmem_job.slurm

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=7700
#SBATCH --partition=hmem
#SBATCH --time=08:00:00
#SBATCH --account=suxxx-somebudget

module purge
module load GCC/13.2.0

./a.out

where more memory can be requested by increasing the number of CPUs, up to approximately 1 TB.

It may be desirable to use the high memory nodes for interactive jobs in which case add the hmem partition to your resource request, e.g.

[user@login01(sulis) ~]$ salloc --account=suxxx-somebudget -N 1 -n 64 --mem-per-cpu=7700 --time=8:00:00

requests half the CPUs and half the memory (512GB) on one of the high memory nodes for an interactive job.

There are only 4 nodes in the hmem partition which are often in high demand. Please do not use these nodes for workloads that could be executed on the standard compute nodes in the compute partition. This will be policed!

Very high memory nodes

For jobs that need extreme amounts of memory (e.g. metagenomics sequence reconstruction), we have 3 servers available to all users via the vhmem partition. These have approximately 4 TB (TerraBytes) of RAM per server, i.e. 8x that of the standard compute nodes. See the resource limits page for details.

They can be accessed for jobs that need large amounts of memory by requesting the vhmem partition explicitly in your SLURM job script, for example;

vhmem_job.slurm

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=2
#SBATCH --mem-per-cpu=64000
#SBATCH --partition=vhmem
#SBATCH --time=08:00:00
#SBATCH --account=suxxx-somebudget

module purge
module load GCC/13.2.0

./a.out

where more memory can be requested by increasing the number of CPUs, up to approximately 4 TB.

It may be desirable to use the very high memory nodes for interactive jobs in which case add the vhmem partition to your resource request, e.g.

[user@login01(sulis) ~]$ salloc --account=suxxx-somebudget -N 1 -n 32 --mem-per-cpu=64000 --time=8:00:00

requests half the CPUs and half the memory (2 TB) on one of the very high memory nodes for an interactive job.

There are only 3 nodes in the vhmem partition which are often in high demand. Please do not use these nodes for workloads that could be executed on the standard compute nodes in the compute partition. This will be policed!