PyTorch on Sulis

  1. PIP-PyTorch module (PyTorch 2.0 and later)
  2. Legacy (pre-2.0 PyTorch modules)

These notes constitute a brief guide to using PyTorch on Sulis, with emphasis on using the GPU hardware. They may update occasionally as newer software is deployed.

PIP-PyTorch module (PyTorch 2.0 and later)

The PIP-PyTorch modules provide an environment which includes a PyTorch build installed using pip following the information at pytorch.org. This is the recommended way of using PyTorch 2.0 and later for GPU enabled computation on Sulis.

We no longer build GPU enabled PyTorch 2 builds from source. For GPU computation the versions from pytorch.org are already built to target all relevant GPU hardware and so performance should be no worse than a custom build.

To query the versions available via this mechanism:

[user@login01(sulis) ~]$ module spider PIP-PyTorch

This will provide a list of available versions. Those with a CUDA version suffix support GPU accelerated computation. Querying one of these specifically:

[user@login01(sulis) ~]$ module spider PIP-PyTorch/2.4.0-CUDA-12.4.0

-----------------------------------------------------------------------------
   PIP-PyTorch: PIP-PyTorch/2.4.0-CUDA-12.4.0
-----------------------------------------------------------------------------
    Description:
      Tensors and Dynamic neural networks in Python with strong GPU 
      acceleration. PyTorch is a deep learning framework that puts Python first.
      
    You will need to load all module(s) on any one of the lines below before
    the "PIP-PyTorch/2.4.0-CUDA-12.4.0" module is available to load.

      GCC/13.2.0 OpenMPI/4.1.6

These modules also provide, torchvision, torchaudio etc.

Be aware that attempting to test a GPU enabled PyTorch script on the login node will fail. There are no GPUs in the login nodes. Use an interactive session on a GPU node instead.

Legacy (pre-2.0 PyTorch modules)

To search for an older version of PyTorch

[user@login01(sulis) ~]$ module spider PyTorch

This will list PyTorch builds that can be added into your environment. For example PyTorch/1.9.0. Querying this version specifically will provide information on prerequisite modules than must first be loaded.

[user@login01(sulis) ~]$ module spider PyTorch/1.9.0

-----------------------------------------------------------------------------
  PyTorch: PyTorch/1.9.0
-----------------------------------------------------------------------------
    Description:
      ATensors and Dynamic neural networks in Python with strong GPU 
      acceleration. PyTorch is a deep learning framework that puts Python 
      first.

    You will need to load all module(s) on any one of the lines below before
    the "PyTorch/1.9.0" module is available to load.

      GCC/10.2.0  CUDA/11.1.1  OpenMPI/4.0.5
      GCC/10.2.0  OpenMPI/4.0.5

In this case there are two sets of possible prerequisite modules. The first includes CUDA and should be used for running PyTorch on the Sulis GPU nodes. The second is for use on the standard compute nodes.

PyTorch can hence be added to your environment for GPU computation by loading the following modules

[user@login01(sulis) ~]$ module purge
[user@login01(sulis) ~]$ module load GCC/10.2.0 CUDA/11.1.1  OpenMPI/4.0.5
[user@login01(sulis) ~]$ module load PyTorch/1.9.0

Loading a PyTorch/1.9.0 module in this way also adds additional prerequisites to your terminal and Python environment. For example in this case the Python 3.8.6 module is loaded, along with a SciPy-bundle module that provides NumPy 1.19.4, SciPy 1.5.4 and Pandas 1.14. There is no need to install these via pip. All dependencies of PyTorch itself are provided.

NOTE : Attempting to use PyTorch loaded in this way will fail unless running on an GPU-enabled node in an interactive session or SLURM job script.