Installing PyTorch with MPI support on ABCI

To get MPI backend for torch distributed working you need to recompile PyTorch.

On ABCI to get this working, you need to load these modules (some of them might be not needed, I just grabbed a modules.sh file):

module load gcc/9.3.0
module load cuda/11.2/11.2.2
module load cudnn/8.1/8.1.1
module load nccl/2.8/2.8.4-1
module load openmpi/4.0.5
module load python/3.8/3.8.7
module load cmake/3.19

After this we just need to clone the PyTorch repo:

git clone git@github.com:pytorch/pytorch.git

and build it:

python3 setup.py develop --user

This overwrites your current PyTorch installation, and you need to use --upgrade --forece-reinstall with pip3 to install the original one.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Flattening loops of combinations
  • Continuous benchmarking on supercomputers
  • Polyhedral compilation: part 1
  • How to set up a website like this part 2
  • How to set up a website like this