Installing PyTorch with MPI support on ABCI
To get MPI backend for torch distributed
working you need to recompile PyTorch.
On ABCI to get this working, you need to load these modules (some of them might be not needed, I just grabbed a modules.sh
file):
module load gcc/9.3.0
module load cuda/11.2/11.2.2
module load cudnn/8.1/8.1.1
module load nccl/2.8/2.8.4-1
module load openmpi/4.0.5
module load python/3.8/3.8.7
module load cmake/3.19
After this we just need to clone the PyTorch repo:
git clone git@github.com:pytorch/pytorch.git
and build it:
python3 setup.py develop --user
This overwrites your current PyTorch installation, and you need to use --upgrade --forece-reinstall
with pip3
to install the original one.
Enjoy Reading This Article?
Here are some more articles you might like to read next: