Is it possible to run the binary pylith on slurm cluster with multiple processes?

daniel9 · December 1, 2020, 10:51am

I am having problems installing pylith from source because the dependencies keep crashing during the installation.

For now, I will be happy to be able to run the binary pylith with multiple processes, is there a way to configure the pylithapp file to run on multiple processes using a slurm workload manager instead of PBS as shown in the pylith manual?

Thanks,
Daniel

baagaard · December 1, 2020, 3:20pm

The Pyre package which PyLith uses for job submission does not currently have support for the SLURM scheduler. The workaround is to use the LSF or PBS scheduler with the --scheduler.dry command line option. This will print the bash script used to submit a job to stdout. You can edit and save the script and then run it to submit a job to the SLURM scheduler.

IMPORTANT: If you are using the PyLith binary, you can only run on a single node of a cluster. If you submit a job running on multiple nodes, each node will just be running a duplicate version of the job.

We can provide help with building PyLith from source on a cluster, but we need you to provide detailed information about what packages (including version information) the cluster system administrator has installed and that you intend to use (including compilers, MPI, etc). It is important that you use the MPI installed by the system administrator for the cluster in order to make use of the specific interconnect hardware. If an installation using the PyLith installer utility fails, then we need the full configure and make logs to diagnose problems. These include the configure/make logs for the installer as well as the underlying builds that failed.

daniel9 · October 26, 2021, 11:55am

Dear Baagaard,

I have manage to install pylith on our cluster and actually it seems I can submit a job using PBS. I used

pylith --scheduler.dry --nodes=5 --scheduler.ppn=20

which pylithapp setting

scheduler = pbs
[pylithapp.pbs]
shell = /bin/bash
qsub-options = -V -m bea -M myemail@email.com
[pylithapp.launcher]
command = mpirun -np {nodes} -machinefile {PBS_NODEFILE}

to generate the bash script

#!/bin/bash
#PBS -S /bin/bash
#PBS -N jobname
#PBS -o stdout.txt
#PBS -e stderr.txt
#PBS -l nodes=1:ppn=20
#PBS -V -m bea -M myemail@email.com

cd $PBS_O_WORKDIR
/opt-ictp/pytlith/2.2.2-1/bin/nemesis --pyre-start /opt-ictp/pytlith/2.2.2-1/bin:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib64/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib/python27.zip:/opt-ictp/pytlith/2.2.2-1/lib/python2.7:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/plat-linux2:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/lib-tk:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/lib-old:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/lib-dynload:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/setuptools-39.1.0-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/numpy-1.14.3-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/Cython-0.29.16-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/h5py-2.9.0-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/build/h5py-2.9.0/.eggs/six-1.16.0-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/cftime-1.1.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/netCDF4-1.5.0.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/setuptools-39.1.0-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/cftime-1.1.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/netCDF4-1.5.0.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib64/python2.7/site-packages pythia pyre.schedulers:jobstart --scheduler-class=pyre.schedulers.SchedulerPBS:SchedulerPBS pylith.apps.PyLithApp:PyLithApp --scheduler.dry --nodes=5 --scheduler.ppn=20 --nodes=5 --macros.nodes=5 --macros.job.name=

comments

[mpich] command: mpirun -np {nodes} -machinefile {PBS_NODEFILE} /opt-ictp/pytlith/2.2.2-1/bin/mpinemesis --pyre-start /opt-ictp/pytlith/2.2.2-1/bin:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib64/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib/python27.zip:/opt-ictp/pytlith/2.2.2-1/lib/python2.7:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/plat-linux2:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/lib-tk:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/lib-old:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/lib-dynload:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/setuptools-39.1.0-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/numpy-1.14.3-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/Cython-0.29.16-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/h5py-2.9.0-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/build/h5py-2.9.0/.eggs/six-1.16.0-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/cftime-1.1.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/netCDF4-1.5.0.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/setuptools-39.1.0-py2.7.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/cftime-1.1.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages/netCDF4-1.5.0.1-py2.7-linux-x86_64.egg:/opt-ictp/pytlith/2.2.2-1/lib64/python2.7/site-packages pythia mpi:mpistart pylith.apps.PyLithApp:PyLithApp --scheduler.dry --nodes=5 --scheduler.ppn=20 --nodes=5 --macros.nodes=5 --macros.job.name= --macros.job.id=None

submit command

qsub < [script]

But I don’t know how to edit this in order to submit the job on the cluster. Can you please direct me on what to do from here in order to submit the job … Thanks.

also with the command line:
command = mpirun -np {nodes} -machinefile {PBS_NODEFILE}

how are the values nodes and PBS_NODEFILE passed to command, it is automatic or I have to manually put some values there. Thanks

baagaard · October 26, 2021, 2:08pm

It looks like your pylithapp.launcher.command needs fixing. It should be something like

[pylithapp.launcher]
command = mpirun -np ${nodes} -machinefile=${PBS_NODEFILE}

When you run --scheduler.dry you should see the variable ${nodes} replaced by the number of compute nodes. ${PBS_NODEFILE} is a PBS variable that should be replaced automatically on job submission when using the PBS scheduler.

If the PBS scheduler works, then just remove --scheduler.dry and the job should be submitted. If you need to make changes to the bash script to get it to work manually copy or capture the generated script to a file, edit it as necessary, and then submit it.

rverw · January 27, 2023, 2:54pm

Hi Brad,

I have a question about this problem. You state that Pyre does not work with SLURM and you need to use the output of the scheduler.dry command to work around this. I did that and right now I can run pylith on 1 node with varying amount of cores, but as soon as I try to increase the amount of nodes I get a GMRES error.
The important warning you give in this answer, does that also apply if you use the workaround for SLURM based systems, as in, I can use multiple cores for my simulations, but never more than 1 node?

I attached my cfg and bash file (as txt)

Thanks!

fdra_greensfns.cfg (3.9 KB)
testpylsmt_N2n16.txt (1.7 KB)

baagaard · January 27, 2023, 3:22pm

It looks like you are trying to use the Linux binary on a cluster. It will not work on a multiple compute nodes of a cluster, because it uses an MPI implementation for use on a single machine. You will need to work with your cluster system administrator to build PyLith from source using the installer. See Installation — PyLith 3.0.3 documentation and Linux clusters — PyLith Installer v3.0.3-0 documentation.

rverw · February 7, 2023, 3:15pm

Thank you! I will try that

Topic		Replies	Views
How to run a task using multiple cores PyLith	9	548	June 24, 2022
Running multiple PyLith simulations on the same machine results in longer runtime PyLith	3	833	February 28, 2019
Error when run pylith on the cluster with PBS PyLith	5	551	June 21, 2021
Errors when building Pylith on cluster PyLith installation	33	1809	December 9, 2021
Error calling pylith within python - _mpi dependency missing from pythia? PyLith	1	35	June 25, 2024

Is it possible to run the binary pylith on slurm cluster with multiple processes?

~~~~ comments ~~~~

~~~~ submit command ~~~~

qsub < [script]

Related topics

comments

submit command