Is it possible to run the binary pylith on slurm cluster with multiple processes?

I am having problems installing pylith from source because the dependencies keep crashing during the installation.

For now, I will be happy to be able to run the binary pylith with multiple processes, is there a way to configure the pylithapp file to run on multiple processes using a slurm workload manager instead of PBS as shown in the pylith manual?


The Pyre package which PyLith uses for job submission does not currently have support for the SLURM scheduler. The workaround is to use the LSF or PBS scheduler with the --scheduler.dry command line option. This will print the bash script used to submit a job to stdout. You can edit and save the script and then run it to submit a job to the SLURM scheduler.

IMPORTANT: If you are using the PyLith binary, you can only run on a single node of a cluster. If you submit a job running on multiple nodes, each node will just be running a duplicate version of the job.

We can provide help with building PyLith from source on a cluster, but we need you to provide detailed information about what packages (including version information) the cluster system administrator has installed and that you intend to use (including compilers, MPI, etc). It is important that you use the MPI installed by the system administrator for the cluster in order to make use of the specific interconnect hardware. If an installation using the PyLith installer utility fails, then we need the full configure and make logs to diagnose problems. These include the configure/make logs for the installer as well as the underlying builds that failed.

Dear Baagaard,

I have manage to install pylith on our cluster and actually it seems I can submit a job using PBS. I used

pylith --scheduler.dry --nodes=5 --scheduler.ppn=20

which pylithapp setting

scheduler = pbs
shell = /bin/bash
qsub-options = -V -m bea -M
command = mpirun -np {nodes} -machinefile {PBS_NODEFILE}

to generate the bash script

#PBS -S /bin/bash
#PBS -N jobname
#PBS -o stdout.txt
#PBS -e stderr.txt
#PBS -l nodes=1:ppn=20
#PBS -V -m bea -M

/opt-ictp/pytlith/2.2.2-1/bin/nemesis --pyre-start /opt-ictp/pytlith/2.2.2-1/bin:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib64/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib/ pythia pyre.schedulers:jobstart --scheduler-class=pyre.schedulers.SchedulerPBS:SchedulerPBS pylith.apps.PyLithApp:PyLithApp --scheduler.dry --nodes=5 --scheduler.ppn=20 --nodes=5 --macros.nodes=5

~~~~ comments ~~~~

[mpich] command: mpirun -np {nodes} -machinefile {PBS_NODEFILE} /opt-ictp/pytlith/2.2.2-1/bin/mpinemesis --pyre-start /opt-ictp/pytlith/2.2.2-1/bin:/opt-ictp/pytlith/2.2.2-1/lib/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib64/python2.7/site-packages:/opt-ictp/pytlith/2.2.2-1/lib/ pythia mpi:mpistart pylith.apps.PyLithApp:PyLithApp --scheduler.dry --nodes=5 --scheduler.ppn=20 --nodes=5 --macros.nodes=5

~~~~ submit command ~~~~

qsub < [script]

But I don’t know how to edit this in order to submit the job on the cluster. Can you please direct me on what to do from here in order to submit the job … Thanks.

also with the command line:
command = mpirun -np {nodes} -machinefile {PBS_NODEFILE}

how are the values nodes and PBS_NODEFILE passed to command, it is automatic or I have to manually put some values there. Thanks

It looks like your pylithapp.launcher.command needs fixing. It should be something like

command = mpirun -np ${nodes} -machinefile=${PBS_NODEFILE}

When you run --scheduler.dry you should see the variable ${nodes} replaced by the number of compute nodes. ${PBS_NODEFILE} is a PBS variable that should be replaced automatically on job submission when using the PBS scheduler.

If the PBS scheduler works, then just remove --scheduler.dry and the job should be submitted. If you need to make changes to the bash script to get it to work manually copy or capture the generated script to a file, edit it as necessary, and then submit it.