I am having problems installing pylith from source because the dependencies keep crashing during the installation.
For now, I will be happy to be able to run the binary pylith with multiple processes, is there a way to configure the pylithapp file to run on multiple processes using a slurm workload manager instead of PBS as shown in the pylith manual?
The Pyre package which PyLith uses for job submission does not currently have support for the SLURM scheduler. The workaround is to use the LSF or PBS scheduler with the --scheduler.dry command line option. This will print the bash script used to submit a job to stdout. You can edit and save the script and then run it to submit a job to the SLURM scheduler.
IMPORTANT: If you are using the PyLith binary, you can only run on a single node of a cluster. If you submit a job running on multiple nodes, each node will just be running a duplicate version of the job.
We can provide help with building PyLith from source on a cluster, but we need you to provide detailed information about what packages (including version information) the cluster system administrator has installed and that you intend to use (including compilers, MPI, etc). It is important that you use the MPI installed by the system administrator for the cluster in order to make use of the specific interconnect hardware. If an installation using the PyLith installer utility fails, then we need the full configure and make logs to diagnose problems. These include the configure/make logs for the installer as well as the underlying builds that failed.
But I donβt know how to edit this in order to submit the job on the cluster. Can you please direct me on what to do from here in order to submit the job β¦ Thanks.
also with the command line:
command = mpirun -np {nodes} -machinefile {PBS_NODEFILE}
how are the values nodes and PBS_NODEFILE passed to command, it is automatic or I have to manually put some values there. Thanks
When you run --scheduler.dry you should see the variable ${nodes} replaced by the number of compute nodes. ${PBS_NODEFILE} is a PBS variable that should be replaced automatically on job submission when using the PBS scheduler.
If the PBS scheduler works, then just remove --scheduler.dry and the job should be submitted. If you need to make changes to the bash script to get it to work manually copy or capture the generated script to a file, edit it as necessary, and then submit it.
I have a question about this problem. You state that Pyre does not work with SLURM and you need to use the output of the scheduler.dry command to work around this. I did that and right now I can run pylith on 1 node with varying amount of cores, but as soon as I try to increase the amount of nodes I get a GMRES error.
The important warning you give in this answer, does that also apply if you use the workaround for SLURM based systems, as in, I can use multiple cores for my simulations, but never more than 1 node?
It looks like you are trying to use the Linux binary on a cluster. It will not work on a multiple compute nodes of a cluster, because it uses an MPI implementation for use on a single machine. You will need to work with your cluster system administrator to build PyLith from source using the installer. See Installation β PyLith 3.0.3 documentation and Linux clusters β PyLith Installer v3.0.3-0 documentation.