Running multiple PyLith simulations on the same machine results in longer runtime


Hi all,

You know when running pylith, we usually use multiple cores to get faster speed. And now we have trouble running multiple pylith projects.

For example:

If one person runs a pylith projects with “–nodes = 8” on a server(whose CPU has 72 cores in total). He needs 1 hour to finish the simulation.
But if there is also another person running pylith with “–nodes = 8” on the server in the meantime, both will spend twice the time as before – like 2 hours – even if there are 72 cores in total.

I’m wondering can several pylith projects run at the same time without slowing done the calculating speed when there are enough CPU cores?

Hope someone would like to give me some suggestions.

Best regards


The runtime for most finite-element and finite-different calculations, including PyLith, are controlled by the memory bandwidth (number of memory sockets and speed of the memory bus) not the number of cores. When running a single simulation, once you saturate the memory bus for each socket, using more cores will not decrease the runtime; the runtime usually increases a little due to the increased communication overhead associated with using more cores. Similarly, with multiple simulations running simultaneously, once the memory bus is saturated, the simulations will compete for it, slowing each other down.

See the PETSc FAQ topic “What kind of parallel computers or clusters are needed to use PETSc? Or why do I get little speedup?” for more information and how to estimate the scaling with the number of cores. Also consult the documentation for your MPI implementation to find any flags that help control how the processes are distributed among the cores on the machine.


As Brad mentioned, your MPI implementation may have some flags that could help. For example, one of my pylithapp.cfg files has the following:


command = mpirun -np ${nodes} --bind-to core --map-by socket --report-bindings --mca mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0

The important part of this command (specific to OpenMPI) is the —bind-to core flag, which locks each process onto a specific core, rather than switching between cores, which is what generally happens. In some cases, this may improve the performance.



Charles Williams I Geodynamic Modeler
GNS Science I Te
Pῡ Ao

1 Fairway Drive, Avalon 5010, PO Box 30368, Lower Hutt 5040, New Zealand

Ph 0064-4-570-4566 I Mob 0064-22-350-7326 I Fax 0064-4-570-4600 I **Email: **


Thank you very much, I’ll try that and see if it works.

Best regards