I’m running a global mantle convection model using ASPECT. However, the program is always suspended during assembing advection system. Specifically, the program ceased to proceed with calculations while failing to terminate automatically or return any error messages. I have consulted the supercomputer administrator, yet the underlying cause of this anomaly has not been identified.
To pinpoint the cause of the issue , I inserted additional diagnostic statements into the code and ultimately identified that the program became stuck in the assemble_advection_system function. The relevant code snippet is provided as follows:
What cluster is this? Do you have a choice of a different MPI installation (different version or vendor)? This sounds like a hardware/software MPI issue and not something that is incorrect inside ASPECT.
I use 3 supercomputers, with cpu AMD EYPC 7742, Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz, and Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz. The MPIs are mpi/openmpi/openmpi-4.1.5-gcc9.3.0, mpi/intel/18.0.2-thc, and mpich-4.1.2-gcc-9.4.0-y4npw2j, respectively. However, this problem occurs frequently in all of the 3 supercomputers, especially these days. In addition, the supercomputer administrators did not detect any anomalies when this issue occurred.
I note this in ASPECT documentation. Let me try the latest version.
Fixed: ASPECT used several features that could fail in MPI communication in case an assert was triggered. The model would end up in a communication deadlock (a model that hangs without output) instead of correctly crashing and producing an error message. This was fixed.
(Rene Gassmoeller, 2025/09/16)