Stuck when running some cookbooks with >1 MPI processes and release build

Hello! I recently got into using ASPECT and been testing different cookbook examples since that. Most of them work just fine with any execution configuration but at least cookbook example crustal_model_2D.rpm gets stuck in the beginning while trying run it using release build and more than one MPI processes (which are physical CPU cores in my case).

By getting stuck I mean it never gets past initial output telling ASPECT version, number of MPI processes etc. However, debug build executable works as expected with crustal_model_2D.rpm using any given number of MPI processes. Also, release build works with other examples that I have tried, with more one than one MPI processes as well.

I suspect this might be related to the issue mentioned in ASPECT manual section 4.5.3, though output files are rather small. I didn’t find other tips about that than setting Number of grouped files to 1 but it didn’t work out either. Our cluster is using Infiniband interconnect with MVAPICH2 implementation of MPI, and 16 CPU cores per a node, if that matters.

Got into that issue just today, so I’m gonna keep investigating this issue but anyone have any (known) solutions or tips, I appreciate to hear them.


Can you show us your detailed.log in the ASPECT build directory and the output you get from ASPECT (to see where it is stuck)?

Sure. When it gets stuck, the output is nothing but

-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
--     . version 2.3.0
--     . using deal.II 9.3.3
--     .       with 32 bit indices and vectorization level 1 (128 bits)
--     . using Trilinos 12.18.1
--     . using p4est 2.3.2
--     . running in OPTIMIZED mode
--     . running with 2 MPI processes

detailed.log for release build:

#  ASPECT configuration:
#        ASPECT_VERSION:            2.3.0
#        GIT REVISION:               ()
#        CMAKE_BUILD_TYPE:          Release
#        DEAL_II_DIR:               /data/home/leevi/software/dealii-candi/deal.II-v9.3.3/lib/cmake/deal.II
#        DEAL_II VERSION:           9.3.3
#        ASPECT_USE_PETSC:          OFF
#        ASPECT_HAVE_LINK_H:        ON
#        ASPECT_WITH_LIBDAP:        OFF
#        ASPECT_WITH_WORLD_BUILDER: ON /data/home/leevi/aspect-2.3.0/contrib/world_builder
#        ASPECT_UNITY_BUILD:        ON
#        CMAKE_INSTALL_PREFIX:      /usr/local
#        CMAKE_SOURCE_DIR:          /data/home/leevi/aspect-2.3.0
#        CMAKE_BINARY_DIR:          /data/home/leevi/aspect-2.3.0/build
#        CMAKE_CXX_COMPILER:        Intel on platform Linux x86_64
#                                   /apps/local/mvapich2-2.3.7/bin/mpicxx
#        LINKAGE:                   DYNAMIC
#        COMPILE_FLAGS:             
#        _WITH_CXX14:               ON
#        _WITH_CXX17:               FALSE
#        _MPI_VERSION:              3.1
#        _WITH_64BIT_INDICES:       OFF

…and for debug build:

#  ASPECT configuration:
#        ASPECT_VERSION:            2.3.0
#        GIT REVISION:               ()
#        CMAKE_BUILD_TYPE:          Debug
#        DEAL_II_DIR:               /data/home/leevi/software/dealii-candi/deal.II-v9.3.3/lib/cmake/deal.II
#        DEAL_II VERSION:           9.3.3
#        ASPECT_USE_PETSC:          OFF
#        ASPECT_HAVE_LINK_H:        ON
#        ASPECT_WITH_LIBDAP:        OFF
#        ASPECT_WITH_WORLD_BUILDER: ON /data/home/leevi/aspect-2.3.0/contrib/world_builder
#        ASPECT_UNITY_BUILD:        ON
#        CMAKE_INSTALL_PREFIX:      /usr/local
#        CMAKE_SOURCE_DIR:          /data/home/leevi/aspect-2.3.0
#        CMAKE_BINARY_DIR:          /data/home/leevi/aspect-2.3.0/build
#        CMAKE_CXX_COMPILER:        Intel on platform Linux x86_64
#                                   /apps/local/mvapich2-2.3.7/bin/mpicxx
#        LINKAGE:                   DYNAMIC
#        COMPILE_FLAGS:             
#        _WITH_CXX14:               ON
#        _WITH_CXX17:               FALSE
#        _MPI_VERSION:              3.1
#        _WITH_64BIT_INDICES:       OFF

Nothing looks wrong on first glance. Does mpirun -n 2 ./aspect -v exit correctly after printing the header? Are there different filesystems you can try to run on available on your machine? Did you try the “convection-box.prm” cookbook?

Okay, I got it working. The solution was increasing X repetitions and defining Y repetitions. Values of both parameters must be also increased in order to get crustal_model_2D running with greater amount of MPI processes.

So I believe this is fundamentally hardware issue, i.e. too much offset between interconnect and disks. And too coarse grid somehow then gets MPI communication stuck.