# Initial global refinement not producing the expected number of cells

I noticed a problem when running the viscoelastic bending beam benchmark, that the number of cells was not what I expected it to be. The attached picture shows Paraview with surface with edges visualization, which shows 13 groups of 4 cells, where 16 should exist. I simplified the parameter file significantly to the following, and still am able to reproduce the bug using deal.II 9.3.3(installed using Candi today) and ASPECT main (also from today). Most strange is so far I have only seen the bug when using 64 or more MPI processes. When I ran with 1, 4, 16, 32, or 63 it did not reproduce the error.

A minimal parameter file that still shows the bug

# Global parameters
set Dimension                              = 2
set Start time                             = 0
set Maximum time step                      = 250 # 1e3
set Pressure normalization                 = surface
set Surface pressure                       = 0.
set Output directory = test
set End time                               = 0e3

# Model geometry (7.5x5 km, 0.1 km spacing)
subsection Geometry model
set Model name = box
subsection Box
set X repetitions = 75
set Y repetitions = 50
set X extent      = 7.5e3
set Y extent      = 5e3
end
end

# Mesh refinement specifications
subsection Mesh refinement
set Initial adaptive refinement        = 0
set Initial global refinement          = 1
set Time steps between mesh refinement = 0
end

# Material model
subsection Material model
set Model name = simple
end

# Gravity model with zero gravity
subsection Gravity model
set Model name = vertical
subsection Vertical
set Magnitude = 0.
end
end



The output when running this parameter file. Note that there should be 15,000 cells from the model being 75\cdot50 refined once to be 150\cdot 100 cells, not 14,997.

-----------------------------------------------------------------------------
-- This is ASPECT, the Advanced Solver for Problems in Earth's ConvecTion.
--     . version 2.4.0-pre (main, 229edb808)
--     . using deal.II 9.3.3
--     .       with 32 bit indices and vectorization level 2 (256 bits)
--     . using Trilinos 12.18.1
--     . using p4est 2.3.2
--     . running in DEBUG mode
--     . running with 64 MPI processes
-----------------------------------------------------------------------------

-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.4.0-pre&sha=229edb808&src=code
-----------------------------------------------------------------------------
Number of active cells: 14,997 (on 2 levels)
Number of degrees of freedom: 196,741 (120,994+15,250+60,497)

*** Timestep 0:  t=0 years, dt=0 years
Skipping temperature solve because RHS is zero.
Rebuilding Stokes preconditioner...
Solving Stokes system... 0+0 iterations.

Postprocessing:

Termination requested by criterion: end time

+---------------------------------------------+------------+------------+
| Total wallclock time elapsed since start    |      8.42s |            |
|                                             |            |            |
| Section                         | no. calls |  wall time | % of total |
+---------------------------------+-----------+------------+------------+
| Assemble Stokes system          |         1 |     0.907s |        11% |
| Assemble temperature system     |         1 |      0.42s |         5% |
| Build Stokes preconditioner     |         1 |     0.856s |        10% |
| Initialization                  |         1 |      2.36s |        28% |
| Postprocessing                  |         1 |    0.0116s |      0.14% |
| Setup dof systems               |         1 |     0.554s |       6.6% |
| Setup initial conditions        |         1 |      1.65s |        20% |
| Setup matrices                  |         1 |     0.201s |       2.4% |
| Solve Stokes system             |         1 |    0.0899s |       1.1% |
+---------------------------------+-----------+------------+------------+

-- Total wallclock time elapsed including restarts:8s
-----------------------------------------------------------------------------
-- For information on how to cite ASPECT, see:
--   https://aspect.geodynamics.org/citing.html?ver=2.4.0-pre&sha=229edb808&src=code
-----------------------------------------------------------------------------


Please let me know if this is a known issue of some kind or if you have any suggestions for fixing this behavior.
Thank you,
Mack

I want to emphasize that this issue / bug has nothing to do with the Bending Beam problem. It seems to be a function of MPI processes (> 63 MPI processes sets it off) and possibly the length and height of the domain. (This remains to be explored.) Mack’s accompanying *.prm file has no problem, no material properties, no particles; etc. just a grid. Yet these clusters of cells that are missing 4 ‘sub cells’ each appear immediately; namely at t = 0 . Try it yourself. It takes almost no time to run this *.prm file.

I think that this is a serious issue / bug.

Hi Mack, Hi Gerry,

Thanks for reporting this issue. At first glance, this seems like potentially an MPI-related issue.

Can you describe the configuration of the system you are running - Is this a cluster a large desktop? How many cores per node? Are you using threading? etc?

If it is a cluster, do the results change depending on how many nodes you use? For example, one could not use all the available cores on a node for a given run.

Cheers,
John

This is likely a bug in the refinement and mesh smoothing not working correctly with many MPI ranks. This bug report is very useful to track it down.

I tried fixing this in communicate refinement flags on ghost cells by tjhei · Pull Request #3595 · geodynamics/aspect · GitHub
See Initial global refinement and maximum/minimum refinement function varies with number of cores · Issue #3262 · geodynamics/aspect · GitHub for an old bug report that mentions that the mesh is different depending on the number of MPI ranks.

It might be back because of Do initial global refinement correctly in parallel by jdannberg · Pull Request #4513 · geodynamics/aspect · GitHub but I am not sure.

This has been on my list to fix and use cell flag communication for initial refinement · Issue #4524 · geodynamics/aspect · GitHub might fix it (but I didn’t have time to write it down).

The workaround for you would be to use fewer MPI ranks for now.

I am running on the Peloton cluster at UC Davis. The problem appears when running with all 64 processes on one node, with 64 processes from 2 nodes (each providing 32 processes), with 64 processes from 4 nodes (each providing 16 processes), or as I originally found it with 192 processes from 3 nodes (each providing 64 processes). The problem did not occur when I used 63 processes on one node, nor 62 processes from 2 nodes (each providing 31 processes). The nodes are made from 2 AMD EPYC 7351 16-Core Processors that are hyper threaded. Each of the occurences of the problem on this prm file have been with 14997 cells where there should be 15000 cells. I didn’t check every time that I have ran the problem, but the location of the missing cell seems to be consistent. I am using OpenMPI version 4.1.0 provided through a modulefile by the cluster.

These are from the minimal parameter file I showed with partition visualization.

I don’t think we’ve met, but I’m a professor in EPS at UC Davis and my entire group runs Aspect on Peloton.
Max Rudolph’s group also runs Aspect on Peloton.

In order to run Aspect on the Peloton cluster using MPI-IO you need to set-up your work-space in a particular way
The default set-up with the NFS server is incompatible with mpi-io. Then you also need to use the correct openmpi module when compiling deal.II and Aspect.

You need to do two things:

1. You need to ask HPC support to create a work space for you that is set-up with the same as /group/billengrp-mpi-io
2. You need to compile using mpi-io enabled version of MPI

I’ve cc’d Haoyuan Li and Menno Fraters as they are most familiar with the current modules needed for compiling.
(Also, I just saw that Haoyuan is having a library issue on Peloton when running on more than one node, but that might only be an issue on our nodes, which are separate from the rest of peloton).

Here’s the notes I found in our shared doc, but I don’t know when these were last updated.