[RESOLVED] Issue Decomposing Very Large Mesh




I am attempting to decompose a very large mesh (22M nodes) to run on 768 cores, but consistently receive a fortran segmentation fault. When run with "–debug-enabled” configured, I receive the message:

“At line 116 of file src/decompose_mesh/part_decompose_mesh.F90
Fortran runtime error: Index ‘-2088155680’ of dimension 1 of array ‘adjncy’ below lower bound of 0”

From what I gather, this occurs when decompose-mesh is checking which elements are neighbors. I figured the error was related to a memory issue, so I reconfigured using the “-mcmodel=medium” flag, but I am still receiving the same error.

Has anyone encountered this error before, or do you know how to avoid it? For reference, I compiled using gcc version 7.1.0.

Thank you,

Ian Stone


So I think I was able to answer my own question. I traced the error to the mesh2dual_ncommonnodes subroutine of the part_decompose_mesh.f90 script. This subroutine is invoked when decompose_mesh is trying to figure out the maximum number of neighbors a node has in the mesh. To do this, the code purposely over-estimates the maximum number of neighbors a node will have (corresponding to the “sup_neighbor” variable), and then creates an array that has sup_neighbor number of entries for each node; this array, “adjncy”, has a lengh of sup_neighbor*nnodes. Not surprisingly, this array is extremely large for my mesh (>2B rows). This exceeds the stack size of the computer and causes decompose_mesh to crash.

To get around this, I set an upperbound size for sup_neighbor. It had been calculated to be 102, though I imposed a rule such that if sup_neighbor>100, then sup_neighbor=90. This allocates enough room for the mesh, while also keeping the size of adjncy below 2B rows. I knew from previous tests with smaller meshes (with the same meshing scheme, i.e., same mesh size and triplication scheme) that the max_neighbor value is closer to 54, so I could even make the sup_neighbor value smaller if I wanted to, assuming the meshing scheme is the same. For instance, I may have to reduce the sup_neighbor value more if I decompose a much larger mesh.