Dear developers,
I am facing a runtime issue with ASPECT on a new HPC cluster (Cosma) and would appreciate any insight. Below I describe the installation history and the problem in detail.
1. Initial problem: Trilinos build failure in Candi
I am installing deal.II and ASPECT using Candi. During the Trilinos configuration step, I encountered the following error:
“Tpetra: Tpetra_INST_FLOAT is ON, but HAVE_TEUCHOS_BLASFLOAT is OFF.
This means that you are linking with a BLAS library that lacks float (S) support.
Tpetra needs a BLAS implementation that supports float.
– Configuring incomplete, errors occurred!
Failure with exit status: 1”
So Trilinos could not be configured because the BLAS library available on the cluster does not provide float precision support.
2. Workaround suggested by cluster support
The cluster support team suggested modifying the Candi Trilinos configuration by disabling float instantiations:
“-D Trilinos_ENABLE_FLOAT:BOOL=OFF” instead -D Trilinos_ENABLE_FLOAT:BOOL=ON
They also suggested some additional changes in PETSc related to 64-bit indices and MPI compiler settings.
After applying these changes and rebuilding from a clean clone, I was able to successfully install:
Trilinos,deal.II, ASPECT
3. New problem: ASPECT runtime crash (Floating Point Exception)
Although compilation succeeded, ASPECT crashes at runtime with a floating point exception.
The simulation starts normally, creates the output directory, and prints mesh and DoF information:
“– This is ASPECT –
– The Advanced Solver for Planetary Evolution, Convection, and Tectonics. –
– . version 3.1.0-pre (main, 0b101bb19)
– . using deal.II 9.7.0
– . with 32 bit indices
– . with vectorization level 3 (AVX512, 8 doubles, 512 bits)
– . using Trilinos 16.1.0
– . using p4est 2.3.6
– . using Geodynamic World Builder 1.0.0
– . running in DEBUG mode
– . running with 1 MPI process
The output directory <output-continental_extension/> provided in the input file appears not to exist.
ASPECT will create it for you.
– For information on how to cite ASPECT, see:
– The ASPECT mantle convection code: How to cite?
Number of active cells: 3,200 (on 4 levels)
Number of degrees of freedom: 107,649 (26,082+3,321+13,041+13,041+13,041+13,041+13,041+13,041)
Number of mesh deformation degrees of freedom: 6,642”
With job crashes
“[m5001:2709721:0:2709721] Caught signal 8 (Floating point exception: floating-point invalid operation)
==== backtrace (tid:2709721) ====
0 0x000000000003ebf0 _GI___sigaction() :0
1 0x0000000000004e85 ddot() ???:0
2 0x0000000000194b03 ML_gdot() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Utils/ml_utils.c:1600
3 0x000000000013eb13 ML_CG_ComputeEigenvalues() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Krylov/ml_cg.c:316
4 0x0000000000143de0 ML_Krylov_Solve() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Krylov/ml_krylov.c:358
5 0x00000000000a5fee ML_AGG_Gen_Prolongator() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Coarsen/ml_agg_genP.c:511
6 0x00000000000a98ef ML_MultiLevel_Gen_Prolongator() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Coarsen/ml_agg_genP.c:3594
7 0x00000000000a109c ML_Gen_MultiLevelHierarchy() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Coarsen/ml_agg_genP.c:3153
8 0x00000000000a3f24 ML_Gen_MultiLevelHierarchy_UsingAggregation() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Coarsen/ml_agg_genP.c:2994
9 0x00000000001e3c2c ML_Epetra::MultiLevelPreconditioner::ComputePreconditioner() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Utils/ml_MultiLevelPreconditioner.cpp:2413
10 0x00000000001e71aa ML_Epetra::MultiLevelPreconditioner::MultiLevelPreconditioner() /cosma/apps/durham/dc-roy3/softwares/candi_9.5.1-r1/candi/build/tmp/unpack/Trilinos-trilinos-release-16-1-0/packages/ml/src/Utils/ml_MultiLevelPreconditioner.cpp:356
11 0x000000001b685c62 dealii::TrilinosWrappers::PreconditionAMG::initialize() ???:0
12 0x000000001b685a5d dealii::TrilinosWrappers::PreconditionAMG::initialize() ???:0
13 0x000000001b6859db dealii::TrilinosWrappers::PreconditionAMG::initialize() ???:0
14 0x00000000047059f9 aspect::MeshDeformation::MeshDeformationHandler<2>::compute_mesh_displacements() ???:0
15 0x0000000004700e5f aspect::MeshDeformation::MeshDeformationHandler<2>::setup_dofs() ???:0
16 0x0000000003eed63d aspect::Simulator<2>::setup_dofs() ???:0
17 0x0000000003ef0549 aspect::Simulator<2>::run() ???:0”
I would like to ask:
-
Is disabling
Trilinos_ENABLE_FLOATcompatible with deal.II and ASPECT? -
Could the floating point exception be caused by this modification in Trilinos?
-
Is there any wayout to install Deal.ii and ASPECT ?
Any leads would be really appreciated!
Best regards,
Poulami Roy