ASPECT 3.1.0: exp_2_high_resolution.prm (Buiter et al. 2016) crashes at timestep 470

Hi all,
I’m running the benchmark model Buiter et al. (2016)exp_2_high_resolution.prm from the ASPECT benchmarks/ folder with ASPECT 3.1.0. The model consistently crashes at timestep 470 with the same error message.

Error:

*** Timestep 470: t=7190.1 seconds, dt=15.3615 seconds

Solving temperature system… retrying linear solve with different preconditioner…

TimerOutput objects finalize timed values printed to the

screen by communicating over MPI in their destructors.
Since an exception is currently uncaught, this
synchronization (and subsequent output) will be skipped
to avoid a possible deadlock.


Exception ‘ExcMessage (exception_message.str())’ on rank 0 on processing:


An error occurred in line <2843> of file </home/aspect/source/utilities.cc> in function
void aspect::Utilities::throw_linear_solver_failure_exception(const string&, const string&, const std::vectordealii::SolverControl&, const std::exception&, MPI_Comm, const string&)
Additional information:
The iterative advection solver in Simulator::solve_advection did not
converge.

The initial residual was: -nan
The final residual is: -nan
The required residual for convergence is: 3.601534e-07
See output-Brittle_thrust_wedge_exp2/solver_history.txt for the full
convergence history.

The solver reported the following error:

--------------------------------------------------------
An error occurred in line <2092> of file
</home/deal.II-v9.7.0/include/deal.II/lac/solver_gmres.h> in
function
void dealii::SolverGMRES<VectorType>::solve(const MatrixType&,
VectorType&, const VectorType&, const PreconditionerType&) [with
MatrixType = dealii::TrilinosWrappers::SparseMatrix;
PreconditionerType = dealii::TrilinosWrappers::PreconditionILU;
VectorType = dealii::TrilinosWrappers::MPI::Vector]
The violated condition was:
iteration_state == SolverControl::success
Additional information:
Iterative method reported convergence failure in step 0. The residual
in the last step was -nan.

This error message can indicate that you have simply not allowed a
sufficiently large number of iterations for your iterative solver to
converge. This often happens when you increase the size of your
problem. In such cases, the last residual will likely still be very
small, and you can make the error go away by increasing the allowed
number of iterations when setting up the SolverControl object that
determines the maximal number of iterations you allow.

The other situation where this error may occur is when your matrix is
not invertible (e.g., your matrix has a null-space), or if you try to
apply the wrong solver to a matrix (e.g., using CG for a matrix that
is not symmetric or not positive definite). In these cases, the
residual in the last iteration is likely going to be large.

What I’m running

  • ASPECT version: 3.1.0

  • Model: benchmarks/buiter_2016/exp_2_high_resolution.prm (unmodified)

  • Command:

    • mpirun -np 16 ~/aspect/build/aspect-release exp_2_high_resolution.prm
  • Mesh/Resolution: as in the high_resolution parameter file

  • Observed failure point: timestep 470 (reproducible)

Environment

  • OS / Distro: Windows 11 - Ubuntu 22.04 / WSL2

What I’ve checked

  • Parameter file is unchanged from the repository

  • Re-ran from scratch; crash repeats at the same timestep

Has anyone seen this failure point with the high-resolution Buiter 2016 experiment on ASPECT 3.1.0? Any hints on parameters or build/runtime settings that could cause a crash at a specific timestep would be greatly appreciated.

Thanks!

— Ramadhan

Hi @adhitama - Thank you for posting the question to the forum and the detailed summary of the issue.

In short, this benchmark was developed quite a few years ago, so not entirely surprising that some change to the code in the ensuing years is now causing it to crash at some stage in its evolution (it is a fairly complex and somewhat fragile model).

My initial guess is that the model was originally run using the AMG solver prior to when the GMG solver became the default for the Stokes system. As the Stokes solver subsection does not specify the solver, the model you ran will use the GMG solver.

Can you try modifying the model to use the AMG solver (see text in bold below) and see if that works? If not, we’ll work through other options to produce reliable convergence and results.

Cheers,

John

subsection Solver parameters
subsection Stokes solver parameters

set Stokes solver type = block AMG
set Linear solver tolerance = 1e-8
set Number of cheap Stokes solver steps = 0

# A higher restart length makes the solver more robust for large viscosity contrasts
set GMRES solver restart length = 200

end
end

Thanks for the guidance, John. I’ve switched the Stokes solver to block AMG as suggested and am re-running the model now. I’ll report back with the results once it completes.

Best,
Ramadhan

I switched to the block AMG Stokes solver and re-ran the case. It successfully passed timestep 470; however, the simulation terminated at timestep 568 with an error. I’ve attached the relevant log snippet for reference.

The initial residual was: -nan
The final residual is: -nan
The required residual for convergence is: 3.924943e-07
See output-Brittle_thrust_wedge_exp2_amg/solver_history.txt for the
full convergence history.

The solver reported the following error:

--------------------------------------------------------
An error occurred in line <2092> of file
</home/deal.II-v9.7.0/include/deal.II/lac/solver_gmres.h> in
function
void dealii::SolverGMRES<VectorType>::solve(const MatrixType&,
VectorType&, const VectorType&, const PreconditionerType&) [with
MatrixType = dealii::TrilinosWrappers::SparseMatrix;
PreconditionerType = dealii::TrilinosWrappers::PreconditionILU;
VectorType = dealii::TrilinosWrappers::MPI::Vector]
The violated condition was:
iteration_state == SolverControl::success
Additional information:
Iterative method reported convergence failure in step 0. The residual
in the last step was -nan.

This error message can indicate that you have simply not allowed a
sufficiently large number of iterations for your iterative solver to
converge. This often happens when you increase the size of your
problem. In such cases, the last residual will likely still be very
small, and you can make the error go away by increasing the allowed
number of iterations when setting up the SolverControl object that
determines the maximal number of iterations you allow.

The other situation where this error may occur is when your matrix is
not invertible (e.g., your matrix has a null-space), or if you try to
apply the wrong solver to a matrix (e.g., using CG for a matrix that
is not symmetric or not positive definite). In these cases, the
residual in the last iteration is likely going to be large.

@adhitama - Apologies for the delay in getting back to you about this failing benchmark. I unfortunately won’t have time to work on this in the short term, but here are a few suggestions for what to try next:

  1. Reduce the CFL number (0.5 → 0.25 → 0.125)
  2. Try using particles instead of DF fields to track the composition. An example of how to do this can be found in this cookbook, but for stability I suggest using set Interpolation scheme = cell average.

As an aside, I’ve created a new issue as a reminder that this benchmark needs to be updated.

John

Hi John — thanks for the follow-up.

Reducing the CFL from 0.5 to 0.25 did the trick: the exp_2_high_resolution.prm run now passes timestep 470 and completes without crashing (ASPECT 3.1.0; Stokes solver = block GMG; other settings unchanged).

Appreciate you opening the issue to track updates to this benchmark. Thanks again!

— Ramadhan

Hi @adhitama,

Excellent! Just to confirm, the benchmark runs successfully with only a change of the CFL value from 0.5 to 0.25 (i.e., a switch to GMG is not also required).

If this is correct, I will submit a patch to update the PRM file.

Cheers,

John