Question on setting up Stokes solver parameters using Aspect 3.1.0-pre, solver fails to converge

Good afternoon all,

I have some simulations that I previously ran successfully using Aspect 2.5.0 and I am now trying to run them using Aspect 3.1.0-pre. I am encountering an issue where the Stokes solver fails to converge on the first ‘expensive’ solver step, after a given number of ‘cheap’ solver steps. The simulations are global mantle convection simulations in 3D, set up similarly to those in Goldberg & Holt 2024 (https://doi.org/10.1029/2023GC011134), though with different input .txt files for the temperature initial condition and the composition, and with no adaptive refinement based on temperature or viscosity, only compositional fields (which I use to define regions close to subduction zones for high refinement). I’m using the ‘visco plastic’ material model, and I have a compositional field that defines plate boundaries so that we can impose a weak viscosity layer through the material model.

Here is part of the error message I get (full stacktrace and solver history are attached).

```

An error occurred in line <2844> of file </work2/10752/ghobson/stampede3/software/aspect/source/utilities.cc> in function

void aspect::Utilities::throw_linear_solver_failure_exception(const std::string &, const std::string &, const std::vector<SolverControl> &, const std::exception &, const MPI_Comm, const std::string &)

The violated condition was:

false

Additional information:

The iterative Stokes solver in

StokesMatrixFreeHandlerLocalSmoothingImplementation::solve did not

converge.



The initial residual was: 1.094939e+20

The final residual is: 1.082211e+19

The required residual for convergence is: 1.094939e+17

See

/scratch/10752/ghobson/Subduction_Model_3D/lumi_run/solver_history.txt

for the full convergence history.



The solver reported the following error:



--------------------------------------------------------

An error occurred in line <2844> of file

</work2/10752/ghobson/stampede3/software/aspect/source/utilities.cc>

in function

void aspect::Utilities::throw_linear_solver_failure_exception(const

std::string &, const std::string &, const std::vector<SolverControl>

&, const std::exception &, const MPI_Comm, const std::string &)

The violated condition was:

false

Additional information:

The iterative (bottom right) solver in

BlockSchurGMGPreconditioner::vmult did not converge.



The initial residual was: 9.996222e-01

The final residual is: 2.212388e-04

The required residual for convergence is: 9.996222e-07



The solver reported the following error:



--------------------------------------------------------

An error occurred in line <1338> of file

</work2/10752/ghobson/stampede3/software/deal.II-v9.5.2/deal.II-v9.5.2/include/deal.II/lac/solver_cg.h>

in function

void

dealii::SolverCG<dealii::LinearAlgebra::distributed::Vector<double,

dealii::MemorySpace::Host>>::solve(const MatrixType &, VectorType &,

const VectorType &, const PreconditionerType &) \[VectorType =

dealii::LinearAlgebra::distributed::Vector<double,

dealii::MemorySpace::Host>, MatrixType =

aspect::MatrixFreeStokesOperators::MassMatrixOperator<3, 1,

GMGNumberType>, PreconditionerType = dealii::PreconditionMG<3,

dealii::LinearAlgebra::distributed::Vector<double,

dealii::MemorySpace::Host>, dealii::MGTransferMatrixFree<3, double>>\]

The violated condition was:

solver_state == SolverControl::success

Additional information:

Iterative method reported convergence failure in step 100. The

residual in the last step was 0.000221239.



This error message can indicate that you have simply not allowed a

sufficiently large number of iterations for your iterative solver to

converge. This often happens when you increase the size of your

problem. In such cases, the last residual will likely still be very

small, and you can make the error go away by increasing the allowed

number of iterations when setting up the SolverControl object that

determines the maximal number of iterations you allow.



The other situation where this error may occur is when your matrix is

not invertible (e.g., your matrix has a null-space), or if you try to

apply the wrong solver to a matrix (e.g., using CG for a matrix that

is not symmetric or not positive definite). In these cases, the

residual in the last iteration is likely going to be large.

```

And here is the part of the parameter file where I set up my solver parameters:

```

set Nonlinear solver scheme = single Advection, iterated Stokes

set Nonlinear solver tolerance = 1.0e-2

set Max nonlinear iterations = 30

subsection Solver parameters

subsection Stokes solver parameters

set Linear solver A block tolerance = 1e-1

set Linear solver tolerance = 1e-3

set Maximum number of expensive Stokes solver steps = 4000

set Number of cheap Stokes solver steps = 300

set GMRES solver restart length = 200

end

set Temperature solver tolerance = 1e-7

set Composition solver tolerance = 1e-7

end

```

I am wondering if there is either (a) some difference between 2.5 and 3.1 in how I should set up my solver parameters, or (b) something wrong with my Aspect installation.

I am running these simulations on Stampede3 using version 3.1.0-pre (main, 513c8574f), deal.II 9.5.2, Trilinos 15.0.0, p4est 2.8.5, Geodynamic World Builder 1.0.0. I’m running in release mode on one node with 48 MPI processes, but I have run a bigger problem (460 million DOFs on 8 nodes) and encountered the same issue. I have also attached here the steps I took to install Aspect 3 on Stampede3, in case that is helpful.

If anyone has advice for how I should investigate this, I would really appreciate it.

Best regards,

Gabrielle Hobson

ASPECT_Installation_on_Stampede3.txt (9.2 KB)

solver_history.txt (4.6 KB)

lumi.e2254169.txt (15.5 KB)

lumi.o2254169.txt (2.5 KB)

simulation.prm (7.3 KB)

Gabrielle,

You are using the GMG solver. There is a known issue that sometimes occurs when ASPECT is switching from cheap to expensive solves: GMG with DC Picard diverges · Issue #4563 · geodynamics/aspect · GitHub

Therefore, I would suggest to either use 0 cheap iterations or 0 expensive iterations and set the other value to some large number. Can you check if this fixes your issue?

Generally, I would suggest to only use cheap GMG iterations, but that might fail for very hard problems (large viscosity contrasts).

Finally, can you enable Solver/Matrix Free/Output details Solver parameters — ASPECT 3.1.0-pre ? This gives prints some additional information that might be helpful to diagnose things.

Hi @ghobson,

A quick follow up on @tjhei message. First, thank you for providing all of the files and the well formed question.

As @tjhei noted, the issue is definitely with the transition from the cheap to expensive Stokes solver steps, but I believe this is the first time we have seen that with the regular (not defect correction or Newton) stokes solver.

As to what has changed between 2.5 and the current version, there have been various bug fix and minor improvements to relevant parts of the code (Visco Plastic material model, etc), but nothing fundamentally different that I can recall. It is certainly possible that a small fix or change to the material model or GMG solver could lead to different requirements for the solver parameters.

For the range of viscosity contrast you are considering, it almost always fastest to use the Cheap Stokes solves with the GMG (or AMG solvers). I typically use assign set Number of cheap Stokes solver steps a value of 2000 or 4000 to make sure it never gets to the expensive Stokes iterations.

Please let us know if the suggestions help resolves the issue or if more questions/issues come up!

Cheers,

John

Hi Timo @tjhei and John @jbnaliboff ,

Thank you both for your replies.

I have enabled the parameter Solver/Matrix Free/Output details. This prints a few additional pieces of information (shown below), but does not show the residuals as the iterations progress. The file solver_history.txt only gets written if the solver fails to converge (or is successful, presumably) - it isn’t helping me when jobs are running or have timed out. It would be very helpful to see the progress of the residuals to determine whether to keep increasing the number of cheap solves, for example. Is there some other way I can enable this?

```

*** Timestep 0:  t=0 years, dt=0 years
Solving temperature system… 0 iterations.
Solving refine_near system … 0 iterations.
Solving refine_far system … 0 iterations.
Solving plbd system … 0 iterations.
Solving Stokes system (GMG)…
GMG coarse size A: 2910, coarse size S: 150

GMG n_levels: 9

Viscosity range: 1.001e+19 - 1e+24

GMG workload imbalance: 3.10084

Stokes solver: 300+

```

As you suggested, I have increased the number of cheap solves from 300 to 2000, and set the number of expensive solves to 0. This also failed to converge. The solver_history.txt shows a residual of 1.43957e+17 after 2000 iterations, down from 1.09494e+20 initially.

I then increased the number of cheap solves to 4000. The solver_history.txt shows a residual of 1.43337e+17 after 4000 iterations (file is attached). Since this is not a substantial decrease relative to the 2000th iteration, I’m not confident that continuing to increase the number of cheap solves is a feasible approach. Not least because the problem I’m testing on here has 22 million DOFs, but the actual problem I’d like to solve for my study has ~460 million DOFs.

I have queued a job with the number of cheap solves set to 0 and the number of expensive solves set to 400. It’s spending a while in the queue - I’ll update here when that job has run.

When I ran this same problem back in November 2024 on a different machine, it typically did the 300 cheap solves and then ~100 expensive solves, or fewer at subsequent timesteps.

I do have strong viscosity contrasts, because I’m trying to resolve plate boundaries and subduction zones, with a weak viscosity zone to allow decoupling.

I’ve read these Github issue threads: https://github.com/geodynamics/aspect/issues/4563, https://github.com/geodynamics/aspect/issues/6002, https://github.com/geodynamics/aspect/issues/6046, but I still don’t understand what the root cause of the issue is that is preventing switching from cheap to expensive solves. Does anyone have further information or thoughts about this? I’m concerned that it may not be possible to run these simulations without switching.

If you have further ideas of things I can test, please let me know. I appreciate your help. And I’ll continue updating with any other information I get.

Best regards,

Gabrielle

Hello again,

Just adding the promised information to my previous message: the test with 0 cheap solves and 400 expensive solves also failed. The solver_history.txt file has only 1 line in it:

0 1.09494e+20 

The last part of the log file looks like this:

*** Timestep 0:  t=0 years, dt=0 years
   Solving temperature system... 0 iterations.
   Solving refine_near system ... 0 iterations.
   Solving refine_far system ... 0 iterations.
   Solving plbd system ... 0 iterations.
   Solving Stokes system (GMG)... 
    GMG coarse size A: 2910, coarse size S: 150
    GMG n_levels: 9
    Viscosity range: 1.001e+19 - 1e+24
    GMG workload imbalance: 3.10084
    Stokes solver: 0+

And the error output is attached to this message.

I’m confused by why it is failing after 1 iteration and not continuing to do another 399. Does anyone know why that is?

error_output.txt (15.5 KB)

Best,

Gabrielle

Hi @ghobson,

Thank you for quickly following up on all of this. After a quick glance, it looks like on of the inner Stokes solvers did not converge, but @tjhei will be able to give a fully informed reply.

However, after looking at your PRM file again I’m a bit surprised the model is having this much trouble converging. The viscosity contrasts are never more than 4 orders of magnitude, which is normally fine for the GMG solver.

I won’t have a chance to look at this problem in details until late tomorrow, so in the meantime here a few ideas for things to test in the short term:

  1. Try using set Nonlinear scheme = no Advection, no Stokes, which will setup the initial conditions and run the post-processors. The strain rate provided to the material model is the reference value used on the first nonlinear iteration of the first time step (default is 1e-15). This will at least allow you to see if anything looks off in the initial conditions.
  2. Following on point 1, it might be worth trying a different reference strain rate value if 1e-15 to see how that affects convergence.
  3. The Material averaging parameter is currently not defined in the material model section (this is outside of Visco Plastic, would go right under subsection Material model) , which means that it is using the default averaging scheme. Off the top of my head I’m not sure what the default is, but I suggest trying set Material averaging = harmonic average or set Material averaging = geometric average.
  4. I see you have already modified the GMRES solver length to 200 for the Stokes system, but perhaps try 300 or 400 to see how much farther you get with the cheap stokes iterations?
  5. Within the subsection Stokes solver parameters the parameter Use full A block as preconditioner is not defined, which means it has the default value of false. Using the full A block preconditioner helps quite a bit for models with large viscosity contrasts, so I suggest setting this parameter to true.

I’ll try to come back to this sometime tomorrow, but please let us know if any of the above changes help improve convergence behavior. Thank you again for your patience and we should definitely be able to improve the convergence behavior in this model.

Cheers,

John