Hi,
I am running 3D subduction models in a server using mpirun -np 32 . It works very well and relatively fast (3.2 Myr in 16 hours) but at the 101 timestep, it is killed and I don’t understand the reason. Below are the last lines of the nohup.out file. Thanks in advance for your help

Postprocessing:
RMS, max velocity: 0.00813 m/year, 0.0719 m/year
Temperature min/avg/max: 293 K, 1525 K, 1652 K
Heat fluxes through boundary parts: 0 W, 0 W, 0 W, 0 W, 0.003238 W, 6.821e+10 W
Topography min/max: 0 m, 0 m
Number of advected particles: 41

Number of active cells: 920,487 (on 8 levels)
Number of degrees of freedom: 50,298,427 (24,615,201+1,068,025+8,205,067+8,205,067+8,205,067)

*** Timestep 101: t=3.19255e+06 years, dt=35871.7 years
Solving temperature system… 1 iterations.
Solving oceanic_crust system … 4 iterations.
Solving continental_crust system … 3 iterations.
Rebuilding Stokes preconditioner…
Solving Stokes system… --------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

mpirun noticed that process rank 15 with PID 0 on node winterc exited on signal 9 (Killed).

Can you post the actual error message? If you ran this job on a cluster with a submission system, there’s usually a separate error or log file that shows why the job terminated. That file will tell you what caused the model to crash (for example, maybe the Stokes solver did not converge).

Hi Juliane
Thank you for your reply. In our little server I am not running the simulation in a queue but directly in my user with nohup mpirun -np 34 …
I have checked the log.txt file created in the output directory created for the solution and it does not provide any information about any possible error (convergence…) . Perhaps I would get more information running in the debug mode instead of release? These are the last 25 lines of log.txt file:
*** Timestep 100: t=3.15668e+06 years, dt=35776.4 years
Solving temperature system… 1 iterations.
Solving oceanic_crust system … 4 iterations.
Solving continental_crust system … 3 iterations.
Rebuilding Stokes preconditioner…
Solving Stokes system… 174+0 iterations.

Postprocessing:
RMS, max velocity: 0.00813 m/year, 0.0719 m/year
Temperature min/avg/max: 293 K, 1525 K, 1652 K
Heat fluxes through boundary parts: 0 W, 0 W, 0 W, 0 W, 0.003238 W, 6.821e+10 W
Topography min/max: 0 m, 0 m
Number of advected particles: 41

Number of active cells: 920,487 (on 8 levels)
Number of degrees of freedom: 50,298,427 (24,615,201+1,068,025+8,205,067+8,205,067+8,205,067)

*** Timestep 101: t=3.19255e+06 years, dt=35871.7 years
Solving temperature system… 1 iterations.
Solving oceanic_crust system … 4 iterations.
Solving continental_crust system … 3 iterations.
Rebuilding Stokes preconditioner…
Solving Stokes system…

Ana: Yes, if you encounter an error, always try again in debug mode. This might be painful in your case because debug mode is so much slower, but it is quite likely (I’m hoping at least) that you will actually get to see what the underlying error is.
Best
W.

Given that Debug mode often takes about 10 times longer than release mode, and you mention that the model already took 16 hours in release mode, this might take quite some time.

So an alternative solution I would suggest to try first: Restart from your last checkpoint (I hope you used checkppointing?) and run Aspect without nohup. That will hopefully only take a few minutes, and I hope this will give you the actual error message (which then may or may not be useful, but at least you don’t have to wait for a week…).

Hi Juliane,
Thank you. I did as you say but still didn’t obtain a clear error message. I think I am having a problem with the lack of RAM memory, as I have more than 50 million DOF with my 3(global) x 4(adaptive) mesh refinement, while it worked very well with a less dense grid 3(global) x 3(adaptive). I’ve now tried 2(global) x 4(adaptive) and it works right. Just a quick question. If I understand correctly the manual, when using a non linear rheology (viscoplastic) I need to use an iterative stokes solver, and since I have strong viscosity contrasts (weak zones to decouple the 3 plates) it is better to use 0 cheap stokes steps. Is this right?
Thanks again
Ana

Ana:
Yes, memory issues might be the cause. If you have 50M unknowns on 34 MPI processes, then that means ~1.5M unknowns per process. It takes somewhere in the range of 3-5kB per unknown, so you end up with needing 4-7 GB per process. It would not surprise me if your machines do not have that much: Most clusters have either 2 or 4 GB per processor core. That would also explain the output you see: It happens in the first step after a mesh refinement operation (in which I assume that the number of degrees of freedom has increased), and the job is terminated at the first operation where we allocate memory above what the machine has.

If you have models this large, you just need to use more MPI processes. I think a good number of processes to shoot for is so that every process has in the range of 100,000 to 200,000 DoFs.

If you have models this large, you just need to use more MPI processes. I think a good number of processes to shoot for is so that every process has in the range of 100,000 to 200,000 DoFs.

I agree with Wolfgang’s suggestion -300,000 DoFs per MPI process is normally a rough maximum I use, and 100,000-200,000 is the ideal range to shoot for.

If I understand correctly the manual, when using a non linear rheology (viscoplastic) I need to use an iterative stokes solver, and since I have strong viscosity contrasts (weak zones to decouple the 3 plates) it is better to use 0 cheap stokes steps. Is this right?

Yes, if you are using a non-linear rheology (dislocation creep, Peierls creep, Plasticity) an iterative Stokes solver is necessary to accurately solve the system of equations. We have found that using no cheap Stokes steps for the linear solver is fastest if pressure-dependent plasticity is used (i.e, if friction angle > 0), but I suggest running a few tests to see what number of cheap Stokes solves produces the fastest run times!

Also, I suggest using the non-linear solver scheme, which employs the defect correction Picard method: set Nonlinear solver scheme = single Advection, iterated defect correction Stokes

You can also potentially get a fair bit of speedup by using the Eisenstat Walker method in combination with the defect correction Picard Stokes solver. An example of how to use this is below.

subsection Solver parameters
subsection Stokes solver parameters
set Number of cheap Stokes solver steps = 0
set Linear solver tolerance = 1e-7
end
subsection Newton solver parameters
set Maximum linear Stokes solver tolerance = 1e-2
set Use Eisenstat Walker method for Picard iterations = true
end
end

Hi,
Many thanks, Wolfgang and John, for your indications. I’ll try these methods (defect Picard combined with Eisenstat Walker). Thanks also for the example
Ana

In case increasing the number of cores is not an option, something else you could try (if you’re not doing that already, and if that works for your models) is using the new GMG solver. That should not require as much memory and is also faster in many cases. With the GMG, we have found that increasing the number of cheap iterations by a lot (for example to 1000) can be useful (since the expensive iterations aren’t that much better), but I haven’t run any models with a plastic rheology yet.

Hi Juliane,
Thank you for this advice. I’ll try this GMG solver. I can also increase the number of cores using other clusters, but they are shared by so many users that they are not always a good alternative. I’ll let you know how it works with a plastic rheology.
Cheers
Ana
Ana