Problem with restarting 3D model

Mainak · August 17, 2022, 4:47am

Hello,
I am doing 3D box model of subduction. Using checkpoints to restart the model some structural errors come.

This is before restarting the model. It shows a composition which is composed of particles by “initial composition” particle property.

This one is the same composition after restarting from checkpoint. Please let me know what can be done to rectify this.

Thanks,
Mainak

bangerth · August 17, 2022, 3:54pm

Mainak,
I don’t know what the underlying cause is, but let’s poke around a bit first:

What version of ASPECT are you using?
Are you running in debug or release mode?
How many processes are you using?
When you visualize the partitioning between processes, does that look similar to the artifacts you see?

Best
W.

Mainak · August 18, 2022, 7:06am

Hi Wolfgang. To answer your questions,
I am using ASPECT 2.3
I am running in release mode with 120 processors.
I suspected that some of the processors may not be able to reload the matrices properly and that may lead to this. But how do I visualize the partitioning between the processes?

Thanks,
Mainak

bangerth · August 18, 2022, 5:26pm

Mainak:
it’s useless to speculate what precisely goes wrong if you encounter a problem in release mode. Run the simulation in debug mode and check again whether you get the same result. More often than not, you will have gotten an error at some point or other, and the problem you see is just a downstream consequence of an error that happened earlier but that you didn’t see because you ran in release mode.

As for the partitioning: It’s one of the visualization postprocessors you can select in the input file.

Best
W.

Mainak · August 22, 2022, 12:01pm

A few observations. I am using two custom plugins for my models. I am also doing 2d subduction modeling using these plugins.

When I compile the plugins in release mode no error or warning shows up. In 2d when restarting from checkpoints in release mode no such structural errors are observed.
When I compile the plugins in debug mode no error or warning shows up as well. But whether I try to run 2d or 3d model in debug mode I get this kind of error at the start.

I restarted the 3d model in release mode this time with partition visualization parameter on. Then overlayed the partition and the structural anomaly but there is not much correlation between them.

Now I am not sure why the model won’t even start in debug mode when compiling the plugins no error comes up, and same plugins compile and run in release mode. I gave a sample run in debug mode which doesn’t use custom plugins and it runs ok, meaning the issue is not with the installed debug build.

Mainak

bangerth · August 23, 2022, 5:43pm

Mainak:
I don’t know what to suggest. Does your model run on one process in debug mode? If necessary with fewer mesh refinement steps? You might have to find out what causes the error, but I continue to think that whatever happens in release mode is not worth chasing until you know that things work correctly in debug mode – this is what debug mode is there for.
Best
W.

Mainak · August 24, 2022, 9:27am

I understand. Finding the issue would take trial and error and be a bit like taking shots in the dark since no error comes up while compiling these plugins. But yes, it should run in debug mode, of course, I’ll work on that.

Thanks,
Mainak

Mainak · September 2, 2022, 7:20am

I may have found the issue why it was not running in debug mode. When I ran with one processor this error came up.

In the calculation of one of my “out.reaction_terms” there is a term “this->get_timestep()”. When I am trying to print its value in debug mode by this line…

this->get_pcout() << " \n " << this->get_timestep();

…it always shows this error, even when I am making all reaction terms zero in the calculation. By removing all traces of “this->get_timestep()” from the code makes it run in debug mode.
But when I try to print this value in release mode it shows “nan” values in the beginning, then it shows zero after some time, and runs further steps as usual.

That is for when I used constant viscosity. For using dislocation creep with this code

    const bool use_reference_strainrate = (this->get_timestep_number() == 0) &&
                                          (strain_rate.norm() <= std::numeric_limits<double>::min());

    double edot_ii;
    if (use_reference_strainrate)
      edot_ii = ref_strain_rate;
    else
    edot_ii = std::max(std::sqrt(std::max(-second_invariant(deviator(strain_rate)), 0.)),
                         min_strain_rate);

It runs in release mode but not in debug mode.

Here the first column values are printed values of “this->get_timestep_number()” in the first time step.

bangerth · September 2, 2022, 2:02pm

Mainak:
I believe that the issue you are running into is the following: When you are computing material properties in the first time step, the velocity has not been computed yet, and as a consequence the time step size has not been computed yet. To make sure people don’t inadvertantly introduce bugs in their program, the get_time_step() function returns NaN in such cases: You just can’t use the time step size before it has been computed.

Is this happening in a plugin you wrote yourself, or are you just using ASPECT’s own plugins?

Best
W.

Mainak · September 2, 2022, 3:13pm

Yes, that is true all the issues come up in the first timestep. This is a material model plugin that I have written. I have taken existing formulations from separate plugins and modified them in order to do so. I understand get_timestep() function can’t return a value there, but get_timestep_number() function also returns a value like 4249… instead of 0 at the beginning of the code. This messes up the condition check for edot_ii calculation. Though after that initial period get_timestep_number() returns usual values 0, 1, 2… in subsequent timesteps.

Not only that, in.temperature and in.pressure sometimes return 0 value at the first timestep, though this issue does not recur afterward. I can make a condition to correct for 0 values of temperature and pressure by providing dummy values in the first timestep for material value calculations. Somehow all these issues were sidetracked in release mode. But what should be done to get past the beginning values of get_timestep() and get_timestep_number(). Should I state a check isnan() for get_timestep() and continue calculations by imposing a dummy value for the first timestep.

Mainak

bangerth · September 2, 2022, 3:57pm

Mainak,
the material model is also called before the first time step, for example to compute the adiabatic profile. At that point, the time step size is not available yet, and get_time_step() returns numbers::invalid_unsigned_int (which has a numeric representation of 4269...).

The thing to do is that if get_time_step()==numbers::invalid_unsigned_int you can simply not access the time step length, and anything that relies on the time step length should not be computed. If you want to be extra safe, put numbers::signaling_nan<double>() into all fields that you cannot compute to make sure that these fields cannot be used for anything else downstream.

Best
W.

Mainak · September 2, 2022, 4:16pm

I understand now. Thank you I will implement this.

Mainak

Topic		Replies	Views
3D model restart issues ASPECT	9	106	April 8, 2025
Particle-advection error occurs only in debug mode but not in release mode ASPECT	2	9	July 30, 2025
Problems restarting model when using mesh coarsening and refinement ASPECT	3	81	April 18, 2024
ASPECT stops at same time steps ASPECT	2	30	September 9, 2024
Problems in session 3 of CIG ASPECT tutorials ASPECT	9	257	July 24, 2024

Problem with restarting 3D model

Related topics