High and inconsistent memory usage of advection solver

Hi forum,

I’ve been using ASPECT already for a while for complex models. No particular memory issues have been around previously but once I added compositional fields noninitial_plastic_strain and plastic_strain, memory usage went through the roof.

I did some experiments with different resource combinations but had to stop at 400 cores and 2 GB of RAM per core. Even in that case my job run out of RAM, which is rather surprising as I was having only ~20 million DOFs at maximum. Previously, before adding those two compositional fields, I was using 180 GB in total for roughly same amount of DOFs. According to slurm’s seff tool, my jobs were peaking at 70-90% of that allocated amount of RAM.

Out of memory crashes happen specifically in advection solver, not Stokes, which is even more mysterious to me. There’s not really a further pattern, the crash can happen while solving any of the compositional fields.

Is the described behavior something expected, a known issue or something unexpected, like a possible memory leak?

best,
Leevi

@leevit,

No particular memory issues have been around previously but once I added compositional fields noninitial_plastic_strain and plastic_strain , memory usage went through the roof.

That’s certainly odd, I’ve never encountered that issue and don’t recall anyone else reporting a similar issue.

Can you post the full PRM file or simplified version of it? Likewise, can you also post the log.txt and statistics files for one of the model runs?

Ideally, I would simplify the PRM file as much as possible to see if you can reproduce the issue for different levels of complexity.

Cheers,
John

Hi Leevi,

I have a similar issue when using plastic_strain and viscous_strain fields. I have a smaller model, ~500 thousand DOFs in total in 5 compositional fields. Initially, the memory usage is reasonable, but gradually increases and in a few hours, the job fills all the available memory.

Do you think it might be the same problem? Did you find a solution to the memory issue in your models?

Best,

Petra

What are the number and types of compositional fields? Can you turn on the “memory statistics” postprocessor and report what it shows (and the output about the degrees of freedom on screen)?

@maipe - In addition to the suggestion from @tjhei, would you mind posting your PRM file as well for testing?

Thanks,

John

Hi,

The input file is attached. I may simplify the setup more if needed. In principle, it is a rifting model with several compositional fields, where the rheology is visco-plastic with dependence on the plastic strain and viscous strain.

From the statistics file, it seems that only the “Peak virtual memory usage (VmPeak) (MB)“ increases.

Thanks a lot for looking into this issue!

Petra

input-memory.prm (8.0 KB)

@maipe - Thank you for sending over the prm file and it seems like a number of collaborators and students are also running into this issue.

I’ll try to do a bit of testing tomorrow and see if I can isolate where the issue is, as it appears to also be occurring in models that do not track plastic or viscous strain on particles.

Thank you again for posting the PRM file and the issue to the forum.

John

OK, thank you!

@maipe - A quick update - we discussed the issue at the ASPECT user meeting on Monday, and @tjhei was able to identify a potential cause of the issue using a model with particles. One of us will send an update after a bit more testing. Thank you again (and to @leevit) for identifying the issue and posting it to the forum.

Hi all,

I have encountered a similar issue when running a 3D model. Even though my single compute node has 1 TB of memory, I have had to restart the simulation multiple times due to memory problems after just a few time steps. I performed some basic memory leak checks using Valgrind but have not investigated the problem in depth yet.

Looking forward to any updates or insights!

I created an issue to track the progress of the fix: Memory Leak in certain ASPECT simulations · Issue #6874 · geodynamics/aspect · GitHub

1 Like

Thanks for looking into this everyone!

@maipe I sort of forgot this issue due to moving into high-memory cluster right after creating this thread and was able to cope with the issue. Ultimately, I would have probably come across with the memory leak later this spring, while running production models with >50 mil. DOFs.

My models are using fields instead of particles to track the strain but as far as I understand, the fix would help with that as well.

@tjhei was able to find the reason for the memory usage and devised a fix in Fix memory leak in NonMatching::MappingInfo by tjhei · Pull Request #19328 · dealii/dealii · GitHub and work around deal.II memory leak by tjhei · Pull Request #6877 · geodynamics/aspect · GitHub .

The fix is merged into the latest version of deal.II and a workaround was merged into ASPECT (for older deal.II versions). Could some of you update to the latest ASPECT version and report back if the issue is solved?

Thanks for diving into this and providing the information needed to fix the issue.

1 Like

Hi @gassmoeller and @tjhei ,

Thank you very much for your continued support and for the fix you provided.
I have updated my code to the latest version including your fixes, but unfortunately, the program still crashes unexpectedly.

For a 2D example, I initially see memory usage below 150 GB at the start, but after running a few thousand time steps, it eventually exceeds my local node’s 1 TB memory limit and causes a crash. I am not sure if this kind of increase in memory consumption is expected behavior or if it indicates another underlying issue.

Would it be helpful if I provide additional debug information or logs from my runs? Please let me know if there is anything specific you’d like me to test or collect.

Thanks again for your help!

@tiannh7 - did you remember to fetch deal.II from the master branch in case of using candi? By default, candi downloads deal.II version 9.7.0, unless something else specified by DEAL_II_VERSION parameter in candi.cfg or local.cfg file.

I’ve got now the most recent commits of ASPECT & deal.II installed but haven’t been able to benchmark memory usage yet.

@tiannh7 @leevit @gassmoeller @tjhei - we found that the recent patches significantly delayed the time until the memory spike, but it did still eventually occur. (edit - this test used the path in ASPECT, but not deal.II. We will test this next).

Should we move this discussion over to the corresponding ASECT issue (6874) to discuss further, share results and logs, etc?

Thanks again Timo and Rene for your work on this, and @leevit and @tiannh7 for testing!

-John

Yes, please post your findings in the issue on GitHub.

Please confirm if this is fixed by the deal.II patch first