Problem running continental extension cookbook on recently installed version of ASPECT

Hi,

Recently I’ve been working with the ARCHER2 help desk and tried installing the latest version of ASPECT on the ARCHER2 supercomputer (https://www.archer2.ac.uk/), but I’m having a problem with the latest version of ASPECT (ver. 2.6-pre, f75d95afe).

Firstly, I tried installing ASPECT 2.5 (bafd9df3e) from GitHub - geodynamics/aspect at aspect-2.5. Long story short, after a lot of back and forth with the helpdesk I managed to get this version of ASPECT working, yay. Attached is the working instructions on how to install ASPECT (ver. 2.5, bafd9df3e) on ARCHER2 (without LAPACK or BLAS).

Subsuquently, I tried updating to the main branch (2.6-pre, f75d95afe when I last tried), and after finding out how to install SUNDIALS, the installation process went fine with no error messages (again attached is the installation instructions I used). However, when I started running a handful of cookbook models it became quite noticeable that the models I ran were taking significantly longer than they should do. To test what was wrong I ran the continental extension cookbook model on the working version of 2.5 and on this recently installed version of 2.6-pre until 10 Myr. Overall, both models look relatively similar (images below) so nothing drastically unexpected happened, but ver. 2.5 took ~21 minutes while 2.6-pre took ~250 minutes on one node (128 cores). After looking at the postprocessing it became apparent that something wan’t right (highlighted in bold is the key differences between the two models):

Using ASPECT 2.5:
±---------------------------------------------±-----------±-----------+
| Total wallclock time elapsed since start | 1.3e+03s | |
| | | |
| Section | no. calls | wall time | % of total |

| Assemble Stokes system | 502 | 9.24s | 0.71% |
| Assemble Stokes system Picard | 580 | 10.6s | 0.82% |
| Assemble Stokes system rhs | 78 | 1.23s | 0% |
| Assemble composition system | 2510 | 46.9s | 3.6% |
| Assemble temperature system | 502 | 12.4s | 0.95% |
| Build Stokes preconditioner | 580 | 21.7s | 1.7% |
| Build composition preconditioner | 2508 | 2.12s | 0.16% |
| Build temperature preconditioner | 502 | 0.436s | 0% |
| Initialization | 1 | 1.1s | 0% |
| Mesh deformation | 502 | 17.1s | 1.3% |
| Mesh deformation initialize | 2 | 1.58s | 0.12% |
| Postprocessing | 501 | 35.8s | 2.8% |
| Refine mesh structure, part 1 | 1 | 0.0429s | 0% |
| Refine mesh structure, part 2 | 1 | 0.056s | 0% |
| Setup dof systems | 2 | 1.98s | 0.15% |
| Setup initial conditions | 2 | 0.294s | 0% |
| Setup matrices | 502 | 64.5s | 5% |
| Solve Stokes system | 580 | 1.05e+03s | 81% |
| Solve composition system | 2508 | 15.5s | 1.2% |
| Solve temperature system | 502 | 2.99s | 0.23% |

Using ASPECT 2.6-pre:
±---------------------------------------------±-----------±-----------+
| Total wallclock time elapsed since start | 1.51e+04s | |
| | | |
| Section | no. calls | wall time | % of total |

| Assemble Stokes system | 502 | 8.91s | 0% |
| Assemble Stokes system Picard | 3998 | 71.9s | 0.48% |
| Assemble Stokes system rhs | 3496 | 59.1s | 0.39% |
| Assemble composition system | 2510 | 61.2s | 0.41% |
| Assemble temperature system | 502 | 12.5s | 0% |
| Build Stokes preconditioner | 3998 | 147s | 0.98% |
| Build composition preconditioner | 2508 | 2.03s | 0% |
| Build temperature preconditioner | 502 | 0.411s | 0% |
| Initialization | 1 | 1.07s | 0% |
| Mesh deformation | 502 | 17.8s | 0.12% |
| Mesh deformation initialize | 2 | 1.21s | 0% |
| Postprocessing | 501 | 43.5s | 0.29% |
| Refine mesh structure, part 1 | 1 | 0.0369s | 0% |
| Refine mesh structure, part 2 | 1 | 0.0441s | 0% |
| Setup dof systems | 2 | 1.58s | 0% |
| Setup initial conditions | 2 | 0.164s | 0% |
| Setup matrices | 2 | 0.264s | 0% |
| Solve Stokes system | 3998 | 1.46e+04s | 97% |
| Solve composition system | 2508 | 15.5s | 0.1% |
| Solve temperature system | 502 | 2.93s | 0% |
±---------------------------------±----------±-----------±-----------+
Full outputs are attached as well.

My main question here is what’s causing this to happen? Also, is there any fix to this? I noticed that the main branch lists SUNDIALS as a requirement now so is it setup correctly? Since ARCHER2 is a Cray system we’re avoiding installing and configuring LAPACK and BLAS, could not having LAPACK and BLAS be another potential issue here?

Any help is much appreciated.

Thanks,

Luke


installing_aspect_archer2.pdf (90.9 KB)
slurm_ver.2.6.txt (2.6 MB)
slurm_ver.2.5.txt (1.7 MB)

Luke:
The difference between the two logs is that in the 2.5 version, ASPECT only ever runs a single nonlinear Stokes iteration per time step after the first time step, whereas in the 2.6 version it is on average ~8. Because the Stokes assembly+solver is the most expensive part of the overall run time, this explains the ~10x slow down.

I don’t know whether the issue with the number of nonlinear iterations is because of a difference in input file, because we fixed a bug, because we introduced a bug, because we changed the default for a parameter you do not explicitly list in the input file, or something else. But that is the starting point for where you need to look.

Best
W.

Hi,

Thanks a lot for the quick response! I figured there was a problem with the number of stokes iterations, thanks for the info and confirming it.

I ran a diff on the input files I used and there’s no difference between the two models I ran, just the version of ASPECT that was used. Likewise, the only difference between the models I ran and the continental extension cookbook model was the name of the output folder and the end time:

I also attached the .prm files.

I’m not sure what else I should look to try so if you have any ideas or troubleshooting tips let me know.

Thanks again,

Luke

FYI: continental_extension.prm = continental extension cookbook model
cookbook_extension_model_2 = continental extension cookbook model ran until 10 Myr on ver. 2.6
cookbook_extension_model_3 = continental extension cookbook model ran until 10 Myr on ver. 2.5

cookbook_extension_model_2.prm (13.5 KB)
continental_extension.prm (13.5 KB)
cookbook_extension_model_3.prm (13.5 KB)

Hi Luke,

That is really strange behavior, and I’m not sure what could be causing the difference.

Here are some options for how to proceed with testing to diagnose the issue:

  1. Modify the PRM file to have an end time of 0 and only allow for a maximum of 1 nonlinear iteration. This should run quickly and then you can see if there is any noticeable difference in the wall clock time for the Stokes solver.
  2. Instead of the defect correction Stokes, try using just regular picard iterations for the nonlinear solver: set Nonlinear solver scheme = single Advection, iterated Stokes. This test should pinpoint if there is a difference with the defect correction Stokes solver between the two versions.

Do you by chance also have a access to a separate computer that you can run these and prior tests on with the same version of ASPECT to confirm if this finding is reproducible across different OS/machines?

Cheers,
John

Hi,

Thanks for the advice! I’ve had a bit of a play with the parameters you suggested to see if I can find out what’s causing the problem:

  1. I first ran a model (on ver. 2.5 and 2.6) where the end time = 0 and then another model where the end time = 0 and Max nonlinear iterations = 1. The end time = 0 model had a very similar wallclock time on both 2.5 (62s total) and 2.6 (63.6s total) with an identicle number of calls. Likewise, the end time = 0 and Max nonlinear iterations = 1 model had a very similar wallclock time on 2.5 (18.9s total) and 2.6 (10s total) with an identicle number of calls. Attached are also the full slurm outputs for all the models I talk about in this post.

  2. I next changed the Stokes solver from single Advection, iterated defect correction Stokes to single Advection, iterated Stokes. The iterated Stokes model took much less time to run compared to using the iterated defect correction Stokes with somewhat similar runtimes to ASPECT ver. 2.5 (966s on 2.6 and 1025s on 2.5) and gave very similar looking outputs. It seems here that the main problem looks to be using iterated defect correction Stokes as I’ve had no major differences using the other solver.

Version 2.6 - Continental extension cookbook model changed to using single Advection, iterated Stokes:
±---------------------------------------------±-----------±-----------+
| Total wallclock time elapsed since start | 966s | |
| | | |
| Section | no. calls | wall time | precent of total |
±---------------------------------±----------±-----------±-----------+
| Assemble Stokes system | 575 | 10.3s | 1.1% |
| Assemble composition system | 1260 | 31s | 3.2% |
| Assemble temperature system | 252 | 6.66s | 0.69% |
| Build Stokes preconditioner | 575 | 20.6s | 2.1% |
| Build composition preconditioner | 1258 | 1.05s | 0.11% |
| Build temperature preconditioner | 252 | 0.256s | 0% |
| Initialization | 1 | 1.11s | 0.11% |
| Mesh deformation | 252 | 9.15s | 0.95% |
| Mesh deformation initialize | 2 | 4.8s | 0.5% |
| Postprocessing | 251 | 71.2s | 7.4% |
| Refine mesh structure, part 1 | 1 | 0.166s | 0% |
| Refine mesh structure, part 2 | 1 | 0.134s | 0% |
| Setup dof systems | 2 | 5.69s | 0.59% |
| Setup initial conditions | 2 | 0.389s | 0% |
| Setup matrices | 2 | 0.315s | 0% |
| Solve Stokes system | 575 | 795s | 82% |
| Solve composition system | 1258 | 7.58s | 0.78% |
| Solve temperature system | 252 | 1.47s | 0.15% |
±---------------------------------±----------±-----------±-----------+

Version 2.5 - Continental extension cookbook model changed to using single Advection, iterated Stokes:
±---------------------------------------------±-----------±-----------+
| Total wallclock time elapsed since start | 1.03e+03s | |
| | | |
| Section | no. calls | wall time | percent of total |
±---------------------------------±----------±-----------±-----------+
| Assemble Stokes system | 548 | 9.97s | 0.97% |
| Assemble composition system | 1260 | 22.9s | 2.2% |
| Assemble temperature system | 252 | 4.88s | 0.48% |
| Build Stokes preconditioner | 548 | 18.5s | 1.8% |
| Build composition preconditioner | 1258 | 1s | 0% |
| Build temperature preconditioner | 252 | 0.265s | 0% |
| Initialization | 1 | 1.47s | 0.14% |
| Mesh deformation | 252 | 8.92s | 0.87% |
| Mesh deformation initialize | 2 | 2.49s | 0.24% |
| Postprocessing | 251 | 79.6s | 7.8% |
| Refine mesh structure, part 1 | 1 | 0.224s | 0% |
| Refine mesh structure, part 2 | 1 | 0.0688s | 0% |
| Setup dof systems | 2 | 3.11s | 0.3% |
| Setup initial conditions | 2 | 0.336s | 0% |
| Setup matrices | 2 | 0.302s | 0% |
| Solve Stokes system | 548 | 860s | 84% |
| Solve composition system | 1258 | 7.78s | 0.76% |
| Solve temperature system | 252 | 1.48s | 0.14% |
±---------------------------------±----------±-----------±-----------+

Unfortunatly, it’s a bit of a pain, but I don’t have a seperate computer that I can readily install ASPECT on. My original post did include information on how I installed ASPECT as well as the system I installed it on if that’s of any help.

For now I’m still working on my project as normal, I’m just avoiding using the single Advection, iterated defect correction Stokes while I work on adding/chaning parameters. If you want me to try anymore tests I can or if you have any idea what’s causing ASPECT to act like this let me know.

Thanks,

Luke

slurm-2.6_End time = 0.txt (30.4 KB)
slurm-2.5_End time = 0.txt (30.1 KB)
slurm-2.6_End time = 0 - Max nonlinear iterations = 1.txt (8.3 KB)
slurm-2.5_End time = 0 - Max nonlinear iterations = 1.txt (8.1 KB)
slurm-2.6_single Advection, iterated Stokes.txt (880.2 KB)
slurm-2.5_single Advection, iterated Stokes.txt (875.1 KB)

Hi Luke,

Thanks for running these additional tests. As a side note, 128 processors is a bit too much for this size problem (100-200K DOF), and the models will probably run faster using 8 or 16 nodes. My recollection is that for 2D you never want to go below 10K DOF per processor (or something along those lines).

Interestingly, aside from minor variations I am not seeing any significant difference in the model outputs between the two versions (2.5 versus 2.6) when using iterated defect correction Stokes or iterated Stokes.

For example, the same number the files slurm-2.6_End time = 0.txt and slurm-2.5_End time = 0.txt take the same number of nonlinear iterations.

I am recalling correctly that after the first time step is when the number of nonlinear iterations between v 2.5 and 2.6 when using defect correction Stokes diverges?

The iterated Stokes model took much less time to run compared to using the iterated defect correction Stokes with somewhat similar runtimes to ASPECT

If you want me to try anymore tests I can or if you have any idea what’s causing ASPECT to act like this let me know.

Is this comparing to the tests you ran in the previous post to the full end time? I confess I am bit confused now, as the tests you just posted for a single time step (End time = 0) actually don’t show much variation between version 2.5 and 2.6 for defection Correction Stokes. Would you be willing to run the models using iterated Stokes in combination with an end time of 0 so we can do another round of comparisons?

For now I’m still working on my project as normal, I’m just avoiding using the single Advection, iterated defect correction Stokes while I work on adding/chaning parameters

I think that is a good plan for now. There is an open issue for the use of defect correction Stokes with the GMG solver, but you are using AMG here (is that correct?).

Are you by chance free next Monday from 11 am - 12 pm Pacific for the regular ASPECT user meeting? That would be a good opportunity to discuss these results live.

@MFraters - This may be of interest.

Thanks!

Cheers,
John

Hi again,

Now I have a better understanding of what the problem is to avoid any confusion I’ve created a little table outlining each test I ran and also all the output slurm files when using the continental extension cookbook model.

Interesting how the issue you linked didn’t have any problems with the AMG solver and single Advection, iterated defect correction Stokes as that’s the problem I seem to be having. Since I don’t have LAPACK and BLAS installed I don’t believe I can use the GMG solver so have been using the AMG solver.

Hopefully this summarises the problems I’m having a little better. I’ll also have a look at pushing this issue at the next ASPECT meeting. I assume joining is just opening the zoom link on the Regular User Meeting pinned topic?

Big thanks again,

Luke

slurm-2.5.txt (876.4 KB)
slurm-2.5_End time = 0 - Max nonlinear iterations = 1.txt (8.1 KB)
slurm-2.5_End time = 0.txt (30.1 KB)
slurm-2.5_End time = 10e6.txt (1.7 MB)
slurm-2.5_single Advection, iterated Stokes - End time = 0.txt (21.2 KB)
slurm-2.5_single Advection, iterated Stokes.txt (875.1 KB)
slurm-2.6.txt (378.8 KB)
slurm-2.6_End time = 0 - Max nonlinear iterations = 1.txt (8.3 KB)
slurm-2.6_End time = 0.txt (30.4 KB)
slurm-2.6_End time = 10e6.txt (2.6 MB)
slurm-2.6_single Advection, iterated Stokes - End time = 0.txt (21.4 KB)
slurm-2.6_single Advection, iterated Stokes.txt (880.2 KB)

Hi Luke,

Thanks for the summary of all the tests and results via the table, that is incredibly helpful.

Very odd that the issue between two versions so far only occurs after the first time step with iterated defect correction Stokes.

If you have a chance, can you add a post to the aforementioned issue summarizing your findings? The underlying problems may or may not be related, but I think it makes sense to add to that issue initially.

On my end, I will try the continental extension cookbooks with the two different versions and iterated defect correction Stokes on a local computer sometime in the coming days, to see if I can reproduce the issue.

Indeed, to join just open the zoom link from that pinned topic.

Thanks again for pointing out this issue and conducting the additional tests.

Cheers,
John

Hi Luke,

I finally had a chance to run a few tests, and can confirm similar differences in the iterated defect correction Stokes solver behavior between the two versions (2.5, 2.6-pre) when building/running ASPECT on a standard linux workstation. In detail, in version 2.5 only 1 nonlinear iteration is required after the first time step, while in version 2.6-pre quite a few nonlinear iterations are required (I capped it at 10).

Similarly, using iterated Stokes only produces very minor variations between the two versions (exact nonlinear residuals, etc)

I propose we proceed as follows:

  1. For now, continue to use iterated Stokes instead iterated defect correction Stokes (I recommend others due this as well for similar classes of models)
  2. We move this discussion over to a new github issue, where I will summarize the issue and my new test results.
  3. We come back to this forum post after discussion on the github issue.

Does this sound like a reasonable path forward?

Cheers,
John

Hi John,

Thanks for running the tests!

When I first came across this issue I assumed it was going to be another problem with my ASPECT installation, so from my point of view this is good news as I don’t have to rebuild ASPECT for what feels like the 100th time trying to find out what’s wrong.

Yes if you could update this post when the bugs fixed that would be great, thanks.

I’ve also attached a slightly updated guide to installing ASPECT on ARCHER2 if you want to add it to the ASPECT wiki.

Big thanks again for all the help you’ve been ace,

Luke

installing_aspect_archer2.pdf (92.3 KB)