Mapping out/document level of testing for different code pieces

Hello,
I was talking with other modelers about code development and pros and cons of different codes. One challenge that came up in this discussion is the fact that a code like Aspect has some parts that have been really broadly/deeply tested and verified, while others have not. The problem this creates is that a new (or even old user) does not know which parts of the code they can use with a very high level of confidence and other parts that they can’t. As a specific example, the DG method for composition is very well tested with a published paper, but the DG method for temperature was put in as a test but nothing was ever published demonstrating the kinds of problems it works for. Even for the DG method for composition, it was test for cartesian but not spherical. So, my question is, could there be some method for “marking” parts of the code in terms of the level of confidence, perhaps with specific information for things like which geometry it has been tested for. I could see this being done with a number/letter code 1c/2s could mean highest level of confidence for cartesian, but only moderate level for spherical. Depending on the part of code, it might not be geometry, but rather compressible versus incompressible. These could be added to the documentation so that when the user looks at the relevant subsection, they would see this number/letter code. This would also be a good way to assess the overall state of the code and identify specific parts that need further testing/work.

Hi Magali,

We ran DGBP for temperature and composition together in the following paper

New numerical approaches for modeling thermochemical convection in a compositionally stratified fluid

There are two tests with DGBP for temperature and composition together. We computed a model that Don Turcotte created of LLSVPs and computed it for Ra = 1e5 and buoyancy ratios B = 0, 0.1, … , 1.0. We made the same 11 computations with four methods in ASPECT: FEM-EV, DGBP, Particles and VOF and plotted the composition fields and temperature fields (eight figures) all together in one figure, so they are easy to compare to one another in the ‘picture norm’. (Figures 12 - 22.) It is clear that the temperature field for DGBP, Particles and VOF are nearly indistinguishable for B > 0.6, but FEM-EV is … Well you should look at these figures.

You will also be very interested in a direct comparison Ying made between DGBP and FEM-EV for B = 1.0 in Figures 25 & 26. In Fig. 26 she plotted the Composition and Entropy Viscosity for FEM-EV on a grid with 192 × 64 cells and one with 768 × 256 cells. The FEM advection algorithm with entropy viscosity is so diffusive that one must use 16 times as many cells and 4 times as many time steps or 64 times as much work to produce output that looks roughly like the other three methods. Although Ying didn’t plot the temperature field in this figure, one can infer that it is too diffusive as compared to DGBP on the grid with 192 × 64 cells, since the temperature is what is driving the problem.

This is one test of DGBP computing the temperature and I think it is fair to say that for this particular problem it is quite extensive.

Hi Magali,
I think that is actually a really good question, but I don’t have a good answer. In essence, what you propose is a big spreadsheet in which each of the many features of ASPECT are both columns and rows, and cells are features tested jointly. In your example, the row would be “DG for the temperature equation” and the column would be “spherical geometry”. Each cell would then document whether there are regression tests, publications, or other ways in which the combination was tested. Gerry’s answer would be an example of something that could show up in one of the cells. I think that we could pretty easily find in the range of 50-100 features that would form the rows and columns of such a spreadsheet.

I think we would all agree that something like this would be nice to have, but it’s difficult to do this retroactively: For example, there are now 924 tests. Figuring out what each of these tests does, and then translating that into the spreadsheet would surely take 5 minutes a test – that is, ~2 weeks of (rather boring) full-time work. One of the things I’ve learned about big software systems is that you can’t ever do things retroactively, simply because it is going to take so incredibly long to go through it all.

There are people in Computer Science who are working on systems that can automatically generate or enhance documentation (for example by extracting typical use cases of functions from the test suite, and then attaching it to the documentation of that function), or automatically generate tests based on what parts of the software is already covered/not yet covered by existing tests (for which tools exist that annotate which lines are executed how many times by running the test suite). I find these efforts really interesting because it has the potential to vastly expand documentation or test suites with relatively modest effort, even for very large projects, but the results they show in practice are almost always not very impressive from a practical perspective, or require extensive human cleanups. One could imagine that these people could also come up with ways to make the generation of your spreadsheet simpler, for example by already figuring out which tests cover which cells of the spreadsheet. But I think that would only cover a small part of what you’re really after.

So, in essence, I think that what you ask for is a totally reasonable thing. But we don’t have it, and in all likelihood will never have it given how much work it would be. I wished that we had thought of that ten years ago: it would have been possible to build and keep up to date right from the start, but I don’t see how it can be done after the fact.

Best
Wolfgang

Hi Wolfgang,
I definitely see your point about the man-hours and boringness of doing this in some super complete fashion retroactively. But, I do think there may be a semi-automatic process that could be tried out with an undergrad with python experience, that would provide a very good retrospective view of for specific questions like which solver have been tested with which geometries.

Here’s my idea: the hundred of tests that have been created along the way each have a complete parameter file. A python script (or other code) could be written to systematically read each of these parameter files and pull out specific information like which solver was turned on with which geometry. This could be done at the section/subsection level of the parameter file as a first pass. I don’t think you would want to do this for the “giant” spread-sheet of everything versus everything, but rather to answer specific questions that are related to the core operations of the program - what we find might then point us to further tests that are needed or “warning” flags to include in the code.

I’d love to hear people’s thoughts on how useful (or not) this would be and if its worthwhile.

A second suggestion would be to add references to the papers that have been published on methods that have been added to the code (after the original papers), so that those come up in the relevant subsection in the appendices in the manual. This is much smaller number of papers, but would really help people to figure out what they should read to understand what has been demonstrated to work in the code.
Cheers,
Magali

Gerry and I exchanged a few e-mails offline and I just want to add this here for anyone interested in this thread. It turns out that the tests that Gerry was referring to in the Puckett et al., PEPI 2018 paper only used the DG method for the composition advection. The FEM-EV method was used for the energy equation in all the tests. Therefore, as far as I know the DG method in Aspect has not been broadly used or tested for the energy equation, and I don’t know of any paper published using it or demonstrating is accuracy. The DG method for composition has been carefully tested in cartesian coordinates, but not in spherical geometry - I do not know if there is any reason why it should not work in spherical coordinates, but I am not expert enough on the method to know whether it could be an issue in Aspect.

Magali:

Here’s my idea: the hundred of tests that have been created along the
way each have a complete parameter file. A python script (or other code)
could be written to systematically read each of these parameter files
and pull out specific information like which solver was turned on with
which geometry. This could be done at the section/subsection level of
the parameter file as a first pass. I don’t think you would want to do
this for the “giant” spread-sheet of everything versus everything, but
rather to answer specific questions that are related to the core
operations of the program - what we find might then point us to further
tests that are needed or “warning” flags to include in the code.

Yes, that could work. You’d have to have a map from “feature” to
parameter name or value, and then parsing the input files for
combinations should not be very difficult. In fact, I think it would be
interesting to annotate all of the tests by the features they employ.

I think this would be useful, assuming there is someone who is willing
to put the time in to write such a script.

A second suggestion would be to add references to the papers that have
been published on methods that have been added to the code (after the
original papers), so that those come up in the relevant subsection in
the appendices in the manual. This is much smaller number of papers, but
would really help people to figure out what they should read to
understand what has been demonstrated to work in the code.

Yes, that too. One of the issues there is that it is also an “after the
fact” effort: People write the code first, and get it merged, then write
the paper, then see it published. We/someone would have to go back for
all of the papers in the database to identify what features each uses.
For some this may be easier because parameter files are available, but
for most I suspect that one has to implicitly figure this out by reading
the paper. Given that there are 90 papers at
The ASPECT mantle convection code: Publications
this is definitely another effort that can only be measured on a scale
from “several days” to “a couple of weeks”.

Best
W.