Artifacts with HDF5 output

maxrudolph · January 18, 2021, 12:37am

Hello everyone. I am having some issues with the visualization of hdf5 files in ASPECT. The model that I am examining is 3D, cartesian geometry.

There is no adaptive mesh refinement.
The model uses periodic boundary conditions in both the x- and y-directions.

The visualization output looks almost correct except for the artifact connecting the top right and bottom left of the image in the attached output. I verified that the visualization contains the artifact in the latest VisIt 3.1.4 as well as the older VisIt 2.13.3.

Has anyone else run into something similar?

Thank you for your help.

This is the aspect version info:
– This is ASPECT, the Advanced Solver for Problems in Earth’s ConvecTion.
– . version 2.3.0-pre (master, 826c25583)
– . using deal.II 9.3.0-pre (master, 250eae6824)
– . with 32 bit indices and vectorization level 1 (128 bits)
– . using Trilinos 12.10.1
– . using p4est 2.0.0
– . running in OPTIMIZED mode
– . running with 128 MPI processes

maxrudolph · January 18, 2021, 1:19am

Moved to ASPECT forum

bangerth · January 18, 2021, 1:25am

Max – what happens if you visualize the mesh? Does it have one warped cell?
Best
W.

maxrudolph · January 18, 2021, 1:56am

Wolfgang - thanks for your very quick response. Good idea to look at the mesh. There is an artifact in the upper (+x,+y corner). It does not appear to be a single warped cell but maybe corresponds to a single warped cell at a coarser refinement level?

bangerth · January 19, 2021, 7:46am

Max – what’s the coarsest mesh with which you can reproduce this? If you reduce global mesh refinement to a minimum and simplify the setup, can you come up with a minimal testcase that shows the issue?
Best
Wolfgang

maxrudolph · January 19, 2021, 7:11pm

I ran a bunch of test cases and found that the problem depends on the number of nodes not the number of MPI processes used in the calculation. As soon as the computation is distributed across two or more nodes, the visualization output is incorrect. I believe that this is due to using parallel HDF5 on NFS file systems, the same issue that we discussed here:

and here

I re-ran a test model on 4 nodes with different values of the OpenMPI fs_ufs_lock_algorithm

export OMPI_MCA_fs_ufs_lock_algorithm=1 has corrupted output
export OMPI_MCA_fs_ufs_lock_algorithm=3 has the correct mesh output

The only workaround that I can see is to allow aspect (and deal.ii) to be configured with serial hdf5 even when running in parallel.

bangerth · January 21, 2021, 9:06pm

The only workaround that I can see is to allow aspect (and deal.ii) to be configured with serial hdf5 even when running in parallel.

But that isn’t going to work easily either. The HDF5 codes make the assumption that data from one processor is going to be stored correctly in the file even if other processors do the same with theirs. Some of the HDF5 operations may also be collective.

So it isn’t just allowing configuration with a sequential HDF5 library, but it would require that the code that uses these interfaces sends its data to a central processor which then does the writing; similar for reading. That would imply a lot of work, I’m afraid.

Can’t you just always run with export OMPI_MCA_fs_ufs_lock_algorithm=3?

Best
W.

tjhei · January 22, 2021, 4:35pm

Hi Max,

I agree with Wolfgang. If you want to use serial hdf5, you will have to rewrite all the code inside deal.II that currently uses parallel HDF5.

HDF5 uses MPI IO and MPI IO might be talked into using a single writer (which is inefficient, but you might not care that much). Did you look into “cb_config_list” and “cb_nodes” hints?
Maybe MPI_Info_set(info, “cb_nodes”, “1”); works?
See https://wgropp.cs.illinois.edu/courses/cs598-s15/lectures/lecture33.pdf

As a side comment (I hope you don’t mind):
It looks like you have spent hours and hours on MPI I/O and file system issues on that cluster. NFS is not a parallel file system and I doubt you can make things work well. I would cut my losses and move on: ask a sysadmin to set up a real, working parallel file system or use a different machine. You could start with an XSEDE allocation, for example.

Topic		Replies	Views
Incorrect visualization output on resume from checkpoint ASPECT	4	273	February 21, 2020
Development Update March 2019 ASPECT	5	328	March 13, 2019
ASPECT 2.4.0 released ASPECT	0	544	July 28, 2022
Paraview visualization problem with hdf5 ASPECT restart	1	580	March 6, 2020
A three-dimensional grid problem ASPECT	5	60	July 25, 2024

Artifacts with HDF5 output

Related topics