Highlights from the 2021 CIG Developers Workshop
CIG held its first Developers Workshop February 23 and 25, 2021. The workshop focused on
- Expanding the CIG software developer community;
- Making CIG software more accessible to new users;
- Identifying and leveraging common infrastructure; and
- Leveraging collective wisdom to make CIG software better and easier to develop and maintain.
The 43 participants represented developers of CIG community codes, those interested in contributing to the community codes, and researchers interested in improving their own codes. The workshop was held online and consisted of six discussion sessions.
Session I: Developer Tools
Integrated Development Environments (IDEs)
- CIG should encourage use of IDEs.
- CIG should provide resources showing people how to setup an IDE and use IDEs in tutorials and hackathons.
- CMake provides much better integration with IDEs than autotools.
Continuous Integration Tools
- CIG should provide hardware for test runners.
- CIG should consider replacing its Jenkins server with GitHub Actions or GitLab Pipelines and just focus on providing test runners.
Containers and Binary Packages
- Singularity containers are likely the best path forward for helping people get CIG code installed on clusters.
- An intermediate step to Singularity containers would be to make available existing Docker containers used in CI tests.
- Spack is another method of installing code that might be useful.
- There are ways to make binaries that provide optimized code for multiple architectures. CIG should consider looking into the feasibility of doing this.
Session II: CIG Software Development Best Practices
Software Development Plan
- A development plan document is nice, but GitHub issues w/milestones also works well.
- Developer teams also need to convey what is out of scope.
- Developers should engage the community at least once a year to update development priorities.
- CIG should investigate which checkpointing tool(s) are worth using.
- Online documentation has several advantages over PDF documents.
- CIG should encourage codes to provide online documentation and a PDF version.
- MyST: Markedly Structure Text (Markdown)
- MyST+Jupyter notebooks: Markedly Structured Text and Jupyter Notebooks
Session III: Should CIG specify a standard output format?
Standard Layout of Output Files
- CIG should form a working group to develop standard layouts for VTK, HDF5+Xdmf, and netCDF files.
- CIG should leverage standards that have already been developed (e.g., cfconventions.org).
- CIG may want to consider web interfaces for simple visualization and inspection.
- CIG should identify possible common post-processing algorithms/scripts.
Improving Simulation Output
- CIG should investigate other projects (Alpine/Conduit) that provide support for higher order discretizations.
- CIG should investigate seamless compression of data, e.g., lossy compression via (custom?) HDF5 filters.
Other community efforts related to standardizing output
- ASDF: HDF5 layout for seismic data (station waveforms)
- Johns Hopkins Turbulence Databases
Session IV: Should CIG adopt a standard interface for specifying values for boundary conditions and material properties?
- World Builder (ASPECT) defines volumes within a domain using simple spatial features and assigns parameterizations/properties to the features.
- SpatialData (PyLith) provides an API for querying values in space along with several implementations; it supports georeferencing and unit conversion.
- easi (SeisSol) provides an API for mapping values from Rn to Rm; it supports several model types, which can be composed together.
- SpatialData and easi are quite similar and it would not be difficult to merge them.
- Specifying values for initial and boundary conditions is usually point queries.
- Specifying material properties can be much more complicated and involve additional information, e.g., composition, state variables.
- An API and implementations for standard output layouts would allow using output from one code as input to another code.
- CIG should form a working group to assess use cases, scope, and outcomes of an API and corresponding library.
Session V: Improving Modeling Workflow
- Many CIG codes are used within a larger framework or run many times for sensitivity analyses, uncertainty quantification, and inversions.
- Most users struggle with troubleshooting simulations, preparing simulation inputs, and post processing.
- There are many useful workflow related tools associated with Pangeo project (see list of links below).
- Data in the cloud
- Storytelling with code
- Several people (about 9 of 30) are already using Jupyter notebooks in their workflow and many users are using Python for pre- and post-processing.
- CIG should form a working group focused on assessing and improving simulation workflow management, including identifying use cases for Jupyter notebooks.
- Lindsey Heagy’s Pangeo talk
- JupyterLab is the next-generation web-based user interface for Project Jupyter.
- Zarr is a format for the storage of chunked, compressed, N-dimensional arrays.
- Intake is a lightweight package for finding, investigating, loading and disseminating data.
- Pangeo Forge is an open source tool for data Extraction, Transformation, and Loading (ETL). The goal of Pangeo Forge is to make it easy to extract data from traditional data repositories and deposit in cloud object storage in analysis-ready, cloud-optimized (ARCO) format.
- Ipygany 3D ParaView-like visualization
- Storytelling with code (article)
- Elyra AI-centric extensions to JupyterLab Notebooks
- Interactive Workflows for C++ with Jupyter
- GeoSci.xyz portal for information and computational resources for geoscientists
- EarthCube Peer-Reviewed Jupyter Notebooks
Session VI: Growing the CIG Developer Community
Users and Developers
- Building a broad user base through good support is an integral step to sustainable development.
- Hackathons provide users with exposure to contributing small changes; encouragement and mentoring are important.
- Clean plug-in interfaces greatly facilitate accepting community contributions.
- Extended, close collaboration with developers/maintainers is key to progressing from user to developer; graduate students and postdocs are the main conduit.
- CIG’s postdoc program and other postdoc programs are critical to growing the CIG developer community.
- CIG should encourage developer teams to acknowledge community contributions to codes in ways that help contributors document their impact.
- CIG should maintain a catalog of recommended building blocks, e.g., libraries and packages, for potential use in developing new codes.
- CIG should maintain a list of teaching, coding, and numerical modeling resources.
- CIG should forge partnerships with organizations running software carpentry workshops and help develop ones with a focus on geoscience.
- CIG should provide Research Experiences to Undergraduates (REUs) in geodynamics modeling with an emphasis on increasing diversity.
Software Carpentry Teaching Basic Lab Skills for Scientific Computing.
Earth Data Analytics at CU Boulder This site contains open, tutorials and course materials covering topics including data integration, GIS and data intensive science.
Computers, Waves, Simulations: A Practical Introduction to Numerical Methods using Python
Other modeling/software projects that might be of interest
Contributed by Developers Survey submissions
Library for 2D pencil decomposition and distributed Fast Fourier Transform.
HealPix - Hierarchical Equal Area isoLatitude Pixelization of a sphere
Triangle A Two-Dimensional Quality Mesh Generator and Delaunay Triangulator
LaMEM - Lithosphere and Mantle Evolution Model A parallel 3D numerical code that can be used to model various thermomechanical geodynamical processes such as mantle-lithosphere interaction for rocks that have visco-elasto-plastic rheologies. The code is build on top of PETSc package and the current version of the code uses a marker-in-cell approach with a staggered finite difference discretization. A range of (Galerkin) multigrid and iterative solvers are available, for both linear and non-linear rheologies, using Picard and quasi-Newton solvers (provided through the PETSc interface)
pTatin3D is a software package designed for studying long time-scale processes relevant to geodynamics. The original motivation for this development was to provide the community with an open-source toolkit capable of studying high-resolution, three-dimensional models of lithospheric deformation. Unique to this package is that we provide fast, parallel scalable matrix-free definitions for the Stokes operators which are utilized by a hybrid geometric-algebraic multi-grid preconditioner.
There are also a number of infrastructure projects that should be more widely used in CIG projects, many of which come out of DoE and are part of the Exascale Computing Project. Examples are HDF5, netCFD, but also tools such as checkpoint/restart or data compression libraries. We don’t need to reinvent these wheels if we need this kind of technology. But often we think that we either don’t need the technology or we know that we do but don’t have the manpower to implement things ourselves – and in those cases, it would be useful to have a bigger knowledge base of what’s out there and people we know using them already so we know who to ask questions. In other words, it would be good if CIG projects in general used a bigger part of xSDK, for example.
AxiSEM3D: We will soon release a new version of this novel method and code. Discussed with Carl Tape already whether this could be added to CIG
ENKI, FEniCS, Firedrake, HippyLIB, LeoPART, ExaHype, ECOMAN