Specfem3D_globe for PKIKP phase

Hi there,

We plan to use Specfem3D_globe to study CMB with velocity tomography models. We already installed the package, and succeeded to get synthetic waveforms with setting NPROC_XI(ETA)=8, and NEX_XI(ETA)=512 due to limit of computational power. However for PKIKP, we prefer to get precision as ~2Hz, therefore we need bigger number for NEX_XI(ETA). What else parameters should I change to make it work with limit of computational power?
Your suggestion will be appreciated in advance.

Jiaming

Jiaming,

you are looking for a 2 Hz global simulation with 3D earth models using a spectral-element code? then you are probably out of luck with the SPECFEM3D_GLOBE code as there is currently no supercomputer big enough to achieve that. however, CIG also hosts the AxiSem code that might help you out with this.

-daniel

Daniel,

Well, maybe not that big. :slight_smile:
Actually we only need 3D velocity model in small region around inner/outer core boundary, the rest is 1D earth model for PKIKP phase.
This should work, right?

Jiaming

the runtime of a SPECFEM3D_GLOBE simulation is independent of having a 1D or 3D model. that is, running a simulation for a 1D model is as expensive as running it for a 3D model. so it doesn’t matter if you only want a portion of the Earth to be 3D.

you could only save time by turning off some of the physics (attenuation, ocean, gravity, rotation) or by limiting the simulation to a 1-chunk, regional simulation.

concerning the maximum frequency range for global simulations, Laura went down to 2s in 2008 on Ranger (High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors - IEEE Conference Publication), Seiji went down to 1.2s in 2016 on the K computer (https://journals.sagepub.com/doi/full/10.1177/1094342016632596). having a global 2Hz simulation is not achievable yet with current HPC systems, mostly due to memory limitations.

try looking into AxiSem and AxiSEM3D (GitHub - kuangdai/AxiSEM3D) which might help you with your 2Hz setup (https://academic.oup.com/gji/article/217/3/2125/5333339).

Hi,
I’m curious about that…which kind of machine could fullfill this problem? a machine similar to Leonardo (for example) or much much bigger? it seems like a potential benchmark for DT in SE…Piero

For a 1Hz simulation, NEX should be at least 4352 (= 256 * 17). with a setup of NPROC_XI = 128 and NEX_XI = 5120 the bin/xcreate_header_file tool provides:

*******************************
./SPECFEM3D_GLOBE$ cp DATA/Par_file.1Hz DATA/Par_file; ./bin/xcreate_header_file

 creating file OUTPUT_FILES/values_from_mesher.h to compile solver with correct values

 edit file OUTPUT_FILES/values_from_mesher.h to see
 some statistics about the mesh

 number of processors =        98304

 maximum number of points per region =     29322721

 total elements per slice =       260606
 total points per slice =     34608523

 the time step of the solver will be DT =    5.34909358E-03  (s)
 the (approximate) minimum period resolved will be =   0.850000024      (s)

 current record length is =    120.000000     min
 current minimum number of time steps will be =      1346100

 on NEC SX, make sure "loopcnt=" parameter
 in Makefile is greater than max vector length =     87968163

 approximate static memory needed by the solver:
 ----------------------------------------------

 (lower bound, usually the real amount used is 5% to 10% higher)

 (you can get a more precise estimate of the size used per MPI process
  by typing "size -d bin/xspecfem3D"
  after compiling the code with the DATA/Par_file you plan to use)

 size of static arrays per slice =    15360.350440000000       MB
                                 =    14648.771705627441       MiB
                                 =    15.360350439999999       GB
                                 =    14.305441118776798       GiB

    (should be below 80% or 90% of the memory installed per core)
    (if significantly more, the job will not run by lack of memory)
    (note that if significantly less, you waste a significant amount
     of memory per processor core)
    (but that can be perfectly acceptable if you can afford it and
     want faster results by using more cores)

 size of static arrays for all slices =    1509983.8896537600       GB
                                      =    1406282.0837402344       GiB
                                      =    1509.9838896537599       TB
                                      =    1373.3223474025726       TiB
*******************************

usually, for such big simulations, the limiting factor is the memory usage. with the above setup, the run requires ~ 100,000 GPUs and fills them up to 16GB per card. and for running adjoint kernel simulations, this will double, i.e., needs up to 32GB per card.

the storage size for having the model output to disk is probably quite big as well (~ a few 100-1,000 TB?).

In Leonardo we should have order 14K A100 (or similar) GPUS with 32 GB each. It’s still unfeasible, yes. The most interesting aspect, in my opinion, is that a similar big run would probably fails…not always but with an high probability due to hardware failures. For this incredible simulations fault tolerance would be a must.