We plan to use Specfem3D_globe to study CMB with velocity tomography models. We already installed the package, and succeeded to get synthetic waveforms with setting NPROC_XI(ETA)=8, and NEX_XI(ETA)=512 due to limit of computational power. However for PKIKP, we prefer to get precision as ~2Hz, therefore we need bigger number for NEX_XI(ETA). What else parameters should I change to make it work with limit of computational power?
Your suggestion will be appreciated in advance.
you are looking for a 2 Hz global simulation with 3D earth models using a spectral-element code? then you are probably out of luck with the SPECFEM3D_GLOBE code as there is currently no supercomputer big enough to achieve that. however, CIG also hosts the AxiSem code that might help you out with this.
Well, maybe not that big.
Actually we only need 3D velocity model in small region around inner/outer core boundary, the rest is 1D earth model for PKIKP phase.
This should work, right?
the runtime of a SPECFEM3D_GLOBE simulation is independent of having a 1D or 3D model. that is, running a simulation for a 1D model is as expensive as running it for a 3D model. so it doesn’t matter if you only want a portion of the Earth to be 3D.
you could only save time by turning off some of the physics (attenuation, ocean, gravity, rotation) or by limiting the simulation to a 1-chunk, regional simulation.
Hi,
I’m curious about that…which kind of machine could fullfill this problem? a machine similar to Leonardo (for example) or much much bigger? it seems like a potential benchmark for DT in SE…Piero
For a 1Hz simulation, NEX should be at least 4352 (= 256 * 17). with a setup of NPROC_XI = 128 and NEX_XI = 5120 the bin/xcreate_header_file tool provides:
*******************************
./SPECFEM3D_GLOBE$ cp DATA/Par_file.1Hz DATA/Par_file; ./bin/xcreate_header_file
creating file OUTPUT_FILES/values_from_mesher.h to compile solver with correct values
edit file OUTPUT_FILES/values_from_mesher.h to see
some statistics about the mesh
number of processors = 98304
maximum number of points per region = 29322721
total elements per slice = 260606
total points per slice = 34608523
the time step of the solver will be DT = 5.34909358E-03 (s)
the (approximate) minimum period resolved will be = 0.850000024 (s)
current record length is = 120.000000 min
current minimum number of time steps will be = 1346100
on NEC SX, make sure "loopcnt=" parameter
in Makefile is greater than max vector length = 87968163
approximate static memory needed by the solver:
----------------------------------------------
(lower bound, usually the real amount used is 5% to 10% higher)
(you can get a more precise estimate of the size used per MPI process
by typing "size -d bin/xspecfem3D"
after compiling the code with the DATA/Par_file you plan to use)
size of static arrays per slice = 15360.350440000000 MB
= 14648.771705627441 MiB
= 15.360350439999999 GB
= 14.305441118776798 GiB
(should be below 80% or 90% of the memory installed per core)
(if significantly more, the job will not run by lack of memory)
(note that if significantly less, you waste a significant amount
of memory per processor core)
(but that can be perfectly acceptable if you can afford it and
want faster results by using more cores)
size of static arrays for all slices = 1509983.8896537600 GB
= 1406282.0837402344 GiB
= 1509.9838896537599 TB
= 1373.3223474025726 TiB
*******************************
usually, for such big simulations, the limiting factor is the memory usage. with the above setup, the run requires ~ 100,000 GPUs and fills them up to 16GB per card. and for running adjoint kernel simulations, this will double, i.e., needs up to 32GB per card.
the storage size for having the model output to disk is probably quite big as well (~ a few 100-1,000 TB?).
In Leonardo we should have order 14K A100 (or similar) GPUS with 32 GB each. It’s still unfeasible, yes. The most interesting aspect, in my opinion, is that a similar big run would probably fails…not always but with an high probability due to hardware failures. For this incredible simulations fault tolerance would be a must.