Hello. This is Sungho Lee.
I’m using a small cluster (total 160 cores <- 1 master, 15 nodes) for testing ASPECT now.
In the master node or a single node of the cluster, increasing processor makes running wall time shorten using ASPECT 2.1.
However, sharing the problem with other nodes dese not show running wall time shorten rather increases running wall time.
Is this normal because of problem dependent or something wrong?
Even though I’ve tested different network fabric like tmi, ofi and etc, it doesn’t help.
One thing I suspect is my executables and libraries (ASPECT and deal.II) compiled based on intel’s MKL and Intel compilers.
Is this can be a cause for my problem or not?
Any comments can be helpful for me
performance is typically a balance between the cost of doing computations and the cost of communicating. If you run on a single machine, communication is cheap and fast, but if you involve other machines, you need to involve the network and send data from one machine to another – so it becomes expensive. Whether a particular computation is accelerated is then a question of how much computing you do and how much communicating. In other words, for small simulations with a few thousand degrees of freedom, it’s often not worth running on multiple machines because there is not enough computation to amortize the cost of communication. Parallelization is often only useful if you have on the order of 50,000 unknowns per processor.
Can you say how large the computation was that you tried?
Thank you for your comment.
My problem has around 1,000,000 degree of freedoms.
By your saying, my problem requires 20 processors.
Now, I understand why performance decreases over around 16-18 cores.