Error when run pylith on the cluster with PBS

Hello, I have complied the pylith on the cluster successfully. But when I try to run it using qsub, some errors happened. The following is the standard error information:
Traceback (most recent call last):
File “/home/node/pylith/bin/pylith”, line 25, in
from pylith.apps.PyLithApp import PyLithApp
ImportError: No module named pylith.apps.PyLithApp

It seems that the program could’t find the location of PyLithApp. However, before submitting the task, I have excute the command “source ${HOME}/build/pylith/setup.sh”

The following shows my qbs file:
#!/bin/bash
#PBS -N case01
#PBS -l nodes=cu01:ppn=12+cu02:ppn=12+cu03:ppn=12+cu04:ppn=12
#PBS -l walltime=480:00:00
#PBS -q batch1
#PBS -e my.err
#PBS -o my.out

EXEC=/home/node/pylith/bin/pylith
INPUT=/home/node/pengzhai/script/step02_7.cfg
MONITOR=/home/node/pengzhai/script/step02_7.log
NP=48

nohup {EXEC} {INPUT} --nodes=${NP}

So how should I configure my qbs file and figure out this problem?

The problem is that the environment you set when you submit the job to the queue is not being used when the job actually starts.See the -V option of the qsub command:

  -V	  Declares that	all environment	variables in the qsub
		  command's environment	are to be exported to the
		  batch	job.

Note that PyLith can automatically submit a job to a PBS batch system. See the PyLith manual and pyre/schedulers/SchedulerPBS.py (part of the pyre package).

Thanks for your advice!
I can excute “pylith step02_7.cfg” in the command line successfully. But when I add the following four lines into the pylithapp.cfg file:
[pylithapp]
scheduler = pbs
[pylithapp.launcher]
command = mpirun -np ${nodes} -machinefile hosts
and I run these in the command line:
pylith step02_7.cfg --job.queue=batch1 --job.name=step02_7 --job.stdout=step02_7.log --job.stderr=step02_7.err --job.walltime=400*hour --nodes=48 --scheduler.ppn=12
It deosn’t work and the step02_7.err shows these information:

Traceback (most recent call last):
File “”, line 1, in
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/init.py”, line 19, in
from Shell import Shell
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/Shell.py”, line 25, in
class Shell(Configurable):
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/Shell.py”, line 39, in Shell
from Preprocessor import Preprocessor
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/Preprocessor.py”, line 17, in
import os, getpass, socket
File “/home/node/pylith/lib/python2.7/socket.py”, line 47, in
import _socket
ImportError: /home/node/pylith/lib/python2.7/lib-dynload/_socket.so: undefined symbol: PyUnicodeUCS2_FromFormat

So why this happened? :thinking:
My hosts file includes:
cu01
cu02
cu03
cu04
And using the qsub-options function, I can receieve the email:
PBS Job Id: 1209.mu01
Job Name: step02_7
Exec host: cu04/0-11+cu03/0-11+cu02/0-11+cu01/0-11
Execution terminated
Exit_status=1
resources_used.cput=00:00:00
resources_used.vmem=0kb
resources_used.walltime=00:00:01
resources_used.mem=0kb
resources_used.energy_used=0
Error_Path: mu01:/home/node/pengzhai/script/step02_7.err
Output_Path: mu01:/home/node/pengzhai/script/step02_7.log

It still looks like your environment is not setup consistently from when you built PyLith and run it interactively versus run on a compute node. You can check this by doing ldd /home/node/pylith/lib/python2.7/lib-dynload/_socket.so when you are logged in and when you have a session on a compute node. You want to make sure all libraries are found and the paths are the same.

Hi Brad,


I compile Pylith today again and I meet a mistake when installing the h5py. It seems that there is something wrong with the pkgconfig.py. And I found that the pkgconfig-1.5.4 doesn’t support python2 anymore from pkgconfig · PyPI !
图片
Then I had a try to excute the followings before continuing to install h5py

pip install cftime==1.1.1
pip install Cython==0.29.16
pip install pkgconfig==1.5.0

After that I continued to ‘make’, It looks like everything is ok! However, I don’t know if this way is right. :sweat_smile:

New error does occur when I run Pylith on multiple nodes as before, and running Pylith on single node is ok.

Traceback (most recent call last):
File “”, line 1, in
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/schedulers/init.py”, line 53, in jobstart
kwds = kwds)
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/init.py”, line 41, in start
shell.run(**kwds)
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/Shell.py”, line 143, in run
method(*args, **kwds)
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/SuperScript.py”, line 47, in execute
self.subscript = self.createSubscript(subscriptName)
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/applications/AppRunner.py”, line 25, in createSubscript
cls = loadObject(name)
File “/home/node/pylith/lib/python2.7/site-packages/pythia-0.8.1.19-py2.7.egg/pyre/util/init.py”, line 54, in loadObject
obj = import(module, globals(), globals(), [‘name’])
File “/home/node/pylith/lib/python2.7/site-packages/pylith/apps/PyLithApp.py”, line 23, in
from PetscApplication import PetscApplication
File “/home/node/pylith/lib/python2.7/site-packages/pylith/apps/PetscApplication.py”, line 27, in
class PetscApplication(Application):
File “/home/node/pylith/lib/python2.7/site-packages/pylith/apps/PetscApplication.py”, line 41, in PetscApplication
from pylith.utils.PetscManager import PetscManager
File “/home/node/pylith/lib/python2.7/site-packages/pylith/utils/PetscManager.py”, line 29, in
import pylith.utils.petsc as petsc
File “/home/node/pylith/lib/python2.7/site-packages/pylith/utils/petsc.py”, line 28, in
_petsc = swig_import_helper()
File “/home/node/pylith/lib/python2.7/site-packages/pylith/utils/petsc.py”, line 24, in swig_import_helper
_mod = imp.load_module(‘_petsc’, fp, pathname, description)
ImportError: libsz.so.2: cannot open shared object file: No such file or directory

Thanks,
Zhai

The PyLith installer v2.2.2-1 will install cftime==1.1.1 and Cython==0.29.16, so those versions should work.

Once again, if you can build and run the tests but you get error messages about libraries not being found when you run on compute nodes, then your environment is not the same on the compute nodes as when you build. I recommend consulting with the system administrators for the cluster. They are in a much better position to help you than us.