Issue during using qsub on cluster


#1

Dear all,

I would like to run the code using qsub after I installed aspect on my cluster.

the following information is about ASPECT:

ASPECT 2.1.0-pre
deal.II 9.0.1-pre
Trilinos 12.10.1
p4rest 2.0.0
petsc-3.7.

the location of executable program: /opt/bin/aspect/build/aspect
I upload convection-box-3d.prm to a shared folders, /data2/liu/aspect/

And I set Output directory = /data2/liu/aspect/output-convection-box-3d in the file of convection-box-3d.prm
But it didn’t work

the following is my file of pbs_sub.pbs

#!/bin/sh -f
#PBS -N test-aspect
#PBS -q cu
#PBS -l nodes=cu10:ppn=20
#PBS -l walltime=2400:00:00
#PBS -V
#PBS -S /bin/bash
#PBS -o /data2/liu/aspect/my.out
#PBS -e /data2/liu/aspect/my.err

Switch to current working directory

cd $PBS_O_WORKDIR

mpirun --mca btl openib,self -np $nprocs -hostfile $PBS_NODEFILE /opt/bin/aspect/build/aspect /data2/liu/aspect/convection-box-3d.prm

Let me know if you need further information,
Many thanks for any suggestion.

liu


#2

Hi Liu,
could you be a bit more specific about what

But it didn’t work

means? Did you get an error message? If so, what does the error message say? At this point it could be any number of problems (e.g. in your submission script, in your file path, in the parameter file, or in the aspect binary).

Best,
Rene


#3

So what concretely happens? There is really very little we can guess
might be wrong without knowing what actually happens – you have to give
us a bit more information :slight_smile:

Best
W.


#4

First of all, thanks for your reply.

I mean I ran aspect on PC serial before. Currently, I would like to run it on CentOS cluster, but I dont know how to write script to run ASPECT in the PBS Environment? So I learn to write some script according some example online. But maybe I made a mistake with some part of script. I submit job using command, qsub pbs_sub.pbs. unfortunately, after a few second, I check the state, the job’s state is end.

[liu@ln01 data2] cd /data2/liu/aspect/ [liu@ln01 aspect] ls
convection-box-3d.prm pbs_sub_impi.pbs pbs_sub-test-03.pbs pbs_sub-test-06.pbs
pbs_sub_impi_01.pbs pbs_sub-test-01.pbs pbs_sub-test-04.pbs pbs_sub-test-07.pbs
pbs_sub_impi_02.pbs pbs_sub-test-02.pbs pbs_sub-test-05.pbs pbs_sub-test.pbs
[liuze@ln01 aspect] qsub pbs_sub_impi_02.pbs 2257.mu01 [liu@ln01 aspect] qstat
Job ID Name User Time Use S Queue


2257.mu01 test-aspect liuze 00:00:00 E cu
[liu@ln01 aspect]$ qstat
Job ID Name User Time Use S Queue


2257.mu01 test-aspect liuze 00:00:00 E cu
[liu@ln01 aspect]$

the following is my script of pbs_sub_impi_02.pbs

#!/bin/sh -f
#PBS -N test-aspect
#PBS -q cu
#PBS -l nodes=cu12:ppn=20
#PBS -l walltime=2400:00:00
#PBS -V
#PBS -S /bin/bash
cd $PBS_O_WORKDIR
NP=cat $PBS_NODEFILE | wc -l
mpirun -genv I_MPI_DEVICE rdma -machinefile $PBS_NODEFILE -np $NP ./opt/bin/aspect/build/aspect convection-box-3d.prm > 2D01.log 2>&1

Can you give me some advice
Many thanks for any suggestion.

liu


#5

I’m afraid this is still not enough information. There will generally be two files for each submitted job – one that shows the regular output of the program and the other any error output. These two should contain information what happened to your job. In an earlier post, you showed that you had these two lines in your job script:

#PBS -o /data2/liu/aspect/my.out
#PBS -e /data2/liu/aspect/my.err

So these are the files you need to look into.