You can also see the latest version here.

1 Job submission guide

Generally, you need to submit the jobs with the allocation of the full resources (nodes and cpu cores),

so that the jobs could work efficiently and don’t have to queue for many times for the same computation.

As more sources as possible

For CPU nodes, set cpu cores == 40, for GPU nodes, set cpu nodes == 20, for amd nodes, set cpu cores == 94
```
# For CPU/himem
-n 40 --ntasks=40

# For GPU
-n 20 --ntasks=20

# For amd
-n 94 --ntasks=94
```
Urgent jobs

For urgent task, you may share the cores with other users so that you don’t have to queue for a long time.

In this case, you can adjust the cpu core number as few as possible, so that you can share the nodes with other users and don’t have to queue for the whole node.

2 Resource overview

Please refer to the Hardware overview for the latest updates.

The maximum number of the cores for the nodes is listed:

# CPU/HIMEM nodes
MAX_CPU_CORE=40

# GPU nodes
MAX_CPU_CORE=20

3 Submission guide

3.1 Check the available nodes

You need to check the available nodes by the following conmmand:

sinfo

Then you can submit your jobs into following nodes:

Idle nodes (there is currently no one using this node)
Mixed nodes (share cores with other users in the same node)

3.1.1 Get idle nodes

Find the idle node partition:

sinfo -a | awk 'NR==1 || /idle/'

# Output:
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
jiy            up   infinite      2   idle hhnode-ib-[32-33]
cpu            up   infinite     12   idle hhnode-ib-[95-98,100,237-238,240-242,251-252]
himem          up   infinite      1   idle hhnode-ib-103
gpu3090        up   infinite      9   idle hhnode-ib-[186-187,190-196]
isd            up   infinite      1   idle hhnode-ib-233
dbm            up   infinite      1   idle hhnode-ib-234
amd            up   infinite      4   idle hhnode-ib-[253-256]

3.1.2 Get mixed nodes

Find the mixed node paritions:

sinfo -a | awk 'NR==1 || /mix/'

# Output:
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
cpu            up   infinite     15    mix hhnode-ib-[16,36,46-47,56,69,86,99,201,231,239,243,245-246,249]
cpu-share      up   infinite      9    mix hhnode-ib-[16,36,46-47,56,69,86,201,231]
gpu            up   infinite      4    mix hhnode-ib-[106-107,140,142]
gpu-share      up   infinite      4    mix hhnode-ib-[106-107,140,142]
x-gpu          up   infinite      2    mix hhnode-ib-[153,159]
x-gpu-share    up   infinite      2    mix hhnode-ib-[153,159]
gpu3090        up   infinite      1    mix hhnode-ib-189
math           up   infinite      2    mix hhnode-ib-[235-236]

3.2 Submit jobs on idle nodes (Whole node)

You can use srun/sbatch/salloc to apply for the resources to access sources for the full node

3.2.1 For `srun` on CPU/HIMEM nodes:

# -N 1 -n 40
srun srunCommand.sh

cat srunCommand.sh

#!/bin/bash

#SRUN -p cpu-share # himem-share
#SRUN -N 1
#SRUN -n 40
#SRUN --exclusive
#SRUN -J testJobName #Slurm job name
#SRUN -t 24:00:00 #Maximum runtime of 48 hours
#SRUN --mail-user=user_name@ust.hk #Update your email address
#SRUN --mail-type=begin
#SRUN --mail-type=end
#SRUN --pty bash

3.2.2 For `srun` on GPU nodes:

# -N 1 -n 20
srun srunCommand.sh

cat srunCommand.sh

#!/bin/bash

#SRUN -p gpu-share 
#SRUN -N 1
#SRUN -n 20
#SRUN --exclusive
#SRUN -J testJobName #Slurm job name
#SRUN -t 24:00:00 #Maximum runtime of 48 hours
#SRUN --mail-user=user_name@ust.hk #Update your email address
#SRUN --mail-type=begin
#SRUN --mail-type=end
#SRUN --pty bash

3.2.3 For `srun` on amd nodes:

# -N 1 -n 94
srun srunCommand.sh

cat srunCommand.sh

#!/bin/bash

#SRUN -p amd
#SRUN -N 1
#SRUN -n 20
#SRUN --exclusive
#SRUN -J testJobName #Slurm job name
#SRUN -t 24:00:00 #Maximum runtime of 48 hours
#SRUN --mail-user=user_name@ust.hk #Update your email address
#SRUN --mail-type=begin
#SRUN --mail-type=end
#SRUN --pty bash

3.2.4 For `sbatch` on CPU/HIMEM nodes:

# -N 1 -n 40
sbatch sbatchCommand.sh

cat sbatchCommand.sh

#!/bin/bash

#SBATCH -p cpu-share # himem-share
#SBATCH -N 1
#SBATCH -n 40
#SBATCH --exclusive
#SBATCH --gres-flags=enforce-binding
#SBATCH -J testJobName #Slurm job name
#SBATCH -t 24:00:00 #Maximum runtime of 48 hours
#SBATCH --mail-user=user_name@ust.hk #Update your email address
#SBATCH --mail-type=begin
#SBATCH --mail-type=end

3.2.5 For `sbatch` on GPU nodes:

# -N 1 -n 20
sbatch sbatchCommand.sh

cat sbatchCommand.sh

#!/bin/bash

#SBATCH -p gpu-share
#SBATCH -N 1
#SBATCH -n 20
#SBATCH --exclusive
#SBATCH --gres-flags=enforce-binding
#SBATCH -J testJobName #Slurm job name
#SBATCH -t 24:00:00 #Maximum runtime of 48 hours
#SBATCH --mail-user=user_name@ust.hk #Update your email address
#SBATCH --mail-type=begin
#SBATCH --mail-type=end

3.2.6 For `sbatch` on amd nodes:

# -N 1 -n 94
sbatch sbatchCommand.sh

cat sbatchCommand.sh

#!/bin/bash

#SBATCH -p amd
#SBATCH -N 1
#SBATCH -n 94
#SBATCH --exclusive
#SBATCH --gres-flags=enforce-binding
#SBATCH -J testJobName #Slurm job name
#SBATCH -t 24:00:00 #Maximum runtime of 48 hours
#SBATCH --mail-user=user_name@ust.hk #Update your email address
#SBATCH --mail-type=begin
#SBATCH --mail-type=end

3.3 Submit jobs on mixed nodes (Fast debugging)

First, check the partition and the available core number

# check mixed nodes 
sinfo -a | awk 'NR==1 || /mix/'

# Output:
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
cpu            up   infinite     15    mix hhnode-ib-[16,36,46-47,56,69,86,99,201,231,239,243,245-246,249]
cpu-share      up   infinite      9    mix hhnode-ib-[16,36,46-47,56,69,86,201,231]
gpu            up   infinite      4    mix hhnode-ib-[106-107,140,142]
gpu-share      up   infinite      4    mix hhnode-ib-[106-107,140,142]
x-gpu          up   infinite      2    mix hhnode-ib-[153,159]
x-gpu-share    up   infinite      2    mix hhnode-ib-[153,159]
gpu3090        up   infinite      1    mix hhnode-ib-189
math           up   infinite      2    mix hhnode-ib-[235-236]

# check max core number for node-86
squeue -o "%.18i %.9P %.5D %.5C %.8j %.8u %.6g %.2t %.10M %R" | awk 'NR==1 || /hhnode-ib-86/'

# Output:
JOBID PARTITION NODES  CPUS     NAME     USER  GROUP ST       TIME NODELIST(REASON)
1030063 cpu-share     1     8     CuPt hlwongac keztlu  R    2:11:28 hhnode-ib-86

Second, submit the jobs by specific the number of cores:

# so in this case, the maximum 40-8=32 cores could be used
# for example, use srun to sub the job
srun -p cpu-share -N 1 -n 32 -J testMixedNode -w hhnode-ib-86 srunCommand.sh

4 View/cancel the job

First, get your job pid by squeue, then use the scancel command to cancel the jobs

# use squeue to get the PID
squeue -u youUserName

# Output:
1025035 gpu-share     1     1      jobName1   youUserName R    3:03:31 hhnode-ib-145
1025034 gpu-share     1     1      jobName2   youUserName R    9:57:32 hhnode-ib-145


# cancel the job
scancel 1025035 1025034

5 References

HPC3 document

jligm @WangLab

2023-11-29

1 Job submission guide

2 Resource overview

3 Submission guide

3.1 Check the available nodes

3.1.1 Get idle nodes

3.1.2 Get mixed nodes

3.2 Submit jobs on idle nodes (Whole node)

3.2.1 For `srun` on CPU/HIMEM nodes:

3.2.2 For `srun` on GPU nodes:

3.2.3 For `srun` on amd nodes:

3.2.4 For `sbatch` on CPU/HIMEM nodes:

3.2.5 For `sbatch` on GPU nodes:

3.2.6 For `sbatch` on amd nodes:

3.3 Submit jobs on mixed nodes (Fast debugging)

4 View/cancel the job

5 References

HPC3 document

jligm @WangLab

2023-11-29

1 Job submission guide

2 Resource overview

3 Submission guide

3.1 Check the available nodes

3.1.1 Get idle nodes

3.1.2 Get mixed nodes

3.2 Submit jobs on idle nodes (Whole node)

3.2.1 For srun on CPU/HIMEM nodes:

3.2.2 For srun on GPU nodes:

3.2.3 For srun on amd nodes:

3.2.4 For sbatch on CPU/HIMEM nodes:

3.2.5 For sbatch on GPU nodes:

3.2.6 For sbatch on amd nodes:

3.3 Submit jobs on mixed nodes (Fast debugging)

4 View/cancel the job

5 References

3.2.1 For `srun` on CPU/HIMEM nodes:

3.2.2 For `srun` on GPU nodes:

3.2.3 For `srun` on amd nodes:

3.2.4 For `sbatch` on CPU/HIMEM nodes:

3.2.5 For `sbatch` on GPU nodes:

3.2.6 For `sbatch` on amd nodes: