You can also see the latest version here.
Generally, you need to submit the jobs with the allocation of the full resources (nodes and cpu cores),
so that the jobs could work efficiently and don’t have to queue for many times for the same computation.
As more sources as possible
For CPU nodes, set cpu cores == 40
, for GPU nodes, set
cpu nodes == 20
, for amd nodes, set
cpu cores == 94
# For CPU/himem
-n 40 --ntasks=40
# For GPU
-n 20 --ntasks=20
# For amd
-n 94 --ntasks=94
Urgent jobs
For urgent task, you may share the cores with other users so that you don’t have to queue for a long time.
In this case, you can adjust the cpu core number
as few
as possible, so that you can share the nodes with other users and don’t
have to queue for the whole node.
Please refer to the Hardware overview for the latest updates.
The maximum number of the cores for the nodes is listed:
# CPU/HIMEM nodes
MAX_CPU_CORE=40
# GPU nodes
MAX_CPU_CORE=20
You need to check the available nodes by the following conmmand:
sinfo
Then you can submit your jobs into following nodes:
Find the idle node partition:
sinfo -a | awk 'NR==1 || /idle/'
# Output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
jiy up infinite 2 idle hhnode-ib-[32-33]
cpu up infinite 12 idle hhnode-ib-[95-98,100,237-238,240-242,251-252]
himem up infinite 1 idle hhnode-ib-103
gpu3090 up infinite 9 idle hhnode-ib-[186-187,190-196]
isd up infinite 1 idle hhnode-ib-233
dbm up infinite 1 idle hhnode-ib-234
amd up infinite 4 idle hhnode-ib-[253-256]
Find the mixed node paritions:
sinfo -a | awk 'NR==1 || /mix/'
# Output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cpu up infinite 15 mix hhnode-ib-[16,36,46-47,56,69,86,99,201,231,239,243,245-246,249]
cpu-share up infinite 9 mix hhnode-ib-[16,36,46-47,56,69,86,201,231]
gpu up infinite 4 mix hhnode-ib-[106-107,140,142]
gpu-share up infinite 4 mix hhnode-ib-[106-107,140,142]
x-gpu up infinite 2 mix hhnode-ib-[153,159]
x-gpu-share up infinite 2 mix hhnode-ib-[153,159]
gpu3090 up infinite 1 mix hhnode-ib-189
math up infinite 2 mix hhnode-ib-[235-236]
You can use srun/sbatch/salloc
to apply for the
resources to access sources for the full node
srun
on CPU/HIMEM nodes:# -N 1 -n 40
srun srunCommand.sh
cat srunCommand.sh
#!/bin/bash
#SRUN -p cpu-share # himem-share
#SRUN -N 1
#SRUN -n 40
#SRUN --exclusive
#SRUN -J testJobName #Slurm job name
#SRUN -t 24:00:00 #Maximum runtime of 48 hours
#SRUN --mail-user=user_name@ust.hk #Update your email address
#SRUN --mail-type=begin
#SRUN --mail-type=end
#SRUN --pty bash
srun
on GPU nodes:# -N 1 -n 20
srun srunCommand.sh
cat srunCommand.sh
#!/bin/bash
#SRUN -p gpu-share
#SRUN -N 1
#SRUN -n 20
#SRUN --exclusive
#SRUN -J testJobName #Slurm job name
#SRUN -t 24:00:00 #Maximum runtime of 48 hours
#SRUN --mail-user=user_name@ust.hk #Update your email address
#SRUN --mail-type=begin
#SRUN --mail-type=end
#SRUN --pty bash
srun
on amd nodes:# -N 1 -n 94
srun srunCommand.sh
cat srunCommand.sh
#!/bin/bash
#SRUN -p amd
#SRUN -N 1
#SRUN -n 20
#SRUN --exclusive
#SRUN -J testJobName #Slurm job name
#SRUN -t 24:00:00 #Maximum runtime of 48 hours
#SRUN --mail-user=user_name@ust.hk #Update your email address
#SRUN --mail-type=begin
#SRUN --mail-type=end
#SRUN --pty bash
sbatch
on CPU/HIMEM nodes:# -N 1 -n 40
sbatch sbatchCommand.sh
cat sbatchCommand.sh
#!/bin/bash
#SBATCH -p cpu-share # himem-share
#SBATCH -N 1
#SBATCH -n 40
#SBATCH --exclusive
#SBATCH --gres-flags=enforce-binding
#SBATCH -J testJobName #Slurm job name
#SBATCH -t 24:00:00 #Maximum runtime of 48 hours
#SBATCH --mail-user=user_name@ust.hk #Update your email address
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
sbatch
on GPU nodes:# -N 1 -n 20
sbatch sbatchCommand.sh
cat sbatchCommand.sh
#!/bin/bash
#SBATCH -p gpu-share
#SBATCH -N 1
#SBATCH -n 20
#SBATCH --exclusive
#SBATCH --gres-flags=enforce-binding
#SBATCH -J testJobName #Slurm job name
#SBATCH -t 24:00:00 #Maximum runtime of 48 hours
#SBATCH --mail-user=user_name@ust.hk #Update your email address
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
sbatch
on amd nodes:# -N 1 -n 94
sbatch sbatchCommand.sh
cat sbatchCommand.sh
#!/bin/bash
#SBATCH -p amd
#SBATCH -N 1
#SBATCH -n 94
#SBATCH --exclusive
#SBATCH --gres-flags=enforce-binding
#SBATCH -J testJobName #Slurm job name
#SBATCH -t 24:00:00 #Maximum runtime of 48 hours
#SBATCH --mail-user=user_name@ust.hk #Update your email address
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
First, check the partition and the available core number
# check mixed nodes
sinfo -a | awk 'NR==1 || /mix/'
# Output:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cpu up infinite 15 mix hhnode-ib-[16,36,46-47,56,69,86,99,201,231,239,243,245-246,249]
cpu-share up infinite 9 mix hhnode-ib-[16,36,46-47,56,69,86,201,231]
gpu up infinite 4 mix hhnode-ib-[106-107,140,142]
gpu-share up infinite 4 mix hhnode-ib-[106-107,140,142]
x-gpu up infinite 2 mix hhnode-ib-[153,159]
x-gpu-share up infinite 2 mix hhnode-ib-[153,159]
gpu3090 up infinite 1 mix hhnode-ib-189
math up infinite 2 mix hhnode-ib-[235-236]
# check max core number for node-86
squeue -o "%.18i %.9P %.5D %.5C %.8j %.8u %.6g %.2t %.10M %R" | awk 'NR==1 || /hhnode-ib-86/'
# Output:
JOBID PARTITION NODES CPUS NAME USER GROUP ST TIME NODELIST(REASON)
1030063 cpu-share 1 8 CuPt hlwongac keztlu R 2:11:28 hhnode-ib-86
Second, submit the jobs by specific the number of cores:
# so in this case, the maximum 40-8=32 cores could be used
# for example, use srun to sub the job
srun -p cpu-share -N 1 -n 32 -J testMixedNode -w hhnode-ib-86 srunCommand.sh
First, get your job pid by squeue
, then use the
scancel command to cancel the jobs
# use squeue to get the PID
squeue -u youUserName
# Output:
1025035 gpu-share 1 1 jobName1 youUserName R 3:03:31 hhnode-ib-145
1025034 gpu-share 1 1 jobName2 youUserName R 9:57:32 hhnode-ib-145
# cancel the job
scancel 1025035 1025034