The job dependency feature of SLURM is useful when you need to run multiple jobs in a particular order. A standard example of this is a workflow in which the output from one job is used as the input to the next. Rather than continually check to see if one job has ended and then manually submit the next, all the jobs in the workflow can be submitted at once. SLURM will then run them in the proper order based on the conditions supplied.
Syntax:
sbatch -d afterok:YOUR_JOBID YOUR_SLURM_SCRIPT
Example:
This example is usually referred to as a “diamond” workflow. There are 4 jobs total; the jobs are labeled A through D. Job A runs first. Jobs B and C both depend on Job A completing before they can run. Job D then depends on Jobs B and C completing.
The SLURM submit files for each step are below.
JobA.submit
#!/bin/sh
#SBATCH --job-name=JobA
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobA.stdout
#SBATCH --error=JobA.stderr
echo "I'm job A"
echo "Sample job A output" > jobA.out
sleep 120
JobB.submit
#!/bin/sh
#SBATCH --job-name=JobB
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobB.stdout
#SBATCH --error=JobB.stderr
echo "I'm job B"
echo "I'm using output from job A"
cat jobA.out >> jobB.out
echo "" >> jobB.out
echo "Sample job B output" >> jobB.out
sleep 120
JobC.submit
#!/bin/sh
#SBATCH --job-name=JobC
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobC.stdout
#SBATCH --error=JobC.stderr
echo "I'm job C"
echo "I'm using output from job A"
cat jobA.out >> jobC.out
echo "" >> jobC.out
echo "Sample job C output" >> jobC.out
sleep 120
JobD.submit
#!/bin/sh
#SBATCH --job-name=JobD
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobD.stdout
#SBATCH --error=JobD.stderr
echo "I'm job D"
echo "I'm using output from jobs B and C"
cat jobB.out >> jobD.out
echo "" >> jobD.out
cat jobC.out >> jobD.out
echo "" >> jobD.out
echo "Sample job D output" >> jobD.out
sleep 120
Now we submit Job A first. Once Job A finishes successfully, Job B and C will start automatically. Job D will start once Job B and Job C both finish successfully.
#Submit Job A
[jzhan61@clogin01 dependence]$ sbatch JobA.submit
Submitted batch job 20820
#Submit Job B and Job C depending on Job A
[jzhan61@clogin01 dependence]$ sbatch -d afterok:20820 JobB.submit
Submitted batch job 20821
[jzhan61@clogin01 dependence]$ sbatch -d afterok:20820 JobC.submit
Submitted batch job 20822
#Submit Job D depending on Job B and Job C
[jzhan61@clogin01 dependence]$ sbatch -d afterok:20821:20822 JobD.submit
Submitted batch job 20823
#Check job status
[jzhan61@clogin01 dependence]$ squeue -u jzhan61
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
20821 short-cpu JobB jzhan61 PD 0:00 1 (Dependency)
20822 short-cpu JobC jzhan61 PD 0:00 1 (Dependency)
20823 short-cpu JobD jzhan61 PD 0:00 1 (Dependency)
20820 short-cpu JobA jzhan61 R 0:48 1 node22
Note the NODELIST(REASON) for jobs B, C and D is Dependency, which means these jobs are pending due to a job dependence.