SLURM Job Dependences

The job dependency feature of SLURM is useful when you need to run multiple jobs in a particular order.  A standard example of this is a workflow in which the output from one job is used as the input to the next.  Rather than continually check to see if one job has ended and then manually submit the next, all the jobs in the workflow can be submitted at once.  SLURM will then run them in the proper order based on the conditions supplied.  

Syntax:

sbatch -d afterok:YOUR_JOBID YOUR_SLURM_SCRIPT

Example:

This example is usually referred to as a “diamond” workflow.  There are 4 jobs total; the jobs are labeled A through D.  Job A runs first.  Jobs B and C both depend on Job A completing before they can run.  Job D then depends on Jobs B and C completing.

The SLURM submit files for each step are below.

JobA.submit

#!/bin/sh
#SBATCH --job-name=JobA
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobA.stdout
#SBATCH --error=JobA.stderr
echo "I'm job A"
echo "Sample job A output" > jobA.out
sleep 120

JobB.submit

#!/bin/sh
#SBATCH --job-name=JobB
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobB.stdout
#SBATCH --error=JobB.stderr
echo "I'm job B"
echo "I'm using output from job A"
cat jobA.out >> jobB.out
echo "" >> jobB.out
echo "Sample job B output" >> jobB.out
sleep 120

JobC.submit

#!/bin/sh
#SBATCH --job-name=JobC
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobC.stdout
#SBATCH --error=JobC.stderr
echo "I'm job C"
echo "I'm using output from job A"
cat jobA.out >> jobC.out
echo "" >> jobC.out
echo "Sample job C output" >> jobC.out
sleep 120

JobD.submit

#!/bin/sh
#SBATCH --job-name=JobD
#SBATCH --time=00:05:00
#SBATCH --ntasks=1
#SBATCH --output=JobD.stdout
#SBATCH --error=JobD.stderr
echo "I'm job D"
echo "I'm using output from jobs B and C"
cat jobB.out >> jobD.out
echo "" >> jobD.out
cat jobC.out >> jobD.out
echo "" >> jobD.out
echo "Sample job D output" >> jobD.out
sleep 120

Now we submit Job A first. Once Job A finishes successfully, Job B and C will start automatically. Job D will start once Job B and Job C both finish successfully.

#Submit Job A
[jzhan61@clogin01 dependence]$ sbatch JobA.submit
Submitted batch job 20820

#Submit Job B and Job C depending on Job A
[jzhan61@clogin01 dependence]$ sbatch -d afterok:20820 JobB.submit
Submitted batch job 20821
[jzhan61@clogin01 dependence]$ sbatch -d afterok:20820 JobC.submit
Submitted batch job 20822

#Submit Job D depending on Job B and Job C
[jzhan61@clogin01 dependence]$ sbatch -d afterok:20821:20822 JobD.submit
Submitted batch job 20823

#Check job status
[jzhan61@clogin01 dependence]$ squeue -u jzhan61
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
             20821 short-cpu     JobB  jzhan61 PD       0:00      1 (Dependency) 
             20822 short-cpu     JobC  jzhan61 PD       0:00      1 (Dependency) 
             20823 short-cpu     JobD  jzhan61 PD       0:00      1 (Dependency) 
             20820 short-cpu     JobA  jzhan61  R       0:48      1 node22 

Note the NODELIST(REASON) for jobs B, C and D is Dependency, which means these jobs are pending due to a job dependence.