R job example – BIOS at RSPH

Below is an example of submitting batch R jobs to the cluster. If you have questions on creating a SLURM file, submitting a SLURM job, or run R jobs interactively, please check the links below:

This R example will read in an Operational Taxonomical Unit (OTU) Table (otu_table.csv) of microbial abundance counts and normalize them by cleaning out any missing entries, replacing zero-values with nominal values, and then scaling all values. The normalized table is then written in a new file normalized_otu_matrix.csv.

Contents of the SLURM file:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4G
#SBATCH --time=00:10:00
#SBATCH --job-name=R_normalize
#SBATCH --error=R_normalize.%J.err
#SBATCH --output=R_normalize.%J.out

module load R/4.0.2
Rscript normalize.R

This script will request for 1 node, 1 core, 4G memory and 10 minutes run time from the SLURM job scheduler. The job name is R_normalize. In –error=R_normalize.%J.err and –output=R_normalize.%J.out, the %J parameter will be replaced by the job id once the job starts to run.

To download the example files, use command below:

wget https://scholarblogs.emory.edu/rsph-hpc/files/2020/09/R_example.zip

To submit this job to the cluster, use the command sbatch SLURM_R.submit.

Below is a full walkthrough:

#Create an R_example directory
[jzhan61@clogin01 ~]$ mkdir R_example
[jzhan61@clogin01 ~]$ cd R_example/

#Download example files
[jzhan61@clogin01 R_example]$ wget https://scholarblogs.emory.edu/rsph-hpc/files/2020/09/R_example.zip
--2020-09-24 15:04:04--  https://scholarblogs.emory.edu/rsph-hpc/files/2020/09/R_example.zip
Resolving scholarblogs.emory.edu (scholarblogs.emory.edu)... 34.196.187.114, 34.198.138.92
Connecting to scholarblogs.emory.edu (scholarblogs.emory.edu)|34.196.187.114|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 797805 (779K) [application/zip]
Saving to: ‘R_example.zip’

R_example.zip                                               100%[========================================================================================================================================>] 779.11K  --.-KB/s    in 0.1s

2020-09-24 15:04:05 (5.16 MB/s) - ‘R_example.zip’ saved [797805/797805]

#Unzip example files
[jzhan61@clogin01 R_example]$ unzip R_example.zip
Archive:  R_example.zip
  inflating: normalize.R
  inflating: otu_table.csv.gz
  inflating: README.md
  inflating: SLURM_R.submit

#Check the list of files
[jzhan61@clogin01 R_example]$ ls
normalize.R  otu_table.csv.gz  README.md  R_example.zip  SLURM_R.submit

#Print the contents of SLURM_R.submit to screen
[jzhan61@clogin01 R_example]$ cat SLURM_R.submit
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=4G
#SBATCH --time=00:10:00
#SBATCH --job-name=R_normalize
#SBATCH --error=R_normalize.%J.err
#SBATCH --output=R_normalize.%J.out

module load R/4.0.2
Rscript normalize.R

#Submit this R job to the cluster using sbatch command
[jzhan61@clogin01 R_example]$ sbatch SLURM_R.submit
Submitted batch job 14059

#Check job status using JOB ID 
[jzhan61@clogin01 R_example]$ squeue -j 14059
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
             14059 short-cpu R_normal  jzhan61  R       0:14      1 node8

#Once the job is completed, the job status will be empty
[jzhan61@clogin01 R_example]$ squeue -j 14059
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

#Check the list of files in the folder again. The result file normalized_otu_matrix.csv has been generated successfully.
[jzhan61@clogin01 R_example]$ ls
normalized_otu_matrix.csv  normalize.R  otu_table.csv.gz  README.md  R_example.zip  R_normalize.14059.err  R_normalize.14059.out  SLURM_R.submit