Part 3. Test your cluster

22.

Go back to the Terminal window you used to login to your cluster’s head node. Check that the submitted jobs are able to access the data by first editing the simple shell script we copied before logging in:

[ec2-user@ip-10-0-0-201 data] $ nano submit.sh

23.

Enter the text below at the end of the file (be sure to replace the TopicArn number):

cp /data/manual_entry.txt /data/manual_entry2.txt 

aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxx:awsnotify --message "JobDone"

Exit and save the file.

24.

Submit the job and check the status:

[ec2-user@ip-10-0-0-201 data] $ sbatch submit.sh

[ec2-user@ip-10-0-0-201 data] $ squeue

Something like this means the job is still waiting to be run:

  • PD: Pending
  • CF: Configuring
JOBID  PARTITION      NAME       USER  ST       TIME  NODES  NODELIST(REASON)
    2    compute  example-   ec2-user  CF       0:05      1  compute-dy-t2micro-1

You should then receive a SNS notification from your EC2 CloudWatch rule once the compute node gets set up, but it might still take some time before the job will start running.

Something like this means the job is running:

  • R: Running
JOBID  PARTITION      NAME      USER  ST       TIME  NODES  NODELIST(REASON)
    2    compute  example-  ec2-user   R       0:05      1  compute-dy-t2micro-1

25.

Once the job disappears from the queue (or if you receive the email saying that the job is done), you can check the output. If you list the files in the /data directory, you should see:

lost+found         miniconda3  slurm.compute-dy-t2micro-1.2.err
manual_entry2.txt  picasso     slurm.compute-dy-t2micro-1.2.out
manual_entry.txt   RCP         submit.sh

You can do the following to check the output contained in the file slurm.compute-dy-t2micro-1.2.out and errors in the file slurm.compute-dy-t2micro-1.2.err from the job:

[ec2-user@ip-10-0-0-201 data] $ cat slurm*

which should just show you the output of the aws sns publish command if everything went well:

{
    "MessageId": "89229119-cbbf-5450-be6b-c8a7cb8300a9"
}

Task

Take a screenshot of your Terminal window showing the directory listing of all the files above and include it in your lab report.

Task

Take a screenshot of the notification email from AWS saying the job is done and include it in your lab report.

26.

Before we exit the cluster, we will copy the ~/.aws directory which contains the AWS keys to /data so that it will be backed up for future use:

[ec2-user@ip-10-0-0-201 data] $ cp -r ~/.aws /data/aws

27.

Exit the cluster (you can also type Ctrl-d):

[ec2-user@ip-10-0-0-201 data] $ exit