Part 3. Test your cluster
22.
Go back to the Terminal window you used to login to your cluster’s head node. Check that the submitted jobs are able to access the data by first editing the simple shell script we copied before logging in:
[ec2-user@ip-10-0-0-201 data] $ nano submit.sh
23.
Enter the text below at the end of the file (be sure to replace the TopicArn number):
cp /data/manual_entry.txt /data/manual_entry2.txt
aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxx:awsnotify --message "JobDone"
Exit and save the file.
24.
Submit the job and check the status:
[ec2-user@ip-10-0-0-201 data] $ sbatch submit.sh
[ec2-user@ip-10-0-0-201 data] $ squeue
Something like this means the job is still waiting to be run:
PD: PendingCF: Configuring
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 compute example- ec2-user CF 0:05 1 compute-dy-t2micro-1
You should then receive a SNS notification from your EC2 CloudWatch rule once the compute node gets set up, but it might still take some time before the job will start running.
Something like this means the job is running:
R: Running
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2 compute example- ec2-user R 0:05 1 compute-dy-t2micro-1
25.
Once the job disappears from the queue (or if you receive the email saying that the job is done), you can check the output. If you list the files in the /data directory, you should see:
lost+found miniconda3 slurm.compute-dy-t2micro-1.2.err
manual_entry2.txt picasso slurm.compute-dy-t2micro-1.2.out
manual_entry.txt RCP submit.sh
You can do the following to check the output contained in the file slurm.compute-dy-t2micro-1.2.out and errors in the file slurm.compute-dy-t2micro-1.2.err from the job:
[ec2-user@ip-10-0-0-201 data] $ cat slurm*
which should just show you the output of the aws sns publish command if everything went well:
{
"MessageId": "89229119-cbbf-5450-be6b-c8a7cb8300a9"
}
Task
Take a screenshot of your Terminal window showing the directory listing of all the files above and include it in your lab report.
Task
Take a screenshot of the notification email from AWS saying the job is done and include it in your lab report.
26.
Before we exit the cluster, we will copy the ~/.aws directory which contains the AWS keys to /data so that it will be backed up for future use:
[ec2-user@ip-10-0-0-201 data] $ cp -r ~/.aws /data/aws
27.
Exit the cluster (you can also type Ctrl-d):
[ec2-user@ip-10-0-0-201 data] $ exit