Part 2. Create your cluster

16.

You are now ready to create a cluster, which you can do by using the following command:

(aws) $ pcluster create-cluster --cluster-configuration ~/cluster-config.yaml --cluster-name MyCluster01 

{
  "cluster": {
    "clusterName": "MyCluster01",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws:cloudformation:ap-southeast-1:018084650241:stack/MyCluster01/83b84c40-1516-11ec-a595-0a9f0bd03f38",
    "region": "ap-southeast-1",
    "version": "3.0.0",
    "clusterStatus": "CREATE_IN_PROGRESS"
  }
}

The command can be abbreviated by using "-c" for "--cluster-configuration" and "-n" for "--cluster-name" so the command becomes "pcluster create-cluster -c ~/cluster-config.yaml -n MyCluster01".

You can check on the status of the cluster creation by doing:

(aws) $ pcluster describe-cluster -n MyCluster01

which will return something similar to:

{
  "creationTime": "2021-09-14T04:44:58.676Z",
  "headNode": {
    "launchTime": "2021-09-14T04:48:02.000Z",
    "instanceId": "i-0b5957cf05fcabb63",
    "publicIpAddress": "18.141.164.218", <-- [take note of this]
    "instanceType": "t3a.nano",
    "state": "running",
    "privateIpAddress": "10.0.4.176"
  },
  "version": "3.0.0",
  "clusterConfiguration": {
    "url": "https://parallelcluster-1524b7ec17c70fc0-v1-do-not-delete.s3.ap-southeast-1.amazonaws.com/parallelcluster/3.0.0/clusters/mycluster01-m0vc1fkvm0xb9cl5/configs/cluster-config.yaml?versionId=kqVdsexc1mKyEX9V4PGf1fD8FSFvQvrI&AWSAccessKeyId=AKIAQINPN3UAVMJDB4G5&Signature=h%2BE3dnCFQUhfQTM0kgm4fejrsso%3D&Expires=1631598655"
  },
  "tags": [
    {
      "value": "3.0.0",
      "key": "parallelcluster:version"
    }
  ],
  "cloudFormationStackStatus": "CREATE_IN_PROGRESS",
  "clusterName": "MyCluster01",
  "computeFleetStatus": "UNKNOWN",
  "cloudformationStackArn": "arn:aws:cloudformation:ap-southeast-1:018084650241:stack/MyCluster01/83b84c40-1516-11ec-a595-0a9f0bd03f38",
  "lastUpdatedTime": "2021-09-14T04:44:58.676Z",
  "region": "ap-southeast-1",
  "clusterStatus": "CREATE_IN_PROGRESS" <-- [this is the cluster status]
}

Take note of the publicIpAddress (your value will be different) as you will need this later.

If you encounter an error, this might be because the instance type for the head node that you specified is not available. You will have to first delete the cluster (which should only take a couple of minutes so you do not need to set up a notification):

(aws) $ pcluster delete-cluster -n MyCluster01 

Check that there are no clusters running:

(aws) $ pcluster list-clusters

You can then try recreating the cluster after editing the config file to use these other instance types:

  • t3a.large
  • c5.large
  • m3.medium
  • c5ad.large
  • t3.large
  • m5a.large

Or other instance types listed here.

You can scroll to the right of the table, and click on the On-Demand Linux pricing to sort by price so you can select the cheaper instances so you do not use up your credits too quickly. You will have to use instance types with x86_64 architecture to avoid running into software errors when using the code we have prepared for you to use on the cluster.

17.

You can take a break and return after you receive the notification email from CloudWatch notifying you that the EC2 instance that is your head node is up and running. Before you login to the cluster, we will copy a file to your cluster first (replace the xx.xx.xx.xx below with the public IP address you obtained above, and replace the PyHipp directory with the one on your computer):

Note

Windows users: Before running the command below, you will need to convert the Windows DOS format of slurm.sh into unix format by running the following commands (replace the PyHipp directory with the one on your computer):

$ sudo apt-get update
$ sudo apt-get install dos2unix
$ dos2unix ~/Documents/EE3801/PyHipp/slurm.sh
(aws) $ scp -i ~/MyKeyPair.pem ~/Documents/EE3801/PyHipp/slurm.sh ec2-user@xx.xx.xx.xx:/data/submit.sh

In order to make sure you are specifying the correct path, you can hit the Tab key after you type a few letters of a directory or filename, e.g.

(aws) $ cd ~/Docu

[Hit Tab will get you]
$ cd ~/Documents/

after you type scp -i ~/My, you can hit Tab for the bash shell to auto-complete with the names of files or directories starting with My in your home directory. If you have multiple items starting with My, the auto-completion may not work, but you can hit Tab again to see a list of all the items with names starting with My. You can then continue typing and hit Tab when there is only 1 item that can be auto-completed. If you do not see any items when you hit Tab twice, you are probably specifying the wrong directory.

You should see something similar to the following warning:

The authenticity of host '54.251.188.19 (54.251.188.19)' can't be established.
ECDSA key fingerprint is SHA256:/inHly2x+BukdsWr3kVgVLL2CA/nMSnF8TIHvLl5Pdk.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Type “yes” and hit Return. You should then see something like:

Warning: Permanently added '54.251.188.19' (ECDSA) to the list of known hosts.
slurm.sh                                      100%  410    75.3KB/s   00:00  

This means the file was successfully transferred to your cluster.

If you see the following error instead:

ssh: connect to host 54.251.5.144 port 22: Connection refused

This is probably because the ssh server on the head node is still starting up, so you should just retry the scp command.

18.

Login to the cluster’s head node by using the key pair file you created in Lab 1:

(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01

You should see the following:

Last login: Tue Aug  3 02:52:02 2021

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
[ec2-user@ip-10-0-0-201 ~]$ 

This means you are now logged into a Linux computer on AWS.

If you see the following error instead:

ssh: connect to host 54.251.5.144 port 22: Connection refused

This is probably because the ssh server on the head node is still starting up, so you should just retry the ssh command.

19.

You should first enter your AWS credentials using:

[ec2-user@ip-10-0-0-201 ~] $ aws configure

Note that commands preceded by

[ec2-user@ip-10-0-0-201 ~] $

means you should be typing the rest in the shell on AWS which is

aws configure

You should enter your Key ID, Access Key, and the following when prompted:

AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: ap-southeast-1
Default output format [None]: json

In order to copy and paste your Key ID and Access Key, open another Terminal window on your computer and get your AWS keys by doing the following:

(aws) $ cat ~/.aws/credentials

20.

Check that your AWS credentials are working properly by sending a notification (remember to replace the xxxxxxxxportion below with your AWS account number):

[ec2-user@ip-10-0-0-201 ~] $ aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxxxx:awsnotify --message "clustertest"

{
  "MessageId": "2fb0a38d-2aeb-5336-a517-fa374b3c7ce4"
}

21.

Check that you are able to access the data directory with the files and directories from the EBS snapshot:

[ec2-user@ip-10-0-0-201 ~] $ cd /data

If you get an error saying that the directory does not exist, it is possible that the volume has not been mounted yet on your head node, so you can retry after waiting a couple of minutes.

[ec2-user@ip-10-0-0-201 data] $ ls

lost+found  manual_entry.txt  miniconda3  picasso  RCP  src  submit.sh

You should also see the submit.sh file you transferred earlier. Check that you are able to write to the /data directory by modifying a file using the touch command:

[ec2-user@ip-10-0-0-201 data] $ touch manual_entry.txt

[ec2-user@ip-10-0-0-201 data] $ ls

lost+found  manual_entry.txt  miniconda3  picasso  RCP  submit.sh