Instruction

These are some basic instructions on how to read and use this document.

There are 3 main indicators that you need to take note of in the document.

They are:

Task

This will contains all the information regarding what you are supposed to do and put into your report, so lookout for them as missing task = deducting point.

Note

This will contain additional note that you should lookout for, extra information or further explanation. This usually pertain to quirk or possible error you might face, so please read through them also.

Warning

This contains special information that we need your attention to or else something will be broken, fail to run or worse COST YOU A LOT OF MONEY.

Part A

This part will consist of:

  1. Install and configure ParallelCluster (Estimated time: 30 mins)
  2. Create your cluster (Estimated time: 15 mins)
  3. Test your cluster (Estimated time: 15 mins)
  4. Shut down your cluster (Estimated time: 15 mins)
  5. Track your AWS usage (Estimated time: 5 mins)

Task

Submit the following task for lab report for Part A:

before:

the end of the lab session at 2 pm on August 25th (Monday)

in PDF format only to the Lab 3A folder, and name the file Lab3A_YourName.pdf

Part 1. Install and configure ParallelCluster

Note

For WINDOWS USERS:

Please proceed with the rest of this lab in WSL, unless stated otherwise. Please install the Linux version of miniconda in WSL and create a conda environment called aws using the command in Step 6 of Lab 2 and activate the environment before starting on this lab. If after installing miniconda and creating a new Terminal window does not show the "(base)" prompt, you might have to do the following:

cd ~/

ln -s ~/.bashrc ~/.bash_profile

before creating a new Terminal window again.

1.

Use pip to install the AWS ParallelCluster software:

(aws) $ pip install aws-parallelcluster --upgrade --user

2.

You will also need to install nvm, a version manager for Node.js here, as it is used by AWS ParallelCluster CLI.

Scroll to the section on Installing and Updating to install nvm. You might need to reload the .bashrc (or equivalent file if you are not using the bash shell) file after installation by doing:

(aws) $ source ~/.bashrc

Once you are done, you can do the following to check that nvm is installed properly:

(aws) $ nvm --version

0.40.3

You can now use nvm to install Node.js:

(aws) $ nvm install node

You can do the following to make sure it is installed properly:

(aws) $ node --version

v24.6.0

3.

Edit your shell’s config file (e.g. ~/.bash_profile, ~/.zshrc, ~/.bashrc) using the nano editor:

(aws) $ nano ~/.bash_profile

to add pcluster’s directory ~/.local/bin to your shell’s $PATH variable by adding the following line at the end of the file:

export PATH="/Users/shihcheng/.local/bin:$PATH"

Note

Replace the portions /Users/shihcheng/ of the path with those appropriate for your home directory. To do that you should:

$ cd ~
$ pwd

/your/home/directory/path

Save and exit the nano editor by typing Ctrl-x, followed by y to save the file, and Return to use the same filename.

4.

Reload your shell’s config file (replace with .zshrc or .bashrc as necessary):

(aws) $ source ~/.bash_profile

or:

(aws) $ . ~/.bash_profile

This will reset your conda environment, so you should reactivate your aws environment:

(base) $ conda activate aws

5.

Check that you can run pcluster, and get the version number returned:

(aws) $ pcluster version

{
  "version": "3.11.1"
}

6.

You will now create a configuration file for pcluster in your home directory:

(aws) $ pcluster configure --config ~/cluster-config.yaml

Select these answers to the prompts:

AWS Region ID: ap-southeast-1
EC2 Key Pair Name: MyKeyPair
Scheduler: slurm
Operating System: alinux2
Head node instance type: t2.micro
Number of queues: 1
Name of queue 1: queue1
Number of compute resources for queue1: 1
Compute instance type for compute resource 1 in queue1: t2.micro
Maximum instance count: 10
Automate VPC creation? (y/n): y
Availability Zone: ap-southeast-1a
Network Configuration: Head node and compute fleet in the same public subnet

Wait until the program completes the setup.

7.

Edit the configuration file created by the command above:

(aws) $ nano ~/cluster-config.yaml

8.

In the HeadNode section, find the entry for InstanceType:

HeadNode:
  InstanceType: t2.micro

9.

In order to avoid running out of one particular type of instance, insert the instance type from the table below according to the last number before the letter in your student number (e.g. use the number 4 if your student number is A0171234X):

NumberInstance TypeNumberInstance TypeNumberInstance Type
0t3a.nano4t3.micro8t3.medium
1t3.nano5t3a.small9c5a.large
2t2.nano6t3.small
3t3a.micro7t3a.medium

10.

Do the same for Name and InstanceType in the Scheduling section:

Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: queue1
    ComputeResources:
    - Name: t3a-nano
      InstanceType: t3a.nano

Warning

The Name field can only take letters, digits, and hyphens, so replace the period in the instance type with a hyphen when entering it into the Name field, e.g. t3a-nano instead of t3a.nano.

11.

Add the following after the Scheduling section (use copy and paste to avoid typos):

SharedStorage:
  - MountDir: data
    Name: ee3801
    StorageType: Ebs
    EbsSettings:
      Size: 1000
      Encrypted: false
      SnapshotId: 

Leave the SnapshotId field empty for now. You will enter this information later after you have created a copy of the snapshot we have created for you. This will give you access to the data you will be processing for the remaining labs, which is stored as a Elastic Block Storage snapshot on AWS. The MountDir setting means that the files and directories in the snapshot can be found in the /data directory on your cluster.

12.

Some AWS Organization accounts block creating Route 53 hosted zones. ParallelCluster tries to create a private hosted zone by default, which will cause Step 16 to fail with AWS::Route53::HostedZone CREATE_FAILED (AccessDenied).

To prevent this, disable managed DNS in your config now:

  1. Edit ~/cluster-config.yaml.
  2. Under Scheduling, insert the following block exactly under SlurmSettings (indentation matters):
Scheduling:
  Scheduler: slurm
  SlurmSettings:
    Dns:
      DisableManagedDns: true
      UseEc2Hostnames: true

Note

Heads-up: after disabling managed DNS, when you start an interactive compute shell with srun --pty /bin/bash in step 47, your prompt will show the default EC2 hostname (e.g. ip-10-0-10-43) instead of a Slurm node alias (e.g. queue1-dy-m54xlarge-1).

Check your config file by doing:

(aws) $ cat ~/cluster-config.yaml

which should look like the following (except the instance type should follow the instructions from above):

Region: ap-southeast-1
Image:
  Os: alinux2
HeadNode:
  InstanceType: t3a.nano
  Networking:
    SubnetId: subnet-xxxxxxxxxxxxxxxxx
  Ssh:
    KeyName: MyKeyPair
Scheduling:
  Scheduler: slurm
  SlurmSettings:
    Dns:
      DisableManagedDns: true
      UseEc2Hostnames: true
  SlurmQueues:
  - Name: queue1
    ComputeResources:
    - Name: t3a-nano
      InstanceType: t3a.nano
      MinCount: 0
      MaxCount: 10
    Networking:
      SubnetIds:
      - subnet-xxxxxxxxxxxxxxxxx
SharedStorage:
  - MountDir: data
    Name: ee3801
    StorageType: Ebs
    EbsSettings:
      Size: 1000
      Encrypted: false
      SnapshotId:

Check carefully for typos as they will cause problems in the subsequent steps.

13.

Before we create a cluster, which might take 10 minutes or more depending on how busy AWS is, we will use the Simple Notification System on AWS to notify you when the cluster is ready:

(aws) $ aws sns create-topic --name awsnotify

which should return something like the following (with a different AWS ID number):

{
    "TopicArn": "arn:aws:sns:ap-southeast-1:123456789012:awsnotify"
}

Copy the TopicArn number (123456789012 in the example above) created for you so you can use it in the subsequent commands.

Subscribe to the topic by using the following command (which should be entered in one line with no line breaks, paste the TopicArn number above into the command below, and change the email address abcde@nus.edu.sg to your email address):

(aws) $ aws sns subscribe --topic-arn arn:aws:sns:ap-southeast-1:123456789012:awsnotify --protocol email --notification-endpoint abcde@nus.edu.sg

{
    "SubscriptionArn": "pending confirmation"
}

You should then receive a subscription confirmation email with a link that you can click to confirm your subscription.

You can then send a test message by doing (replace the TopicArn number):

(aws) $ aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:123456789012:awsnotify --message "test"

{
    "MessageId": "2fb0a38d-2aeb-5336-a517-fa374b3c7ce4"
}

Check your email to see that you received the test message successfully to verify that the notification system has been set up properly.

We can create a shell function to make it easier to send notifications by adding the following to the end of your shell config file (e.g. ~/.bash_profile, ~/.bashrc, ~/.zshrc, etc.):

awsnotify() {
	aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:123456789012:awsnotify --message "$1"
}

Reload the configuration file by doing:

(aws) $ source ~/.bash_profile

(base) $ conda activate aws

and test it by doing:

(aws) $ awsnotify test2 

If you receive an email with the message test2 after a few minutes, it means you have successfully created a shell function to make sending AWS notifications easier.

14.

We will now use the EventBridge service in AWS to notify you once your cluster is set up, so login to your AWS console.

Type Amazon EventBridge into the search bar at the top, and click on Amazon EventBridge in the drop-down list: Alt text

Click on Rules in the panel on the left: Alt text

Click on the Create rule button: Alt text

Enter something like EC2Running in the Name field and click Next: Alt text

Scroll down to the Event pattern section. Under the AWS service section, select EC2 from the drop-down list, EC2 Instance State-change Notification from the Event Type drop-down list, select the Specific state(s) option, and select running from the drop-down list, and click Next: Alt text

Under the Target section, select SNS topic from the first drop-down list and awsnotify from the Topic drop-down list, then click Next: Alt text

Click Next for the Configure tags section, then click Create rule.

You will now receive an email notification whenever an EC2 instance switches state into the running state, which is when the instance is ready for use.

While you are here, we will set up another rule to notify you when EC2 snapshots are completed, which you will use later to back up your work. Click on the Create rule button again from the Rules landing page. This time, type SnapshotComplete for the rule name, select EC2 for Service Name, EBS Snapshot Notification for Event Type, createSnapshot for Specific event(s), and succeeded for Specific result(s) (as shown below). Check that the same options as above are selected under the Target section. Then click on the Create rule button. Alt text

You will now also receive an email when a snapshot is completed.

15.

The final thing you will have to do before you create your cluster is to make a copy of the snapshot containing the data that you will be working with for the remaining labs. You can find the snapshot by typing EC2 into the search bar and select EC2 in the drop-down list: Alt text

Click Snapshots in the left panel: Alt text

Click on the Owned By Me button to reveal the drop-down menu and select Public Snapshots. Copy and paste the following snapshot id: snap-0e87df4e208383a50, and then hit Return. You should see a snapshot with the description condaenvs listed and selected. Alt text

Click on the Actions button, and select the Copy command. In the window that appears, replace the description with data, make sure the Encrypt this snapshot option is NOT selected, and then click the Copy button. You should see a message that the snapshot is being copied.

Click on the Public Snapshots button to select the Owned By Me option, and then click on the x icon in the search field to remove the snapshot id. You should now see a snapshot with a size of 1000 GiB with the description data. If the snapshot is not selected, select it, and you should see more information for the snapshot shown in the panel at the bottom of the window. Move your cursor over the text Snapshot ID, which will cause a Copy to clipboard icon to appear. Click on it to copy the Snapshot ID. Alt text

Use nano to edit the cluster-config.yaml file and paste the Snapshot ID into the SnapshotId field (be sure to add a space after SnapshotId:):

SharedStorage:
  - MountDir: data
    Name: ee3801
    StorageType: Ebs
    EbsSettings:
      Size: 1000
      Encrypted: false
      SnapshotId: snap-xxxxxxxxxxxxxxxxx

Task

Take a screenshot of your Nano window showing the SnapshotID above and include it in your lab report.

Save the file and exit nano.

Part 2. Create your cluster

16.

You are now ready to create a cluster, which you can do by using the following command:

(aws) $ pcluster create-cluster --cluster-configuration ~/cluster-config.yaml --cluster-name MyCluster01 

{
  "cluster": {
    "clusterName": "MyCluster01",
    "cloudformationStackStatus": "CREATE_IN_PROGRESS",
    "cloudformationStackArn": "arn:aws:cloudformation:ap-southeast-1:018084650241:stack/MyCluster01/83b84c40-1516-11ec-a595-0a9f0bd03f38",
    "region": "ap-southeast-1",
    "version": "3.0.0",
    "clusterStatus": "CREATE_IN_PROGRESS"
  }
}

The command can be abbreviated by using "-c" for "--cluster-configuration" and "-n" for "--cluster-name" so the command becomes "pcluster create-cluster -c ~/cluster-config.yaml -n MyCluster01".

You can check on the status of the cluster creation by doing:

(aws) $ pcluster describe-cluster -n MyCluster01

which will return something similar to:

{
  "creationTime": "2021-09-14T04:44:58.676Z",
  "headNode": {
    "launchTime": "2021-09-14T04:48:02.000Z",
    "instanceId": "i-0b5957cf05fcabb63",
    "publicIpAddress": "18.141.164.218", <-- [take note of this]
    "instanceType": "t3a.nano",
    "state": "running",
    "privateIpAddress": "10.0.4.176"
  },
  "version": "3.0.0",
  "clusterConfiguration": {
    "url": "https://parallelcluster-1524b7ec17c70fc0-v1-do-not-delete.s3.ap-southeast-1.amazonaws.com/parallelcluster/3.0.0/clusters/mycluster01-m0vc1fkvm0xb9cl5/configs/cluster-config.yaml?versionId=kqVdsexc1mKyEX9V4PGf1fD8FSFvQvrI&AWSAccessKeyId=AKIAQINPN3UAVMJDB4G5&Signature=h%2BE3dnCFQUhfQTM0kgm4fejrsso%3D&Expires=1631598655"
  },
  "tags": [
    {
      "value": "3.0.0",
      "key": "parallelcluster:version"
    }
  ],
  "cloudFormationStackStatus": "CREATE_IN_PROGRESS",
  "clusterName": "MyCluster01",
  "computeFleetStatus": "UNKNOWN",
  "cloudformationStackArn": "arn:aws:cloudformation:ap-southeast-1:018084650241:stack/MyCluster01/83b84c40-1516-11ec-a595-0a9f0bd03f38",
  "lastUpdatedTime": "2021-09-14T04:44:58.676Z",
  "region": "ap-southeast-1",
  "clusterStatus": "CREATE_IN_PROGRESS" <-- [this is the cluster status]
}

Take note of the publicIpAddress (your value will be different) as you will need this later.

If you encounter an error, this might be because the instance type for the head node that you specified is not available. You will have to first delete the cluster (which should only take a couple of minutes so you do not need to set up a notification):

(aws) $ pcluster delete-cluster -n MyCluster01 

Check that there are no clusters running:

(aws) $ pcluster list-clusters

You can then try recreating the cluster after editing the config file to use these other instance types:

  • t3a.large
  • c5.large
  • m3.medium
  • c5ad.large
  • t3.large
  • m5a.large

Or other instance types listed here.

You can scroll to the right of the table, and click on the On-Demand Linux pricing to sort by price so you can select the cheaper instances so you do not use up your credits too quickly. You will have to use instance types with x86_64 architecture to avoid running into software errors when using the code we have prepared for you to use on the cluster.

17.

You can take a break and return after you receive the notification email from CloudWatch notifying you that the EC2 instance that is your head node is up and running. Before you login to the cluster, we will copy a file to your cluster first (replace the xx.xx.xx.xx below with the public IP address you obtained above, and replace the PyHipp directory with the one on your computer):

Note

Windows users: Before running the command below, you will need to convert the Windows DOS format of slurm.sh into unix format by running the following commands (replace the PyHipp directory with the one on your computer):

$ sudo apt-get update
$ sudo apt-get install dos2unix
$ dos2unix ~/Documents/EE3801/PyHipp/slurm.sh
(aws) $ scp -i ~/MyKeyPair.pem ~/Documents/EE3801/PyHipp/slurm.sh ec2-user@xx.xx.xx.xx:/data/submit.sh

In order to make sure you are specifying the correct path, you can hit the Tab key after you type a few letters of a directory or filename, e.g.

(aws) $ cd ~/Docu

[Hit Tab will get you]
$ cd ~/Documents/

after you type scp -i ~/My, you can hit Tab for the bash shell to auto-complete with the names of files or directories starting with My in your home directory. If you have multiple items starting with My, the auto-completion may not work, but you can hit Tab again to see a list of all the items with names starting with My. You can then continue typing and hit Tab when there is only 1 item that can be auto-completed. If you do not see any items when you hit Tab twice, you are probably specifying the wrong directory.

You should see something similar to the following warning:

The authenticity of host '54.251.188.19 (54.251.188.19)' can't be established.
ECDSA key fingerprint is SHA256:/inHly2x+BukdsWr3kVgVLL2CA/nMSnF8TIHvLl5Pdk.
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Type “yes” and hit Return. You should then see something like:

Warning: Permanently added '54.251.188.19' (ECDSA) to the list of known hosts.
slurm.sh                                      100%  410    75.3KB/s   00:00  

This means the file was successfully transferred to your cluster.

If you see the following error instead:

ssh: connect to host 54.251.5.144 port 22: Connection refused

This is probably because the ssh server on the head node is still starting up, so you should just retry the scp command.

18.

Login to the cluster’s head node by using the key pair file you created in Lab 1:

(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01

You should see the following:

Last login: Tue Aug  3 02:52:02 2021

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
[ec2-user@ip-10-0-0-201 ~]$ 

This means you are now logged into a Linux computer on AWS.

If you see the following error instead:

ssh: connect to host 54.251.5.144 port 22: Connection refused

This is probably because the ssh server on the head node is still starting up, so you should just retry the ssh command.

19.

You should first enter your AWS credentials using:

[ec2-user@ip-10-0-0-201 ~] $ aws configure

Note that commands preceded by

[ec2-user@ip-10-0-0-201 ~] $

means you should be typing the rest in the shell on AWS which is

aws configure

You should enter your Key ID, Access Key, and the following when prompted:

AWS Access Key ID [None]:
AWS Secret Access Key [None]:
Default region name [None]: ap-southeast-1
Default output format [None]: json

In order to copy and paste your Key ID and Access Key, open another Terminal window on your computer and get your AWS keys by doing the following:

(aws) $ cat ~/.aws/credentials

20.

Check that your AWS credentials are working properly by sending a notification (remember to replace the xxxxxxxxportion below with your AWS account number):

[ec2-user@ip-10-0-0-201 ~] $ aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxxxx:awsnotify --message "clustertest"

{
  "MessageId": "2fb0a38d-2aeb-5336-a517-fa374b3c7ce4"
}

21.

Check that you are able to access the data directory with the files and directories from the EBS snapshot:

[ec2-user@ip-10-0-0-201 ~] $ cd /data

If you get an error saying that the directory does not exist, it is possible that the volume has not been mounted yet on your head node, so you can retry after waiting a couple of minutes.

[ec2-user@ip-10-0-0-201 data] $ ls

lost+found  manual_entry.txt  miniconda3  picasso  RCP  src  submit.sh

You should also see the submit.sh file you transferred earlier. Check that you are able to write to the /data directory by modifying a file using the touch command:

[ec2-user@ip-10-0-0-201 data] $ touch manual_entry.txt

[ec2-user@ip-10-0-0-201 data] $ ls

lost+found  manual_entry.txt  miniconda3  picasso  RCP  submit.sh

Part 3. Test your cluster

22.

Go back to the Terminal window you used to login to your cluster’s head node. Check that the submitted jobs are able to access the data by first editing the simple shell script we copied before logging in:

[ec2-user@ip-10-0-0-201 data] $ nano submit.sh

23.

Enter the text below at the end of the file (be sure to replace the TopicArn number):

cp /data/manual_entry.txt /data/manual_entry2.txt 

aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxx:awsnotify --message "JobDone"

Exit and save the file.

24.

Submit the job and check the status:

[ec2-user@ip-10-0-0-201 data] $ sbatch submit.sh

[ec2-user@ip-10-0-0-201 data] $ squeue

Something like this means the job is still waiting to be run:

  • PD: Pending
  • CF: Configuring
JOBID  PARTITION      NAME       USER  ST       TIME  NODES  NODELIST(REASON)
    2    compute  example-   ec2-user  CF       0:05      1  compute-dy-t2micro-1

You should then receive a SNS notification from your EC2 CloudWatch rule once the compute node gets set up, but it might still take some time before the job will start running.

Something like this means the job is running:

  • R: Running
JOBID  PARTITION      NAME      USER  ST       TIME  NODES  NODELIST(REASON)
    2    compute  example-  ec2-user   R       0:05      1  compute-dy-t2micro-1

25.

Once the job disappears from the queue (or if you receive the email saying that the job is done), you can check the output. If you list the files in the /data directory, you should see:

lost+found         miniconda3  slurm.compute-dy-t2micro-1.2.err
manual_entry2.txt  picasso     slurm.compute-dy-t2micro-1.2.out
manual_entry.txt   RCP         submit.sh

You can do the following to check the output contained in the file slurm.compute-dy-t2micro-1.2.out and errors in the file slurm.compute-dy-t2micro-1.2.err from the job:

[ec2-user@ip-10-0-0-201 data] $ cat slurm*

which should just show you the output of the aws sns publish command if everything went well:

{
    "MessageId": "89229119-cbbf-5450-be6b-c8a7cb8300a9"
}

Task

Take a screenshot of your Terminal window showing the directory listing of all the files above and include it in your lab report.

Task

Take a screenshot of the notification email from AWS saying the job is done and include it in your lab report.

26.

Before we exit the cluster, we will copy the ~/.aws directory which contains the AWS keys to /data so that it will be backed up for future use:

[ec2-user@ip-10-0-0-201 data] $ cp -r ~/.aws /data/aws

27.

Exit the cluster (you can also type Ctrl-d):

[ec2-user@ip-10-0-0-201 data] $ exit

Part 4. Shut down your cluster

28.

Since you have modified some of the files in the /data directory, you will want to save a new snapshot of the /data directory so you can continue from where you left off the next time you create a new cluster. You can do this by running the update_snapshot.sh script from a Terminal window on your computer (replace ~/Documents/EE3801/PyHipp with the path to your PyHipp repository):

Note

Windows users: Before running the command below, you will need to convert the Windows DOS format into unix format by doing:

$ dos2unix ~/Documents/EE3801/PyHipp/update_snapshot.sh 
(aws) $ chmod a+x ~/Documents/EE3801/PyHipp/update_snapshot.sh

(aws) $ ~/Documents/EE3801/PyHipp/update_snapshot.sh data 2 MyCluster01

The first command will make the script executable while the second command will start the process of creating the snapshot.

The first augment is the name of the snapshot data, the second number 2 specifies how many similarly named snapshots to keep, while the last argument MyCluster1 specifies the name of the cluster you want to base the snapshot on.

Keep in mind that each snapshot you keep will use up some of your AWS credits. This command might take a while, so you can return once you receive the email notification.

29.

Check that the snapshot was created properly using the following command:

(aws) $ aws ec2 describe-snapshots --owner-ids self  --query 'Snapshots[]'

You should see something like the following:

[
    {
        "Description": "data",
        "Encrypted": false,
        "OwnerId": "xxxxxxxxxxxx",
        "Progress": "100%",
        "SnapshotId": "snap-xxxx",
        "StartTime": "2020-11-04T11:10:40.838Z",
        "State": "completed",
        "VolumeId": "vol-xxxx",
        "VolumeSize": 1000
    },
]

30.

You can add the update_snapshot.sh script to your shell’s path by adding the following line to your shell’s config file (substitute in the path to your PyHipp directory):

export PATH="/Users/your-user-name/Documents/EE3801/PyHipp:$PATH"

Note

In case you are not so sure what is your full path, cd to where your PyHipp is at and do:

$ pwd 

31.

Once you have received the email notifying you that the snapshot has been completed, you can delete the cluster:

(aws) $ pcluster delete-cluster -n MyCluster01

Check that you have no clusters running:

(aws) $ pcluster list-clusters

This should return an empty list. If it does not, you will have to retry your cluster deletion command.

32.

In the future, you will just need to do the following to create a cluster, wait till you receive the notification that the head node is running, and then ssh to the head node:

(aws) $ pcluster create-cluster -c ~/cluster-config.yaml -n MyCluster01 

(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01 

When you are done, you can do the following to create a snapshot, wait till you receive the notification that the snapshot has been completed, and then delete the cluster:

(aws) $ update_snapshot.sh data 2 MyCluster01

(aws) $ pcluster delete-cluster -n MyCluster01

Part 5. Track your AWS usage

33.

In order to track your AWS usage, go to the following link.

34.

First, click the “Services” drop-down menu in the top left, and then click “EC2”:

Alt text

This will take you to your EC2 Dashboard:

Alt text

Check “Instances (running)”, and make sure that you do not have any instances running. If you do, you might have forgotten to delete a cluster, so you should run the following command from the Terminal:

(aws) $ pcluster list-clusters

Followed by (replace MyCluster01 below with the name of any clusters you see listed above):

(aws) $ pcluster delete-cluster -n MyCluster01

Task

Once you have made sure that you do not have any running instances, take a screenshot of your EC2 dashboard, and include it in your lab report.

Warning

Please remember to do this for all subsequent labs. This is to ensure that you do not have any running instance that could potentially cost you a lot of money.

35.

While you are at your EC2 Dashboard, check your virtual cpu limits by typing “Service quotas” in the top search bar and click on the “Service Quotas” service that appears:

Alt text

Click on “Amazon Elastic Compute Cloud (Amazon EC2)” in the dashboard:

Alt text

Type “instances” in the search bar, and check your Applied quota value for Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances. If your limit is not 64, select that option, and click the Request quota increase button in orange in the top right:

Alt text

Enter 64 in the Change quota value text box, and then click the Request button in the bottom right:

Alt text

36.

Next, find the link to “My Billing Dashboard”:

Alt text

Click the Bills option from Billing drop down:

Alt text

In the drop-down menu for “Billing Period”, select the current month:

Alt text

This will list the charges you have incurred.

Alt text

Task

Take a screenshot of the screen above to show the charges and include it in your lab report. It is fine if there are no charges listed yet.

Warning

Please remember to do this for all subsequent labs. This is to ensure that your bill is within expectation.

37.

In order to avoid being surprised by unexpected charges, you can use AWS Budgets to warn you when your credits are running low. First navigate to the Cost Explorer options under Cost Management

Alt text

It will open up a new tab and you should see the following:

Alt text

You should be able to see your current month costs. This will usually from our experience by updated more frequently than your billing view, so we also need to check this after every lab.

Task

Take a screenshot of the screen above to show the charges and include it in your lab report. It is fine if there are no charges listed yet.

Warning

Please remember to do this for all subsequent labs. This is to ensure that your bill is within expectation.

38.

Remember to always check for your budgets from this lab onward, as it will tell you if you have accidentally incur an exhorbitant cost to your services that could cause you an arm and a leg. In the next section, we will set up a budget which will allow us to receive email or notification before we exceed our cost, which will help us manage our budget better.

Part B

This part will consist of:

  1. Familiarize yourself with the command line (Estimated time: 15 mins) [OPTIONAL]
  2. Installing and testing software on AWS (Estimated time: 60 mins)
  3. Setting up the data pipeline (Estimated time: 10 mins)
  4. Executing the data pipeline (Estimated time: 20 mins)
  5. Checking the output (Estimated time: 15 mins)
  6. Wrapping up (Estimated time: 10 mins)

Task

Submit the following task for lab report for Part A:

before:

9 pm on August 27th (Wednesday)

in PDF format only to the Lab 3B folder, and name the file Lab3B_YourName.pdf

Part 6. Familiarize yourself with the command line [OPTIONAL]

39.

Note

This section is totally OPTIONAL and WILL NOT BE GRADED. It is meant to give you more practice in using the command line to navigate around and reading the content of your folder and files. It might be helpful to you in later lab.

You will be using the bash command line throughout the remaining labs, so familiarize yourself with the bash command line on your computer by reading through Section 1 (i.e. 1.1, 1.2, and 1.3) of this cheatsheet before answering the questions.

  • Change directory to your home directory from any other directory
  • List the files and directories in the current directory
  • List all files and directories including hidden files and directories with . in the filename, e.g. .bash_profile
  • List all items (excluding hidden items) in the current directory with file sizes
  • Show the manual for the ls command
  • List all items (excluding hidden items) in the current directory with file sizes sorted by their modification time
  • Change directory to the Documents subdirectory
  • Return the full path for the current directory
  • Create a new subdirectory named ee3801
  • Change directory to the parent directory from the Documents directory
  • Change directory to the ee3801 subdirectory you created earlier in one command
  • Create a file named test.txt from the command line (i.e. without using an editor)
  • Change directory back to your previous directory in one command
  • Remove the ee3801 subdirectory you created and all its contents
  • Display the entire shell config file (e.g. .bash_profile/.bashrc/.zshrc) in your home directory
  • Display the shell config file part-by-part
  • Display the first 10 lines of the shell config file
  • Display the last 10 lines of the shell config file
  • Copy the shell config file to a file named .bashrc-orig
  • Rename the .bashrc-orig file to .obashrc
  • Remove the .obashrc file
  • Display a count of only the number of lines in the shell config file
  • Look for the word conda in the shell config file
  • Show how much disk space you have available in human readable form (i.e. kilobytes, megabytes, gigabytes, etc.)
  • Show the cumulative disk usage of all the items in your ~/Documents directory in human readable form (if your Documents directory is empty, use another directory that has multiple files like ~/miniconda3)
  • Show the current date and time

Part 7. Installing and testing software on AWS

40.

First, we will create a Terminal window, activate your aws conda environment, and edit the config file to modify the instance type for the head and compute nodes (Windows users please be sure to do everything in the WSL environment):

(base) $ conda activate aws

(aws) $ nano ~/cluster-config.yaml

Note

Your operating system cli usually ship with a few text editor. One of the is nano. Another famouse one that you will have probably heard of is vim. vim is much more powerful, but also has a higher learning curve than nano. Both vim and nano are shifted with MacOS, Linux and Window WSL. You are welcome to use vim instead of nano for the rest of this course.

Modify the following in the “HeadNode” section:

HeadNode:
  InstanceType: t3a.nano

In order to spread out the load, use the following table to find out the instance type you should use according to the last number before the letter in your student number (e.g. use the number 4 if your student number is A0171234X):

Head Node

Last DigitTypeLast DigitTypeLast DigitType
0t3a.large4m5.large8m5d.large
1t3.large5m6i.large9m5n.large
2m5a.large6m4.large
3t2.large7m5ad.large

and the table below to fill in the “Name” and “InstanceType” in the Scheduling section (replace the . in InstanceType with - when entering the Name):

Scheduling:
  Scheduler: slurm
  SlurmQueues:
  - Name: queue1
    ComputeResources:
    - Name: t3a-nano
      InstanceType: t3a.nano

Compute Node

Last DigitTypeLast DigitTypeLast DigitType
0m5.4xlarge4r5n.2xlarge8r5.2xlarge
1z1d.2xlarge5r5b.2xlarge9r5a.2xlarge
2m5a.4xlarge6r5d.2xlarge
3r5dn.2xlarge7r5ad.2xlarge

You can find more about different instance types at:

In order to use pcluster, one of the limitations is that the head node and the compute nodes are required to have the same architecture. In order to keep things simple, we will be using “x86_64” architecture for both the head and compute nodes.

The head node is just used to submit jobs and handle file system requests, so we will usually find a relatively cheap instance type with the same architecture and 8 GB of memory (in order to handle the file system requests), which includes t3a.large, t3.large, m5a.large, etc. (see the description of the instance types in the first link above, and the prices in the second link, where you can scroll to the right of the table, and click on the “On-Demand Linux pricing” to sort by price). Avoid instance types that have their architecture listed as “i386, x86_64” as they are often incompatible with computer nodes that have architecture listed as “x86_64”.

For the compute nodes, we recommend instance types with a minimum of 64 GB of memory. This will allow us to have enough memory to read in and split the .ns5 neural data files, which are around 40 GB in size. Due to the AWS limit of 64 vCPUs per user, if you use an instance with a large number of vCPUs, e.g. m6g.12xlarge, which has 48 vCPUs, you will only be able to create 1 compute node with 48 vCPUs as creating 2 compute nodes will require 96 vCPUs, which will exceed your vCPU limit. On the other hand, if you use instance types with 8 vCPUs, you will be able to create 8 instances for a total of 64 vCPUs. So, choosing the right instance types will allow you to have more vCPUs available for your jobs.

Finally, make sure MountDir is set to data and replace SnapshotId in the config file with the ID of the last snapshot named data in your AWS EC2 Snapshots Dashboard:

SharedStorage:
  - MountDir: data
    Name: ee3801
    StorageType: Ebs
    EbsSettings:
      Size: 1000
      Encrypted: false
      SnapshotId: snap-xxxxxxxxxxxxxxxxx

41.

Next, we will do a one-time set up of your environment on AWS ParallelCluster. We will start by creating a cluster on AWS (enter the command below all on the same line):

(aws) $ pcluster create-cluster -c ~/cluster-config.yaml -n MyCluster01

If you get an error like:

Beginning cluster creation for cluster: MyCluster01
Creating stack named: parallelcluster-MyCluster01
Status: parallelcluster-MyCluster01 - ROLLBACK_IN_PROGRESS                         
Cluster creation failed.  Failed events:
  - AWS::CloudFormation::WaitCondition MasterServerWaitCondition Received FAILURE signal with UniqueId i-0e06294577f6606d0

You might need to delete the cluster first by doing:

(aws) $ pcluster list-clusters

MyCluster01  ROLLBACK_COMPLETE 2.9.1

(aws) $ pcluster delete-cluster -n MyCluster01

You can then try re-creating the cluster using a different head node.

42.

Once you receive the notification email stating that your head node is running, login to the cluster:

(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01

43.

Change to the directory where the snapshot you saved in Part A is mounted:

[ec2-user@ip-10-0-5-43 ~] $ cd /data

If you receive an error like:

-bash: cd: /data: No such file or directory

You might have to wait a couple of minutes for AWS to mount your /data volume before trying again.

44.

Miniconda has already been installed for you, along with an environment called env1. So you can initialize your conda environment by doing:

[ec2-user@ip-10-0-5-43 data] $ miniconda3/bin/conda init

This will add some commands to your ~/.bashrc, so you can reload it to get the conda commands to work:

[ec2-user@ip-10-0-5-43 data] $ source ~/.bashrc

(base) [ec2-user@ip-10-0-5-43 data] $ conda activate env1

(env1) [ec2-user@ip-10-0-5-43 data] $ 

Your prompt should now be prefixed by (env1) instead of (base).

You should also copy the /data/aws directory back to ~/.aws:

(env1) [ec2-user@ip-10-0-5-43 data] $ cp -r /data/aws ~/.aws

Check that your aws credentials are working properly by sending yourself a notification email (remember to replace the numbers in the topic-arn with your account number):

(env1) [ec2-user@ip-10-0-5-43 data] $ aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxx:awsnotify --message "ClusterTest"

45.

We will now clone the GitHub repository (replace the username for the PyHipp repository with your username):

(env1) [ec2-user@ip-10-0-5-43 x64] $ cd /data/src

(env1) [ec2-user@ip-10-0-5-43 src] $ git clone https://github.com/your_user_name/PyHipp

If you list the files and directories in the "src" directory, you will see that the DataProcessingTools and pyedfread packages are already installed.

46.

We will install the PyHipp module using pip:

(env1) [ec2-user@ip-10-0-5-43 src] $ cd PyHipp

(env1) [ec2-user@ip-10-0-5-43 PyHipp] $ pip install -r requirements.txt

(env1) [ec2-user@ip-10-0-5-43 PyHipp] $ pip install -e .

(env1) [ec2-user@ip-10-0-5-43 pyedfread] $ cd /data

47.

In order to try processing the data, we will want to run on one of the compute nodes as it has at least 64 GB of memory. So do the following to login to a compute node:

(env1) [ec2-user@ip-10-0-5-43 data] $ srun --pty /bin/bash

If you get an error saying that the srun command is not found, you may have to log out and log back in. This is usually due to some network volumes not being mounted yet after the creation of the cluster.

It will take a little time for the compute node to be started up, after which you should receive an email. Your prompt should also change to something like (the portion after queue1-dy- should correspond to the instance type you have specified in your cluster config file):

(base) [ec2-user@queue1-dy-m54xlarge-1 ~] $  

Activate the env1 conda environment as usual, and change to the /data directory:

(base) [ec2-user@queue1-dy-m54xlarge-1 ~] $ conda activate env1

(env1) [ec2-user@queue1-dy-m54xlarge-1 ~] $ cd /data

48.

We will now start up ipython (which is a little easier to use than regular python) to test that everything was installed properly:

(env1) [ec2-user@queue1-dy-m54xlarge-1 data] $ ipython

In [ ]: import PyHipp as pyh
In [ ]: pyh.pyhcheck('hello')
hello
In [ ]: from pyedfread import edf
In [ ]: cd src/pyedfread
In [ ]: samples, events, messages = edf.pread('SUB001.EDF')
In [ ]: events.shape
Out[ ]: (485, 30)

Note that the In [ ]: prompt indicates that the command should be entered in iPython.

49.

We will now go through and test the software that has been installed. We will start by processing the Ripple .nev files containing the signals sent by Unity via the parallel port. We will use a Python class named RPLParallel defined in the PyHipp repository to create a RPLParallel object:

In [ ]: cd /data/picasso/20181105/session01
In [ ]: pyh.RPLParallel(saveLevel=1)

which should give you the following:

Object created
Opening .nev file, creating new RPLParallel object...
Object saved to file rplparallel_d41d.hkl
Out[ ]: <PyHipp.rplparallel.RPLParallel at 0x7fb0fcaa2160>

The saveLevel=1 argument tells the function to save the RPLParallel object into the current directory after it has been created. It is a feature common to all the classes defined in the PyHipp repository.

We will do the same for the eye fixation session:

In [ ]: cd ../sessioneye
In [ ]: pyh.RPLParallel(saveLevel=1)

50.

The navigation session and the fixation session each contain a .ns5 file that contains the neural data from 110 channels. We can separate out one of the channels (Channel 9) by doing:

In [ ]: cd ../session01
In [ ]: pyh.RPLSplit(channel=[9])

which will return the following showing that a RPLRaw object for Channel 9 was created (as the .ns5 is pretty large, the entire process might take 10 to 15 minutes):

Object created
.ns5 file loaded.
Processing channel 009
Calling RPLRaw for channel 009
Object created
Object saved to file rplraw_d41d.hkl
Channel 009 processed
Out[ ]: <PyHipp.rplsplit.RPLSplit at 0x7f07b25a7828>

We do not need to specify the saveLevel=1 argument as the primary function of the function is to create the appropriate RPLRaw objects, which are then saved by default.

If you would like to be notified when the function is done, you can follow these instructions to use CloudWatch to set up an alarm to notify you when the Head Node’s Network Out Bytes falls below 1,000,000.

If you check the current directory by doing:

In [ ]: ls

you will see a directory named array01. If you check that directory and its subdirectory:

In [ ]: ls array01
In [ ]: ls array01/channel009

you will see a file named rplraw_d41d.hkl that contains the raw data just for Channel 9.

51.

In order to generate the low-pass filtered signals, we will call the RPLLFP function from the channel directory:

In [ ]: cd array01/channel009
In [ ]: pyh.RPLLFP(saveLevel=1)

which will load the RPLRaw object and create a RPLLFP object:

Object loaded from file rplraw_d41d.hkl
Object created
Applying low-pass filter with frequencies 1.0 and 150.0 Hz
Object saved to file rpllfp_6eca.hkl
Out[7]: <PyHipp.rpllfp.RPLLFP at 0x7f07b259f940>

52.

Similarly, to generate the high-pass filtered signals, we will call the RPLHighPass function:

In [ ]: pyh.RPLHighPass(saveLevel=1)

which will load the RPLRaw object and create a RPLHighPass object:

Object loaded from file rplraw_d41d.hkl
Object created
Applying high-pass filter with frequencies 500.0 and 7500.0 Hz
Object saved to file rplhighpass_b59f.hkl
Out[7]: <PyHipp.rplhighpass.RPLHighPass at 0x7f07b23ca400>

You can see the two new files created in the current directory by doing:

In [ ]: ls

which will give you:

rplhighpass_b59f.hkl  rpllfp_6eca.hkl  rplraw_d41d.hkl

53.

We can process the Unity files by doing:

In [ ]: cd ../../
In [ ]: pyh.Unity(saveLevel=1)

which will return:

Object created
Object loaded from file rplparallel_d41d.hkl
Object saved to file unity_71bf.hkl
Out[ ]: <PyHipp.unity.Unity at 0x7facccd89400>

You will see that in creating the Unity object, the previously saved RPLParallel object was loaded to extract some information. If the RPLParallel was not saved previously, the information in the object will have to be recomputed from the raw data files. This reduction in unnecessary recomputation was one of the principles on which the objects in the PyHipp repository were designed.

54.

This next command will process the eye-tracking files for both the navigation (180702.edf) and fixation (P7_2.edf) sessions, create eyelink objects, and save them to the session01 and sessioneye directories:

In [ ]: cd ..
In [ ]: pyh.EDFSplit()

which should return the following:

Reading calibration edf file.
Object created
Object saved to file eyelink_24d5.hkl
Reading navigation edf file.
...
Object saved to file eyelink_24d5.hkl
Object created
Out[ ]: <PyHipp.eyelink.EDFSplit at 0x7faccce0f1d0>

Similar to the RPLSplit function above, we do not need to specify the saveLevel=1 argument as the primary function of EDFSplit is to create and save the eyelink objects.

You might get a few “SerializedWarning” or “DataFrame” warning messages, but it is safe to ignore them.

55.

In order to align the Ripple, Unity, and Eyelink data, we will call the following function:

In [ ]: cd session01
In [ ]: pyh.aligning_objects()

which should end with the following (you can ignore the messages that precede it):

finish aligning objects

56.

For performing the raycasting, you can do:

In [ ]: pyh.raycast(1)

You should see some text ending with:

Object loaded from file unity_71bf.hkl
Object loaded from file eyelink_24d5.hkl
Found path: /data/RCP/VirtualMaze.x86_64

This might take about 20 minutes to complete, so we will instead just check that it started running properly before stopping it. Create a new Terminal window, activate the aws conda environment, and ssh to your cluster:

(base) $ conda activate aws

(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01

Change to the directory from which you ran the raycast function:

(base) [ec2-user@ip-10-0-5-43 ~] $ cd /data/picasso/20181105/session01

The raycast function writes to a log file as it is running, so we can check the log file’s contents by doing:

(base) [ec2-user@ip-10-0-5-43 session01] $ tail -f VirtualMazeBatchLog.txt

You have used the tail function before to look at the last few lines of a file in this lab, but by adding the -f flag, it will now continuously show you the last few lines of the log file. This is one of the ways to monitor a program as it is running. The command above should progressively show you something like:

Add "-logfile <log file location>.txt" to see unity logs during the data generation
There may be a need to copy the libraries found in the directory 'Plugins' to a new folder called 'Mono'

More command line arguments can be found at https://docs.unity3d.com/Manual/CommandLineArguments.html
Session List detected!
Setting density to : 220
Setting radius to : 1
Queuing /data/picasso/20181105/session01
1 sessions to be processed
Starting(1/1): /data/picasso/20181105/session01
5.037168%: Data Generation is still running. 10/3/2021 2:45:51 AM

The last line above indicates that the program has started running, so everything should be installed properly. At this point, you can terminate the tail program by typing Ctrl-c. You can go back to your first Terminal window that was running ipython, and type Ctrl-c to interrupt the raycast function. You can then exit ipython by typing Ctrl-d.

If ipython does not respond, you can do the following from your second Terminal window:

(base) [ec2-user@ip-10-0-5-43 session01] $ ps -ef | grep ipython

ec2-user 12958  7414  0 13:37 pts/0    00:00:00 /data/miniconda3/envs/env1/bin/python /data/miniconda3/envs/env1/bin/ipython
ec2-user 13459 12716  0 13:43 pts/1    00:00:00 grep --color=auto ipython

The ps command lists running processes, which are then sent using the pipe command | to the grep command to search for the string ipython. The process ID for ipython (12958 in the example above) can be used to terminate the process by doing the following (replace the process ID below with the one you obtained from the command above):

(base) [ec2-user@ip-10-0-5-43 session01] $ kill -9 12958

If the ipython process was terminated properly, you should see the bash prompt in the first Terminal window again:

(env1) [ec2-user@queue1-dy-m54xlarge-1 data] $  

You can then exit from the compute node by doing:

(env1) [ec2-user@queue1-dy-m54xlarge-1 data] $ exit

57.

From the second Terminal window, you can check that a few files have now been created in the session01 directory:

(base) [ec2-user@ip-10-0-5-43 session01] $ ls

181105_Block1.nev    eyelink_24d5.hkl s       slist.txt
181105_Block1.ns5    logs.txt                 unity_71bf.hkl
array01              missingData.csv          unityfile_eyelink.csv
binData.hdf          RawData_T1-400           VirtualMazeBatchLog.txt
rplparallel_d41d.hkl

Once you have verified the files have been created, you can type Ctrl-d to logout from the head node.

Task

Include a screenshot with the above output in your lab report.

58.

The last thing to do is to copy the following two files for spike sorting from the PyHipp directory to the /data directory using your first Terminal window (which should still be logged into your head node):

(env1) [ec2-user@ip-10-0-5-43 data] $ cp /data/src/PyHipp/geom.csv /data/picasso

(env1) [ec2-user@ip-10-0-5-43 data] $ cp /data/src/PyHipp/sort.sh.txt /data/picasso

59.

You can now delete the files we created to test the raycasting:

(env1) [ec2-user@ip-10-0-5-43 data] $ cd picasso/20181105

(env1) [ec2-user@ip-10-0-5-43 20181105] $ rm session*/eyelink*hkl

(env1) [ec2-user@ip-10-0-5-43 20181105] $ cd session01

(env1) [ec2-user@ip-10-0-5-43 session01] $ rm *.hkl *.csv bin*hdf VirtualMaze* *.txt

60.

At this point, you should take a snapshot of the /data volume from your computer, so you will not have to go through the set up again:

(aws) $ update_snapshot.sh data 2 MyCluster01

You can move on to the next section while waiting for the snapshot to be completed. Once the snapshot is completed, you should receive an email notification from AWS.

Part 8. Setting up the data pipeline

61.

We are now ready to set up the data pipeline. We will want to create the objects in this order:

  1. RPLParallel (for both session01 and sessioneye)
  2. RPLSplit to create a RPLRaw object for each of the 110 channels (for both session01 and sessioneye)
  3. RPLLFP (which needs the RPLRaw object) for each of the 110 channels (for both session01 and sessioneye)
  4. RPLHighPass (which needs the RPLRaw object) for each of the 110 channels (for both session01 and sessioneye)
  5. Spike sorting (which needs the RPLHighPass objects for both session01 and sessioneye) for each of the 110 channels
  6. Unity (needs RPLParallel object)
  7. EDFSplit to create Eyelink objects (needs RPLParallel, and Unity if available) (for both session01 and sessioneye)
  8. Aligning_objects (needs RPLParallel, Unity, and Eyelink objects)
  9. Raycasting (needs Unity and Eyelink objects)

62.

We will first create a script for submitting jobs to a queue by creating a copy of the script you used in Lab 2, and editing it:

(env1) [ec2-user@ip-10-0-5-43 data] $ cd /data/src/PyHipp

(env1) [ec2-user@ip-10-0-5-43 PyHipp] $ cp slurm.sh pipeline-slurm.sh

(env1) [ec2-user@ip-10-0-5-43 PyHipp] $ nano pipeline-slurm.sh

63.

We will now want to enter what we did above in ipython into the script, but we will just process 8 channels instead of the full 110 channels. However, what we did above involved changing directories numerous times, and that involved processing only one out of the 110 neural channels recorded. We will need to change directories quite a few times if we wanted to process 8 or 110 channels. So instead, we will make use of a command called processDirs in the DataProcessingTools that will automatically change directory to the appropriate directory in which to create the specified objects. In addition, we will want to take note of the time taken to process the data. You can copy and paste the following lines to the end of the file

python -u -c "import PyHipp as pyh; \
import DataProcessingTools as DPT; \
import os; \
import time; \
t0 = time.time(); \
print(time.localtime()); \
DPT.objects.processDirs(dirs=None, objtype=pyh.RPLParallel, saveLevel=1); \
DPT.objects.processDirs(dirs=None, objtype=pyh.RPLSplit, channel=[9, 31, 34, 56, 72, 93, 119, 120]); \
DPT.objects.processDirs(dirs=None, objtype=pyh.RPLLFP, saveLevel=1); \
DPT.objects.processDirs(dirs=None, objtype=pyh.RPLHighPass, saveLevel=1); \
DPT.objects.processDirs(dirs=None, objtype=pyh.Unity, saveLevel=1); \
pyh.EDFSplit(); \
os.chdir('session01'); \
pyh.aligning_objects(); \
pyh.raycast(1); \
DPT.objects.processDirs(level='channel', cmd='import PyHipp as pyh; from PyHipp import mountain_batch; mountain_batch.mountain_batch(); from PyHipp import export_mountain_cells; export_mountain_cells.export_mountain_cells();'); \
print(time.localtime()); \
print(time.time()-t0);"

aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxx:awsnotify --message "JobDone"

When the processDirs function is called with level and cmd arguments, it will find all the subdirectories that are at the appropriate level in the data hierarchy (in this case channel directories), and run the specified command in those directories. This will create a job that will perform the spike sorting and save the appropriate spiketrain files into cell directories as discussed in the lecture.

The last Python command will take the difference in time between the start and the end of the job, and print out the difference in the form of number of seconds.

We will also edit the following line in the file to give the job more time (24 hours) to run:

#SBATCH --time=24:00:00   # walltime

as well as to make the job name and the slurm output files more distinct:

#SBATCH -J "pipe"   # job name

#SBATCH -o pipe-slurm.%N.%j.out # STDOUT
#SBATCH -e pipe-slurm.%N.%j.err # STDERR

64.

Save the file, and change directory to the 20181105 data directory:

(env1) [ec2-user@ip-10-0-5-43 PyHipp]$ cd /data/picasso/20181105

Part 9. Executing the data pipeline

65.

You are now ready to submit the script to the slurm queue:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ sbatch /data/src/PyHipp/pipeline-slurm.sh

66.

You can use the squeue function to watch the progress of the job:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ squeue

JOBID   PARTITION    NAME       USER    ST    TIME   NODES  NODELIST(REASON)
    2      queue1    pipe   ec2-user    PD    0:00       1        (Priority)

The last column will say (Priority) when the job is first queued, and switch to (Resources) when the compute node is starting up. When the job starts running on the compute node, the last column will now state the address of the compute node, and the ST (status) column will change from PD (pending), to CF (configuring), to R (running):

JOBID  PARTITION     NAME      USER   ST     TIME  NODES  NODELIST(REASON)
    2     queue1     pipe  ec2-user    R     0:00      1  ip-10-0-1-245

You can also use the following function to track the progress of the job by checking the slurm output file (the number in the name of the .out file corresponds to your Job ID above, so you may have to modify the command below to match your Job ID):

(env1) [ec2-user@ip-10-0-5-43 20181105] $ tail -f pipe-slurm.*.2.out

The tail function will show you the last 10 lines of the file or files you specify, and adding the -f argument will cause it to continue to monitor the files, and print out new lines as they are added to the files. This allows you to monitor the progress of the jobs by looking at their outputs. You can type Ctrl-C at any point to get out of the tail function.

67.

If you make a mistake, you can cancel jobs by specifying the job number:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ scancel 2

The command above will cancel job #2. You can cancel a range of jobs by doing:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ scancel {2..7}

or all jobs by doing:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ scancel --user=ec2-user

68.

If it is taking a long time (more than 10 mins) for the jobs to start running, it could be because there are no servers available with the compute nodes you specified. In this case, you can switch to one of the other instance types for your compute nodes that has more than 64 GB of memory, but do take note of the price differences, which will consume your AWS credits at a faster rate.

If you do not have any jobs running, you can use the following command on your computer to change the instance type of your compute nodes after editing the config file without having to delete and re-create the cluster:

(aws) $ pcluster update-compute-fleet --status STOP_REQUESTED -n MyCluster01

(aws) $ pcluster update-cluster -c ~/cluster-config.yaml -n MyCluster01

You can find more information about pcluster update from:

69.

It will take some time for the job to finish, so you can wait till you receive the email notification that your job has been completed to continue.

Part 10. Checking the output

70.

Once the job has been completed, you should see the following items in the day directory:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ ls

181105.edf  pipe-slurm.ip-xxx.2.err  sessioneye
mountains   pipe-slurm.ip-xxx.2.out
P11_5.edf   session01

You should see the following items in the session01 directory:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ ls session01

181105_Block1.nev  binData.hdf           slist.txt
181105_Block1.ns5  eyelink_24d5.hkl      unity_71bf.hkl
array01            logs.txt              unityfile_eyelink.csv
array02            missingData.csv       VirtualMazeBatchLog.txt
array03            RawData_T1-400
array04            rplparallel_d41d.hkl

In total, we expect the following .hkl files to be created:

session01: rplparallel, unity, eyelink

8 channel directories: rplraw, rpllfp, rplhighpass

sessioneye: rplparallel, eyelink

8 channel directories: rplraw, rpllfp, rplhighpass

which adds up to 53. There will also be some spiketrain .hkl files, but the number returned for each channel is not predictable.

So we can do the following to find all the .hkl files:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ find . -name "*.hkl"

The find command is a very useful function that can help find files using different parameters. In this case, we are asking it to start from the current directory (the . in the second argument) and look for files with names ending in .hkl.

In order to leave out the spiketrain .hkl files, we can send the output of the above command to the grep function using the pipe feature | in the shell. The grep function looks for lines containing matching strings (specified by the -e argument) inside a file or from the output of another function sent to it via a pipe. However, when we specify the -v argument, it will instead look for lines that do not contain the matching strings.

So in the command below, by specifying -v -e spiketrain -e mountains, the grep function will only return lines that do not contain “spiketrain” nor “mountains”, which will allow us to select only the files returned by the find function that do not contain spiketrain or mountains in their names:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ find . -name "*.hkl" | grep -v -e spiketrain -e mountains

We can count the number of files by again piping the output above to the wc command to make sure they were all created properly:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ find . -name "*.hkl" | grep -v -e spiketrain -e mountains | wc -l

53

We can also do the following to make sure that the file sizes look correct:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ find . -name "*.hkl" | grep -v -e spiketrain -e mountains | xargs ls -hl

-rw-rw-r-- 1 ec2-user ec2-user 129M Aug 10 17:20 
./session01/eyelink_24d5.hkl

-rw-rw-r-- 1 ec2-user ec2-user  61K Aug 10 13:55 
./session01/rplparallel_d41d.hkl

-rw-rw-r-- 1 ec2-user ec2-user  12M Aug 10 17:20 
unity_71bf.hkl

-rw-rw-r-- 1 ec2-user ec2-user 630M Aug 11 02:28 
./session01/array01/channel009/rplhighpass_b59f.hkl

-rw-rw-r-- 1 ec2-user ec2-user  22M Aug 11 02:27 
./session01/array01/channel009/rpllfp_6eca.hkl

-rw-rw-r-- 1 ec2-user ec2-user 630M Aug 11 02:18 
./session01/array01/channel009/rplraw_d41d.hkl

Task

Include a screenshot of your Terminal window with the file sizes above in your lab report. Make sure you increase the size of your Terminal window so that the size of all 53 files can be captured in the screenshot.

The xargs function used in the command above takes the output of the grep function, and appends them as arguments to the end of the call to the ls function. For instance, if the grep function returns:

$ find . -name "*.hkl" | grep -v -e spiketrain -e mountains

./session01/eyelink_24d5.hkl
./session01/rplparallel_d41d.hkl

Using xargs will be the equivalent of:

$ ls -hl ./session01/eyelink_24d5.hkl ./session01/rplparallel_d41d.hkl

For session01, the rplraw and rplhighpass files are typically over 600 MB, while the rpllfp files are around 20 MB. The unity files are typically around 10 MB, and the eyelink files are typically over 100 MB. The files in the sessioneye directory are typically quite a bit smaller.

In order to check the output of the spike sorting, we would expect one of these files to be created for each channel:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ find mountains -name "firings.mda" | wc -l

8

Task

Include a screenshot of your terminal window with the output above in your lab report.

71.

You can check the time taken to complete the job by looking for the printouts of time in the output file for the job:

(env1) [ec2-user@ip-10-0-5-43 20181105] $ tail pipe-slurm*.out

Task

Include the output of the command above in your lab report, and convert the time taken for the job to hours, minutes, and seconds so it is easy to understand. Extrapolate from the time taken for 8 channels to estimate how long it will take to process all 110 channels.

Part 11. Wrapping up

72.

Once you are done, you can exit your cluster, and then update your snapshot:

(aws) $ update_snapshot.sh data 2 MyCluster01

73.

Once you receive the email notification, you can delete the cluster:

(aws) $ pcluster delete-cluster -n MyCluster01

74.

In the next lab, we will look at ways to parallelize the data processing.

75.

In the Part A, we started the process of setting up an AWS Budget to warn you when your credits are running low. You can now resume by going to your Billing Dashboard:

Alt text

Click on Budget in the left panel:

Alt text

Click on the Create a budget button:

Alt text

Choose Customize (advanced) under Budget setup, select the Cost budget - Recommended option, and scroll down to click the Next button:

Alt text

Scroll down and name the budget as follow:

Alt text

Scroll down to the Set budget amount section, select Quarterly under Period, Expiring budget under Budget effective date, Q3 and 2025 under Start quarter, Q3 and 2025 under End quarter, Fixed under Choose how to budget, and enter 200.00 in Enter your budgeted amount ($):

Alt text

Scroll down and click Next.

Click on the “Add an alert threshold” button:

Alt text

Enter “80” under “Threshold”, fill in your email address under “Email recipients”, click the “Next” button at the bottom of the page:

Alt text

Click on the “Next” button to skip the “Attach actions” section:

Alt text

Review your entries, and click the “Create budget” at the bottom of the page:

Alt text

This will take you back to the budgets page to show you the budget you created:

Alt text

You will now get an email warning when your charges exceed 80% of your credits.

Task

Include a screenshot of your budget page in your lab report.