Part 7. Installing and testing software on AWS
40.
First, we will create a Terminal window, activate your aws conda environment, and edit the config file to modify the instance type for the head and compute nodes (Windows users please be sure to do everything in the WSL environment):
(base) $ conda activate aws
(aws) $ nano ~/cluster-config.yaml
Note
Your operating system cli usually ship with a few text editor. One of the is
nano. Another famouse one that you will have probably heard of isvim.vimis much more powerful, but also has a higher learning curve thannano. Bothvimandnanoare shifted with MacOS, Linux and Window WSL. You are welcome to useviminstead ofnanofor the rest of this course.
Modify the following in the “HeadNode” section:
HeadNode:
InstanceType: t3a.nano
In order to spread out the load, use the following table to find out the instance type you should use according to the last number before the letter in your student number (e.g. use the number 4 if your student number is A0171234X):
Head Node
| Last Digit | Type | Last Digit | Type | Last Digit | Type |
|---|---|---|---|---|---|
| 0 | t3a.large | 4 | m5.large | 8 | m5d.large |
| 1 | t3.large | 5 | m6i.large | 9 | m5n.large |
| 2 | m5a.large | 6 | m4.large | ||
| 3 | t2.large | 7 | m5ad.large |
and the table below to fill in the “Name” and “InstanceType” in the Scheduling section (replace the . in InstanceType with - when entering the Name):
Scheduling:
Scheduler: slurm
SlurmQueues:
- Name: queue1
ComputeResources:
- Name: t3a-nano
InstanceType: t3a.nano
Compute Node
| Last Digit | Type | Last Digit | Type | Last Digit | Type |
|---|---|---|---|---|---|
| 0 | m5.4xlarge | 4 | r5n.2xlarge | 8 | r5.2xlarge |
| 1 | z1d.2xlarge | 5 | r5b.2xlarge | 9 | r5a.2xlarge |
| 2 | m5a.4xlarge | 6 | r5d.2xlarge | ||
| 3 | r5dn.2xlarge | 7 | r5ad.2xlarge |
You can find more about different instance types at:
In order to use pcluster, one of the limitations is that the head node and the compute nodes are required to have the same architecture. In order to keep things simple, we will be using “x86_64” architecture for both the head and compute nodes.
The head node is just used to submit jobs and handle file system requests, so we will usually find a relatively cheap instance type with the same architecture and 8 GB of memory (in order to handle the file system requests), which includes t3a.large, t3.large, m5a.large, etc. (see the description of the instance types in the first link above, and the prices in the second link, where you can scroll to the right of the table, and click on the “On-Demand Linux pricing” to sort by price). Avoid instance types that have their architecture listed as “i386, x86_64” as they are often incompatible with computer nodes that have architecture listed as “x86_64”.
For the compute nodes, we recommend instance types with a minimum of 64 GB of memory. This will allow us to have enough memory to read in and split the .ns5 neural data files, which are around 40 GB in size. Due to the AWS limit of 64 vCPUs per user, if you use an instance with a large number of vCPUs, e.g. m6g.12xlarge, which has 48 vCPUs, you will only be able to create 1 compute node with 48 vCPUs as creating 2 compute nodes will require 96 vCPUs, which will exceed your vCPU limit. On the other hand, if you use instance types with 8 vCPUs, you will be able to create 8 instances for a total of 64 vCPUs. So, choosing the right instance types will allow you to have more vCPUs available for your jobs.
Finally, make sure MountDir is set to data and replace SnapshotId in the config file with the ID of the last snapshot named data in your AWS EC2 Snapshots Dashboard:
SharedStorage:
- MountDir: data
Name: ee3801
StorageType: Ebs
EbsSettings:
Size: 1000
Encrypted: false
SnapshotId: snap-xxxxxxxxxxxxxxxxx
41.
Next, we will do a one-time set up of your environment on AWS ParallelCluster. We will start by creating a cluster on AWS (enter the command below all on the same line):
(aws) $ pcluster create-cluster -c ~/cluster-config.yaml -n MyCluster01
If you get an error like:
Beginning cluster creation for cluster: MyCluster01
Creating stack named: parallelcluster-MyCluster01
Status: parallelcluster-MyCluster01 - ROLLBACK_IN_PROGRESS
Cluster creation failed. Failed events:
- AWS::CloudFormation::WaitCondition MasterServerWaitCondition Received FAILURE signal with UniqueId i-0e06294577f6606d0
You might need to delete the cluster first by doing:
(aws) $ pcluster list-clusters
MyCluster01 ROLLBACK_COMPLETE 2.9.1
(aws) $ pcluster delete-cluster -n MyCluster01
You can then try re-creating the cluster using a different head node.
42.
Once you receive the notification email stating that your head node is running, login to the cluster:
(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01
43.
Change to the directory where the snapshot you saved in Part A is mounted:
[ec2-user@ip-10-0-5-43 ~] $ cd /data
If you receive an error like:
-bash: cd: /data: No such file or directory
You might have to wait a couple of minutes for AWS to mount your /data volume before trying again.
44.
Miniconda has already been installed for you, along with an environment called env1. So you can initialize your conda environment by doing:
[ec2-user@ip-10-0-5-43 data] $ miniconda3/bin/conda init
This will add some commands to your ~/.bashrc, so you can reload it to get the conda commands to work:
[ec2-user@ip-10-0-5-43 data] $ source ~/.bashrc
(base) [ec2-user@ip-10-0-5-43 data] $ conda activate env1
(env1) [ec2-user@ip-10-0-5-43 data] $
Your prompt should now be prefixed by (env1) instead of (base).
You should also copy the /data/aws directory back to ~/.aws:
(env1) [ec2-user@ip-10-0-5-43 data] $ cp -r /data/aws ~/.aws
Check that your aws credentials are working properly by sending yourself a notification email (remember to replace the numbers in the topic-arn with your account number):
(env1) [ec2-user@ip-10-0-5-43 data] $ aws sns publish --topic-arn arn:aws:sns:ap-southeast-1:xxxxxx:awsnotify --message "ClusterTest"
45.
We will now clone the GitHub repository (replace the username for the PyHipp repository with your username):
(env1) [ec2-user@ip-10-0-5-43 x64] $ cd /data/src
(env1) [ec2-user@ip-10-0-5-43 src] $ git clone https://github.com/your_user_name/PyHipp
If you list the files and directories in the "src" directory, you will see that the DataProcessingTools and pyedfread packages are already installed.
46.
We will install the PyHipp module using pip:
(env1) [ec2-user@ip-10-0-5-43 src] $ cd PyHipp
(env1) [ec2-user@ip-10-0-5-43 PyHipp] $ pip install -r requirements.txt
(env1) [ec2-user@ip-10-0-5-43 PyHipp] $ pip install -e .
(env1) [ec2-user@ip-10-0-5-43 pyedfread] $ cd /data
47.
In order to try processing the data, we will want to run on one of the compute nodes as it has at least 64 GB of memory. So do the following to login to a compute node:
(env1) [ec2-user@ip-10-0-5-43 data] $ srun --pty /bin/bash
If you get an error saying that the srun command is not found, you may have to log out and log back in. This is usually due to some network volumes not being mounted yet after the creation of the cluster.
It will take a little time for the compute node to be started up, after which you should receive an email. Your prompt should also change to something like (the portion after queue1-dy- should correspond to the instance type you have specified in your cluster config file):
(base) [ec2-user@queue1-dy-m54xlarge-1 ~] $
Activate the env1 conda environment as usual, and change to the /data directory:
(base) [ec2-user@queue1-dy-m54xlarge-1 ~] $ conda activate env1
(env1) [ec2-user@queue1-dy-m54xlarge-1 ~] $ cd /data
48.
We will now start up ipython (which is a little easier to use than regular python) to test that everything was installed properly:
(env1) [ec2-user@queue1-dy-m54xlarge-1 data] $ ipython
In [ ]: import PyHipp as pyh
In [ ]: pyh.pyhcheck('hello')
hello
In [ ]: from pyedfread import edf
In [ ]: cd src/pyedfread
In [ ]: samples, events, messages = edf.pread('SUB001.EDF')
In [ ]: events.shape
Out[ ]: (485, 30)
Note that the In [ ]: prompt indicates that the command should be entered in iPython.
49.
We will now go through and test the software that has been installed. We will start by processing the Ripple .nev files containing the signals sent by Unity via the parallel port. We will use a Python class named RPLParallel defined in the PyHipp repository to create a RPLParallel object:
In [ ]: cd /data/picasso/20181105/session01
In [ ]: pyh.RPLParallel(saveLevel=1)
which should give you the following:
Object created
Opening .nev file, creating new RPLParallel object...
Object saved to file rplparallel_d41d.hkl
Out[ ]: <PyHipp.rplparallel.RPLParallel at 0x7fb0fcaa2160>
The saveLevel=1 argument tells the function to save the RPLParallel object into the current directory after it has been created. It is a feature common to all the classes defined in the PyHipp repository.
We will do the same for the eye fixation session:
In [ ]: cd ../sessioneye
In [ ]: pyh.RPLParallel(saveLevel=1)
50.
The navigation session and the fixation session each contain a .ns5 file that contains the neural data from 110 channels. We can separate out one of the channels (Channel 9) by doing:
In [ ]: cd ../session01
In [ ]: pyh.RPLSplit(channel=[9])
which will return the following showing that a RPLRaw object for Channel 9 was created (as the .ns5 is pretty large, the entire process might take 10 to 15 minutes):
Object created
.ns5 file loaded.
Processing channel 009
Calling RPLRaw for channel 009
Object created
Object saved to file rplraw_d41d.hkl
Channel 009 processed
Out[ ]: <PyHipp.rplsplit.RPLSplit at 0x7f07b25a7828>
We do not need to specify the saveLevel=1 argument as the primary function of the function is to create the appropriate RPLRaw objects, which are then saved by default.
If you would like to be notified when the function is done, you can follow these instructions to use CloudWatch to set up an alarm to notify you when the Head Node’s Network Out Bytes falls below 1,000,000.
If you check the current directory by doing:
In [ ]: ls
you will see a directory named array01. If you check that directory and its subdirectory:
In [ ]: ls array01
In [ ]: ls array01/channel009
you will see a file named rplraw_d41d.hkl that contains the raw data just for Channel 9.
51.
In order to generate the low-pass filtered signals, we will call the RPLLFP function from the channel directory:
In [ ]: cd array01/channel009
In [ ]: pyh.RPLLFP(saveLevel=1)
which will load the RPLRaw object and create a RPLLFP object:
Object loaded from file rplraw_d41d.hkl
Object created
Applying low-pass filter with frequencies 1.0 and 150.0 Hz
Object saved to file rpllfp_6eca.hkl
Out[7]: <PyHipp.rpllfp.RPLLFP at 0x7f07b259f940>
52.
Similarly, to generate the high-pass filtered signals, we will call the RPLHighPass function:
In [ ]: pyh.RPLHighPass(saveLevel=1)
which will load the RPLRaw object and create a RPLHighPass object:
Object loaded from file rplraw_d41d.hkl
Object created
Applying high-pass filter with frequencies 500.0 and 7500.0 Hz
Object saved to file rplhighpass_b59f.hkl
Out[7]: <PyHipp.rplhighpass.RPLHighPass at 0x7f07b23ca400>
You can see the two new files created in the current directory by doing:
In [ ]: ls
which will give you:
rplhighpass_b59f.hkl rpllfp_6eca.hkl rplraw_d41d.hkl
53.
We can process the Unity files by doing:
In [ ]: cd ../../
In [ ]: pyh.Unity(saveLevel=1)
which will return:
Object created
Object loaded from file rplparallel_d41d.hkl
Object saved to file unity_71bf.hkl
Out[ ]: <PyHipp.unity.Unity at 0x7facccd89400>
You will see that in creating the Unity object, the previously saved RPLParallel object was loaded to extract some information. If the RPLParallel was not saved previously, the information in the object will have to be recomputed from the raw data files. This reduction in unnecessary recomputation was one of the principles on which the objects in the PyHipp repository were designed.
54.
This next command will process the eye-tracking files for both the navigation (180702.edf) and fixation (P7_2.edf) sessions, create eyelink objects, and save them to the session01 and sessioneye directories:
In [ ]: cd ..
In [ ]: pyh.EDFSplit()
which should return the following:
Reading calibration edf file.
Object created
Object saved to file eyelink_24d5.hkl
Reading navigation edf file.
...
Object saved to file eyelink_24d5.hkl
Object created
Out[ ]: <PyHipp.eyelink.EDFSplit at 0x7faccce0f1d0>
Similar to the RPLSplit function above, we do not need to specify the saveLevel=1 argument as the primary function of EDFSplit is to create and save the eyelink objects.
You might get a few “SerializedWarning” or “DataFrame” warning messages, but it is safe to ignore them.
55.
In order to align the Ripple, Unity, and Eyelink data, we will call the following function:
In [ ]: cd session01
In [ ]: pyh.aligning_objects()
which should end with the following (you can ignore the messages that precede it):
finish aligning objects
56.
For performing the raycasting, you can do:
In [ ]: pyh.raycast(1)
You should see some text ending with:
Object loaded from file unity_71bf.hkl
Object loaded from file eyelink_24d5.hkl
Found path: /data/RCP/VirtualMaze.x86_64
This might take about 20 minutes to complete, so we will instead just check that it started running properly before stopping it. Create a new Terminal window, activate the aws conda environment, and ssh to your cluster:
(base) $ conda activate aws
(aws) $ pcluster ssh -i ~/MyKeyPair.pem -n MyCluster01
Change to the directory from which you ran the raycast function:
(base) [ec2-user@ip-10-0-5-43 ~] $ cd /data/picasso/20181105/session01
The raycast function writes to a log file as it is running, so we can check the log file’s contents by doing:
(base) [ec2-user@ip-10-0-5-43 session01] $ tail -f VirtualMazeBatchLog.txt
You have used the tail function before to look at the last few lines of a file in this lab, but by adding the -f flag, it will now continuously show you the last few lines of the log file. This is one of the ways to monitor a program as it is running. The command above should progressively show you something like:
Add "-logfile <log file location>.txt" to see unity logs during the data generation
There may be a need to copy the libraries found in the directory 'Plugins' to a new folder called 'Mono'
More command line arguments can be found at https://docs.unity3d.com/Manual/CommandLineArguments.html
Session List detected!
Setting density to : 220
Setting radius to : 1
Queuing /data/picasso/20181105/session01
1 sessions to be processed
Starting(1/1): /data/picasso/20181105/session01
5.037168%: Data Generation is still running. 10/3/2021 2:45:51 AM
The last line above indicates that the program has started running, so everything should be installed properly. At this point, you can terminate the tail program by typing Ctrl-c. You can go back to your first Terminal window that was running ipython, and type Ctrl-c to interrupt the raycast function. You can then exit ipython by typing Ctrl-d.
If ipython does not respond, you can do the following from your second Terminal window:
(base) [ec2-user@ip-10-0-5-43 session01] $ ps -ef | grep ipython
ec2-user 12958 7414 0 13:37 pts/0 00:00:00 /data/miniconda3/envs/env1/bin/python /data/miniconda3/envs/env1/bin/ipython
ec2-user 13459 12716 0 13:43 pts/1 00:00:00 grep --color=auto ipython
The ps command lists running processes, which are then sent using the pipe command | to the grep command to search for the string ipython. The process ID for ipython (12958 in the example above) can be used to terminate the process by doing the following (replace the process ID below with the one you obtained from the command above):
(base) [ec2-user@ip-10-0-5-43 session01] $ kill -9 12958
If the ipython process was terminated properly, you should see the bash prompt in the first Terminal window again:
(env1) [ec2-user@queue1-dy-m54xlarge-1 data] $
You can then exit from the compute node by doing:
(env1) [ec2-user@queue1-dy-m54xlarge-1 data] $ exit
57.
From the second Terminal window, you can check that a few files have now been created in the session01 directory:
(base) [ec2-user@ip-10-0-5-43 session01] $ ls
181105_Block1.nev eyelink_24d5.hkl s slist.txt
181105_Block1.ns5 logs.txt unity_71bf.hkl
array01 missingData.csv unityfile_eyelink.csv
binData.hdf RawData_T1-400 VirtualMazeBatchLog.txt
rplparallel_d41d.hkl
Once you have verified the files have been created, you can type Ctrl-d to logout from the head node.
Task
Include a screenshot with the above output in your lab report.
58.
The last thing to do is to copy the following two files for spike sorting from the PyHipp directory to the /data directory using your first Terminal window (which should still be logged into your head node):
(env1) [ec2-user@ip-10-0-5-43 data] $ cp /data/src/PyHipp/geom.csv /data/picasso
(env1) [ec2-user@ip-10-0-5-43 data] $ cp /data/src/PyHipp/sort.sh.txt /data/picasso
59.
You can now delete the files we created to test the raycasting:
(env1) [ec2-user@ip-10-0-5-43 data] $ cd picasso/20181105
(env1) [ec2-user@ip-10-0-5-43 20181105] $ rm session*/eyelink*hkl
(env1) [ec2-user@ip-10-0-5-43 20181105] $ cd session01
(env1) [ec2-user@ip-10-0-5-43 session01] $ rm *.hkl *.csv bin*hdf VirtualMaze* *.txt
60.
At this point, you should take a snapshot of the /data volume from your computer, so you will not have to go through the set up again:
(aws) $ update_snapshot.sh data 2 MyCluster01
You can move on to the next section while waiting for the snapshot to be completed. Once the snapshot is completed, you should receive an email notification from AWS.