PGInsider: Technical and How-to Articles
Getting Started with PGI on AWS
by Chris Parrott
PGI Community Edition compilers and tools for Linux/x86-64 are available as an Amazon Machine Image (AMI) on the AWS Marketplace, providing a low-cost option for those interested in doing GPU-accelerated computing using Amazon's extensive cloud computing resources. For as little as $3 per hour, you can create your own personal virtualized NVIDIA Volta V100 GPU-enabled system on Amazon's cloud. Just upload your application's source code, build it using the PGI compilers, and run it. This article guides you through the steps necessary to build and run an application using PGI compilers, and demonstrates how GPU-accelerated computing can be cost-effective on Amazon's cloud infrastructure.
Understanding Amazon's terminology is key to learning and effectively using Amazon Web Services' cloud computing infrastructure. It is worthwhile to take a few moments to define a few of these terms, as they are used extensively throughout the remainder of this article:
Amazon Elastic Compute Cloud (EC2) — the service that allows customers to rent virtual computers within Amazon's cloud computing infrastructure.
Amazon Machine Image (AMI) — a pre-configured virtual machine image containing an operating system plus the applications required for its intended purpose. AMIs based on several different operating systems are available, including Microsoft Windows and several distributions of Linux. The PGI Community Edition AMI is only available on Ubuntu Linux as of this writing.
Instance — a copy of an AMI running as a virtual computer. Users create instances from an AMI and customize them according to their particular needs.
Instance Type — pre-defined configurations that specify processor(s), memory, storage, network capability and usage cost. When creating an instance from an AMI, users choose from the various instance types made available for the particular AMI. See the on-demand pricing page for fees for each instance type. Note that the more powerful instance types are also more expensive.
Elastic Block Store (EBS) — provides persistent storage for use with EC2 instances. AMIs are normally configured with an EBS volume as the root storage device containing the operating system and applications. When creating an instance, users can increase the size of this root volume. Users can also create separate EBS volumes that can be mounted to instances. Note that the use of EBS incurs additional usage charges, typically $0.10-$0.12 per GiB per month for general purpose SSD volumes.
Regions — AWS hosts cloud computing resources at data centers in various geographic regions worldwide. Pricing varies by region. For this article, we will be using the "Oregon" or "us-west-2" region.
Signing in to AWS
To start, go to the Amazon Web Services page and click on the orange box in the upper right corner of the page. If you do not already have an AWS account, the text in the box will say Create an AWS Account. Click on this box and proceed through the steps as prompted to create your account. (Alternately, you can use the Create an AWS Account page.) You will need to provide some personal information, including a credit card. Amazon prorates the hourly charges of EC2 resources by the minute, so you only pay for what you use.
Note that new accounts include "free tier" access for 12 months; though GPU-accelerated computing is not included in the free tier as of this writing. Amazon also offers grants to subsidize usage of EC2 compute resources to students, educators and researchers for approved projects.
Once you have set up an account, AWS will save a cookie to your computer, so on future visits to the AWS portal page, the orange box will display the text Sign In to the Console instead. The AWS portal page should resemble something like Figure 1 below, with the orange box highlighted in red:
With your AWS account, you should sign in to the AWS console. Once you enter your credentials and click on the blue Sign In button, you will be taken to the AWS Console screen, which should resemble Figure 2 below:
Creating an Instance
Select the EC2 service from the main AWS console by clicking on All Services then EC2 as shown in Figure 2 above. Note that on future visits to the AWS console, a link to EC2 will also appear under the list of Recently visited services.
he EC2 dashboard should resemble Figure 3 below:
From here, click on the blue Launch Instance button to create a new instance as highlighted by the red box in Figure 3. You will next complete a series of steps to configure and bring online your AWS EC2 instance:
Choose an Amazon Machine Image (AMI) — you should now see a screen like Figure 4 below, indicating Step 1 at the top. In the left column, click on AWS Marketplace, type "PGI" in the search box, and then select the PGI Community Edition AMI. Figure 4 indicates these steps in order:
A pop-up window with details about the PGI AMI is presented next, showing available instance types and pricing for the AMI, as shown in Figure 5. Review the details, including the End User License Agreement, then press the blue "Continue" button to proceed.
Choose an Instance Type — We will be experimenting with several different instance types during the remainder of this article. For our initial experiment, you should choose a c5.xlarge instance type, which costs around $0.20 per hour. This is shown in Figure 6:
If you are satisfied with the defaults for this instance, you can select the blue Review and Launch button here. Otherwise, select the Next: Configure Instance Details button to customize some more configuration details about the instance you are about to create.
Configure Instance Details &mdahs; this screen is shown below in Figure 7. For now, we do not need to change anything here. You can review these options to see what sorts of configuration settings are available, though. More advanced configurations might require tweaking some of these options. For now, just click on the Next: Add Storage button.
Add Storage — this screen is shown below in Figure 8. The PGI AMI includes a 20 GiB General Purpose SSD EBS volume as the root storage device. If you need more, you can easily increase the size of this volume to something larger, or alternatively add a new volume. For example, you might want to create a volume that contains applications or data that is shared among multiple EC2 instances. Click on the Next: Add Tags button to proceed to the next screen.
Add Tags — this screen is shown in Figure 9. For now, you do not need to worry about adding any tags to your EC2 instance. Click on the Next: Configure Security Group button to proceed to the next screen.
Configure Security Group — this screen is show in Figure 10. A security group is a set of firewall rules that define the connections that can be made to your instance. By default, SSH connections to port 22 (the default SSH port) from any IP address are allowed. You can restrict connections to be from your local IP addresses if you wish. Once you are satisfied with these settings, you should click on the Review and Launch button, as highlighted by the red box in Figure 10.
Review Instance Launch — this screen is shown in Figure 11. From this screen, you have one last opportunity to review all the settings for your instance.
When you click the Launch button, a window will pop up to allow you to select an existing SSH key pair for authenticating to your AWS instance, or create a new SSH key pair if you have not already done so. This window is shown in Figure 12 below. For security purposes, all logins to AWS instances require SSH key pairs, rather than sending cleartext passwords through SSH for authentication. This also allows you to access your instance from scripts without having to store an SSH password in the script.
Should you need to create a new SSH key pair for logging in to AWS EC2, pull down the menu item that says Choose an existing key pair and select Create a new key pair. Give your key pair a name in the following text entry box, and then click on the Download Key Pair button. The downloaded file should have a name with a .pem extension, e.g. MyKey.pem. Save this file in a safe location, because you will need it to log in to the AWS EC2 instances you create.
Logging into the Instance
Once you have created and launched an instance, you can view it from the EC2 Dashboard. Refer to Figure 13 below for an example. Note the DNS name or IP address of the running instance. You will use this information to log in to the running instance.
Before proceeding, make sure the "Instance State" field shows "running" and the "Status Checks" field shows "2/2 checks passed" or similar. If the "Status Checks" field still shows "Initializing" the instance is not yet ready to accept connections.
The PGI AMI comes with a user account named 'ubuntu' which has full sudo privileges, so you can create an alternate account for yourself using your preferred username.
With this information, you can now connect and login to your instance. For example, suppose your instance has been brought up on an IP address of 192.168.144.127, and you are using the private key stored in the file MyKey.pem. If you are using the OpenSSH client bundled with Linux, macOS, FreeBSD, or various other operating systems, you can issue the following command to log into your instance:
$ ssh -i MyKey.pem firstname.lastname@example.org
If you are using the PuTTY client on Windows, you need to use the PuTTYgen tool, available as part of the complete PuTTY installation package, to convert your key to a .ppk file that PuTTY can use.
When you log in to your instance successfully, you should see a banner message and a prompt similar to the following:
Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-1063-aws x86_64) * Documentation: https://help.ubuntu.com * Management: https://landscape.canonical.com * Support: https://ubuntu.com/advantage Get cloud support with Ubuntu Advantage Cloud Guest: http://www.ubuntu.com/business/services/cloud 4 packages can be updated. 0 updates are security updates. ================================== == PGI Community Edition == == with OpenACC and CUDA Fortran == =================================== PGI Community Edition version 18.10 Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved. Invoke the PGI Fortran, C or C++ compilers as follows: pgcc|pgc++|pgfortran [options]
For more information, see the online documentation at: https://www.pgroup.com/resources/docs/18.10/x86/
If you see a message like *** System restart required ***, the underlying Ubuntu Linux operating system has automatically downloaded an important security update, and the system needs to be rebooted in order to apply it. You should issue the following command to reboot your instance:
$ sudo shutdown -r now
Then wait a few moments, and log back in again.
Building and Running an Application on the Instance
This section guides you through building and running CloverLeaf, which is "a hydrodynamics mini-app to solve the compressible Euler equations in 2D, using an explicit, second-order method." Obtain the CloverLeaf source code by issuing the following command:
$ git clone --recurse-submodules https://github.com/UK-MAC/CloverLeaf.git
Once this command finishes, you should see a new directory named CloverLeaf.
First, log into the c5.xlarge instance you created above, and then build the serial version of CloverLeaf. This version runs on only one CPU core, and serves to provide a baseline time for performance of the application:
$ cd CloverLeaf $ make serial COMPILER=PGI
When the build completes, you should see an executable named clover_leaf in the CloverLeaf_Serial subdirectory. Try running CloverLeaf using the clover_bm32.in input deck. CloverLeaf expects its input file to be named clover.in. Therefore, you should first move the existing clover.in file out of the way, and then copy the clover_bm32.in file from the InputDecks subdirectory to the current directory as clover.in:
$ cd CloverLeaf_Serial $ mv clover.in clover.in.bak $ cp InputDecks/clover_bm32.in clover.in
Now, you should be able to run the serial version of CloverLeaf as follows:
The serial version takes about three and one-half hours to complete. You should see a series of time steps printed to the screen, culminating in the final step:
Average time per cell 1.5531565209357159E-007 Step time per cell 1.6049411594091604E-007 Step 2955 time 2.1820157 control sound timestep 7.45E-04 1, 1 x 6.51E-04 y 1.30E-03 Wall clock 13535.59463596344
The total cost of running the serial version of CloverLeaf on the c5.xlarge instance is roughly $0.17 per hour x 3.760 hours = $0.64.
Changing the Instance Type
You will need to change the instance type for subsequent experiments. To do this, stop your instance using the steps in Shutting Down Your Instance at the end of this article.
To change the instance type, navigate to the Instances page of the /EC2 Dashboard as shown above in Figure 13. Select the instance, click the Actions button at the top or right-click on the instance, choose Instance Settings > Change Instance Type and choose from the drop-down in the Change Instance Type pop-up. These steps are shown below in Figure 14. For the next experiment, we shall use the c5.9xlarge instance type.
To re-start the instance, after clicking Actions or right-clicking on the instance, choose Instance State > Start.
Building and Running Parallel Applications on the Instance
OpenMP Parallel Version
Next, try building the OpenMP version of CloverLeaf. For this experiment, stop the instance and change the instance type to c5.9xlarge as described above. This instance type provides 36 virtualized CPU cores at a rate of $1.53 per hour, which should deliver a substantial speedup over the serial version.
Start the instance and log into it as before. Then change into the top-level CloverLeaf directory and issue the following command:
$ make openmp COMPILER=PGI
Once the build completes, you should change to the CloverLeaf_OpenMP subdirectory, and copy the same clover_bm32.in input file as described previously:
$ cd CloverLeaf_OpenMP $ mv clover.in clover.in.bak $ cp InputDecks/clover_bm32.in clover.in
You can now the OpenMP version of CloverLeaf as follows:
$ OMP_NUM_THREADS=36 ./clover_leaf
The OpenMP version requires approximately 30 minutes to complete:
Average time per cell 2.4922225906677995E-008 Step time per cell 2.4786172111311719E-008 Step 2955 time 2.1820157 control sound timestep 7.45E-04 1, 1 x 6.51E-04 y 1.30E-03 Wall clock 2171.918267965317
The total cost of running the OpenMP version of CloverLeaf on the c5.9xlarge instance is roughly $1.53 per hour x 0.603 hours = $0.92.
As you can see, the OpenMP version is a big win over the serial version, requiring less time to complete at roughly the same cost.
MPI Parallel Version
You can also build a parallel version of CloverLeaf using MPI. The PGI AMI includes a build of Open MPI that is bundled with the PGI compilers. For this experiment, we continue using the c5.9xlarge instance type from the previous section. Change back to the top-level CloverLeaf directory, and issue the following command:
$ make mpi COMPILER=PGI
$ cd CloverLeaf_MPI $ mv clover.in clover.in.bak $ cp InputDecks/clover_bm32.in clover.in
You can now run the MPI version of CloverLeaf as follows:
$ mpirun -np 36 ./clover_leaf
The MPI version of CloverLeaf seems to be a bit faster than the OpenMP version:
Average time per cell 2.4026006379229807E-008 Step time per cell 2.3560820005109740E-008 Step 2955 time 2.1820157 control sound timestep 7.45E-04 1, 1 x 6.51E-04 y 1.30E-03 Wall clock 2093.818086147308
The total cost of running the MPI version of CloverLeaf on the c5.9xlarge instance in the Oregon region is roughly $1.53 per hour x 0.582 hours = $0.89.
In this case, the MPI version is slightly faster than the OpenMP version, at a slightly lower cost.
OpenACC Parallel Multicore Version
It is also possible to build a parallel version of CloverLeaf that runs on multiple host CPU cores using OpenACC directives. This is potentially useful for testing applications with OpenACC when a GPU is not available on the system, or as a first step toward porting a given application to run on GPUs.
Before you can build the OpenACC version of CloverLeaf, you need to issue a couple of commands to fix a couple of minor build issues with this version. Change back to the top-level CloverLeaf directory and issue the following commands:
$ ln -s CloverLeaf_OpenACC CloverLeaf_OpenACC_KERNELS $ sed -i -e 's#-ta=nvidia,cc35#-ta=multicore#g' CloverLeaf_OpenACC/Makefile
Now invoke the build as follows:
$ make openacc_kernels COMPILER=PGI
Once the build completes, you should change to the CloverLeaf_OpenACC subdirectory, and copy the same clover_bm32.in input file as described previously:
$ cd CloverLeaf_OpenACC $ mv clover.in clover.in.bak $ cp InputDecks/clover_bm32.in clover.in
You can now run the OpenACC version of CloverLeaf on multiple host CPU cores as follows:
$ mpirun -np 1 ./clover_leaf
CloverLeaf should complete in around the same amount of time as the MPI version in the previous section:
Average time per cell 2.2886566252314620E-008 Step time per cell 2.2746551419711775E-008 Step 2955 time 2.1820157 control sound timestep 7.45E-04 1, 1 x 6.51E-04 y 1.30E-03 Wall clock 1994.515158891678
The total cost of running the OpenACC Multicore version of CloverLeaf on the c5.9xlarge instance in the Oregon region is roughly $1.53 per hour x 0.554 hours = $0.85.
So, in this case, running the OpenACC Multicore version of CloverLeaf is slightly cheaper than running the MPI version on the same hardware.
OpenACC Parallel Version on 1 GPU
Stop your instance so we can prepare to use a NVIDIA Volta V100 GPU to accelerate CloverLeaf via OpenACC. AWS doesn’t provide access to GPU-enabled instance types by default, so users must first request access. Check the EC2 Service Limits page to see if you have access to p3.2xlarge and p3.8xlarge instance types. If not, submit a request to AWS via the Request limit increase link.
Once you’ve verified you have access to p3 instance types, change the instance type to p3.2xlarge, and then start the instance. The p3.2xlarge instance type provides eight virtualized CPU cores and one virtualized V100 GPU, which should provide a substantial speedup over the serial version. Amazon charges a higher rate for it accordingly: $3.06 per hour.
We are going to use a slightly different version of CloverLeaf for the next couple of experiments. This version has been modified to better support running CloverLeaf on multiple GPUs on a single system. To obtain this version of CloverLeaf, issue the following command:
$ git clone https://github.com/UoB-HPC/CloverLeaf-OpenACC
Once again, we need to fix up a few things in the Makefile:
$ sed -i -e 's#-ta=tesla,cc60#-ta=nvidia,cc35,cc60,cc70 -DUSE_CUDA_AWARE_MPI#g' CloverLeaf_OpenACC/Makefile
Now invoke the build as follows:
$ cd CloverLeaf-OpenACC $ make COMPILER=PGI
Once the build completes, you should copy the same clover_bm32.in input file as described previously:
$ mv clover.in clover.in.bak $ cp InputDecks/clover_bm32.in clover.in
You can now run the OpenACC version of CloverLeaf on a single V100 GPU as follows:
$ mpirun -np 1 ./clover_leaf
A single V100 completes an entire run of this application in just over three minutes:
Average time per cell 2.1036212802534272E-009 Step time per cell 2.0812310847557253E-009 Step 2955 time 2.1820157 control sound timestep 7.45E-04 1, 1 x 6.51E-04 y 1.30E-03 Wall clock 183.3258261680603
Even more impressively, the total cost of running the OpenACC version of CloverLeaf on the p3.2xlarge instance is $3.04 per hour x 0.051 hours = $0.15.
So not only can GPU-accelerated computing save a lot of time when running an application, it can save a lot of money as well.
OpenACC Parallel Version on 4 GPUs
Now we are going to try a really fun experiment to fully showcase the power of GPU-accelerated computing: we will harness the power of multiple GPUs in parallel to run the same CloverLeaf problem.
For this experiment, you will try using not one, but four V100s GPU to accelerate CloverLeaf via OpenACC. Bring down your p3.2xlarge instance and change the instance type to p3.8xlarge. This instance type provides 32 virtualized CPU cores and four virtualized V100 GPUs. As this is one of the most powerful instance types AWS EC2 offers, its cost is reflected accordingly: Amazon charges around $12 per hour to use a p3.8xlarge instance. Fortunately, you will not be using this one for very long at all.
We will reuse the same GPU-enhanced version of the CloverLeaf source code as in the previous section, so there is no need to download or rebuild it here. Simply change to the CloverLeaf_OpenACC directory and run CloverLeaf as follows:
$ git clone https://github.com/UoB-HPC/CloverLeaf-OpenACC
Once you have the source code, bring up the p3.8xlarge instance and log in as usual. Apply a similar build fix as in Section 6.4 and then build it as follows:
$ cd CloverLeaf-OpenACC $ mpirun -np 4 ./clover_leaf
Notice that four GPUs can whiz through this CloverLeaf problem in about a minute:
Average time per cell 5.6823226477422412E-010 Step time per cell 5.6024065189477467E-010 Step 2955 time 2.1820157 control sound timestep 7.45E-04 1, 1 x 6.51E-04 y 1.30E-03 Wall clock 49.52016711235046
Not surprisingly, we get a nearly 4x speed-up over the single-GPU experiment. This improved performance mostly makes up for the more expensive multi-GPU instance type, as the cost is roughly the same: $12.24 per hour x 0.0138 hours = $0.17.
Multiple-GPU instance types can be very cost-effective, especially when running larger, time-consuming parallel-capable applications.
Below is a table summarizing all of our results.
|Version||Instance Type||Time (secs.)||Cost|
|1 Skylake Core||c5.xlarge||13,536||$0.64|
|36 Skylake Cores (OpenMP)||c5.9xlarge||2,171||$0.92|
|36 Skylake Cores (MPI)||c5.9xlarge||2,094||$0.89|
|36 Sklyake Cores (OpenACC)||c5.9xlarge||1,995||$0.85|
|1 V100 GPU (OpenACC)||p3.2xlarge||182||$0.15|
|4 V100 GPUs (OpenACC)||p3.8xlarge||50||$0.17|
Shutting Down Your Instances
One important item that bears repeating is that your instance continues to accrue charges as long as it is running. You should shut down (“stop”) your instances whenever you are not using them to avoid unnecessary fees. To do this, bring up the EC2 Dashboard, find the running instance you need to shut down in the list of instances, right click on it, and select Instance State followed by Stop. Figure 15 below illustrates these steps:
This can take a few minutes, so verify that it has stopped before closing your browser. The EC2 Dashboard should resemble Figure 16 below when your instance has reached the Stopped state:
IMPORTANT NOTE: The default Terminate action means that the instance will be removed, and its associated root storage (EBS volume) will be deleted. Do not change your instance state to Terminate unless you are finished with your instance and wish to delete it.
Using the PGI AMI on AWS, you can access GPU-accelerated computing for very little investment. Using Amazon's EC2 cloud computing platform, we sped up a sample application from running in three and one-half hours on a single-core CPU, to just under a minute using four state-of-the-art NVIDIA Volta V100 GPUs. At the same time, accelerating the application with the GPU resulted in a significant cost savings.