Getting Started with PGI on AWS

by Chris Parrott

PGI Community Edition compilers and tools for Linux/x86-64 are available as an Amazon Machine Image (AMI) on the AWS Marketplace, providing a low-cost option for those interested in doing GPU-accelerated computing using Amazon's extensive cloud computing resources. For as little as $3 per hour, you can create your own personal virtualized NVIDIA Volta V100 GPU-enabled system on Amazon's cloud. Just upload your application's source code, build it using the PGI compilers, and run it. This article guides you through the steps necessary to build and run an application using PGI compilers, and demonstrates how GPU-accelerated computing can be cost-effective on Amazon's cloud infrastructure.

AWS Terminology

Understanding Amazon's terminology is key to learning and effectively using Amazon Web Services' cloud computing infrastructure. It is worthwhile to take a few moments to define a few of these terms, as they are used extensively throughout the remainder of this article:

  • Amazon Elastic Compute Cloud (EC2) — the service that allows customers to rent virtual computers within Amazon's cloud computing infrastructure.

  • Amazon Machine Image (AMI) — a pre-configured virtual machine image containing an operating system plus the applications required for its intended purpose. AMIs based on several different operating systems are available, including Microsoft Windows and several distributions of Linux. The PGI Community Edition AMI is only available on Ubuntu Linux as of this writing.

  • Instance — a copy of an AMI running as a virtual computer. Users create instances from an AMI and customize them according to their particular needs.

  • Instance Type — pre-defined configurations that specify processor(s), memory, storage, network capability and usage cost. When creating an instance from an AMI, users choose from the various instance types made available for the particular AMI. See the on-demand pricing page for fees for each instance type. Note that the more powerful instance types are also more expensive.

  • Elastic Block Store (EBS) — provides persistent storage for use with EC2 instances. AMIs are normally configured with an EBS volume as the root storage device containing the operating system and applications. When creating an instance, users can increase the size of this root volume. Users can also create separate EBS volumes that can be mounted to instances. Note that the use of EBS incurs additional usage charges, typically $0.10-$0.12 per GiB per month for general purpose SSD volumes.

  • Regions — AWS hosts cloud computing resources at data centers in various geographic regions worldwide. Pricing varies by region. For this article, we will be using the "Oregon" or "us-west-2" region.

Signing in to AWS

To start, go to the Amazon Web Services page and click on the orange box in the upper right corner of the page. If you do not already have an AWS account, the text in the box will say Create an AWS Account. Click on this box and proceed through the steps as prompted to create your account. (Alternately, you can use the Create an AWS Account page.) You will need to provide some personal information, including a credit card. Amazon prorates the hourly charges of EC2 resources by the minute, so you only pay for what you use.

Note that new accounts include "free tier" access for 12 months; though GPU-accelerated computing is not included in the free tier as of this writing. Amazon also offers grants to subsidize usage of EC2 compute resources to students, educators and researchers for approved projects.

Once you have set up an account, AWS will save a cookie to your computer, so on future visits to the AWS portal page, the orange box will display the text Sign In to the Console instead. The AWS portal page should resemble something like Figure 1 below, with the orange box highlighted in red:

AWS Portal
Figure 1. AWS Portal

With your AWS account, you should sign in to the AWS console. Once you enter your credentials and click on the blue Sign In button, you will be taken to the AWS Console screen, which should resemble Figure 2 below:

AWS Console
Figure 2. AWS Console

Creating an Instance

Select the EC2 service from the main AWS console by clicking on All Services then EC2 as shown in Figure 2 above. Note that on future visits to the AWS console, a link to EC2 will also appear under the list of Recently visited services.

he EC2 dashboard should resemble Figure 3 below:

AWS EC2 Dashboard
Figure 3. AWS EC2 Dashboard

From here, click on the blue Launch Instance button to create a new instance as highlighted by the red box in Figure 3. You will next complete a series of steps to configure and bring online your AWS EC2 instance:

  1. Choose an Amazon Machine Image (AMI) — you should now see a screen like Figure 4 below, indicating Step 1 at the top. In the left column, click on AWS Marketplace, type "PGI" in the search box, and then select the PGI Community Edition AMI. Figure 4 indicates these steps in order:

    AWS: Selecting the PGI Community Edition AMI
    Figure 4. Step 1: Selecting the PGI Community Edition AMI

    A pop-up window with details about the PGI AMI is presented next, showing available instance types and pricing for the AMI, as shown in Figure 5. Review the details, including the End User License Agreement, then press the blue "Continue" button to proceed.

    AWS: PGI Community Edition AMI Details
    Figure 5. PGI Community Edition AMI Details

  2. Choose an Instance Type — We will be experimenting with several different instance types during the remainder of this article. For our initial experiment, you should choose a c5.xlarge instance type, which costs around $0.20 per hour. This is shown in Figure 6:

    Choosing the AWS Instance Type
    Figure 6. Step 2: Choose and Instance Type

    If you are satisfied with the defaults for this instance, you can select the blue Review and Launch button here. Otherwise, select the Next: Configure Instance Details button to customize some more configuration details about the instance you are about to create.

  3. Configure Instance Details &mdahs; this screen is shown below in Figure 7. For now, we do not need to change anything here. You can review these options to see what sorts of configuration settings are available, though. More advanced configurations might require tweaking some of these options. For now, just click on the Next: Add Storage button.

    Configuring Instance Details
    Figure 7. Step 3: Configure Instance Details

  4. Add Storage — this screen is shown below in Figure 8. The PGI AMI includes a 20 GiB General Purpose SSD EBS volume as the root storage device. If you need more, you can easily increase the size of this volume to something larger, or alternatively add a new volume. For example, you might want to create a volume that contains applications or data that is shared among multiple EC2 instances. Click on the Next: Add Tags button to proceed to the next screen.

    Adding Storage
    Figure 8. Step 4: Add Storage

  5. Add Tags — this screen is shown in Figure 9. For now, you do not need to worry about adding any tags to your EC2 instance. Click on the Next: Configure Security Group button to proceed to the next screen.

    Adding Tags
    Figure 9. Step 5: Add Tags

  6. Configure Security Group — this screen is show in Figure 10. A security group is a set of firewall rules that define the connections that can be made to your instance. By default, SSH connections to port 22 (the default SSH port) from any IP address are allowed. You can restrict connections to be from your local IP addresses if you wish. Once you are satisfied with these settings, you should click on the Review and Launch button, as highlighted by the red box in Figure 10.

    Configuring Security Groups
    Figure 10. Step 6: Configure Security Groups

  7. Review Instance Launch — this screen is shown in Figure 11. From this screen, you have one last opportunity to review all the settings for your instance.

    Reviewing Instance Launch Settings
    Figure 11. Step 7: Review Instance Launch

    When you click the Launch button, a window will pop up to allow you to select an existing SSH key pair for authenticating to your AWS instance, or create a new SSH key pair if you have not already done so. This window is shown in Figure 12 below. For security purposes, all logins to AWS instances require SSH key pairs, rather than sending cleartext passwords through SSH for authentication. This also allows you to access your instance from scripts without having to store an SSH password in the script.

    Should you need to create a new SSH key pair for logging in to AWS EC2, pull down the menu item that says Choose an existing key pair and select Create a new key pair. Give your key pair a name in the following text entry box, and then click on the Download Key Pair button. The downloaded file should have a name with a .pem extension, e.g. MyKey.pem. Save this file in a safe location, because you will need it to log in to the AWS EC2 instances you create.

    Configuring SSH Key Pair
    Figure 12. Configuring an SSH Key Pair

    See the EC2 Key Pair documentation page for information on creating an SSH key pair. More information about using SSH to connect to your instance can be found in the EC2 User Guide.

Logging into the Instance

Once you have created and launched an instance, you can view it from the EC2 Dashboard. Refer to Figure 13 below for an example. Note the DNS name or IP address of the running instance. You will use this information to log in to the running instance.

Before proceeding, make sure the "Instance State" field shows "running" and the "Status Checks" field shows "2/2 checks passed" or similar. If the "Status Checks" field still shows "Initializing" the instance is not yet ready to accept connections.

AWS EC2 Instance List
Figure 13. AWS EC2 Instances

The PGI AMI comes with a user account named 'ubuntu' which has full sudo privileges, so you can create an alternate account for yourself using your preferred username.

With this information, you can now connect and login to your instance. For example, suppose your instance has been brought up on an IP address of 192.168.144.127, and you are using the private key stored in the file MyKey.pem. If you are using the OpenSSH client bundled with Linux, macOS, FreeBSD, or various other operating systems, you can issue the following command to log into your instance:

$ ssh -i MyKey.pem ubuntu@192.168.144.127

If you are using the PuTTY client on Windows, you need to use the PuTTYgen tool, available as part of the complete PuTTY installation package, to convert your key to a .ppk file that PuTTY can use.

When you log in to your instance successfully, you should see a banner message and a prompt similar to the following:



Welcome to Ubuntu 16.04.5 LTS (GNU/Linux 4.4.0-1063-aws x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

4 packages can be updated.
0 updates are security updates.

==================================
==     PGI Community Edition     ==
== with OpenACC and CUDA Fortran ==
===================================
 
PGI Community Edition version 18.10
 
Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.

Invoke the PGI Fortran, C or C++ compilers as follows:

    pgcc|pgc++|pgfortran [options] 

For more information, see the online documentation at:

    https://www.pgroup.com/resources/docs/18.10/x86/

If you see a message like *** System restart required ***, the underlying Ubuntu Linux operating system has automatically downloaded an important security update, and the system needs to be rebooted in order to apply it. You should issue the following command to reboot your instance:

    $ sudo shutdown -r now

Then wait a few moments, and log back in again.

Building and Running an Application on the Instance

This section guides you through building and running CloverLeaf, which is "a hydrodynamics mini-app to solve the compressible Euler equations in 2D, using an explicit, second-order method." Obtain the CloverLeaf source code by issuing the following command:

    $ git clone --recurse-submodules https://github.com/UK-MAC/CloverLeaf.git

Once this command finishes, you should see a new directory named CloverLeaf.

Serial Version

First, log into the c5.xlarge instance you created above, and then build the serial version of CloverLeaf. This version runs on only one CPU core, and serves to provide a baseline time for performance of the application:

    $ cd CloverLeaf
    $ make serial COMPILER=PGI

When the build completes, you should see an executable named clover_leaf in the CloverLeaf_Serial subdirectory. Try running CloverLeaf using the clover_bm32.in input deck. CloverLeaf expects its input file to be named clover.in. Therefore, you should first move the existing clover.in file out of the way, and then copy the clover_bm32.in file from the InputDecks subdirectory to the current directory as clover.in:

    $ cd CloverLeaf_Serial
    $ mv clover.in clover.in.bak
    $ cp InputDecks/clover_bm32.in clover.in

Now, you should be able to run the serial version of CloverLeaf as follows:

    $ ./clover_leaf

The serial version takes about three and one-half hours to complete. You should see a series of time steps printed to the screen, culminating in the final step:

 Average time per cell    1.5531565209357159E-007
 Step time per cell       1.6049411594091604E-007
 Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
 Wall clock     13535.59463596344

The total cost of running the serial version of CloverLeaf on the c5.xlarge instance is roughly $0.17 per hour x 3.760 hours = $0.64.

Changing the Instance Type

You will need to change the instance type for subsequent experiments. To do this, stop your instance using the steps in Shutting Down Your Instance at the end of this article.

To change the instance type, navigate to the Instances page of the /EC2 Dashboard as shown above in Figure 13. Select the instance, click the Actions button at the top or right-click on the instance, choose Instance Settings > Change Instance Type and choose from the drop-down in the Change Instance Type pop-up. These steps are shown below in Figure 14. For the next experiment, we shall use the c5.9xlarge instance type.

Changing AWS Instance Type
Figure 14. Changing the Instance Type

To re-start the instance, after clicking Actions or right-clicking on the instance, choose Instance State > Start.

Building and Running Parallel Applications on the Instance

OpenMP Parallel Version

Next, try building the OpenMP version of CloverLeaf. For this experiment, stop the instance and change the instance type to c5.9xlarge as described above. This instance type provides 36 virtualized CPU cores at a rate of $1.53 per hour, which should deliver a substantial speedup over the serial version.

Start the instance and log into it as before. Then change into the top-level CloverLeaf directory and issue the following command:

    $ make openmp COMPILER=PGI

Once the build completes, you should change to the CloverLeaf_OpenMP subdirectory, and copy the same clover_bm32.in input file as described previously:

    $ cd CloverLeaf_OpenMP
    $ mv clover.in clover.in.bak
    $ cp InputDecks/clover_bm32.in clover.in

You can now the OpenMP version of CloverLeaf as follows:

    $ OMP_NUM_THREADS=36 ./clover_leaf

The OpenMP version requires approximately 30 minutes to complete:

 Average time per cell    2.4922225906677995E-008
 Step time per cell       2.4786172111311719E-008
 Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
 Wall clock     2171.918267965317

The total cost of running the OpenMP version of CloverLeaf on the c5.9xlarge instance is roughly $1.53 per hour x 0.603 hours = $0.92.

As you can see, the OpenMP version is a big win over the serial version, requiring less time to complete at roughly the same cost.

MPI Parallel Version

You can also build a parallel version of CloverLeaf using MPI. The PGI AMI includes a build of Open MPI that is bundled with the PGI compilers. For this experiment, we continue using the c5.9xlarge instance type from the previous section. Change back to the top-level CloverLeaf directory, and issue the following command:

    $ make mpi COMPILER=PGI

    $ cd CloverLeaf_MPI
    $ mv clover.in clover.in.bak
    $ cp InputDecks/clover_bm32.in clover.in

You can now run the MPI version of CloverLeaf as follows:

    $ mpirun -np 36 ./clover_leaf

The MPI version of CloverLeaf seems to be a bit faster than the OpenMP version:

 Average time per cell    2.4026006379229807E-008
 Step time per cell       2.3560820005109740E-008
 Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
 Wall clock     2093.818086147308

The total cost of running the MPI version of CloverLeaf on the c5.9xlarge instance in the Oregon region is roughly $1.53 per hour x 0.582 hours = $0.89.

In this case, the MPI version is slightly faster than the OpenMP version, at a slightly lower cost.

OpenACC Parallel Multicore Version

It is also possible to build a parallel version of CloverLeaf that runs on multiple host CPU cores using OpenACC directives. This is potentially useful for testing applications with OpenACC when a GPU is not available on the system, or as a first step toward porting a given application to run on GPUs.

Before you can build the OpenACC version of CloverLeaf, you need to issue a couple of commands to fix a couple of minor build issues with this version. Change back to the top-level CloverLeaf directory and issue the following commands:

    $ ln -s CloverLeaf_OpenACC CloverLeaf_OpenACC_KERNELS
    $ sed -i -e 's#-ta=nvidia,cc35#-ta=multicore#g' CloverLeaf_OpenACC/Makefile

Now invoke the build as follows:

    $ make openacc_kernels COMPILER=PGI

Once the build completes, you should change to the CloverLeaf_OpenACC subdirectory, and copy the same clover_bm32.in input file as described previously:

    $ cd CloverLeaf_OpenACC
    $ mv clover.in clover.in.bak
    $ cp InputDecks/clover_bm32.in clover.in

You can now run the OpenACC version of CloverLeaf on multiple host CPU cores as follows:

    $ mpirun -np 1 ./clover_leaf

CloverLeaf should complete in around the same amount of time as the MPI version in the previous section:

 Average time per cell    2.2886566252314620E-008
 Step time per cell       2.2746551419711775E-008
 Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
 Wall clock     1994.515158891678     

The total cost of running the OpenACC Multicore version of CloverLeaf on the c5.9xlarge instance in the Oregon region is roughly $1.53 per hour x 0.554 hours = $0.85.

So, in this case, running the OpenACC Multicore version of CloverLeaf is slightly cheaper than running the MPI version on the same hardware.

OpenACC Parallel Version on 1 GPU

Stop your instance so we can prepare to use a NVIDIA Volta V100 GPU to accelerate CloverLeaf via OpenACC. AWS doesn’t provide access to GPU-enabled instance types by default, so users must first request access. Check the EC2 Service Limits page to see if you have access to p3.2xlarge and p3.8xlarge instance types. If not, submit a request to AWS via the Request limit increase link.

Once you’ve verified you have access to p3 instance types, change the instance type to p3.2xlarge, and then start the instance. The p3.2xlarge instance type provides eight virtualized CPU cores and one virtualized V100 GPU, which should provide a substantial speedup over the serial version. Amazon charges a higher rate for it accordingly: $3.06 per hour.

We are going to use a slightly different version of CloverLeaf for the next couple of experiments. This version has been modified to better support running CloverLeaf on multiple GPUs on a single system. To obtain this version of CloverLeaf, issue the following command:

    $ git clone https://github.com/UoB-HPC/CloverLeaf-OpenACC

Once again, we need to fix up a few things in the Makefile:

    $ sed -i -e 's#-ta=tesla,cc60#-ta=nvidia,cc35,cc60,cc70 -DUSE_CUDA_AWARE_MPI#g' CloverLeaf_OpenACC/Makefile

Now invoke the build as follows:

    $ cd CloverLeaf-OpenACC
    $ make COMPILER=PGI

Once the build completes, you should copy the same clover_bm32.in input file as described previously:

    $ mv clover.in clover.in.bak
    $ cp InputDecks/clover_bm32.in clover.in

You can now run the OpenACC version of CloverLeaf on a single V100 GPU as follows:

    $ mpirun -np 1 ./clover_leaf

A single V100 completes an entire run of this application in just over three minutes:

 Average time per cell    2.1036212802534272E-009
 Step time per cell       2.0812310847557253E-009
 Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
 Wall clock     183.3258261680603     

Even more impressively, the total cost of running the OpenACC version of CloverLeaf on the p3.2xlarge instance is $3.04 per hour x 0.051 hours = $0.15.

So not only can GPU-accelerated computing save a lot of time when running an application, it can save a lot of money as well.

OpenACC Parallel Version on 4 GPUs

Now we are going to try a really fun experiment to fully showcase the power of GPU-accelerated computing: we will harness the power of multiple GPUs in parallel to run the same CloverLeaf problem.

For this experiment, you will try using not one, but four V100s GPU to accelerate CloverLeaf via OpenACC. Bring down your p3.2xlarge instance and change the instance type to p3.8xlarge. This instance type provides 32 virtualized CPU cores and four virtualized V100 GPUs. As this is one of the most powerful instance types AWS EC2 offers, its cost is reflected accordingly: Amazon charges around $12 per hour to use a p3.8xlarge instance. Fortunately, you will not be using this one for very long at all.

We will reuse the same GPU-enhanced version of the CloverLeaf source code as in the previous section, so there is no need to download or rebuild it here. Simply change to the CloverLeaf_OpenACC directory and run CloverLeaf as follows:

    $ git clone https://github.com/UoB-HPC/CloverLeaf-OpenACC

Once you have the source code, bring up the p3.8xlarge instance and log in as usual. Apply a similar build fix as in Section 6.4 and then build it as follows:

    $ cd CloverLeaf-OpenACC
    $ mpirun -np 4 ./clover_leaf

Notice that four GPUs can whiz through this CloverLeaf problem in about a minute:

 Average time per cell    5.6823226477422412E-010
 Step time per cell       5.6024065189477467E-010
 Step    2955 time   2.1820157 control    sound    timestep   7.45E-04       1,       1 x  6.51E-04 y  1.30E-03
 Wall clock     49.52016711235046 

Not surprisingly, we get a nearly 4x speed-up over the single-GPU experiment. This improved performance mostly makes up for the more expensive multi-GPU instance type, as the cost is roughly the same: $12.24 per hour x 0.0138 hours = $0.17.

Multiple-GPU instance types can be very cost-effective, especially when running larger, time-consuming parallel-capable applications.

Results Summary

Below is a table summarizing all of our results.

Version Instance Type Time (secs.) Cost
1 Skylake Core c5.xlarge 13,536 $0.64
36 Skylake Cores (OpenMP) c5.9xlarge 2,171 $0.92
36 Skylake Cores (MPI) c5.9xlarge 2,094 $0.89
36 Sklyake Cores (OpenACC) c5.9xlarge 1,995 $0.85
1 V100 GPU (OpenACC) p3.2xlarge 182 $0.15
4 V100 GPUs (OpenACC) p3.8xlarge 50 $0.17

Shutting Down Your Instances

One important item that bears repeating is that your instance continues to accrue charges as long as it is running. You should shut down (“stop”) your instances whenever you are not using them to avoid unnecessary fees. To do this, bring up the EC2 Dashboard, find the running instance you need to shut down in the list of instances, right click on it, and select Instance State followed by Stop. Figure 15 below illustrates these steps:

Stopping an Instance
Figure 15. Stopping a Running Instance

This can take a few minutes, so verify that it has stopped before closing your browser. The EC2 Dashboard should resemble Figure 16 below when your instance has reached the Stopped state:

Stopped Instance
Figure 16. Stopped Instance State

IMPORTANT NOTE: The default Terminate action means that the instance will be removed, and its associated root storage (EBS volume) will be deleted. Do not change your instance state to Terminate unless you are finished with your instance and wish to delete it.

Conclusion

Using the PGI AMI on AWS, you can access GPU-accelerated computing for very little investment. Using Amazon's EC2 cloud computing platform, we sped up a sample application from running in three and one-half hours on a single-core CPU, to just under a minute using four state-of-the-art NVIDIA Volta V100 GPUs. At the same time, accelerating the application with the GPU resulted in a significant cost savings.

Click me

This site uses cookies to store information on your computer. See our cookie policy for further details on how to block cookies.

X