Accelerating CUDA C++ Applications with Multiple GPUs (ACCAMG) – Outline

Detailed Course Outline

Introduction

  • Meet the instructor.
  • Create an account at courses.nvidia.com/join

Using JupyterLab

  • Get familiar with your GPU-accelerated interactive JupyterLab environment.

Application Overview

  • Orient yourself with a single GPU CUDA C++ application that will be the starting point for the course.
  • Observe the current performance of the single GPU CUDA C++ application using Nsight Systems.

Introduction to CUDA Streams

  • Learn the rules that govern concurrent CUDA stream behavior.
  • Use multiple CUDA streams to perform concurrent host-to-device and device-to-host memory transfers.
  • Utilize multiple CUDA streams for launching GPU kernels.
  • Observe multiple streams in the Nsight Systems Visual Profiler timeline view.

Copy/Compute Overlap with CUDA Streams

  • Learn the key concepts for effectively performing copy/compute overlap.
  • Explore robust indexing strategies for the flexible use of copy/compute overlap in applications.
  • Refactor the single-GPU CUDA C++ application to perform copy/compute overlap.
  • See copy/compute overlap in the Nsight Systems visual profiler timeline.

Multiple GPUs with CUDA C++

  • Learn the key concepts for effectively using multiple GPUs on a single node with CUDA C++.
  • Explore robust indexing strategies for the flexible use of multiple GPUs in applications.
  • Refactor the single-GPU CUDA C++ application to utilize multiple GPUs.
  • See multiple-GPU utilization in the Nsight Systems Visual Profiler timeline.

Copy/Compute Overlap with Multiple GPUs

  • Learn the key concepts for effectively performing copy/compute overlap on multiple GPUs.
  • Explore robust indexing strategies for the flexible use of copy/compute overlap on multiple GPUs.
  • Refactor the single-GPU CUDA C++ application to perform copy/compute overlap on multiple GPUs.
  • Observe performance benefits for copy/compute overlap on multiple GPUs.
  • See copy/compute overlap on multiple GPUs in the Nsight Systems visual profiler timeline.

Course Assessment

Final Review

  • Review key learnings.
  • Learn to build your own training environment from the DLI base environment container.
  • Complete the workshop survey.