Description: Present-day high-performance computing (HPC) and deep learning applications benefit from, and even require, cluster-scale GPU compute power. Writing CUDA® applications that can correctly and efficiently utilize GPUs across a cluster requires a distinct set of skills. In this workshop, you will learn the tools and techniques needed to write CUDA C++ applications that can scale efficiently to clusters of NVIDIA GPUs.
You’ll do this by working on code from several CUDA C++ applications in an interactive cloud environment backed by several NVIDIA GPUs. You’ll gain exposure to a handful of multi-GPU programming methods, including CUDA-aware Message Passing Interface (MPI), before proceeding to the main focus of this course, NVSHMEM™.
NVSHMEM is a parallel programming interface based on OpenSHMEM that provides efficient and scalable communication for NVIDIA GPU clusters. NVSHMEM creates a global address space for data that spans the memory of multiple GPUs and can be accessed with fine-grained GPU-initiated operations, CPU-initiated operations, and operations on CUDA streams. NVSHMEM’s asynchronous, GPU-initiated data transfers eliminate synchronization overheads between the CPU and the GPU. They also enable long-running kernels that include both communication and computation, reducing overheads that can limit an application’s performance when strong scaling.
At the end of the workshop, participants can obtain an official certificate from Deep Learning Institute from NVIDIA.
Workflow: The workshop takes place remotely via a browser on the AWS cloud infrastructure.
Difficulty: Basic
Language: English
Target audience: HPC developers using CUDA in the network or cloud.
Prerequisite knowledge: Intermediate experience writing CUDA C/C++ applications.
Skills to be gained:
By participating in this workshop, you’ll learn how to:
– Use concurrent CUDA Streams to overlap memory transfers with GPU computation.
– Utilize all available GPUs on a single node to scale workloads across all available GPUs.
– Combine the use of copy/compute overlap with multiple GPUs.
– Rely on the NVIDIA ® Nsight TM Systems Visual Profiler timeline to observe improvement opportunities and the impact of the techniques covered in the workshop.