Data Parallelism: How to Train Deep Learning Models on Multiple GPUs (DPHTDLM) – Outline

Detailed Course Outline

Introduction

  • Meet the instructor.
  • Create an account at courses.nvidia.com/join

Stochastic Gradient Descent and the Effects of Batch Size

  • Learn the significance of stochastic gradient descent when training on multiple GPUs
  • Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
  • Understand loss function, gradient descent, and stochastic gradient descent (SGD).
  • Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.

Training on Multiple GPUs with PyTorch Distributed Data Parallel (DDP)

  • Learn to convert single GPU training to multiple GPUs using PyTorch Distributed Data Parallel
  • Understand how DDP coordinates training among multiple GPUs.
  • Refactor single-GPU training programs to run on multiple GPUs with DDP.

Maintaining Model Accuracy when Scaling to Multiple GPUs

  • Understand and apply key algorithmic considerations to retain accuracy when training on multiple GPUs
  • Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
  • Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.

Workshop Assessment

  • Use what you have learned during the workshop: complete the workshop assessment to earn a certificate of competency

Final Review

  • Review key learnings and wrap up questions.
  • Take the workshop survey.