Accelerating Data Engineering Pipelines (ADEP) – Outline

Detailed Course Outline

Introduction

  • Meet the instructor.
  • Create an account at courses.nvidia.com/join

Data on the Hardware Level

  • Explore the strengths and weaknesses of different hardware approaches to data and the frameworks that support them:
    • Pandas
    • CuDF
    • Dask

ETL with NVTabular

  • Learn how to scale an ETL pipeline from 1 GPU to many with NVTabular through the perspective of a big data recommender system.
    • Transform raw json into analysis-ready parquet files
    • Learn how to quickly add features to a dataset, such as Categorify and Lambda operators

Data Visualization

  • Step into the shoes of a meteorologist and learn how to plot precipitation data on a map.
  • Learn how to use descriptive statistics and plots like histograms in order to assess data quality
  • Learn effective memory usage, so users can quickly filter data through a graphical interface

Final Project: Data Detective

  • Users are complaining that the dashboard is too slow. Apply the techniques learned in class to find and eliminate efficiencies in the backend code

Final Review

  • Review key learnings and answer questions.
  • Complete the assessment and earn your certificate.
  • Complete the workshop survey.
  • Learn how to set up your own AI application development environment.