Zur Navigation springen (Enter drücken)
Zur Suche springen (Enter drücken)
Zum Kursangebot springen (Enter drücken)
Zum Seiteninhalt springen (Enter drücken)

+49 40 253346-10 Kontakt

DSH

Online Training

Dauer
4 Tage

Preis

2.970,– € (exkl. MwSt.)
3.534,30 € (inkl. 19% MwSt.)

Termine und Buchen

Termin anfragen

Classroom Training

Dauer
4 Tage

Preis

Deutschland:
2.970,– € (exkl. MwSt.)
3.534,30 € (inkl. 19% MwSt.)

Termine und Buchen

Termin anfragen

Onsite Training

Kurs anfragen

Cloudera

Cloudera Developer Training for Spark & Hadoop (DSH) – Details

Detaillierter Kursinhalt

Introduction to Apache Hadoop and the Hadoop Ecosystem

Introduction to Apache Hadoop and the Hadoop Ecosystem
Apache Hadoop Overview
Data Ingestion and Storage
Data Processing
Data Analysis and Exploration
Other Ecosystem Tools
Introduction to the Hands-On Exercises

Apache Hadoop File Storage

Apache Hadoop Cluster Components
HDFS Architecture
Using HDFS

Distributed Processing on an Apache Hadoop Cluster

YARN Architecture
Working With YARN

Apache Spark Basics

What is Apache Spark?
Starting the Spark Shell
Using the Spark Shell
Getting Started with Datasets and DataFrames
DataFrame Operations

Working with DataFrames and Schemas

Creating DataFrames from Data Sources
Saving DataFrames to Data Sources
DataFrame Schemas
Eager and Lazy Execution

Analyzing Data with DataFrame Queries

Querying DataFrames Using Column Expressions
Grouping and Aggregation Queries
Joining DataFrames

RDD Overview

RDD Overview
RDD Data Sources
Creating and Saving RDDs
RDD Operations

Transforming Data with RDDs

Writing and Passing Transformation Functions
Transformation Execution
Converting Between RDDs and DataFrames

Aggregating Data with Pair RDDs

Key-Value Pair RDDs
Map-Reduce
Other Pair RDD Operations

Querying Tables and Views with Apache Spark SQL

Querying Tables in Spark Using SQL
Querying Files and Views
The Catalog API
Comparing Spark SQL, Apache Impala, and Apache Hive-on-Spark

Working with Datasets in Scala

Datasets and DataFrames
Creating Datasets
Loading and Saving Datasets
Dataset Operations

Writing, Configuring, and Running Apache Spark Applications

Writing a Spark Application
Building and Running an Application
Application Deployment Mode
The Spark Application Web UI
Configuring Application Properties

Distributed Processing

Review: Apache Spark on a Cluster
RDD Partitions
Example: Partitioning in Queries
Stages and Tasks
Job Execution Planning
Example: Catalyst Execution Plan
Example: RDD Execution Plan

Distributed Data Persistence

DataFrame and Dataset Persistence
Persistence Storage Levels
Viewing Persisted RDDs

Common Patterns in Apache Spark Data Processing

Common Apache Spark Use Cases
Iterative Algorithms in Apache Spark
Machine Learning
Example: k-means

Apache Spark Streaming: Introduction to DStreams

Apache Spark Streaming Overview
Example: Streaming Request Count
DStreams
Developing Streaming Applications

Apache Spark Streaming: Processing Multiple Batches

Multi-Batch Operations
Time Slicing
State Operations
Sliding Window Operations
Preview: Structured Streaming

Apache Spark Streaming: Data Sources

Streaming Data Source Overview
Apache Flume and Apache Kafka Data Sources
Example: Using a Kafka Direct Data Source

Kontakt