Splunk for Analytics and Data Science (SADS) – Outline

Detailed Course Outline

Topic 1 – Analytics Workflow

  • Define terms related to analytics and data science
  • Describe the analytics workflow
  • Describe common usage scenarios
  • Navigate Splunk Machine Learning Toolkit

Topic 2 – Training and Testing Models

  • Split data for tes7ng and training using the sample command
  • Describe the fit and apply commands
  • Use the score command to evaluate models

Topic 3 – Regression: Predict Numerical Values

  • Differentiate predictions from estimates
  • Identify prediction algorithms and assumptions
  • Model numeric predictions in the MLTK and Splunk Enterprise

Topic 4 – Clean and Preprocess the Data

  • Define preprocessing and describe its purpose
  • Describe algorithms that preprocess data for use in models
  • Use FieldSelector to choose relevant fields
  • Normalize data with StandardScaler and RobustScaler
  • Preprocess text using Imputer, NPR, TF-IDF, and HashingVectorizer

Topic 5 – Clustering

  • Define Clustering
  • Identify clustering methods, algorithms, and use cases
  • Use Smart Clustering Assistant to cluster data
  • Evaluate clusters using silhouette score
  • Validate cluster coherence
  • Describe clustering best practices

Topic 6 – Forecasting Fields

  • Differentiate predictions from forecasts
  • Use the Smart Forecasting Assistant
  • Use the StateSpaceForecast algorithm
  • Forecast multivariate data
  • Account for periodicity in each time series

Topic 7 – Detect Anomalies

  • Define anomaly detection and outliers
  • Identify anomaly detection use cases
  • Use Splunk Machine Learning Toolkit Smart Outlier Assistant
  • Detect anomalies using the Density Function algorithm
  • View results with the Distribution Plot visualization

Topic 8 – Classify: Predict Categorical Values

  • Define key classification terms
  • Identify when to use different classification algorithms
  • Evaluate classifier tradeoffs
  • Evaluate results of multiple algorithms