Big Data Essentials Immersive

This three day course provides the student with the necessary knowledge of Hadoop, Spark, and NoSQL as used with Big Data. With these three programs the attendee will be able to build systems processing massive amounts of data. The class also lays the foundation for proper analytics, allowing to extract insights from data.

3 days - $1,895.00

Course taught by an expert Big Data Instructor.

Prerequisites:

Experience with at least one programming language is essential. Working with a command-line interface is also required.

Course Outline

Big Data Overview
Big Data
Big Data Use Cases
Designing a Big Data System
Technologies: Hadoop
Technologies: NoSQL
Analytics
Putting It All Together

Hadoop Introduction
Introduction to Hadoop
The Future of Hadoop

HDFS and MapReduce Primer
HDFS
MapReduce
YARN
Future of Hadoop Processing Engines

Hive
Hadoopy, Hive, and SQL
Hive Design and Architecture
HiveQL
First Look at Hive
Hive Partitions
Hive Joins
Hive UDFs
Text Analytics with Hive

Hive 2
Data Access
Feature Generation
Filter/Search/Transpose
Binning and Smoothing
Tez

Pig
Understand Apache Pig
Pig Concepts/History
Pig by Example
Pig as an ETL Pipeline

Hadoop Cluster Planning
Planning Hadoop Hardware
Planning Software Install

Hadoop Install and Configure
Different Installation Configurations in Hadoop
Install Hadoop
Configure Hadoop Cluster
Common Configuration Properties
Making Installation and Configuration Easier
Hadoop Advanced Configuration

Hadoop Data Ingest
Flume
Sqoop
REST
Import Best Practices

NoSQL Introduction
RDBMS and NoSQL
ACID in NoSQL
CAP Theorem
NoSQL Stores
Columnar Storage

Cassandra Introduction
Introduction & Architecture
Cassandra Use Cases
Data Organization
First Look at Cassandra
Replication & Consistency

Cassandra Data Modeling 1
Keyspaces and Tables
CQL Queries
Indexing

Cassandra Data Modeling 2
Collections
Composite Keys
Time Series Data
Counters
Lightweight Transactions

Cassandra Data Modeling Labs
MyFlix (Netflix)
YouTube
Online Shopping (Amazon)
User Activity (Facebook)

Scala Primer
Introduction
Collections
Functions/Methods
Class/Object/Trait

Introduction to Spark
Introduction
Spark vs. Hadoop
A First Look at Spark

Spark Data Model 1
Data Model Overview
RDD Concepts
Spark Workflow
Working with RDDs
Key-Value Pairs
Caching

Spark Data Model 2
DataFrames
Working with DataFrames
Spark SQL
DataSet
Spark and Hive
Data Formats

Spark API/Applications
Core API
Building and Running Applications
Application Lifecycle
Logging & Debugging

Machine Learning Primer
Machine Learning Concepts
Machine Learning Vocabulary
Text Mining
Recommendations

Spark Streaming
Streaming
Spark Streaming Overview
Architecture
Programming
Structured Streaming
Transformations
Apache Kafka