Big Data Essentials Immersive
This three day course provides the student with the necessary knowledge of Hadoop, Spark, and NoSQL as used with Big Data. With these three programs the attendee will be able to build systems processing massive amounts of data. The class also lays the foundation for proper analytics, allowing to extract insights from data.
3 days - $1,895.00
Course taught by an expert Big Data Instructor.
Prerequisites:
Experience with at least one programming language is essential. Working with a command-line interface is also required.
Course Outline
Big Data Overview
Big Data
Big Data Use Cases
Designing a Big Data System
Technologies: Hadoop
Technologies: NoSQL
Analytics
Putting It All Together
Hadoop Introduction
Introduction to Hadoop
The Future of Hadoop
HDFS and MapReduce Primer
HDFS
MapReduce
YARN
Future of Hadoop Processing Engines
Hive
Hadoopy, Hive, and SQL
Hive Design and Architecture
HiveQL
First Look at Hive
Hive Partitions
Hive Joins
Hive UDFs
Text Analytics with Hive
Hive 2
Data Access
Feature Generation
Filter/Search/Transpose
Binning and Smoothing
Tez
Pig
Understand Apache Pig
Pig Concepts/History
Pig by Example
Pig as an ETL Pipeline
Hadoop Cluster Planning
Planning Hadoop Hardware
Planning Software Install
Hadoop Install and Configure
Different Installation Configurations in Hadoop
Install Hadoop
Configure Hadoop Cluster
Common Configuration Properties
Making Installation and Configuration Easier
Hadoop Advanced Configuration
Hadoop Data Ingest
Flume
Sqoop
REST
Import Best Practices
NoSQL Introduction
RDBMS and NoSQL
ACID in NoSQL
CAP Theorem
NoSQL Stores
Columnar Storage
Cassandra Introduction
Introduction & Architecture
Cassandra Use Cases
Data Organization
First Look at Cassandra
Replication & Consistency
Cassandra Data Modeling 1
Keyspaces and Tables
CQL Queries
Indexing
Cassandra Data Modeling 2
Collections
Composite Keys
Time Series Data
Counters
Lightweight Transactions
Cassandra Data Modeling Labs
MyFlix (Netflix)
YouTube
Online Shopping (Amazon)
User Activity (Facebook)
Scala Primer
Introduction
Collections
Functions/Methods
Class/Object/Trait
Introduction to Spark
Introduction
Spark vs. Hadoop
A First Look at Spark
Spark Data Model 1
Data Model Overview
RDD Concepts
Spark Workflow
Working with RDDs
Key-Value Pairs
Caching
Spark Data Model 2
DataFrames
Working with DataFrames
Spark SQL
DataSet
Spark and Hive
Data Formats
Spark API/Applications
Core API
Building and Running Applications
Application Lifecycle
Logging & Debugging
Machine Learning Primer
Machine Learning Concepts
Machine Learning Vocabulary
Text Mining
Recommendations
Spark Streaming
Streaming
Spark Streaming Overview
Architecture
Programming
Structured Streaming
Transformations
Apache Kafka