English | MP4 | AVC 1920×1080 | AAC 44KHz 2ch | 29 Lessons (2h 20m) | 570 MB
Get hands-on with Apache Spark and PySpark by learning how to build scalable, high-performance data pipelines using the DataFrame API, Spark jobs, joins, aggregations, and more.
What you’ll learn
- Learn the skills and real-world tools used by Data Engineers and become top 10% in your field
- Set up Apache Spark and configure your local or cloud environment for big data processing
- Write efficient PySpark code to handle, transform, and analyze large-scale datasets
- Use DataFrames to manipulate data in a distributed computing environment
- Build scalable data pipelines that integrate multiple transformation and aggregation steps
- Create a strong foundation for a career in Data Engineering, Data Science, and AI/ML
Table of Contents
1 Introduction
2 [Optional] What Is a Virtualenv?
3 Apache Spark
4 How Spark Works
5 Spark Application
6 DataFrames
7 Installing Spark
8 Inside Airbnb Data
9 Writing Your First Spark Job
10 Lazy Processing
11 [Exercise] Basic Functions
12 [Exercise] Basic Functions – Solution
13 Aggregating Data
14 Joining Data
15 Aggregations and Joins with Spark
16 Complex Data Types
17 [Exercise] Aggregate Functions
18 [Exercise] Aggregate Functions – Solution
19 User Defined Functions
20 Data Shuffle
21 Data Accumulators
22 Optimizing Spark Jobs
23 Submitting Spark Jobs
24 Other Spark APIs
25 Spark SQL
26 [Exercise] Advanced Spark
27 [Exercise] Advanced Spark – Solution
28 Summary
29 Let’s Keep Learning Together!
Resolve the captcha to access the links!
