Apache Spark Fundamentals
placeVeenendaal 18 feb. 2026 tot 19 feb. 2026Toon rooster event 18 februari 2026, 09:00-16:00, Veenendaal event 19 februari 2026, 09:00-16:00, Veenendaal |
Meer weten over de onderwerpen die aan bod komen en de vereiste voorkennis? Neem vrijblijvend contact met ons op.
Get started processing data with Apache Spark and PySpark
Description
With the rise of cloud computing, distributed storage and (big) data processing, many organisations are starting to use Apache Spark for their data processes. Whether it is for data science, data analysis or data engineering, Apache Spark can be the right tool for the job. It is a foundation under Azure Synapse Analytics, Microsoft Fabric and Databricks.
This training aims to walk you through the fundamentals of working with Apache Spark, starting with what it is and how it works. You will then continue to read, transform and write data using PySpark.
Finally, to make sure your code can be safely used in production, there …
Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.
Meer weten over de onderwerpen die aan bod komen en de vereiste voorkennis? Neem vrijblijvend contact met ons op.
Get started processing data with Apache Spark and PySpark
Description
With the rise of cloud computing, distributed storage and (big) data processing, many organisations are starting to use Apache Spark for their data processes. Whether it is for data science, data analysis or data engineering, Apache Spark can be the right tool for the job. It is a foundation under Azure Synapse Analytics, Microsoft Fabric and Databricks.
This training aims to walk you through the fundamentals of working with Apache Spark, starting with what it is and how it works. You will then continue to read, transform and write data using PySpark.
Finally, to make sure your code can be safely used in production, there will be an added focus on using development best practices.
Subjects
1: About SparkWhat is Spark, where did it come from, why was it created? And how does it work?
Lessons
- History of Apache Spark
- Technical Architecture (Driver, Cluster Manager, Executors)
- RDD and Dataframe
- Pyspark
- Benefits of using Spark
- Running Spark locally
After completing this module, students will be able to:
- Explain how Spark works
To work with data, we first need to retrieve it from wherever it is located. This is done through spark.read.
Lessons
- spark.read
- read options
- read modes
- Using regex in the filepath(s)
Lab
- Read your first files in Spark
After completing this module, students will be able to:
- Read data using PySpark
After retrieving our data we need to perform transformations on it. Operations such as joins, filters, grouping, aggregating, splitting and renaming are necessary in most data pipelines. How do they work in Spark?
Lessons
- Filtering
- Narrow and broad transformations
- Column operations
- JSON transformations
- Window functions
- UDF and Lambdas
Lab
- Perform transformations with PySpark
After completing this module, students will be able to:
- Transform data using PySpark
After completing the necessary transformations in memory, it is time to write our data to our target location. This may sound like a plain operation, but there are things to consider such as file formats and partitioning.
Lessons
- Common file formats
- Apache Parquet
- Delta Lake
- Data partitioning
- Bucketing
Lab
- Write data with PySpark, with partitions and buckets
After completing this module, students will be able to:
- Write data using PySpark
All we need to do with data is reading, transforming and writing it. But the code we use to do that needs to be maintained. For this, we need to use development best practices. Some of them are general, others are specific to Apache Spark.
Lessons
- Notebooks for Development, python files for production
- Modularization
- Logging
- Error Handling
- Testing
- Continuous Integration
Lab
- Read, clean, transform and write data using development best practices for production ready code
After completing this module, students will be able to:
- Write PySpark code following development best practices
Er zijn nog geen veelgestelde vragen over dit product. Als je een vraag hebt, neem dan contact op met onze klantenservice.
