I spent 8 hours understanding Apache Spark’s memory management | by Vu Trinh | Feb, 2026 | Medium

Sitemap

Open in app

Sign up

Sign in

Write

Search

Sign up

Sign in

Image 1

Mastodon

Member-only story

I spent 8 hours understanding Apache Spark’s memory management

Here’s everything you need to know

Image 2: Vu Trinh

Vu Trinh

Follow

9 min read

·

4 days ago

126

3

Listen

Share

Image 3

Intro

In 2009, UC Berkeley’s AMPLab developed Spark.

At that time, MapReduce was the go-to choice for processing massive datasets across multiple machines. AMPLab observed that cluster computing had significant potential.

However, MapReduce made building large applications inefficient, especially for machine learning (ML) tasks requiring multiple data passes.

For example, the ML algorithm might need to make many passes over the data. With MapReduce, each pass must be written as a separate job and launched individually on the cluster.

They created Spark. Unlike MapReduce, which writes data to disks after every task, Spark relies on memory processing.

With a more friendly API, supporting wide use cases, and especially efficient in-memory processing, Spark has gained increasing attention and become the dominant solution in data processing.

But, do you know how Spark manages the memory?

This week, I will try to answer this question in the following text. We will revisit some Spark basics before diving into Spark’s memory management.

A Spark Application

Create an account to read the full story.

The author made this story available to Medium members only.

If you’re new to Medium, create a new account to read this story on us.

Continue in app

Or, continue in mobile web

Sign up with Google

Sign up with Facebook

Sign up with email

Already have an account? Sign in

126

126

3

Image 4: Vu Trinh

Image 5: Vu Trinh

Follow

Written by Vu Trinh -------------------

29K followers

·74 following

Follow for practical data engineering articles with self-created illustrations. No AI-writing content

Follow

Responses (3)

Image 6

Write a response

What are your thoughts?

Cancel

Respond

Image 7: Akiladba

Akiladba

3 days ago

Well explained

Reply

Image 8: Memphis Meng

Memphis Meng

1 day ago

Thanks for saving my 80 hours.

Reply

Image 9: Chetan Ambi

Chetan Ambi

1 day ago

I’m a regular reader of content. Awesome stuff as usual. Which software do you use for creating the images.

1 reply

Reply

More from Vu Trinh

Image 10: I spent 5 hours learning Unity Catalog. Here’s everything you need to know.

Image 11: Data Engineer Things

In

Data Engineer Things

by

Vu Trinh

I spent 5 hours learning Unity Catalog. Here’s everything you need to know. --------------------------------------------------------------------------- The famous catalog service from Databricks, and it was open-sourced

Jan 21

2

Image 12: The new observability stack war in 2026

Image 13: Data Engineer Things

In

Data Engineer Things

by

Yingjun Wu

The new observability stack war in 2026 --------------------------------------- For years, SRE/DevOps and infra felt like two separate lanes.

Jan 12

3

Image 14: Top 10 Data Engineering Projects That Actually Get You Hired

Image 15: Data Engineer Things

In

Data Engineer Things

by

B V Sarath Chandra

Top 10 Data Engineering Projects That Actually Get You Hired ------------------------------------------------------------ Most beginners build projects that look great on YouTube thumbnails but are useless on resumes.

Dec 15, 2025

2

Image 16: To start the DE career again, I will keep these 4 things in mind

Image 17: Data Engineer Things

In

Data Engineer Things

by

Vu Trinh

To start the DE career again, I will keep these 4 things in mind ---------------------------------------------------------------- To break into the field quickly and grow more efficiently.

Jan 8

5

See all from Vu Trinh

Image 18: Databricks Just Dropped 22 Game-Changing Features in January 2026 — Here’s What You’re Missing

Image 19: Reliable Data Engineering

Reliable Data Engineering

Databricks Just Dropped 22 Game-Changing Features in January 2026 — Here’s What You’re Missing ---------------------------------------------------------------------------------------------- If you’re still running Databricks like it’s 2025, you’re leaving money, time, and competitive advantage on the table

6d ago

Image 20: Data Engineering Design Patterns You Must Learn in 2026

Image 21: AWS in Plain English

In

AWS in Plain English

by

Khushbu Shah

Data Engineering Design Patterns You Must Learn in 2026 ------------------------------------------------------- These are the 8 data engineering design patterns every modern data stack is built on. Learn them once, and every data engineering tool…

Jan 5

17

Image 22: Screenshot of a desktop with the Cursor application open

Image 23: Jacob Bennett

Jacob Bennett

The 5 paid subscriptions I actually use in 2026 as a Staff Software Engineer ---------------------------------------------------------------------------- Tools I use that are (usually) cheaper than Netflix

Jan 19

57

Image 24: 6 brain images

Image 25: Write A Catalyst

In

Write A Catalyst

by

Dr. Patricia Schmidt

As a Neuroscientist, I Quit These 5 Morning Habits That Destroy Your Brain -------------------------------------------------------------------------- Most people do #1 within 10 minutes of waking (and it sabotages your entire day)

Jan 14

436

Image 26: Stop Memorizing Design Patterns: Use This Decision Tree Instead

Image 27: Women in Technology

In

Women in Technology

by

Alina Kovtun✨

Stop Memorizing Design Patterns: Use This Decision Tree Instead --------------------------------------------------------------- Choose design patterns based on pain points: apply the right pattern with minimal over-engineering in any OO language.

Jan 29

9

Image 28: LinkedIn Is Replacing Kafka — Here’s Why the Streaming Giant is Moving On

Image 29: Cloud With Azeem

Cloud With Azeem

LinkedIn Is Replacing Kafka — Here’s Why the Streaming Giant is Moving On ------------------------------------------------------------------------- Inside LinkedIn’s Bold Move to a New Data Pipeline That Could Change the Future of Real-Time Streaming

Jan 3

15

See more recommendations

Help

Status

About

Careers

Press

Blog

Privacy

Rules

Terms

Text to speech