Running Spark on Amazon Web Services (AWS)

When you search thought the net looking for methods of running Apache Spark on AWS infrastructure you are most likely to be redirected to the documentation of AWS EMR (Elastic Map Reduce) service, which is Amazon's Hadoop distribution suited to run in AWS cloud environment. It's quite an easy way to deploy your data pipelines, but sometimes bootstrapping a huge cluster to perform simple ad-hoc analysis it's a cumbersome task. They say:

"to a man with a hammer everything looks like a nail" :)

and we felt into this trap with EMR once.

The article below describes two other ways of running Apache Spark jobs on AWS-managed infrastructure - AWS Glue and AWS Fargate - that we use on our clients' data warehousing projects. You will find there the key differences between these methods when it comes to flexibility and pricing, showing why there is no place for "one service fits all" approach in AWS world.

Check out!

big data

spark

AWS

Amazon Web Services

Last updated: 18 December 2019

Written by

Mariusz Strzelecki

Data Engineer

Want more? Check our articles

Tutorial

Dynamic SQL processing with Apache Flink

In this blog post, I would like to cover the hidden possibilities of dynamic SQL processing using the current Flink implementation. I will showcase a…

Tutorial

Introduction to GeoSpatial streaming with Apache Spark and Apache Sedona

We are producing more and more geospatial data these days. Many companies struggle to analyze and process such data, and a lot of this data comes…

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

A year is definitely a long enough time to see new trends or technologies that get more traction. The Big Data landscape changes increasingly fast…

Tutorial

Avoiding the mess in the Hadoop Cluster

This blog is based on the talk “Simplified Data Management and Process Scheduling in Hadoop” that we gave at the Big Data Technical Conference in…

Tutorial

Automated Machine Learning (AutoML) with BigQuery ML. Start Machine Learning easily and validate if ML is worth investing in or not.

Machine learning is becoming increasingly popular in many industries, from finance to marketing to healthcare. But let's face it, that doesn't mean ML…

Tutorial

Artificial Intelligence regulatory initiatives of EU countries

AI regulatory initiatives of EU countries On April 21, 2021, the EU Commission adopted a proposal for a regulation on artificial intelligence…

Running Spark on Amazon Web Services (AWS)

Like this post?
Spread the word

Want more? Check our articles

Dynamic SQL processing with Apache Flink

Introduction to GeoSpatial streaming with Apache Spark and Apache Sedona

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

Avoiding the mess in the Hadoop Cluster

Automated Machine Learning (AutoML) with BigQuery ML. Start Machine Learning easily and validate if ML is worth investing in or not.

Artificial Intelligence regulatory initiatives of EU countries

Contact us

Interested in our solutions?
Contact us!

Running Spark on Amazon Web Services (AWS)

Like this post?Spread the word

Want more? Check our articles

Dynamic SQL processing with Apache Flink

Introduction to GeoSpatial streaming with Apache Spark and Apache Sedona

5 main data-related trends to be covered at Big Data Tech Warsaw 2021. Part I.

Avoiding the mess in the Hadoop Cluster

Automated Machine Learning (AutoML) with BigQuery ML. Start Machine Learning easily and validate if ML is worth investing in or not.

Artificial Intelligence regulatory initiatives of EU countries

Contact us

Interested in our solutions?Contact us!

Like this post?
Spread the word

Interested in our solutions?
Contact us!