Amazon EMR supplies a managed service to simply run analytics purposes utilizing open-source frameworks resembling Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto contains optimizations that present over twice the efficiency enhancements in comparison with open-source Apache Spark and Presto.
With Amazon EMR launch 6.7, now you can use Amazon Elastic Compute Cloud (Amazon EC2) C7g cases, which use the AWS Graviton3 processors. These cases enhance price-performance of working Spark workloads on Amazon EMR by 7.93–13.35% over earlier technology cases, relying on the occasion measurement. On this publish, we describe how we estimated the price-performance profit.
Amazon EMR runtime efficiency with EC2 C7g cases
We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.9 utilizing the Amazon EMR runtime for Apache Spark (suitable with Apache Spark 3.3) with C7g cases. Knowledge was saved in Amazon Easy Storage Service (Amazon S3), and outcomes had been in comparison with equal C6g clusters from the earlier technology occasion household. We measured efficiency enhancements utilizing the overall question runtime and geometric imply of the question runtime throughout TPC-DS 3 TB benchmark queries.
Our outcomes confirmed 13.65–18.73% enchancment in complete question runtime efficiency and 16.98–20.28% enchancment in geometric imply on EMR clusters with C7g in comparison with equal EMR clusters with C6g cases, relying on the occasion measurement. In evaluating prices, we noticed 7.93–13.35% discount in value on the EMR cluster with C7g in comparison with the equal with C6g, relying on the occasion measurement. We didn’t benchmark the C6g xlarge occasion as a result of it didn’t have adequate reminiscence to run the queries.
The next desk reveals the outcomes from working the TPC-DS 3 TB benchmark queries utilizing Amazon EMR 6.9 in comparison with equal C7g and C6g occasion EMR clusters.
| Occasion Dimension | 16 XL | 12 XL | 8 XL | 4 XL | 2 XL |
| Whole measurement of the cluster (1 chief + 5 core nodes) | 6 | 6 | 6 | 6 | 6 |
| Whole question runtime on C6g (seconds) | 2774.86205 | 2752.84429 | 3173.08086 | 5108.45489 | 8697.08117 |
| Whole question runtime on C7g (seconds) | 2396.22799 | 2336.28224 | 2698.72928 | 4151.85869 | 7249.58148 |
| Whole question runtime enchancment with C7g | 13.65% | 15.13% | 14.95% | 18.73% | 16.64% |
| Geometric imply question runtime C6g (seconds) | 22.2113 | 21.75459 | 23.38081 | 31.97192 | 45.41656 |
| Geometric imply question runtime C7g (seconds) | 18.43905 | 17.65898 | 19.01684 | 25.48695 | 37.43737 |
| Geometric imply question runtime enchancment with C7g | 16.98% | 18.83% | 18.66% | 20.28% | 17.57% |
| EC2 C6g occasion worth ($ per hour) | $2.1760 | $1.6320 | $1.0880 | $0.5440 | $0.2720 |
| EMR C6g occasion worth ($ per hour) | $0.5440 | $0.4080 | $0.2720 | $0.1360 | $0.0680 |
| (EC2 + EMR) occasion worth ($ per hour) | $2.7200 | $2.0400 | $1.3600 | $0.6800 | $0.3400 |
| Price of working on C6g ($ per occasion) | $2.09656 | $1.55995 | $1.19872 | $0.96493 | $0.82139 |
| EC2 C7g occasion worth ($ per hour) | $2.3200 | $1.7400 | $1.1600 | $0.5800 | $0.2900 |
| EMR C7g worth ($ per hour per occasion) | $0.5800 | $0.4350 | $0.2900 | $0.1450 | $0.0725 |
| (EC2 + EMR) C7g occasion worth ($ per hour) | $2.9000 | $2.1750 | $1.4500 | $0.7250 | $0.3625 |
| Price of working on C7g ($ per occasion) | $1.930290 | $1.411500 | $1.086990 | $0.836140 | $0.729990 |
| Whole value discount with C7g together with efficiency enchancment | -7.93% | -9.52% | -9.32% | -13.35% | -11.13% |
The next graph reveals per-query enhancements noticed on C7g 2xlarge cases in comparison with equal C6g generations.

Benchmarking methodology
The benchmark used on this publish is derived from the industry-standard TPC-DS benchmark, and makes use of queries from the Spark SQL Efficiency Exams GitHub repo with the next fixes utilized.
We calculated TCO by multiplying value per hour by variety of cases within the cluster and time taken to run the queries on the cluster. We used on-demand pricing within the US East (N. Virginia) Area for all cases.
Conclusion
On this publish, we described how we estimated the cost-performance profit from utilizing Amazon EMR with C7g cases in comparison with utilizing equal earlier technology cases. Utilizing these new cases with Amazon EMR improves cost-performance by a further 7–13%.
In regards to the authors
Al MS is a product supervisor for Amazon EMR at Amazon Net Providers.
Kyeonghyun Ryoo is a Software program Improvement Engineer for EMR at Amazon Net Providers. He primarily works on designing and constructing automation instruments for inner groups and prospects to maximise their productiveness. Exterior of labor, he’s a retired world champion in skilled gaming who nonetheless take pleasure in enjoying video video games.
Yuzhou Solar is a software program growth engineer for EMR at Amazon Net Providers.
Steve Koonce is an Engineering Supervisor for EMR at Amazon Net Providers.