Amazon EMR offers a managed service to simply run analytics functions utilizing open-source frameworks similar to Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto consists of optimizations that present over 2x efficiency enhancements over open-source Apache Spark and Presto.
With Amazon EMR launch 6.8, now you can use Amazon Elastic Compute Cloud (Amazon EC2) situations similar to M6A and C6A, which use the third technology AMD EPYC processors. These situations enhance the value efficiency of working Spark workloads on Amazon EMR by 15–50 p.c over earlier technology situations. On this weblog put up, we describe how we estimated this worth efficiency profit.
Amazon EMR runtime efficiency with EC2 M6A situations
We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.8 utilizing Amazon EMR runtime for Apache Spark (appropriate with Apache Spark 3.3) with M6a situations. Knowledge was saved in Amazon Easy Storage Service (Amazon S3), and outcomes have been in comparison with equal clusters with M5a, which is the earlier technology occasion household. We measured efficiency enhancements utilizing the entire question runtime and the geometric imply of question runtime throughout TPC-DS 3 TB benchmark queries.
Our outcomes confirmed a 23.6–50.3 p.c enchancment in complete question runtime efficiency and 22.8–52.4 p.c in geometric imply on an EMR cluster with M6a in comparison with an equal EMR cluster with M5a situations. In evaluating prices, we noticed a 23.2–41.4 p.c discount in value on the EMR cluster with M6a in comparison with the equal with M5a. M6A 48 XL and 32 XL situations weren’t benchmarked as a result of the M5A technology doesn’t supply equal sizes.
The next desk reveals the outcomes from working TPC-DS 3 TB benchmark queries utilizing Amazon EMR 6.8 over equal M6a and M5a occasion EMR clusters.
| Occasion Dimension | 24 XL | 16 XL | 12 XL | 8 XL | 4 XL | 2 XL | XL |
| Complete dimension of the cluster (1 Chief + 5 core nodes) | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| Complete question runtime on M5A (seconds) | 6624.1713838714 | 5466.7251180433 | 5269.0578151495 | 5366.1486275129 | 7753.6218015794 | 12118.0922180235 | 21070.6905510002 |
| Complete question runtime on M6A (seconds) | 3295.2894058371 | 3063.7807673078 | 3399.1509249577 | 3482.8401591909 | 4906.2216891762 | 9184.4366036450 | 16107.9707619002 |
| Complete question runtime enchancment with M6A | 50.25% | 43.96% | 35.49% | 35.10% | 36.72% | 24.21% | 23.55% |
| Geometric imply question runtime M5A (sec) | 51.1422829354 | 40.9550798753 | 38.4890223194 | 35.3863834186 | 44.8454957416 | 61.0454658020 | 92.6414502105 |
| Geometric imply question runtime M6A (sec) | 24.3406154481 | 22.3484713891 | 22.9913163520 | 23.0351017440 | 28.2855683398 | 46.4363267349 | 71.5498816854 |
| Geometric imply question runtime enchancment with M6A | 52.41% | 45.43% | 40.27% | 34.90% | 36.93% | 23.93% | 22.77% |
| EC2 M5A occasion worth ($ per hour) | $4.12800 | $2.75200 | $2.06400 | $1.37600 | $0.68800 | $0.34400 | $0.17200 |
| EMR M5A occasion worth ($ per hour) | $0.27000 | $0.27000 | $0.27000 | $0.27000 | $0.17200 | $0.08600 | $0.04300 |
| (EC2 + EMR) M5A occasion worth ($ per hour) | $4.39800 | $3.02200 | $2.33400 | $1.64600 | $0.86000 | $0.43000 | $0.21500 |
| Price of working on M5A ($ per occasion) | $8.09253 | $4.58901 | $3.41611 | $2.45352 | $1.85225 | $1.44744 | $1.25839 |
| EC2 M6A occasion worth ($ per hour) | $4.14720 | $2.76480 | $2.07360 | $1.38240 | $0.69120 | $0.34560 | $0.17280 |
| EMR M6A worth ($ per hour per occasion) | $1.03680 | $0.69120 | $0.51840 | $0.34560 | $0.17280 | $0.08640 | $0.04320 |
| (EC2 + EMR) M6A occasion worth ($ per hour) | $5.18400 | $3.45600 | $2.59200 | $1.72800 | $0.86400 | $0.43200 | $0.21600 |
| Price of working on M6A ($ per occasion) | $4.74522 | $2.94123 | $2.44739 | $1.67176 | $1.17749 | $1.10213 | $0.96648 |
| Complete value discount with M6A together with efficiency enchancment | -41.36% | -35.91% | -28.36% | -31.86% | -36.43% | -23.86% | -23.20% |
The next graph reveals per question enhancements noticed on M6a 2XL situations in comparison with equal M5a technology. We noticed that two queries take longer to execute on M6a occasion clusters in comparison with M5a occasion clusters. Q91 regressed as much as 6.64 p.c and Q55 regressed as much as 1.86 p.c on 4 XL occasion clusters.
Amazon EMR runtime efficiency with EC2 R6A situations
R6A situations confirmed an analogous efficiency enchancment whereas working Apache Spark workloads in comparison with equal R5A situations. R6A 32XL and 48XL situations weren’t benchmarked since R5A situations do not need 32XL and 48XL sizes obtainable. Our outcomes confirmed 16–58.22 p.c enchancment in complete question runtime for seven completely different occasion sizes inside the occasion household and 20.04–59.59 p.c enchancment in geometric imply. In evaluating prices, we noticed 15.85–-50.07 p.c discount in value on R6A occasion EMR clusters in comparison with R5A EMR occasion clusters.
The next desk reveals the outcomes from working TPC-DS 3 TB benchmark queries utilizing Amazon EMR 6.8 over equal R6A and R5A occasion EMR clusters.
| Occasion Dimension | 24 XL | 16 XL | 12 XL | 8 XL | 4 XL | 2 XL | XL |
| Complete dimension of the cluster (1 Chief + 5 core nodes) | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| Complete question runtime on R5A (seconds) | 6934.22936 | 5530.74672 | 5834.32344 | 5718.72582 | 7615.58392 | 11431.37368 | 20688.58642 |
| Complete question runtime on R6A (seconds) | 2897.44817 | 2906.49952 | 3017.85315 | 3488.83875 | 4661.32856 | 7717.33575 | 17378.49043 |
| Complete question runtime enchancment with R6A | 58.22% | 47.45% | 48.27% | 38.99% | 38.79% | 32.49% | 16.00% |
| Geometric imply question runtime R5A (sec) | 53.27574 | 41.76973 | 42.50324 | 37.62155 | 44.58173 | 58.88182 | 91.72095 |
| Geometric imply question runtime R6A (sec) | 21.52803 | 21.36831 | 19.94607 | 21.59493 | 26.90097 | 36.57557 | 73.3405 |
| Geometric imply question runtime enchancment with R6A | 59.59% | 48.84% | 53.07% | 42.60% | 39.66% | 37.88% | 20.04% |
| EC2 R5A occasion worth ($ per hour) | $5.42400 | $3.61600 | $2.71200 | $1.80800 | $0.90400 | $0.45200 | $0.22600 |
| EMR R5A occasion worth ($ per hour) | $0.27000 | $0.27000 | $0.27000 | $0.27000 | $0.22600 | $0.11300 | $0.05700 |
| (EC2 + EMR) R5A occasion worth ($ per hour) | $5.69400 | $3.88600 | $2.98200 | $2.07800 | $1.13000 | $0.56500 | $0.28300 |
| Price of working on R5A ($ per occasion) | $10.96764 | $5.97013 | $4.83276 | $3.30098 | $2.39045 | $1.79409 | $1.62635 |
| EC2 R6A occasion worth ($ per hour) | $5.44320 | $3.62880 | $2.72160 | $1.81440 | $0.90720 | $0.45360 | $0.22680 |
| EMR R6A worth ($ per hour per occasion) | $1.36080 | $0.90720 | $0.68040 | $0.45360 | $0.22680 | $0.11340 | $0.05670 |
| (EC2 + EMR) R6A occasion worth ($ per hour) | $6.80400 | $4.53600 | $3.40200 | $2.26800 | $1.13400 | $0.56700 | $0.28350 |
| Price of working on R6A ($ per occasion) | $5.47618 | $3.66219 | $2.85187 | $2.19797 | $1.46832 | $1.21548 | $1.36856 |
| Complete value discount with R6A together with efficiency enchancment | -50.07% | -38.66% | -40.99% | -33.41% | -38.58% | -32.25% | -15.85% |
Benchmarking methodology
The benchmark used on this put up is derived from the industry-standard TPC-DS benchmark and makes use of queries from the Spark SQL Efficiency Exams GitHub repo with the next fixes utilized.
We calculated TCO by multiplying value per hour by variety of situations within the cluster and time taken to run the queries on the cluster. We used the on-demand pricing within the US East (N. Virginia) Area for all situations.
Conclusion
On this put up, we described how we estimated the cost-performance profit from utilizing Amazon EMR with M6A and R6A situations in comparison with utilizing equal previous-generation situations. Utilizing these new situations with Amazon EMR improves worth efficiency by 15–50%.
Concerning the authors
Al MS is a product supervisor for Amazon EMR at Amazon Internet Providers.
Kyeonghyun Ryoo is a Software program Improvement Engineer for EMR at Amazon Internet Providers. He primarily works on designing and constructing automation instruments for inner groups and prospects to maximise their productiveness. Exterior of labor, he’s a retired world champion in skilled gaming who nonetheless get pleasure from taking part in video video games.
