Big Data

Analyze Amazon S3 storage prices utilizing AWS Price and Utilization Experiences, Amazon S3 Stock, and Amazon Athena

Analyze Amazon S3 storage prices utilizing AWS Price and Utilization Experiences, Amazon S3 Stock, and Amazon Athena
Written by admin


Since its launch in 2006, Amazon Easy Storage Service (Amazon S3) has skilled main development, supporting a number of use instances resembling internet hosting web sites, creating knowledge lakes, serving as object storage for client functions, storing logs, and archiving knowledge. As the appliance portfolio grows, prospects are inclined to retailer knowledge from a number of utility and totally different enterprise capabilities in a single S3 bucket, which may develop the storage in S3 buckets to a whole lot of TBs. The AWS Billing console gives a method to take a look at the overall storage price of information saved in Amazon S3, however generally IT organizations want to grasp the breakdown of prices of a specific S3 bucket by varied prefixes or objects comparable to a specific person or utility. There are numerous causes to investigate the prices of S3 buckets, resembling to determine the spend breakdown, do inside chargebacks, perceive the fee breakdown by enterprise unit and utility, and lots of extra. As of this writing, there is no such thing as a simple method to do a price breakdown of S3 buckets by objects and prefixes.

On this publish, we talk about an answer utilizing Amazon Athena to question AWS Price and Utilization Experiences and Amazon S3 Stock studies to investigate the fee by prefixes and objects in an S3 bucket.

Overview of answer

The next determine exhibits the structure for this answer. First, we allow the AWS Price and Utilization Experiences (AWS CUR) and Amazon S3 Stock options, which save the output into two separate pre-created S3 buckets. We then use Athena to question these S3 buckets for AWS CUR knowledge and S3 object stock knowledge to correlate and allocate the fee breakdown on the object or prefix stage.

architecture diagram

To implement the answer, we full the next steps:

  1. Create S3 buckets for AWS CUR, S3 object stock, and Athena outcomes. Alternatively, you may create these respective buckets when enabling the respective particular person options, however for the aim of this publish, we create all of them at the start.
  2. Allow the Price and Utilization Experiences.
  3. Allow Amazon S3 Stock configuration.
  4. Create AWS Glue Information Catalog tables for the CUR and S3 object stock to question utilizing Athena.
  5. Run queries in Athena.

Stipulations

For this walkthrough, it is best to have the next conditions:

Create S3 buckets

Amazon S3 is an object storage service providing industry-leading scalability, knowledge availability, safety, and efficiency. Prospects of all sizes and industries can retailer and shield any quantity of information for just about any use case, resembling knowledge lakes, cloud-native functions, and cellular apps. With cost-effective storage lessons and easy-to-use administration options, you may optimize prices, set up knowledge, and configure fine-tuned entry controls to satisfy particular enterprise, organizational, and compliance necessities.

For this publish, we use the S3 bucket s3-object-cost-allocation as the first bucket for price allocation. This S3 bucket is conveniently modeled to include a number of prefixes and objects of various sizes for which price allocation must be performed based mostly on the general price of the bucket. In a real-world state of affairs, it is best to use a bucket that has knowledge for a number of groups and for which you could allocate prices by prefix or object. Going ahead, we check with this bucket as the first object bucket.

The next screenshot exhibits our S3 bucket and folders.

example Folders created

Now let’s create the three further operational S3 buckets to retailer the datasets generated to calculate prices for the objects. You may create the next buckets or any current buckets as wanted:

  • cur-cost-usage-reports-<account_number> – This bucket is used to save lots of the Price and Utilization Experiences for the account.
  • S3-inventory-configurations-<account_number> – This bucket is used to save lots of the stock configurations of our main object bucket.
  • athena-query-bucket-<account_number> – This bucket is used to save lots of the question outcomes from Athena.

Full the next steps to create your S3 buckets:

  • On the Amazon S3 console, select Buckets within the navigation pane.
  • Select Create bucket.
  • For Bucket identify, enter the identify of your bucket (cur-cost-usage-reports-<account_number>).
  • For AWS Area, select your most well-liked Area.
  • Depart all different settings at default (or in accordance with your group’s requirements).
  • Select Create bucket.
    create s3 bucket
  • Repeat these steps to create s3-inventory-configurations-<account_number> and athena-query-bucket-<account_number>.

Allow the Price and Utilization Experiences

The AWS Price and Utilization Experiences (AWS CUR) comprises essentially the most complete set of price and utilization knowledge accessible. You should use Price and Utilization Experiences to publish your AWS billing studies to an S3 bucket that you just personal. You may obtain studies that break down your prices by the hour, day, or month; by product or product useful resource; or by tags that you just outline your self.

Full the next steps to allow Price and Utilization Experiences on your account:

  • On the AWS Billing console, within the navigation pane, select Price & Utilization Experiences.
  • Select Create report.
  • For Report identify, enter a reputation on your report, resembling account-cur-s3.
  • For Further report particulars, choose Embrace useful resource IDs to incorporate the IDs of every particular person useful resource within the report.Together with useful resource IDs will create particular person line objects for every of your sources. This will improve the scale of your Price and Utilization Experiences information considerably, which may have an effect on the S3 storage prices on your CUR, based mostly in your AWS utilization. We’d like this function enabled for this publish.
  • For Information refresh settings, choose whether or not you need the Price and Utilization Experiences to refresh if AWS applies refunds, credit, or help charges to your account after finalizing your invoice.When a report refreshes, a brand new report is uploaded to Amazon S3.
  • Select Subsequent.
  • For S3 bucket, select Configure.
  • For Configure S3 Bucket, choose an current bucket created within the earlier part (cur-cost-usage-reports-<account_number>) and select Subsequent.
  • Assessment the bucket coverage, choose I’ve confirmed that this coverage is appropriate, and select Save. This default bucket coverage gives Price and Utilization Experiences entry to jot down knowledge to Amazon S3.
  • For Report path prefix, enter cur-data/account-cur-daily.
  • For Time granularity, select Each day.
  • For Report versioning, select Overwrite current report.
  • For Allow report knowledge integration for, choose Amazon Athena.
  • Select Subsequent.
  • After you could have reviewed the settings on your report, select Assessment and Full.
    create cost and usage report

The Price and Utilization studies shall be delivered to the S3 buckets inside 24 hours.

The next pattern CUR in CSV format exhibits totally different columns of the Price and Utilization Report, together with bill_invoice_id, bill_invoicing_entity, bill_payer_account_id, and line_item_product_code, to call a couple of.

sample cost and usage report

Allow Amazon S3 Stock configuration

Amazon S3 Stock is without doubt one of the instruments Amazon S3 gives to assist handle your storage. You should use it to audit and report on the replication and encryption standing of your objects for enterprise, compliance, and regulatory wants. Amazon S3 Stock gives comma-separated values (CSV), Apache Optimized Row Columnar (ORC), or Apache Parquet output information that checklist your objects and their corresponding metadata on a each day or weekly foundation for an S3 bucket or a shared prefix (objects which have names that start with a standard string).

Full the next steps to allow Amazon S3 Stock on the first object bucket:

  • On the Amazon S3 console, select Buckets within the navigation pane.
  • Select the bucket for which you need to configure Amazon S3 Stock.
    This would be the current bucket in your account that has knowledge that must be analyzed. This could possibly be your knowledge lake or utility S3 bucket. We created the bucket s3-object-cost-allocation with some pattern knowledge and folder construction.
  • Select Administration.
  • Underneath Stock configurations, select Create stock configuration.
  • For Stock configuration identify, enter s3-object-cost-allocation.
  • For Stock scope, go away Prefix clean.
    That is to make sure that all objects are coated for the report.
  • For Object Variations, choose Present model solely.
  • For Report particulars, select This account.
  • For Vacation spot, select the vacation spot bucket we created (s3-inventory-configurations-<account_number>).
  • For Frequency, select Each day.
  • For Output format, select as Apache Parquet.
  • For Standing, select Allow.
  • Maintain server-side encryption disabled. To make use of server-side encryption, select Allow and specify the encryption key.
  • For Further fields, choose the next so as to add to the stock report:
    • Dimension – The article measurement in bytes.
    • Final modified date – The article creation date or the final modified date, whichever is the most recent.
    • Multipart add – Specifies that the thing was uploaded as a multipart add. For extra info, see Importing and copying objects utilizing multipart add.
    • Replication standing – The replication standing of the thing. For extra info, see Utilizing the S3 console.
    • Encryption standing – The server-side encryption used to encrypt the thing. For extra info, see Defending knowledge utilizing server-side encryption.
    • Bucket key standing – Signifies whether or not a bucket-level key generated by AWS KMS applies to the thing.
    • Storage class – The storage class used for storing the thing.
    • Clever-Tiering: Entry tier – Signifies the entry tier of the thing if it was saved in Clever-Tie
      create s3 inventory
  • Select Create.
    s3 inventory configuration

It might take as much as 48 hours to ship the first report.

Create AWS Glue Information Catalog tables for CUR and Amazon S3 Stock studies

Anticipate as much as 48 hours for the earlier step to generate the studies. On this part, we use Athena to create and outline AWS Glue Information Catalog tables for the info that has been created utilizing Price and Utilization Experiences and Amazon S3 Stock studies.

Athena is a serverless, interactive analytics service constructed on open-source frameworks, supporting open-table and file codecs. Athena gives a simplified, versatile method to analyze petabytes of information the place it lives.

Full the next steps to create the tables utilizing Athena:

  • Navigate to the Athena console.
  • Should you’re utilizing Athena for the primary time, you could arrange a question consequence location in Amazon S3. Should you preconfigured this in Athena , you may skip this step.
    • Select View settings.
      athena setup query bucket
    • Select Handle.
    • Within the part Question consequence location and encryption, select Browse S3 and select the bucket that we created (athena-query-bucket-<account_number>).
    • Select Save.
      Athena Config
    • Navigate again to the Athena question editor.
  • Run the next question in Athena to create a desk for Price and Utilization Experiences. Confirm and replace the part for <<LOCATION>> on the finish of the question and level it to the proper S3 bucket and placement. Word that the brand new desk identify needs to be account_cur.
    CREATE EXTERNAL TABLE `account_cur`(
    `identity_line_item_id` string,
    `identity_time_interval` string,
    `bill_invoice_id` string,
    `bill_billing_entity` string,
    `bill_bill_type` string,
    `bill_payer_account_id` string,
    `bill_billing_period_start_date` timestamp,
    `bill_billing_period_end_date` timestamp,
    `line_item_usage_account_id` string,
    `line_item_line_item_type` string,
    `line_item_usage_start_date` timestamp,
    `line_item_usage_end_date` timestamp,
    `line_item_product_code` string,
    `line_item_usage_type` string,
    `line_item_operation` string,
    `line_item_availability_zone` string,
    `line_item_resource_id` string,
    `line_item_usage_amount` double,
    `line_item_normalization_factor` double,
    `line_item_normalized_usage_amount` double,
    `line_item_currency_code` string,
    `line_item_unblended_rate` string,
    `line_item_unblended_cost` double,
    `line_item_blended_rate` string,
    `line_item_blended_cost` double,
    `line_item_line_item_description` string,
    `line_item_tax_type` string,
    `line_item_legal_entity` string,
    `product_product_name` string,
    `product_availability` string,
    `product_description` string,
    `product_durability` string,
    `product_event_type` string,
    `product_fee_code` string,
    `product_fee_description` string,
    `product_free_query_types` string,
    `product_from_location` string,
    `product_from_location_type` string,
    `product_from_region_code` string,
    `product_group` string,
    `product_group_description` string,
    `product_location` string,
    `product_location_type` string,
    `product_message_delivery_frequency` string,
    `product_message_delivery_order` string,
    `product_operation` string,
    `product_platopricingtype` string,
    `product_product_family` string,
    `product_queue_type` string,
    `product_region` string,
    `product_region_code` string,
    `product_servicecode` string,
    `product_servicename` string,
    `product_sku` string,
    `product_storage_class` string,
    `product_storage_media` string,
    `product_to_location` string,
    `product_to_location_type` string,
    `product_to_region_code` string,
    `product_transfer_type` string,
    `product_usagetype` string,
    `product_version` string,
    `product_volume_type` string,
    `pricing_rate_code` string,
    `pricing_rate_id` string,
    `pricing_currency` string,
    `pricing_public_on_demand_cost` double,
    `pricing_public_on_demand_rate` string,
    `pricing_term` string,
    `pricing_unit` string,
    `reservation_amortized_upfront_cost_for_usage` double,
    `reservation_amortized_upfront_fee_for_billing_period` double,
    `reservation_effective_cost` double,
    `reservation_end_time` string,
    `reservation_modification_status` string,
    `reservation_normalized_units_per_reservation` string,
    `reservation_number_of_reservations` string,
    `reservation_recurring_fee_for_usage` double,
    `reservation_start_time` string,
    `reservation_subscription_id` string,
    `reservation_total_reserved_normalized_units` string,
    `reservation_total_reserved_units` string,
    `reservation_units_per_reservation` string,
    `reservation_unused_amortized_upfront_fee_for_billing_period` double,
    `reservation_unused_normalized_unit_quantity` double,
    `reservation_unused_quantity` double,
    `reservation_unused_recurring_fee` double,
    `reservation_upfront_value` double,
    `savings_plan_total_commitment_to_date` double,
    `savings_plan_savings_plan_a_r_n` string,
    `savings_plan_savings_plan_rate` double,
    `savings_plan_used_commitment` double,
    `savings_plan_savings_plan_effective_cost` double,
    `savings_plan_amortized_upfront_commitment_for_billing_period` double,
    `savings_plan_recurring_commitment_for_billing_period` double,
    `resource_tags_user_bucket_name` string,
    `resource_tags_user_cost_tracking` string)
    PARTITIONED BY (
    `yr` string,
    `month` string)
    ROW FORMAT SERDE
    'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
    STORED AS INPUTFORMAT
    'org.apache.hadoop.mapred.TextInputFormat'
    OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION
    '<<LOCATION>>'
  • Run the next question in Athena to create the desk for Amazon S3 Stock. Confirm and replace the part for <<LOCATION>> on the finish of the question and level it to the proper S3 bucket and placement.
    • To get the precise worth of the placement, navigate to the bucket the place stock configurations are saved and navigate to the folder path Hive . Use the S3 URI to interchange <<LOCATION>> within the question.query path location
      CREATE EXTERNAL TABLE s3_object_inventory(
               bucket string,
               key string,
               version_id string,
               is_latest boolean,
               is_delete_marker boolean,
               measurement bigint,
               last_modified_date bigint,
               storage_class string,
               is_multipart_uploaded boolean,
               replication_status string,
               encryption_status string,
               intelligent_tiering_access_tier string,
               bucket_key_status string
      ) PARTITIONED BY (
              dt string
      )
      ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
        STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
        LOCATION '<<LOCATION>>';
      
  • We have to refresh the partitions and add new stock lists to the desk. Use the next instructions so as to add knowledge to the CUR desk and Amazon S3 Stock desk:
    MSCK REPAIR TABLE `account_cur`;
    
    MSCK REPAIR TABLE s3_object_inventory;

Run queries in Athena to allocate the price of objects in an S3 bucket

Now we will question the info now we have accessible to get a price allocation breakdown on the prefix stage.

We have to present some info within the following queries:

  • Replace <<YYYY-MM-DD>> with the date for which you need to analyze the info
  • Replace <<prefix>> with the prefix values on your bucket that must be analyzed
  • Replace <<bucket_name>> with the identify of the bucket that must be analyzed

We use the next a part of the question to calculate the scale of storage being utilized by the goal prefix that we need to calculate the fee for:

choose date_parse(dt,'%Y-%m-%d-%H-%i') dt, forged (sum(measurement) as double) targetPrefixBytes
from s3_object_inventory
the place date_parse(dt,'%Y-%m-%d-%H-%i') = forged('<<YYYY-MM-DD>>' as timestamp)
and key like '<<prefix>>/%'
group by dt

Subsequent, we calculate the overall measurement of the bucket on that exact date:

choose date_parse(dt,'%Y-%m-%d-%H-%i') dt, forged (sum(measurement) as double) totalBytes
from s3_object_inventory
the place date_parse(dt,'%Y-%m-%d-%H-%i') = forged('<<YYYY-MM-DD>>' as timestamp)
group by dt

We question the CUR desk to get the price of a specific bucket on a specific date:

choose line_item_usage_start_date as dt, sum(line_item_blended_cost) as line_item_blended_cost
from "account_cur"
the place line_item_product_code="AmazonS3"
and product_servicecode="AmazonS3"
and line_item_operation = 'StandardStorage'
and line_item_resource_id = '<<bucket_name>>'
and line_item_usage_start_date = forged('<<YYYY-MM-DD>>' as timestamp)
group by line_item_usage_start_date

Placing all of this collectively, we will calculate the price of a specific prefix (folder or a file) on a particular date. The entire question is as follows:

with
price as (choose line_item_usage_start_date as dt, sum(line_item_blended_cost) as line_item_blended_cost
from "account_cur"
the place line_item_product_code="AmazonS3"
and product_servicecode="AmazonS3"
and line_item_operation = 'StandardStorage'
and line_item_resource_id = '<<bucket_name>>'
and line_item_usage_start_date = forged('<<YYYY-MM-DD>>' as timestamp)
group by line_item_usage_start_date),
whole as (choose date_parse(dt,'%Y-%m-%d-%H-%i') dt, forged (sum(measurement) as double) totalBytes
from s3_object_inventory
the place date_parse(dt,'%Y-%m-%d-%H-%i') = forged('<<YYYY-MM-DD>>' as timestamp)
group by dt),
goal as (choose date_parse(dt,'%Y-%m-%d-%H-%i') dt, forged (sum(measurement) as double) targetPrefixBytes
from s3_object_inventory
the place date_parse(dt,'%Y-%m-%d-%H-%i') = forged('<<YYYY-MM-DD>>' as timestamp)
and key like '<<prefix>>/%'
group by dt)
choose goal.dt,
(goal.targetPrefixBytes/ whole.totalBytes * 100) percentUsed,
price.line_item_blended_cost totalCost,
price.line_item_blended_cost*(goal.targetPrefixBytes/ whole.totalBytes) as prefixCost
from goal, whole, price
the place goal.dt = whole.dt
and goal.dt = price.dt

The next screenshot exhibits the outcomes desk for the pattern knowledge we used on this publish. We get the next info:

  • dt – Date
  • percentUsed – The share of prefix house in comparison with general bucket house
  • totalCost – The entire price of the bucket
  • prefixCost – The price of the house utilized by the prefix

final result percetage

Clear up

To cease incurring prices, remember to disable Amazon S3 Stock and Price and Utilization Experiences if you’re performed.

Delete the S3 buckets created for the Amazon S3 Stock studies and Price and Utilization Experiences to keep away from storage prices.

Different strategies for Amazon S3 storage evaluation

Amazon S3 Storage Lens can present a single view of object storage utilization and exercise throughout your whole Amazon S3 storage. With S3 Storage Lens, you may perceive, analyze, and optimize storage with over 29 utilization and exercise metrics and interactive dashboards to combination knowledge on your whole group, particular accounts, Areas, buckets, or prefixes. All of this knowledge is accessible on the Amazon S3 console or as uncooked knowledge in an S3 bucket.

S3 Storage Lens doesn’t present price evaluation based mostly on an object or prefix in a single bucket. If you would like visibility of storage utilization and traits throughout all the storage footprint together with suggestions on price effectivity and knowledge safety greatest practices, S3 Storage Lens is the suitable choice. However if you need a price evaluation of particular S3 buckets and searching for methods to get price allocation of S3 objects on the object or prefix stage, the answer on this publish could be one of the best match.

Conclusion

On this publish, we detailed the best way to create a price breakdown mannequin on the object or prefix stage for S3 buckets that comprises knowledge for a number of enterprise items and functions. We used Athena to question the studies and datasets produced by the AWS CUR and Amazon S3 Stock options that, when correlated, give us the fee allocation on the object and prefix stage. This answer offers you a straightforward method to calculate prices for unbiased objects and prefixes, which can be utilized for inside chargebacks or simply to know the per-object or per-prefix spending in a shared S3 bucket.


Concerning the Authors


Dagar Katyal
is a Senior Options Architect at AWS, based mostly in Chicago, Illinois. He works with prospects and gives steering for key strategic initiatives vital for his or her enterprise. Dagar has an MBA and has spent years over 15 years working with prospects on initiatives on analytics technique, roadmap, and utilizing knowledge as a key differentiator. When not working with prospects, Dagar spends time along with his household and doing house enchancment initiatives.


Saiteja Pudi
is a Options Architect at AWS, based mostly in Dallas, Tx. He has been with AWS for greater than 3 years now, serving to prospects derive the true potential of AWS by being their trusted advisor. He comes from an utility growth background, enthusiastic about Information Science and Machine Studying.

About the author

admin

Leave a Comment