Amazon Redshift is a completely managed, petabyte-scale, massively parallel information warehouse that provides easy operations and excessive efficiency. It makes it quick, easy, and cost-effective to investigate all of your information utilizing commonplace SQL and your current enterprise intelligence (BI) instruments. At this time, Amazon Redshift is probably the most broadly used cloud information warehouse.
Dynamic information masking (DDM) help (preview) in Amazon Redshift lets you simplify the method of defending delicate information in your Amazon Redshift information warehouse. Now you can use DDM to guard information primarily based in your job position or permission rights and degree of knowledge sensitivity by a SQL interface. DDM help (preview) in Amazon Redshift lets you disguise, obfuscate, or pseudonymize column values inside the tables in your information warehouse with out incurring extra storage prices. It’s configurable to let you outline constant, format-preserving, and irreversible masked information values.
DDM help (preview) in Amazon Redshift offers a local characteristic to help your have to masks information for regulatory or compliance necessities, or to extend inside privateness requirements. In comparison with static information masking the place underlying information at relaxation will get completely changed or redacted, DDM help (preview) in Amazon Redshift lets you briefly manipulate the show of delicate information in transit at question time primarily based on person privilege, leaving the unique information at relaxation intact. You management entry to information by masking insurance policies that apply customized obfuscation guidelines to a given person or position. That method, you possibly can reply to altering privateness necessities with out altering the underlying information or modifying SQL queries.
With DDM help (preview) in Amazon Redshift, you are able to do the next:
- Outline masking insurance policies that apply customized obfuscation insurance policies (for instance, masking insurance policies to deal with bank card, PII entries, HIPAA or GDPR wants, and extra)
- Rework the info at question time to use masking insurance policies
- Connect masking insurance policies to roles or customers
- Connect a number of masking insurance policies with various ranges of obfuscation to the identical column in a desk and assign them to completely different roles with priorities to keep away from conflicts
- Implement cell-level masking by utilizing conditional columns when creating your masking coverage
- Use masking insurance policies to partially or fully redact information, or hash it by utilizing user-defined features (UDFs)
Right here’s what our clients should say on DDM help(personal beta) in Amazon Redshift:
“Baffle delivers data-centric safety for enterprises through a knowledge safety platform that’s clear to purposes and distinctive to information safety. Our mission is to seamlessly weave information safety into each information pipeline. Beforehand, to use information masking to an Amazon Redshift information supply, we needed to stage the info in an Amazon S3 bucket. Now, by using the Amazon Redshift Dynamic Knowledge Masking functionality, our clients can defend delicate information all through the analytics pipeline, from safe ingestion to accountable consumption decreasing the danger of breaches.”
-Ameesh Divatia, CEO & co-founder of Baffle
“EnergyAustralia is a number one Australian power retailer and generator, with a mission to guide the clear power transition for patrons in a method that’s dependable, reasonably priced and sustainable for all. We allow all corners of our enterprise with Knowledge & Analytics capabilities which might be used to optimize enterprise processes and improve our clients’ expertise. Protecting our clients’ information protected is a prime precedence throughout our groups. Up to now, this concerned a number of layers of customized constructed safety insurance policies that would make it cumbersome for analysts to seek out the info they require. The brand new AWS dynamic information masking characteristic will considerably simplify our safety processes so we proceed to maintain buyer information protected, whereas additionally decreasing the executive overhead.”
-William Robson, Knowledge Options Design Lead, EnergyAustralia
Use case
For our use case, a retail firm desires to manage how they present bank card numbers to customers primarily based on their privilege. In addition they don’t need to duplicate the info for this function. They’ve the next necessities:
- Customers from Buyer Service ought to be capable of view the primary six digits and the final 4 digits of the bank card for buyer verification
- Customers from Fraud Prevention ought to be capable of view the uncooked bank card quantity provided that it’s flagged as fraud
- Customers from Auditing ought to be capable of view the uncooked bank card quantity
- All different customers shouldn’t be capable of view the bank card quantity
Answer overview
The answer encompasses creating masking insurance policies with various masking guidelines and attaching a number of to the identical position and desk with an assigned precedence to take away potential conflicts. These insurance policies could pseudonymize outcomes or selectively nullify outcomes to adjust to retailers’ safety necessities. We seek advice from a number of masking insurance policies being connected to a desk as a multi-modal masking coverage. A multi-modal masking coverage consists of three elements:
- An information masking coverage that defines the info obfuscation guidelines
- Roles with completely different entry ranges relying on the enterprise case
- The flexibility to connect a number of masking insurance policies on a person or position and desk mixture with precedence for battle decision
The next diagram illustrates how DDM help (preview) in Amazon Redshift insurance policies works with roles and customers for our retail use case.

For a person with a number of roles, the masking coverage with the very best attachment precedence is used. For instance, within the following instance, Ken is a part of the Public and FrdPrvnt position. As a result of the FrdPrvnt position has the next attachment precedence, card_number_conditional_mask shall be utilized.

Stipulations
To implement this resolution, you must full the next conditions:
- Have an AWS account.
- Have an Amazon Redshift cluster provisioned with DDM help (preview) or a serverless workgroup with DDM help (preview).
- Have Superuser privilege, or the
sys:secadminposition on the Amazon Redshift information warehouse created in step 2.
Getting ready the info
To arrange our use case, full the next steps:
- On the Amazon Redshift console, select Question editor v2 in Explorer.
In the event you’re aware of SQL Notebooks, you possibly can obtain the Jupyter pocket book for the demonstration, and import it to shortly get began. - Create the desk and populate contents.
- Create customers.
Implement the answer
To fulfill the safety necessities, we have to be sure that every person sees the identical information in numerous methods primarily based on their granted privileges. To try this, we use person roles mixed with masking insurance policies as follows:
- Create person roles and grant completely different customers to completely different roles:
- Create masking insurance policies:
- Connect the masking insurance policies on the desk or column to the person or position:
Take a look at the answer
Let’s affirm that the masking insurance policies are created and connected.
- Test that the masking insurance policies are created with the next code:
- Test that the masking insurance policies are connected:
Now we will check that completely different customers can see the identical information masked in a different way primarily based on their roles.
- Take a look at that the Buyer Service brokers can solely view the primary six digits and the final 4 digits of the bank card quantity:
- Take a look at that the Fraud Prevention customers can solely view the uncooked bank card quantity when it’s flagged as fraud:
- Take a look at that Auditor customers can view the uncooked bank card quantity:
- Take a look at that normal customers can’t view any digits of the bank card quantity:
Modify the masking coverage
To switch an current masking coverage, you should detach it from the position first after which drop and recreate it.
In our use case, the enterprise modified course and determined that Buyer Service brokers ought to solely be allowed to view the final 4 digits of the bank card quantity.
- Detach and drop the coverage:
- Recreate the coverage and reattach the coverage on the desk or column to the supposed person or position.Notice that this time we created a scalar Python UDF. It’s potential to create a SQL, Python, and Lambda UDF primarily based in your use case.
- Take a look at that Buyer Service brokers can solely view the final 4 digits of the bank card quantity:
Clear up
While you’re finished with the answer, clear up your assets:
- Detach the masking insurance policies from the desk:
- Drop the masking insurance policies:
- Revoke and drop every person and position:
- Drop the operate and desk:
Issues and finest practices
Contemplate the next:
- All the time create a default coverage connected to the general public person. In the event you create a brand new person, they may at all times have a minimal coverage connected. It’s going to implement the supposed safety posture.
- Do not forget that DDM insurance policies in Amazon Redshift at all times observe invoker permissions conference, not definer (for extra data, seek advice from Safety and privileges for saved procedures ). That being stated, the masking insurance policies are relevant primarily based on the person or position operating it.
- For finest efficiency, create the masking features utilizing a scalar SQL UDF, if potential. The efficiency of scalar UDFs usually goes by the order of SQL to Python to Lambda, in that order. Typically, SQL UDF outperforms Python UDFs and the latter outperforms scalar Lambda UDFs.
- DDM insurance policies in Amazon Redshift are utilized forward of any predicate or be a part of operations. For instance, if you happen to’re operating a be a part of on a masked column (per your entry coverage) to an unmasked column, the be a part of will result in a mismatch. That’s an anticipated habits.
- All the time detach a masking coverage from all customers or roles earlier than dropping it.
- As of this writing, the answer has the next limitations:
- You may apply a masks coverage on tables and columns and connect it to a person or position, however teams aren’t supported.
- You may’t create a masks coverage on views, materialized views, and exterior tables.
- The DDM help (preview) in Amazon Redshift is offered in following areas: US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), Europe (Eire), and Europe (Stockholm).
Efficiency benchmarks
Based mostly on numerous exams carried out on TPC-H datasets, we’ve discovered built-in features to be extra performant as in comparison with features created externally utilizing scalar Python or Lambda UDFs.
Increase the answer
You may take this resolution additional and arrange a masking coverage that restricts SSN and e mail handle entry as follows:
- Buyer Service brokers accessing pre-built dashboards could solely view the final 4 digits of SSNs and full e mail addresses for correspondence
- Analysts can’t view SSNs or e mail addresses
- Auditing companies could entry uncooked values for SSNs in addition to e mail addresses
For extra data, seek advice from Use DDM help (preview) in Amazon Redshift for E-mail & SSN Masking.
Conclusion
On this put up, we mentioned tips on how to use DDM help (preview) in Amazon Redshift to outline configuration-driven, constant, format-preserving, and irreversible masked information values. With DDM help (preview) in Amazon Redshift, you possibly can management your information masking method utilizing acquainted SQL language. You may reap the benefits of the Amazon Redshift role-based entry management functionality to implement completely different ranges of knowledge masking. You may create a masking coverage to determine which column must be masked, and you’ve got the pliability of selecting tips on how to present the masked information. For instance, you possibly can fully disguise all the data of the info, exchange partial actual values with wildcard characters, or outline your individual option to masks the info utilizing SQL expressions, Python, or Lambda UDFs. Moreover, you possibly can apply a conditional masking primarily based on different columns, which selectively protects the column information in a desk primarily based on the values in a number of columns.
We encourage you to create your individual person outlined features for numerous use-cases and attain desired safety posture utilizing dynamic information masking help in Amazon Redshift.
In regards to the Authors
Rohit Vashishtha is a Senior Analytics Specialist Options Architect at AWS primarily based in Dallas, TX. He has greater than 16 years of expertise architecting, constructing, main, and sustaining huge information platforms. Rohit helps clients modernize their analytic workloads utilizing the breadth of AWS companies and ensures that clients get one of the best value/efficiency with the utmost safety and information governance.
Ahmed Shehata is a Senior Analytics Specialist Options Architect at AWS primarily based on Toronto. He has greater than 20 years of expertise serving to clients modernize their information platforms. Ahmed is captivated with serving to clients construct environment friendly, performant, and scalable analytic options.
Variyam Ramesh is a Senior Analytics Specialist Options Architect at AWS primarily based in Charlotte, NC. He’s an achieved know-how chief serving to clients conceptualize, develop, and ship progressive analytic options.
Yanzhu Ji is a Product Supervisor within the Amazon Redshift staff. She has expertise in product imaginative and prescient and technique in industry-leading information merchandise and platforms. She has excellent talent in constructing substantial software program merchandise utilizing internet improvement, system design, database, and distributed programming strategies. In her private life, Yanzhu likes portray, pictures, and taking part in tennis.
James Moore is a Technical Lead at Amazon Redshift centered on SQL options and safety. His work during the last 10 years has spanned distributed programs, machine studying, and databases. He’s captivated with constructing scalable software program that permits clients to unravel real-world issues.









