Synchronize your Salesforce and Snowflake information to hurry up your time to perception with Amazon AppFlow

This put up was co-written with Amit Shah, Principal Advisor at Atos.

Clients throughout industries search significant insights from the info captured of their Buyer Relationship Administration (CRM) programs. To attain this, they mix their CRM information with a wealth of data already obtainable of their information warehouse, enterprise programs, or different software program as a service (SaaS) functions. One extensively used method is getting the CRM information into your information warehouse and maintaining it updated by frequent information synchronization.

Integrating third-party SaaS functions is commonly difficult and requires important effort and growth. Builders want to know the appliance APIs, write implementation and check code, and keep the code for future API adjustments. Amazon AppFlow, which is a low-code/no-code AWS service, addresses this problem.

Amazon AppFlow is a completely managed integration service that lets you securely switch information between SaaS functions, like Salesforce, SAP, Zendesk, Slack, and ServiceNow, and AWS providers like Amazon Easy Storage Service (Amazon S3) and Amazon Redshift in only a few clicks. With Amazon AppFlow, you may run information flows at enterprise scale on the frequency you select—on a schedule, in response to a enterprise occasion, or on demand.

On this put up, we give attention to synchronizing your information from Salesforce to Snowflake (on AWS) with out writing code. This put up walks you thru the steps to arrange an information stream to handle full and incremental information load utilizing an instance use case.

Resolution overview

Our use case includes the synchronization of the Account object from Salesforce into Snowflake. On this structure, you utilize Amazon AppFlow to filter and switch the info to your Snowflake information warehouse.

You’ll be able to configure Amazon AppFlow to run your information ingestion in three other ways:

On-demand – You’ll be able to manually run the stream by the AWS Administration Console, API, or SDK name.
Occasion-driven – Amazon AppFlow can subscribe and hear to alter information seize (CDC) occasions from the supply SaaS software.
Scheduled – Amazon AppFlow can run schedule-triggered flows primarily based on a pre-defined schedule rule. With scheduled flows, you may select both full or incremental information switch:
- With full switch, Amazon AppFlow transfers a snapshot of all information on the time of the stream run from the supply to the vacation spot.
- With incremental switch, Amazon AppFlow transfers solely the information which have been added or modified because the final profitable stream run. To find out the incremental delta of your information, AppFlow requires you to specify a supply timestamp discipline to instruct how Amazon AppFlow identifies new or up to date information.

We use the on-demand set off for the preliminary load of information from Salesforce to Snowflake, as a result of it helps you pull all of the information, no matter their creation. To then synchronize information periodically with Snowflake, after we run the on-demand set off, we configure a scheduled set off with incremental switch. With this method, Amazon AppFlow pulls the information primarily based on a selected timestamp discipline from the Salesforce Account object periodically, primarily based on the time interval specified within the stream.

The Account_Staging desk is created in Snowflake to behave as a short lived storage that can be utilized to determine the info change occasions. Then the everlasting desk (Account) is up to date from the staging desk by working a SQL saved process that comprises the incremental replace logic. The next determine depicts the assorted parts of the structure and the info stream from the supply to the goal.

The information stream comprises the next steps:

First, the stream is run with on-demand and full switch mode to load the total information into Snowflake.
The Amazon AppFlow Salesforce connector pulls the info from Salesforce and shops it within the Account Knowledge S3 bucket in CSV format.
The Amazon AppFlow Snowflake connector masses the info into the Account_Staging desk.
A scheduled process, working at common intervals in Snowflake, triggers a saved process.
The saved process begins an atomic transaction that masses the info into the Account desk after which deletes the info from the Account_Staging desk.
After the preliminary information is loaded, you replace the stream to seize incremental updates from Salesforce. The stream set off configuration is modified to scheduled, to seize information adjustments in Salesforce. This permits Snowflake to get all updates, deletes, and inserts in Salesforce at configured intervals.
The stream makes use of the configured LastModifiedDate discipline to find out incremental adjustments.
Steps 3, 4, and 5 are run once more to load the incremental updates into the Snowflake Accounts desk.

Conditions

To get began, you want the next conditions:

A Salesforce consumer account with ample privileges to put in related apps. Amazon AppFlow makes use of a related app to speak with Salesforce APIs. Should you don’t have a Salesforce account, you may join a developer account.
A Snowflake account with ample permissions to create and configure the mixing, exterior stage, desk, saved procedures, and duties.
An AWS account with entry to AWS Identification and Entry Administration (IAM), Amazon AppFlow, and Amazon S3.

Arrange Snowflake configuration and Amazon S3 information

Full the next steps to configure Snowflake and arrange your information in Amazon S3:

Create two S3 buckets in your AWS account: one for holding the info coming from Salesforce, and one other for holding error information.

A greatest follow when creating your S3 bucket is to ensure you block public entry to the bucket to make sure your information shouldn’t be accessible by unauthorized customers.

Create an IAM coverage named snowflake-access that enables itemizing the bucket contents and studying S3 objects contained in the bucket.

Observe the directions for steps 1 and a couple of in Configuring a Snowflake Storage Integration to Entry Amazon S3 to create an IAM coverage and function. Substitute the placeholders along with your S3 bucket names.

Log in to your Snowflake account and create a brand new warehouse referred to as SALESFORCE and database referred to as SALESTEST.
Specify the format by which information will likely be obtainable in Amazon S3 for Snowflake to load (for this put up, CSV):

USE DATABASE SALESTEST;
CREATE or REPLACE file format my_csv_format
sort = csv
field_delimiter=","
Y skip_header = 1
null_if = ('NULL', 'null')
empty_field_as_null = true
compression = gzip;

Amazon AppFlow makes use of the Snowflake COPY command to maneuver information utilizing an S3 bucket. To configure this integration, comply with steps 3–6 in Configuring a Snowflake Storage Integration to Entry Amazon S3.

These steps create a storage integration along with your S3 bucket, replace IAM roles with Snowflake account and consumer particulars, and creates an exterior stage.

This completes the setup in Snowflake. Within the subsequent part, you create the required objects in Snowflake.

Create schemas and procedures in Snowflake

In your Snowflake account, full the next steps to create the tables, saved procedures, and duties for implementing the use case:

In your Snowflake account, open a worksheet and run the next DDL scripts to create the Account and Account_staging tables:

CREATE or REPLACE TABLE ACCOUNT_STAGING (
ACCOUNT_NUMBER STRING NOT NULL,
ACCOUNT_NAME STRING,
ACCOUNT_TYPE STRING,
ANNUAL_REVENUE NUMBER,
ACTIVE BOOLEAN NOT NULL,
DELETED BOOLEAN,
LAST_MODIFIED_DATE STRING,
major key (ACCOUNT_NUMBER)
);

CREATE or REPLACE TABLE ACCOUNT (
ACCOUNT_NUMBER STRING NOT NULL,
ACCOUNT_NAME STRING,
ACCOUNT_TYPE STRING,
ANNUAL_REVENUE NUMBER,
ACTIVE BOOLEAN NOT NULL,
LAST_MODIFIED_DATE STRING,
major key (ACCOUNT_NUMBER)
);

Create a saved process in Snowflake to load information from staging to the Account desk:

CREATE or REPLACE process sp_account_load( )
returns varchar not null
language sql
as
$$
start
Start transaction;
merge into ACCOUNT utilizing ACCOUNT_STAGING
on ACCOUNT.ACCOUNT_NUMBER = ACCOUNT_STAGING.ACCOUNT_NUMBER
when matched AND ACCOUNT_STAGING.DELETED=TRUE then delete
when matched then UPDATE SET
ACCOUNT.ACCOUNT_NAME = ACCOUNT_STAGING.ACCOUNT_NAME,
ACCOUNT.ACCOUNT_TYPE = ACCOUNT_STAGING.ACCOUNT_TYPE,
ACCOUNT.ANNUAL_REVENUE = ACCOUNT_STAGING.ANNUAL_REVENUE,
ACCOUNT.ACTIVE = ACCOUNT_STAGING.ACTIVE,
ACCOUNT.LAST_MODIFIED_DATE = ACCOUNT_STAGING.LAST_MODIFIED_DATE
when NOT matched then
INSERT (
ACCOUNT.ACCOUNT_NUMBER,
ACCOUNT.ACCOUNT_NAME,
ACCOUNT.ACCOUNT_TYPE,
ACCOUNT.ANNUAL_REVENUE,
ACCOUNT.ACTIVE,
ACCOUNT.LAST_MODIFIED_DATE
)
values(
ACCOUNT_STAGING.ACCOUNT_NUMBER,
ACCOUNT_STAGING.ACCOUNT_NAME,
ACCOUNT_STAGING.ACCOUNT_TYPE,
ACCOUNT_STAGING.ANNUAL_REVENUE,
ACCOUNT_STAGING.ACTIVE,
ACCOUNT_STAGING.LAST_MODIFIED_DATE
) ;

Delete from ACCOUNT_STAGING;
Commit;
finish;
$$
;

This saved process determines whether or not the info comprises new information that must be inserted or current information that must be up to date or deleted. After a profitable run, the saved process clears any information out of your staging desk.

Create a process in Snowflake to set off the saved process. Ensure that the time interval for this process is greater than the time interval configured in Amazon AppFlow for pulling the incremental adjustments from Salesforce. The time interval ought to be ample for information to be processed.

CREATE OR REPLACE TASK TASK_ACCOUNT_LOAD
WAREHOUSE = SALESFORCE
SCHEDULE = 'USING CRON 5 * * * * America/Los_Angeles'
AS
name sp_account_load();

Present the required permissions to run the duty and resume the duty:

As quickly as process is created will probably be suspended state so must resume it manually first time

ALTER TASK TASK_ACCOUNT_LOAD RESUME;

If the function which is assigned to us doesn’t have correct entry to renew/execute process must grant execute process privilege to that function

GRANT EXECUTE TASK, EXECUTE MANAGED TASK ON ACCOUNT TO ROLE SYSADMIN;

This completes the Snowflake a part of configuration and setup.

Create a Salesforce connection

First, let’s create a Salesforce connection that can be utilized by AppFlow to authenticate and pull information out of your Salesforce occasion. On the AWS console, ensure you are in the identical Area the place your Snowflake occasion is working.

On the Amazon AppFlow console, select Connections within the navigation pane.
From the record of connectors, choose Salesforce.
Select Create connection.
For Connection identify, enter a reputation of your alternative (for instance, Salesforce-blog).
Go away the remainder of the fields as default and select Proceed.
You’re redirected to a sign-in web page, the place you want to log in to your Salesforce occasion.
After you permit Amazon AppFlow entry to your Salesforce account, your connection is efficiently created.

Create a Snowflake connection

Full the next steps to create your Snowflake connection:

On the Connections menu, select Snowflake.
Select Create connection.
Present data for the Warehouse, Stage identify, and Bucket particulars fields.
Enter your credential particulars.

For Area, select the identical Area the place Snowflake is working.
For Connection identify, identify your connection Snowflake-blog.
Go away the remainder of the fields as default and select Join.

Create a stream in Amazon AppFlow

Now you create a stream in Amazon AppFlow to load the info from Salesforce to Snowflake. Full the next steps:

On the Amazon AppFlow console, select Flows within the navigation pane.
Select Create stream.
On the Specify stream particulars web page, enter a reputation for the stream (for instance, AccountData-SalesforceToSnowflake).
Optionally, present an outline for the stream and tags.
Select Subsequent.

On the Configure stream web page, for Supply identify¸ select Salesforce.
Select the Salesforce connection we created within the earlier step (Salesforce-blog).
For Select Salesforce object, select Account.
For Vacation spot identify, select Snowflake.
Select the newly created Snowflake connection.
For Select Snowflake object, select the staging desk you created earlier (SALESTEST.PUBLIC. ACCOUNT_STAGING).

Within the Error dealing with part, present your error S3 bucket.
For Select tips on how to set off the stream¸ choose Run on demand.
Select Subsequent.

Choose Manually map fields to map the fields between your supply and vacation spot.
Select the fields Account Quantity, Account Title, Account Kind, Annual Income, Lively, Deleted, and Final Modified Date.

Map every supply discipline to its corresponding vacation spot discipline.
Beneath Extra settings, go away the Import deleted information unchecked (default setting).

Within the Validations part, add validations for the info you’re pulling from Salesforce.

As a result of the schema for the Account_Staging desk in Snowflake database has a NOT NULL constraint for the fields Account_Number and Lively, information containing a null worth for these fields ought to be ignored.

Select Add Validation to configure validations for these fields.
Select Subsequent.

Go away every part else as default, proceed to the ultimate web page, and select Create Circulate.
After the stream is created, select Run stream.

When the stream run completes efficiently, it would carry all information into your Snowflake staging desk.

Confirm information in Snowflake

The information will likely be loaded into the Account_staging desk. To confirm that information is loaded in Snowflake, full the next steps:

Validate the variety of information by querying the ACCOUNT_STAGING desk in Snowflake.
Wait on your Snowflake process to run primarily based on the configured schedule.
Confirm that each one the info is transferred to the ACCOUNT desk and the ACCOUNT_STAGING desk is truncated.

Configure an incremental information load from Salesforce

Now let’s configure an incremental information load from Salesforce:

On the Amazon AppFlow console, choose your stream, and select Edit.
Go to the Edit configuration step and alter to Run stream on schedule.
Set the stream to run each 5 minutes, and supply a begin date of In the present day, with a begin time sooner or later.
Select Incremental switch and select the LastModifiedDate discipline.
Select Subsequent.
Within the Extra settings part, choose Import deleted information.

This ensures that deleted information from the supply are additionally ingested.

Select Save after which select Activate stream.

Now your stream is configured to seize all incremental adjustments.

Check the answer

Inside 5 minutes or much less, a scheduled stream will choose up your change and write the modified document into your Snowflake staging desk and set off the synchronization course of.

You’ll be able to see the main points of the run, together with variety of information transferred, on the Run Historical past tab of your stream.

Clear up

Clear up the sources in your AWS account by finishing the next steps:

On the Amazon AppFlow console, select Flows within the navigation pane.
From the record of flows, choose the stream AccountData-SalesforceToSnowflakeand delete it.
Enter delete to delete the stream.
Select Connections within the navigation pane.
Select Salesforce from the record of connectors, choose Salesforce-blog, and delete it.
Enter delete to delete the connector.
On the Connections web page, select Snowflake from the record of connectors, choose Snowflake-blog, and delete it.
Enter delete to delete the connector.
On the IAM console, select Roles within the navigation web page, then choose the function you created for Snowflake and delete it.
Select Insurance policies within the navigation pane, choose the coverage you created for Snowflake, and delete it.
On the Amazon S3 console, seek for the info bucket you created, select Empty to delete the objects, then delete the bucket.
Seek for the error bucket you created, select Empty to delete the objects, then delete the bucket.
Clear up sources in your Snowflake account:

Delete the duty TASK_ACCOUNT_LOAD:

ALTER TASK TASK_ACCOUNT_LOAD SUSPEND;
DROP TASK TASK_ACCOUNT_LOAD;

Delete the saved process sp_account_load:

DROP process sp_account_load();

Delete the tables ACCOUNT_STAGING and ACCOUNT:

DROP TABLE ACCOUNT_STAGING;
DROP TABLE ACCOUNT;

Conclusion

On this put up, we walked you thru tips on how to combine and synchronize your information from Salesforce to Snowflake utilizing Amazon AppFlow. This demonstrates how one can arrange your ETL jobs with out having to study new programming languages by utilizing Amazon AppFlow and your acquainted SQL language. This can be a proof of idea, however you may attempt to deal with edge circumstances like failure of Snowflake duties or perceive how incremental switch works by making a number of adjustments to a Salesforce document inside the scheduled time interval.

For extra data on Amazon AppFlow, go to Amazon AppFlow.

Concerning the authors

Ramesh Ranganathan is a Senior Accomplice Resolution Architect at AWS. He works with AWS clients and companions to supply steering on enterprise cloud adoption, software modernization and cloud native growth. He’s captivated with expertise and enjoys experimenting with AWS Serverless providers.

Kamen Sharlandjiev is an Analytics Specialist Options Architect and Amazon AppFlow knowledgeable. He’s on a mission to make life simpler for patrons who’re dealing with complicated information integration challenges. His secret weapon? Absolutely managed, low-code AWS providers that may get the job carried out with minimal effort and no coding.

Amit Shah is a cloud primarily based fashionable information structure knowledgeable and at the moment main AWS Knowledge Analytics follow in Atos. Primarily based in Pune in India, he has 20+ years of expertise in information technique, structure, design and growth. He’s on a mission to assist group change into data-driven.