Security considerations - Azure Data Factory (2023)

  • Article
  • 11 minutes to read

APPLIES TO: Security considerations - Azure Data Factory (1)Azure Data Factory Security considerations - Azure Data Factory (2)Azure Synapse Analytics

This article describes basic security infrastructure that data movement services in Azure Data Factory use to help secure your data. Data Factory management resources are built on Azure security infrastructure and use all possible security measures offered by Azure.

In a Data Factory solution, you create one or more data pipelines. A pipeline is a logical grouping of activities that together perform a task. These pipelines reside in the region where the data factory was created.

Even though Data Factory is only available in few regions, the data movement service is available globally to ensure data compliance, efficiency, and reduced network egress costs.

Azure Data Factory including Azure Integration Runtime and Self-hosted Integration Runtime does not store any temporary data, cache data or logs except for linked service credentials for cloud data stores, which are encrypted by using certificates. With Data Factory, you create data-driven workflows to orchestrate movement of data between supported data stores, and processing of data by using compute services in other regions or in an on-premises environment. You can also monitor and manage workflows by using SDKs and Azure Monitor.

Data Factory has been certified for:

CSA STAR Certification
ISO 20000-1:2011
ISO 22301:2012
ISO 27001:2013
ISO 27017:2015
ISO 27018:2014
ISO 9001:2015
SOC 1, 2, 3
HIPAA BAA
HITRUST

If you're interested in Azure compliance and how Azure secures its own infrastructure, visit the Microsoft Trust Center. For the latest list of all Azure Compliance offerings check - https://aka.ms/AzureCompliance.

In this article, we review security considerations in the following two data movement scenarios:

  • Cloud scenario: In this scenario, both your source and your destination are publicly accessible through the internet. These include managed cloud storage services such as Azure Storage, Azure Synapse Analytics, Azure SQL Database, Azure Data Lake Store, Amazon S3, Amazon Redshift, SaaS services such as Salesforce, and web protocols such as FTP and OData. Find a complete list of supported data sources in Supported data stores and formats.
  • Hybrid scenario: In this scenario, either your source or your destination is behind a firewall or inside an on-premises corporate network. Or, the data store is in a private network or virtual network (most often the source) and is not publicly accessible. Database servers hosted on virtual machines also fall under this scenario.

Note

We recommend that you use the Azure Az PowerShell module to interact with Azure. See Install Azure PowerShell to get started. To learn how to migrate to the Az PowerShell module, see Migrate Azure PowerShell from AzureRM to Az.

(Video) AzureFunBytes Episode 44 - @Azure Data Factory Security with @narainabhishek

Cloud scenarios

Securing data store credentials

  • Store encrypted credentials in an Azure Data Factory managed store. Data Factory helps protect your data store credentials by encrypting them with certificates managed by Microsoft. These certificates are rotated every two years (which includes certificate renewal and the migration of credentials). For more information about Azure Storage security, see Azure Storage security overview.
  • Store credentials in Azure Key Vault. You can also store the data store's credential in Azure Key Vault. Data Factory retrieves the credential during the execution of an activity. For more information, see Store credential in Azure Key Vault.

Data encryption in transit

If the cloud data store supports HTTPS or TLS, all data transfers between data movement services in Data Factory and a cloud data store are via secure channel HTTPS or TLS.

Note

All connections to Azure SQL Database and Azure Synapse Analytics require encryption (SSL/TLS) while data is in transit to and from the database. When you're authoring a pipeline by using JSON, add the encryption property and set it to true in the connection string. For Azure Storage, you can use HTTPS in the connection string.

Note

To enable encryption in transit while moving data from Oracle follow one of the below options:

  1. In Oracle server, go to Oracle Advanced Security (OAS) and configure the encryption settings, which supports Triple-DES Encryption (3DES) and Advanced Encryption Standard (AES), refer here for details. ADF automatically negotiates the encryption method to use the one you configure in OAS when establishing connection to Oracle.
  2. In ADF, you can add EncryptionMethod=1 in the connection string (in the Linked Service). This will use SSL/TLS as the encryption method. To use this, you need to disable non-SSL encryption settings in OAS on the Oracle server side to avoid encryption conflict.

Note

TLS version used is 1.2.

Data encryption at rest

Some data stores support encryption of data at rest. We recommend that you enable the data encryption mechanism for those data stores.

Azure Synapse Analytics

Transparent Data Encryption (TDE) in Azure Synapse Analytics helps protect against the threat of malicious activity by performing real-time encryption and decryption of your data at rest. This behavior is transparent to the client. For more information, see Secure a database in Azure Synapse Analytics.

Azure SQL Database

Azure SQL Database also supports transparent data encryption (TDE), which helps protect against the threat of malicious activity by performing real-time encryption and decryption of the data, without requiring changes to the application. This behavior is transparent to the client. For more information, see Transparent data encryption for SQL Database and Data Warehouse.

Azure Data Lake Store

Azure Data Lake Store also provides encryption for data stored in the account. When enabled, Data Lake Store automatically encrypts data before persisting and decrypts before retrieval, making it transparent to the client that accesses the data. For more information, see Security in Azure Data Lake Store.

(Video) Data Security Best Practices for Data Engineers Using Data Factory | Azure SQL and ADF Event

Azure Blob storage and Azure Table storage

Azure Blob storage and Azure Table storage support Storage Service Encryption (SSE), which automatically encrypts your data before persisting to storage and decrypts before retrieval. For more information, see Azure Storage Service Encryption for Data at Rest.

Amazon S3

Amazon S3 supports both client and server encryption of data at rest. For more information, see Protecting Data Using Encryption.

Amazon Redshift

Amazon Redshift supports cluster encryption for data at rest. For more information, see Amazon Redshift Database Encryption.

Salesforce

Salesforce supports Shield Platform Encryption that allows encryption of all files, attachments, and custom fields. For more information, see Understanding the Web Server OAuth Authentication Flow.

Hybrid scenarios

Hybrid scenarios require self-hosted integration runtime to be installed in an on-premises network, inside a virtual network (Azure), or inside a virtual private cloud (Amazon). The self-hosted integration runtime must be able to access the local data stores. For more information about self-hosted integration runtime, see How to create and configure self-hosted integration runtime.

Security considerations - Azure Data Factory (3)

The command channel allows communication between data movement services in Data Factory and self-hosted integration runtime. The communication contains information related to the activity. The data channel is used for transferring data between on-premises data stores and cloud data stores.

On-premises data store credentials

The credentials can be stored within data factory or be referenced by data factory during the runtime from Azure Key Vault. If storing credentials within data factory, it is always stored encrypted on the self-hosted integration runtime.

  • Store credentials locally. If you directly use the Set-AzDataFactoryV2LinkedService cmdlet with the connection strings and credentials inline in the JSON, the linked service is encrypted and stored on self-hosted integration runtime. In this case the credentials flow through Azure backend service, which is extremely secure, to the self-hosted integration machine where it is finally encrypted and stored. The self-hosted integration runtime uses Windows DPAPI to encrypt the sensitive data and credential information.

  • Store credentials in Azure Key Vault. You can also store the data store's credential in Azure Key Vault. Data Factory retrieves the credential during the execution of an activity. For more information, see Store credential in Azure Key Vault.

  • Store credentials locally without flowing the credentials through Azure backend to the self-hosted integration runtime. If you want to encrypt and store credentials locally on the self-hosted integration runtime without having to flow the credentials through data factory backend, follow the steps in Encrypt credentials for on-premises data stores in Azure Data Factory. All connectors support this option. The self-hosted integration runtime uses Windows DPAPI to encrypt the sensitive data and credential information.

  • Use the New-AzDataFactoryV2LinkedServiceEncryptedCredential cmdlet to encrypt linked service credentials and sensitive details in the linked service. You can then use the JSON returned (with the EncryptedCredential element in the connection string) to create a linked service by using the Set-AzDataFactoryV2LinkedService cmdlet.

Ports used when encrypting linked service on self-hosted integration runtime

By default, when remote access from intranet is enabled, PowerShell uses port 8060 on the machine with self-hosted integration runtime for secure communication. If necessary, this port can be changed from the Integration Runtime Configuration Manager on the Settings tab:

Security considerations - Azure Data Factory (4)

(Video) Azure Data Factory Self-hosted Integration Runtime Tutorial | Connect to private on-premises network

Security considerations - Azure Data Factory (5)

Encryption in transit

All data transfers are via secure channel HTTPS and TLS over TCP to prevent man-in-the-middle attacks during communication with Azure services.

You can also use IPSec VPN or Azure ExpressRoute to further secure the communication channel between your on-premises network and Azure.

Azure Virtual Network is a logical representation of your network in the cloud. You can connect an on-premises network to your virtual network by setting up IPSec VPN (site-to-site) or ExpressRoute (private peering).

The following table summarizes the network and self-hosted integration runtime configuration recommendations based on different combinations of source and destination locations for hybrid data movement.

SourceDestinationNetwork configurationIntegration runtime setup
On-premisesVirtual machines and cloud services deployed in virtual networksIPSec VPN (point-to-site or site-to-site)The self-hosted integration runtime should be installed on an Azure virtual machine in the virtual network.
On-premisesVirtual machines and cloud services deployed in virtual networksExpressRoute (private peering)The self-hosted integration runtime should be installed on an Azure virtual machine in the virtual network.
On-premisesAzure-based services that have a public endpointExpressRoute (Microsoft peering)The self-hosted integration runtime can be installed on-premises or on an Azure virtual machine.

The following images show the use of self-hosted integration runtime for moving data between an on-premises database and Azure services by using ExpressRoute and IPSec VPN (with Azure Virtual Network):

Express Route

Security considerations - Azure Data Factory (6)

IPSec VPN

Security considerations - Azure Data Factory (7)

Firewall configurations and allow list setting up for IP addresses

Note

You might have to manage ports or set up allow list for domains at the corporate firewall level as required by the respective data sources. This table only uses Azure SQL Database, Azure Synapse Analytics, and Azure Data Lake Store as examples.

Note

For details about data access strategies through Azure Data Factory, see this article.

(Video) Azure Security best practices | Azure Tips and Tricks

Firewall requirements for on-premises/private network

In an enterprise, a corporate firewall runs on the central router of the organization. Windows Firewall runs as a daemon on the local machine in which the self-hosted integration runtime is installed.

The following table provides outbound port and domain requirements for corporate firewalls:

Domain namesOutbound portsDescription
*.servicebus.windows.net443Required by the self-hosted integration runtime for interactive authoring.
{datafactory}.{region}.datafactory.azure.net
or *.frontend.clouddatahub.net
443Required by the self-hosted integration runtime to connect to the Data Factory service.
For new created Data Factory, please find the FQDN from your Self-hosted Integration Runtime key which is in format {datafactory}.{region}.datafactory.azure.net. For old Data factory, if you don't see the FQDN in your Self-hosted Integration key, please use *.frontend.clouddatahub.net instead.
download.microsoft.com443Required by the self-hosted integration runtime for downloading the updates. If you have disabled auto-update, you can skip configuring this domain.
*.core.windows.net443Used by the self-hosted integration runtime to connect to the Azure storage account when you use the staged copy feature.
*.database.windows.net1433Required only when you copy from or to Azure SQL Database or Azure Synapse Analytics and optional otherwise. Use the staged-copy feature to copy data to SQL Database or Azure Synapse Analytics without opening port 1433.
*.azuredatalakestore.net
login.microsoftonline.com/<tenant>/oauth2/token
443Required only when you copy from or to Azure Data Lake Store and optional otherwise.

Note

You might have to manage ports or set up allow list for domains at the corporate firewall level as required by the respective data sources. This table only uses Azure SQL Database, Azure Synapse Analytics, and Azure Data Lake Store as examples.

The following table provides inbound port requirements for Windows Firewall:

Inbound portsDescription
8060 (TCP)Required by the PowerShell encryption cmdlet as described in Encrypt credentials for on-premises data stores in Azure Data Factory, and by the credential manager application to securely set credentials for on-premises data stores on the self-hosted integration runtime.

Security considerations - Azure Data Factory (8)

IP configurations and allow list setting up in data stores

Some data stores in the cloud also require that you allow the IP address of the machine accessing the store. Ensure that the IP address of the self-hosted integration runtime machine is allowed or configured in the firewall appropriately.

The following cloud data stores require that you allow the IP address of the self-hosted integration runtime machine. Some of these data stores, by default, might not require allow list.

Frequently asked questions

Can the self-hosted integration runtime be shared across different data factories?

Yes. More details here.

What are the port requirements for the self-hosted integration runtime to work?

The self-hosted integration runtime makes HTTP-based connections to access the internet. The outbound ports 443 must be opened for the self-hosted integration runtime to make this connection. Open inbound port 8060 only at the machine level (not the corporate firewall level) for credential manager application. If Azure SQL Database or Azure Synapse Analytics is used as the source or the destination, you need to open port 1433 as well. For more information, see the Firewall configurations and allow list setting up for IP addresses section.

Next steps

For information about Azure Data Factory Copy Activity performance, see Copy Activity performance and tuning guide.

(Video) Azure Data Security NEW FEATURES to keep your data safe today! - Azure Storage - #9

FAQs

What are security concerns in ADF? ›

IP configurations and allow list setting up in data stores. Some data stores in the cloud also require that you allow the IP address of the machine accessing the store. Ensure that the IP address of the self-hosted integration runtime machine is allowed or configured in the firewall appropriately.

How will you secure your Azure Data Factory? ›

Unless you configure to use your own Key Vault in ADF, ADF uses system generated keys to encrypt any information related to ADF such as connection information, meta data and any cached data. While this is secure enough, you can always use your own keys stored in an Azure Key vault to encrypt the Azure Data Factory.

What are the four 4 key issues in data security? ›

Main Elements of Data Security

There are three core elements to data security that all organizations should adhere to: Confidentiality, Integrity, and Availability.

What are the data security considerations? ›

Data security considerations are few practices followed to achieve a fair level of security in an organization. They include data backup, data archival, data destruction, location security, and maintaining redundant utilities.

Which security solution is recommended in Azure? ›

You can additionally use Microsoft Defender for Endpoint to protect your Azure servers. It provides advanced breach-detection sensors that quickly adapt to evolving threats through the power of big data and analytics and provides superior threat intelligence to protect your workloads.

How many pipelines can an Azure data/factory have? ›

Overview. A Data Factory or Synapse Workspace can have one or more pipelines.

What are the 3 types of data that can be stored in Azure? ›

Azure storage types include objects, managed files and managed disks. Customers should understand their often-specific uses before implementation. Each storage type has different pricing tiers -- usually based on performance and availability -- to make each one accessible to companies of every size and type.

What are the three types of activities that Azure Data factory supports? ›

Data Factory supports three types of activities: data movement activities, data transformation activities, and control activities.

Why is my ADF shedding so much? ›

African dwarf frogs shed their skin for various reasons. The main reason is because of their natural growing process. If you see them eating their shed afterwards, this just means that they are consuming the leftover nutrients they lost during the shed. It is perfectly normal, so the shed can be left in the tank.

How do you handle errors in ADF? ›

There are two primary methods to graceful handle errors when writing data to your database sink in ADF data flows: Set the sink error row handling to "Continue on Error" when processing database data. This is an automated catch-all method that does not require custom logic in your data flow.

What are the 5 D's of security? ›

The 5 Ds of perimeter security (Deter, Detect, Deny, Delay, Defend) work on the 'onion skin' principle, whereby multiple layers of security work together to prevent access to your site's assets, giving you the time and intelligence you need to respond effectively.

What are the 3 D's of security? ›

That is where the three D's of security come in: deter, detect, and delay. The three D's are a way for an organization to reduce the probability of an incident.

What are the 3 P's of security? ›

Like a football or soccer team, security also has two lineups that must be continuously managed. One lineup involves protecting the digital assets and data of a business.

What are the 4 levels of data security? ›

Typically, there are four classifications for data: public, internal-only, confidential, and restricted.

Why is security a concern in database? ›

Because databases are nearly always network-accessible, any security threat to any component within or portion of the network infrastructure is also a threat to the database, and any attack impacting a user's device or workstation can threaten the database.

What are the 5 basic security principles? ›

The U.S. Department of Defense has promulgated the Five Pillars of Information Assurance model that includes the protection of confidentiality, integrity, availability, authenticity, and non-repudiation of user data.

What are the three 3 features of security? ›

The CIA triad refers to an information security model made up of the three main components: confidentiality, integrity and availability. Each component represents a fundamental objective of information security.

Videos

1. How to secure Azure Data Lake Gen1 Security
(Paddy Maddy)
2. #116. Azure Data Factory - Troubleshooting guide
(All About BI !)
3. What is the Azure Data Factory? | How to Use the Azure Data Factory
(ITProTV)
4. How Do I Ensure my Azure Modern Data Warehouse is Secure? - Sriharsh Adari
(PASStv)
5. Managed Virtual Networks and Private Endpoints in Azure Synapse and Azure Data Factory
(MitchellPearson)
6. S7.Azure Data Factory- Provision Azure SQL Server and Create SQL Database WhiteList IP - Tutorial
(ThomTech ADFTutorials)
Top Articles
Latest Posts
Article information

Author: Fr. Dewey Fisher

Last Updated: 10/22/2022

Views: 6624

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.