How the Data Warehouse helps you meet GDPR Compliance

Katy Yuan
4 September 2024

Cloud data warehouses are a critical part of the modern data stack. They act as the single source of truth of your entire ETL pipeline, the central hub of all ingestion, transformation, and activation. Cloud warehouses in particular have many benefits which include accessibility, speed, and ease of use, but also security and privacy features. 

Replacing on-prem data warehouses with cloud data warehouses allows you to take advantage of features that closely follow the rules of regulations like GDPR. When you can adhere to the standards set by GDPR, you also open up a whole new world of opportunities for using data, like data activation platforms. 

 

What is GDPR?

GDPR stands for General Data Protection Regulation. This is a global privacy law created by the EU to help protect people’s rights to their data.

The main principles of GDPR include:

  • Data minimization
  • Accuracy
  • Storage limitation
  • Security 

In this article, we will discuss how cloud data warehouses support these GDPR standards.

 

Top features of the data warehouse to help you meet compliance

User/Role-Based Access Control

User- and role-based access control helps limit access to a tool to only those who absolutely need the data, ensuring data minimization. They allow you to maintain tight data governance over specific tables, views, schemas, and databases within your data warehouse. This is particularly important when you have multiple lines of businesses or teams all using one data warehouse. 

We recommend creating a user for every team member and tool that accesses your data warehouse. This way, you can always see who is accessing the different objects in your warehouse, enforcing strict data governance monitoring. You can easily create roles for specific groups of people and assign them to different users depending on the data they need access to.

For example, you can create a role called ANALYZER within Snowflake with specific permissions to only read from production data, except tables storing credit card numbers. 

```

CREATE ROLE ANALYZER;

GRANT USAGE ON database PROD to ROLE analyzer; 

GRANT SELECT ON ALL TABLES IN SCHEMA prod.core TO ROLE analyzer;

```

You can then assign this role to every user who is considered a data analyst on your team.

GRANT ROLE analyzer to USER madison_the_analyst;

When you create a user for each tool in your data stack, you can use this user to connect to external modern data stack tools, tightly controlling which resources that tool can access. This is powerful in ensuring confidential data is never accessed accidentally by external vendors. 

 

Searchable Data

Legacy data systems make it difficult to find all of the data requested for specific users when it comes time for an audit, or a user simply requests all of the data you have available on them. This can often be a time-consuming process, especially when your data isn’t consolidated in one central location. 

Luckily, data warehouses bring all of your data together in one location. They also support SQL and DML, two languages that make it easy to query data and then easily delete, update, or truncate it. Because GDPR requires you to have data easily accessible, these types of processes must be streamlined and simple for data teams. 

 

Encryption

Data warehouses maintain type-notch security, always ensuring your data is encrypted at rest and in motion. This is not only important to protecting data as it gets ingested and transformed within your data warehouse, but also when sending data via external tools. Data activation and BI tools work safest with data warehouses for this very reason. With tight security being a principle of GDPR, this is a simple thing you can look for to ensure your data is always in good hands. 

 

Specific Cloud Regions

For companies that require their data to be stored in a certain location, data warehouses offer the option to choose your cloud region. Cloud data warehouses have numerous locations all over the world where you can choose to host your warehouse. This means that the underlying infrastructure (which you don’t manage) is located in that specific region of the world. 

For example, you can see all of the different regions offered by Snowflake here. Snowflake even allows you to adhere to tight government regulations on data location. 

 

Secure Data Sharing 

Data warehouses have features that make it easy to securely send data between different organizations, or even different regions within the same tool.

Databricks offers Delta Sharing which allows you to securely send data no matter the platform.

Snowflake offers Secure Data Sharing where no actual data is copied or transferred but rather leverages metadata. This supports GDPR’s storage limitation principles. 

Both of these data sharing features contain built-in data governance to audit the exact data that is moving between applications. This is extremely important as it's easy to send large amounts of data without truly monitoring why you are sending it. Features like this ensure everything is being done with intention.

 

Data Masking 

Data masking refers to the hiding of PII (personal identification) data. It makes it so that anyone using the data warehouse, and querying the data, cannot see confidential data like that related to HIPAA, social security, or credit card numbers. This helps to maintain the integrity of your company and build trust with your customers. 

Many data warehouses have features where you can mask this data to specific users and roles, ensuring nobody sees this confidential information. This means you could hide the values of certain data fields from external tools, those with specific roles within the company, or everyone except the admin of the account. Redshift, Snowflake, and BigQuery all offer their own form of this, allowing increased privacy.

In fact, creating a data masking policy can be quite simple! For example, in Redshift, you declare a policy name and either write code or call a function with the policy.

--create a masking policy that hides a credit card number

CREATE MASKING POLICY mask_credit_card

WITH (credit_card VARCHAR(256))

USING ('000000XXXX0000'::TEXT);


Here, it’s as simple as replacing the field value with text that hides the credit card number within the data warehouse. Warehouses like Redshift also offer the ability to write more complex policies that mask the data on certain conditions, or only partially mask the data depending on what is needed by the end user. 

 

Data Warehouses and Data Activation 

Data Activation platforms like Census help keep your data secure when sharing with other tools, such as Salesforce, HubSpot, or Braze. They allow you to feel confident in using data for personalized campaigns, sales outreach, and product-led growth, while following GDPR regulations. 

Census adds additional layers of security and compliance on top of the data warehouse with the Observability Toolkit, access controls, and configuration as code. You can even ingest data from or export data to external partners and customers by embedding Census within your application.

With your data being encrypted in transit, you don’t have to worry about potential data breaches when sending data from your warehouse to external tools. And if things go wrong, Census helps you quickly and easily troubleshoot sync issues and fix data discrepancies.

 

Conclusion 

Data Activation platforms help you unlock the benefits of your data warehouse and data stack. The best part is that you can do so while closely following GDPR compliance, becoming more confident in the quality and standards of your data more so now than ever.

With Census, you automatically unlock security and governance features, allowing you to minimize the data people have access to while maintaining control and governance. Data warehouses together with data activation tools will ensure that you are always sharing data internally and externally in a GDPR-compliant manner.