Permissions aren’t new. They’re used in nearly every aspect of the software industry, including cloud infrastructures, application development, and data management, providing fine-grained access to actions and resources. Basically, permissions ensure that unauthorized users can’t access your sensitive data. 🙅
Luckily, data management systems and data warehouse tools like Amazon Redshift and Snowflake have authorization policies that allow database admins to grant custom privileges to users. With this feature, you can limit what users can access depending on their job responsibilities, helping you to secure your data environment.
If you’re a dbt-lover like us, you’ll be happy to hear that, like Redshift and Snowflake, dbt also offers various solutions to help you protect access to your most important data assets. If you’re new to dbt, here’s the TLDR: It allows data analysts to transform raw data within their data warehouse into refined datasets that reflect the core business logic.
dbt is powerful because it allows users to write modular code, broken up into multiple reusable models, and provides an easy way to document, test, and snapshot your data models. The best part? All that power translates to higher-quality datasets to be used by the business. ⬆️
In this article, we’re tackling the importance of permissions within dbt and how to configure them for the open-source product (using the profiles.yml file) and dbt Cloud.
The importance of dbt permissions
Because dbt is so intertwined in your modern data stack, it’s crucial to assign the correct permissions to the tools that it touches as well as the users modeling your data.
Data warehouse
dbt connects directly to your data warehouse, meaning it has access to all of your raw and transformed data. Configuring permissions for what it can and cannot do is essential for maintaining the integrity of your one source of truth: Your data warehouse. ⭐
So, you need to consider which databases, schemas, and tables dbt is allowed to read from and write to. You need to give the tool enough permissions to properly transform your data while minimizing potential problems.
For example, dbt should never be allowed to write to your raw data tables. These should only ever be read from and remain otherwise untouched. ✋ However, for data tables that are produced by dbt itself, dbt needs permission to both read and write to them in order to create, modify, and reference the tables.
Users
Like tools, users should only have the minimum access required to do their jobs. Depending on a user's function within the data team, they may be assigned different permissions within the dbt environment.
For example, a data analyst may only be allowed to read from various data tables while an analytics engineer can both read from and write to tables produced by dbt models. Similarly, you may have one person on your team who is well-versed in Github, so they should have permission to control Github access on dbt Cloud.
As a rule of thumb: It is always better to under-assign permissions and add them as needed to ensure a user only has access to what they should. 👍 Otherwise, a user may accidentally delete a row of data or modify a data model without even realizing it. In general, limiting user access prevents mistakes that otherwise would not occur.
Environments
A dbt project contains different profiles depending on your data models’ creation environment. At the very minimum, you should have one for development and production, and each of these profiles should only have access to the databases corresponding to their environment.
You don’t want to write development code to production accidentally or, on the flip side, send production code to your development database. Separating permissions based on the environment will ensure you always have the correct data models available. ✅
How to configure dbt permissions
Permissions in dbt can vary depending on whether you use the open source tool or cloud version. In this case, we’ll review setting up your profiles.yml within dbt as well as configuring specific access control in dbt Cloud.
Open source tool
Within dbt, your profiles.yml file controls the permissions specific to your data warehouse. This is where you create a profile for use in both development and production environments and provide them with the correct data warehouse credentials. You should create a target for every environment you wish to write your data models to (which is usually dev
and prod
).
Under each target, you’ll input the database name you wish to write to, a default schema, and your credentials. I highly recommend creating a user within your data warehouse specifically for dbt so you can control dbt permissions from your warehouse, too. Remember, you always want to give a user the minimum access to complete their job – and doing this will allow for that.
For instance, I call mine dbt_user
and assign it the role of transformer
. This transformer role will need read access to the dbt data model referenced tables in the data warehouse. It will also need write access to the tables you update and create with dbt.
I assign the permissions I need for dbt to the transformer role to the user. Your profiles.yml development target should look something like this 👇
# example profiles.yml file
madison_example:
target: dev
outputs:
dev:
type: postgres
host: localhost
user: dbt_user
password: <password>
port: 5432
dbname: data_mart_dev
schema: core
Notice how I’m using the dbt_user
I created for this target. This dbt_user
has read and write permissions to different tables within the data_mart_dev
database because of its assigned role as a transformer. Again (just for good measure), your user needs to have read permissions to the tables it is referencing in your models and write permissions to those you are updating or creating.
dbt is also configured to use target names as prefixes for the models built using dbt. For example, if your target schema is dev
like the above, the schema created by dbt will be dev
If your target were prod
, then the schema created would be called prod
.
This allows you to group your models based on the development environment that you’re using. You can even add custom schemas to create even more granular schema names. To do so, add this configuration to the top of your dbt model:
In this case, your resulting schema would be dev_marketing
. These custom schemas allow you to properly configure your data warehouse so you know exactly what tables are being created for various parts of the business (and to make sure that they’re in the proper environment).
dbt Cloud
dbt Cloud provides a fine-grained permission control, allowing admins the possibility of setting permissions to nearly every action that can be performed on dbt Cloud.
It uses two varieties of access to control user-level access in an account:
Every action performed on dbt Cloud is sensitive, so not every team member should have elevated privileges. Restricting actions to users based on their licenses and roles increases the security of your dbt Cloud application and your data warehouse. With different levels of access control, you know the actions a user can and cannot perform. 🚦
License-based Access Control
License-based access control configures user privileges via account-wide license types. Account-wide license types control specific parts of the dbt Cloud application. When a user is invited or added to a project, the user is granted a specific license type (and can only have one license at a time).
There are two categories of license types:
- Read-Only limits user privileges to only read permissions that are applied to all dbt Cloud resources, regardless of the role-based permissions that have been assigned to the user.
- Developer is not limited to read permissions and can be granted any permissions to perform various actions on the dbt Cloud application.
For a project with a minimal number of team users, you can limit permissions to only license-based access control. However, if your project requires fine-grained access control for each user, you should consider role-based access control.
Role-based Access Control
While role-based access is more secure, keep in mind that it is a feature only available for dbt Cloud accounts that are on the enterprise plan. Role-based access control allows you to set fine-grained permissions in dbt Cloud, whereby you can assign users several permissions to different projects within a dbt Cloud account.
These role-based permissions are applied to groups (a collection of users), and a user is able to be in multiple groups. For example, one user can be in the analysis and security group, while another user can be in the billing, security, and data group. Users in a group inherit permissions applied to the group.
In order to apply permissions to a group, you will require permission sets. According to the documentation, permission sets are “predefined collections of granular permissions.”
With a combination of low-level permissions, you can create, view, and delete accounts which can then be used to form different roles that can be assigned to a group. dbt Cloud supports a number of predefined permissions, including account admin, account viewer, Git admin, database admin, and analyst – each with its own predefined permissions configured.
For instance, the Git admin permission set controls who can see project details, modify repositories, and view different environments. Only someone with a high-level understanding of Git should have permissions like this.
Giving someone who doesn’t have Git knowledge this permission could result in poorly created repositories or accidental deletions. So, when using these predefined permissions, be sure you understand exactly what each entails before assigning it to a user. 💯
Control who accesses what data
Configuring permissions within dbt allows you to control who accesses different data assets. It prevents someone from deleting an important data model, altering data in a table, or accidentally pushing a development model to production.
In the open-source dbt tool, you use targets within the profiles.yml file to control access to the data in your data warehouse. In dbt Cloud, there are two major ways of configuring dbt permissions: License-based access control and role-based access control.
With the permissions that your warehouse, environments, and users each have with dbt, you can create a development environment that your team feels safe using — while still protecting the quality of your data.
dbt is a powerful tool when used correctly. Consider permissions from the very start, and you won't need to stress about permission-caused errors. 😮💨
👉 Then connect your dbt models directly with Census to streamline your sales and marketing tools! Book a demo with one of our product specialists to learn how we work with dbt to operationalize your analytics.