Skip to content

Snowmask is a dbt package to simplify managing personally identifiable data in Snowflake using native capabilities such as dynamic data masking.

License

Notifications You must be signed in to change notification settings

tropos-io/dbt-snowmask

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dbt-snowmask

GitHub issues GitHub GitHub Repo stars GitHub release (latest by date)

Make dbt manage dynamic data masking for Snowflake.

Why do I need something like this?

Odds are rather high that you deal with personally identifiable data (PII) in your cloud data platform. Laws and regulations have analytics engineers enforce policies to make sure data analysts only see identifyable data whenever they're authorized to. Managing the rulesets involved in larger data platforms can be a daunting task.

That's why we created Snowmask. A dbt library to govern dynamic data masking capabilities in Snowflake from the comfort of your own chair.

Introduction

This repo contains the following macros:

  • masking_policy.sql to create, alter and apply masking policies in Snowflake using DBT. Here, you can find some info on Dynamic Data Masking in Snowflake.

  • row_column_masked_view.sql to create an interactive view with row-level security.

  • create_security_integration.sql to create/check a security integration with tableau server and ask for the OAUTH endpoint.


IMPORTANT

Since dynamic data masking and row-level security are security measures, not every role or every user can create and alter the masking policies or RLS-views. This role has to have the privileges to define, manage, and apply masking policies to columns. It is best to have a specific role for this that is not used for anything else.


Masking policy (source)

This macro creates, alters and applies masking policies.

Installation instructions

New to dbt packages? Read more about them here.

  1. Include this package in your packages.yml — check here for the latest version number.
# packages.yml
packages:
  - git: "https://github.com/tropos-io/dbt-snowmask.git"
    revision: <latest version>
  1. Run dbt deps
  2. Include the macro in your dbt_project.yml. This can be added with a hook, on-run-end...
# dbt_project.yml
on-run-end:
  - "{{ dbt_snowmask.masking_policy([('<table_name','<column_name>'),('<table_name','<column_name>')], 
        ('<roles_1>'),
        ('<roles_2>')) }}"

In this package you have to add the column you want masked together with the table this column belongs to. You can add just one column or multiple columns. The roles are created in 2 levels. Every role in 'roles_1' has full access to the masked data, the roles in 'roles_2' have access to hashed values for string data. The roles that are not mentioned in this policy, will see '******' or NULL values.

  1. Execute dbt run – the policies will be set automatically!

Usage:

{{ dbt_snowmask.masking_policy([('table_name','column_name)], ('roles_1'), ('roles_2')) }}

'table_name': table name of the column you want masked
'colum_name': column name of the column you want masked
'roles_1': name(s) of the role(s) you want to have full access
'roles_2': name(s) of the role(s) you want to have limited access

Information:

  • It is good practice to create a specific role to create these masking policies. You can find more information on this here.
  • This macro uses the active database and schema defined in your profiles.yml.
  • You can add just one or multiple columns from different tables, as long as they are in the same schema and database.
  • You can add just one or multiple roles, and up to 2 masking levels.
  • There are 3 kinds of policies: 1 for string values, 1 for date values, and 1 for number values. These will automatically be set for the matching columns.
  • The roles have to exist before running this macro.

Example:

# dbt_project.yml
on-run-end:
  - "{{ dbt_snowmask.masking_policy([('IDENTIFIERS','NAME'), ('IDENTIFIERS','FIRST_NAME'), ('PRIVATE','ADDRESS')], 
        ('DIRECTOR','MANAGER'),
        ('JUNIOR')) }}"

In this example, the masking policy will be set on the name, first name and address columns. The director and manager roles will have full access to the masked columns (i.e., will see the values), the junior role will see hashed values for these columns, and the roles that are not mentioned will see '********'.


Row-level security (source)

This macro creates a row-level secured view from a table. It uses a csv file with the 'key | user' combination to decide which user can access which row.

Installation instructions

New to dbt packages? Read more about them here.

  1. Include this package in your packages.yml — check here for the latest version number.
# packages.yml
packages:
  - git: "https://github.com/tropos-io/dbt-snowmask.git"
    revision: <latest version>
  1. Run dbt deps
  2. Include the macro in your dbt_project.yml. This can be added with a hook, on-run-end...
# dbt_project.yml
on-run-end:
  - "{{ dbt_snowmask.row_column_masked_view('table_name', 'source_schema', 'source_table', 'key_table', 'key_csv', 'user_csv', 'control_user') }}"
  1. Include a sources.yml file in your ./models folder where you define the source of your csv file(s). You can find more info on this here.
# sources.yml
version: 2

sources:
  - name: <source_schema>
    tables:
      - name: <source_table>
  1. Execute dbt seed – the information in the csv file will be turned into a table in your active schema. The table automatically gets the name of your csv file. If your macro has been added with an on-run-end command, it will automatically be run after your dbt seed.
  2. Optionally, execute dbt run if you want to run a model.

Usage:

{{ dbt_snowmask.row_column_masked_view('table_name', 'source_schema', 'source_table', 'key_table', 'key_csv', 'user_csv', 'control_user') }}

'table_name': table name of the table you want to create your view on
'source_schema': schema name that has been defined in your sources.yml
'source_table': table name that has been defined in your sources.yml
'key_table': key of the table you want to filter on
'key_csv': column name of the key in the csv file
'user_csv': column name of the users in the csv file
'control_user': a user that is allowed access to all rows

Information:

  • This macro uses the active database and schema defined in your profiles.yml.
  • Your csv file should be in your ./data folder.

Example:

# dbt_project.yml
on-run-end:
  - "{{ dbt_snowmask.row_column_masked_view('BILLS', 'SALES', 'RLS_RULES', 'ID', 'ID_NUMBER', 'USER', 'USER_000') }}"

In this example, the row-level security will be set on the combination of the ID and USER. For example, if the csv file would look like the table below, USER_AAA would only see the rows of the table 'BILLS' that have the ID 111. In the same way USER_BBB would only see the rows with the ID 222. USER_000 (e.g., the supervisor), would see all rows with ID's defined in the csv file.

ID_NUMBER USER
111 USER_AAA
222 USER_BBB

Combining masking policy and row-level security

Ideally you would run both macro's together.

# dbt_project.yml
on-run-end:
  - "{{ dbt_snowmask.masking_policy([('<table_name','<column_name>'),('<table_name','<column_name>')], 
        ('<roles_1>'),
        ('<roles_2>')) }}"
  - "{{ dbt_snowmask.row_column_masked_view('table_name', 'source_schema', 'source_table', 'key_table', 'key_csv', 'user_csv', 'control_user') }}"

You first mask your columns based on the roles you have defined. Then you would use row-level security to create views on these masked tables for extra security and privacy. This way you have masking based on roles and users.


Creating security integrations (source)

This macro creates a security integration for tableau server when no current security integration for tableau server exists. It also returns the OAUTH endpoint of this newly created security integration. When a security integration already exists, it returns the OAUTH endpoint of the existing security integration.

IMPORTANT

This macro can only be run by a user with the privileges to use the role ACCOUNTADMIN. Otherwise, a message saying 'YOU DO NOT HAVE THE PRIVILEGES TO CREATE OR DESCRIBE SECURITY INTEGRATIONS' will be returned.

Installation instructions

New to dbt packages? Read more about them here.

  1. Include this package in your packages.yml — check here for the latest version number.
# packages.yml
packages:
  - git: "https://github.com/tropos-io/dbt-snowmask.git"
    revision: <latest version>
  1. Run dbt deps
  2. Include the macro in your dbt_project.yml. This can be added with a hook, on-run-end...
# dbt_project.yml
on-run-end:
 - "{{ dbt_snowmask.create_security_integration('new_integration_name') }}"
  1. Execute dbt run – the security integration will be created/checked and the OAUTH endpoint will be returned (if you have the right privileges)!

Usage:

{{ dbt_snowmask.create_security_integration('new_integration_name') }}

'new_integration_name': name of the new security integration for tableau server you want to create

Information:

  • This macro will only work when you have the privileges to use the role ACCOUNTADMIN!
  • This macro is only designed for tableau server security integrations.

Example:

# dbt_project.yml
on-run-end:
  - "{{ dbt_snowmask.create_security_integration('new_tableau_server_integration') }}"

In this example the macro will first check whether the current user can use the role ACCOUNTADMIN. If they can't, the message 'YOU DO NOT HAVE THE PRIVILEGES TO CREATE OR DESCRIBE SECURITY INTEGRATIONS' will be returned. If they can use the role ACCOUNTADMIN, the macro will check whether a security integration for tableau server already exists. If it does, the OAUTH endpoint for the existing security integration will be returned. If it doesn't exist, a new security integration will be created with the name 'new_tableau_server_integration' and the OAUTH endpoint of this new security integration will be returned.


Database support

This package has been tested on Snowflake.

License

Licensed under the Apache License 2.0 (see LICENSE.md file for more details).

About

Snowmask is a dbt package to simplify managing personally identifiable data in Snowflake using native capabilities such as dynamic data masking.

Topics

Resources

License

Stars

Watchers

Forks