Google’s BigQuery Introduces Column-Level Encryption Features and Dynamic Information Masking

Google recently issued new features for its SaaS data warehouse BigQuery including column-level encryption functions and dynamic masking of information. These features add a second layer of defense on top of access control to secure and manage sensitive data.

Dynamic information masking can be used in particular for real-time transactions, while column-level encryption provides additional security for data at rest or in motion where real-time usability is not required.

These new features could be useful for companies that store personally identifiable information (PII) and other sensitive data, such as credit card information and biometric information. Companies that store and analyze data in countries where data regulation and privacy mandates are evolving are constantly at risk from data breaches and data leaks and need to control data access, and these companies can also take advantage of the new features.

Column-level encryption enables the encryption and decryption of information at the column level, meaning the administrator can select which column is encrypted and which is not. It supports the AES-GCM (non-deterministic) and AES-SIV (deterministic) encryption algorithms. Features support AES-SIV to enable grouping, aggregation, and merging on encrypted data. This new feature enables a number of new use cases: when data is encrypted natively in BigQuery and needs to be decrypted when accessed, or when data is encrypted externally, stored in BigQuery and then decrypted when accessed.

Column-level encryption is integrated with Cloud Key Management System (Cloud KMS) to give the administrator more control, to enable management of the encryption keys in KMS, and to enable secure key retrieval on access and detailed logging. Cloud KMS can be used to generate the KEK (Key Encryption Key) that encrypts the DEK (Data Encryption Key) that encrypts the data in BigQuery columns. Cloud KMS uses IAM (Identity and Access Management) to define roles and permissions. KEK is a symmetric encryption key set that is stored in Cloud KMS. Referring to an encrypted key set in BigQuery reduces the risk of key exposure.

The BigQuery documentation explains:

At the time the query is run, specify the Cloud KMS resource path of the KEK and the ciphertext of the wrapped DEK. BigQuery calls Cloud KMS to extract the DEK and then uses that key to decrypt the data in your query. The unpackaged version of the DEK is only stored in memory for the duration of the query and then destroyed.

In an example use case, the zip code is the data to be encrypted, and a non-deterministic function decrypts data when accessed using the function in the query executed on the table.

From BigQuery documentation

In a second example, the AEAD deterministic function can decrypt data when accessed using the function in the query executed on the table and supports aggregation and concatenation using the encrypted data.

From BigQuery documentation

In this way, even a user who does not have access to the encrypted data can perform a join.

Before releasing the column-level encryption feature, administrators need to make copies of the obfuscated datasets to manage proper access to groups. This creates an inconsistent approach to protecting data, which can be expensive to manage. Column-level encryption increases the level of security because each column can have its own encryption key instead of a single key for the entire database. Using column-level encryption provides faster data access because there is less encryption data.

Dynamic information masking, released in preview, gives administrators more control who can choose, in conjunction with column-level access controls, to grant full access, no access to data, or masked data, extending column-level security. This capability selectively masks column-level data during queries based on the defined masking rules, user roles, and privileges. This feature allows administrators to obfuscate sensitive data and control user access, while reducing the risk of data leakage.

This new feature makes data sharing easier as administrators can selectively hide information and the tables can be shared with large groups of users. At the application level, developers do not need to modify the query to hide sensitive data. After the data masking is configured at the BigQuery level, the existing query automatically hides the data based on the roles assigned to the user. Last but not least, applying security is easier because the administrator can write the security rule once and then apply it to any number of columns of tags.

Masking policies or encryption applied to the base tables are carried over to authorized views and realized views, and masking or encryption is compatible with other security features, such as row-level security.

Both new features can be used to increase security, manage access control, comply with privacy laws and create secure test environments. Provide a more consistent way to manage tables of sensitive data, the administrators do not need to create multiple datasets with encrypted (or not) data and share these copies with the appropriate users.

Leave a Comment

Your email address will not be published. Required fields are marked *