Shares

Teams can browse data.all catalog and request access for data assets. data.all shares data between teams securely within and environment and across environments without any data movement.

Concepts

Share request or Share Object: one for each dataset and requester team.
Share Item refers to the individual Redshift table, Glue table, folder or S3 Bucket that is added to the Share request.

Shareable items

In data.all there are 2 types of datasets: S3 Datasets and Redshift Datasets. Here is an overview of the items that can be shared using data.all by type of dataset. A detailed explanation of the technical details for each type can be found in the AWS data sharing technical details section.

From S3 Datasets we can share:
- S3 Bucket of the Dataset - using IAM permissions and S3/KMS policies
- one or multiple Glue Tables (Tables) - using [Lake Formation](Lake Formation sharing feature) to create access permissions to tables, meaning that no data is copied between AWS accounts.
- one or multiple S3 Prefixes (Folders) - using S3 access points to manage granular S3 policies.
From Redshift Datasets we can share:
- one or multiple Redshift Tables - using Redshift datashares

Sharing workflow

Requesters create a share request and add items to it. Both requesters and approvers can work on this DRAFT of the request and add and delete items to the request Draft. Items that are added go to the PENDINGAPPROVAL status.

Once the draft is ready, requesters submit the request, which moves to the SUBMITTED status. Then, approvers approve or reject the request which will go to APPROVED or REJECTED status and its items to SHARE_APPROVED or SHARE_REJECTED correspondingly.

When the sharing task starts in the backend, both items and the share object move to SHARE_IN_PROGRESS. Once all items have been processed, the Share object is PROCESSED and each of the items is in either SHARE_SUCCEEDED or SHARE_FAILED. New items can be added to the share requests, the request will go back to DRAFT to be re-processed.

Both approvers and requesters can revoke access to shared items. They open the revoke items window and select which items should be revoked from the share request. The items move to REVOKE_APPROVED while the share is in REVOKED status.

While the revoking task is executing, the items and the request remain in REVOKE_IN_PROGRESS until the revoke is complete and items go to REVOKE_FAILED or SUCCEEDED. If there are share items in PENDINGAPPROVAL in the share request, it will go back to DRAFT. Otherwise, it will go to PROCESSED.

Requesters can delete the share request with the delete button. However, the request cannot contain any shared items. Users must revoke all shared items before deletion.

On left pane choose Catalog then Search for the table you want to access. Click on the lock icon of the selected data asset.

catalog_search

The following window will open. Choose your target environment and team.

share_request_form

If instead of to a team, you want to request access for a Consumption role, add it to the request as in the picture below.

NOTE: If the consumption role selected is not data.all managed - you will have the option to allow data.all to attach the share policies to the consumption role for this particular share object (if not enabled here you will have to manually attach the share policies to be given access to data).

share_request_form

Finally, click on Create Draft Request. This will create a share request or object for the corresponding dataset and if you have requested a table or folder it will add those items to the request. After that the modal window will switch to share edit form. share_request_form

Here you can edit the list of items you want to request access to. Note that the request is in DRAFT status and that the items that we add are in PENDINGAPPROVAL. They are not shared until the request is submitted and processed. The share can not be submitted if the list of items is empty.

Request purpose is optional field, recommended length is up to 200 symbols.

When you are happy with the share request form, click Submit Request or click Draft Request if you want to return to this form later.

The share needs to be submitted for the request to be sent to the approvers.

Navigate to the Catalog, on top of other filters, you can use the Redshift dataset and table filters to list only Redshift data items. Once you have found the item you want, click on Request access to open a share request.

catalog_search

Pre-requisites To be able to open a share request to a Redshift Dataset, a data.all Redshift Connection of type ADMIN in the namespace of the Redshift Dataset is required.

Similarly, the namespace that we want to use as target MUST have a data.all ADMIN connection that allows data.all to manage datashares in it. In addition, the group that we use as requester MUST have permissions to use that connection in a share request.

Taking the request in the picture as example. rs_Dataset is stored in cluster-1 and the requester team Scientists wants to access the data from cluster-2.

Source connection: the admin team of cluster-1 has created a connection connection1 of type ADMIN for this cluster.
Target connection: the admin team of cluster-2 has created a connection connection2 of type ADMIN for this cluster. The Administrators2 team has granted permissions to Use Connection in share request to the Scientists team.

Check out the Redshift Datasets documentation for more information about ADMIN connections and how admins can update Connection permissions.

Once the pre-requisites are fulfilled, you will be able to open a share request specifying the target namespace and the Redshift role that will get access to the data.

catalog_search

Anyone can go to the Shares menu on the left side pane and look up the share requests that they have received and that they have sent. Click on Learn More in the request that you are interested in to start working on your request.

add_share

Add/delete items

If the request is not being processed, it can be edited by clicking the Edit button on top of the page. edit_share Edit button opens the modal window with the Share Edit Form, same as upon creating the share. Here you can edit list of shared items and request purpose. To remove an item from the request click on the Delete button with the trash icon next to it. We can only delete items that have not been shared. Items that are shared must be revoked, which is explained below.

Once the draft is ready, the requesters need to click on the submit button. The request should be now in the SUBMITTED state. Approvers can see the request in their received share requests, alongside the current shared items, revoked items, failed items and pending items.

submit_share_2

As an approver, you will also see the option to Edit Filters for Glue Table share items:

share_table_filter

Here an approver can attach one or more filters that were created on the table previously to the table:

share_table_filter_edit

Once assigned, the filter will appear in the share object view and can be clicked on to view the underlying associated data filters assigned

share_table_filter_attached

share_table_filter_view

Before sharing as the table - approvers can also edit the assigned filter and remove underlying data filters or attach new ones as needed. Once the share is approved there is no longer the ability to edit filters and the table item must be revoked and re-shared to assign new filters.

NOTE: If more than 1 filter is assigned to a table share item, the resulting data access is evaluated as the union (logical ‘OR’) of the filters assigned.

NOTE: If assigning filter(s) to a table share item, the Item Filter Name specified will be used in naming the table resource link for the consumer, meaning the consumer will be reading for table named - tablename_filtername

As an approver, click on Learn more in the SUBMITTED request and in the share view you can check the tables and folders added in the request. This is the view that approvers see, it now contains buttons to approve or reject the request.

submit_share_2

If the approvers approve the request, it moves to the APPROVED status. Share items IN PENDINGAPPROVAL will go to SHARE_APPROVED.

accept_share

Data.all backend starts a sharing task, during which, items and the request are in SHARE_IN_PROGRESS state.

accept_share

When the task is completed, the items go to SHARE_SUCCEEDED or SHARE_FAILED and the request is PROCESSED. To understand what happens under-the-hood when each share item is processed, check out the AWS data sharing technical details section.

accept_share

If a dataset is shared, requesters should see the dataset on their screens. Their role with regards to the dataset is SHARED.

accept_share

Verify (and Re-apply) Items

As of V2.3 of data.all - share requestors or approvers are able to verify the health status of the share items within their share request from the data.all UI. Any set of share items that are in a shared state (i.e. SHARE_SUCCEEDED or REVOKE_FAILED state) will be able to be selected to start a verify share process.

share_verify

Upon completion of the verify share process, each share item’s healthStatus will be updated with an updated healthStatus (i.e. Healthy or Unhealthy) as well as a timestamp representing the last verification time. If the share item is in an Unhealthy health status, there will also be included a health message detailing what part of the share is in an unhealthy state.

In addition to running a verify share process on particular items, dataset owners can run the verify share process on multiple share objects associated with a particular dataset. Navigating to the Dataset –> Shares Tab, dataset owners can start a verify process on multiple share objects. For each share object selected, the share items that are in a shared state for the associated share object will verified and updated with a new health status and so on.

share_verify

✅ Scheduled Share Verify Task

The share verifier process is run against all share object items that are in a shared state every 7 days by default as a scheduled task which runs in the background of data.all.

If any share items do end up in an Unhealthy status, the data.all approver will have the option to re-apply the share for the selected items that are in an unhealthy state.

share_reapply

Upon successful re-apply process, the share items health status will revert back to a Healthy status with an updated timestamp.

Revoke Items

Both approvers and requesters can click on the button Revoke items to remove the share grant from chosen items.

It will open a window where multiple items can be selected for revoke. Once the button “revoke selected items” is pressed the consequent revoke task will be triggered.

accept_share

✅ Proactive clean-up

In every revoke task, data.all checks if there are no more shared folders or tables in a share request. In such case, data.all automatically cleans up any unnecessary S3 access point or Lake Formation permission.

For the share Approvers the logs of share processor are available via Data.all UI. To view logs of the latest share processor run, click Logs button in right upper conner of the Share View page. accept_logs

To delete a share request, it needs to be empty from shared items. For example, the following request has some items in SHARE_SUCCEEDED state, therefore we receive an error. Once we have revoked access to all items we can delete the request.

Here is a brief explanation of how each type of sharing mechanism is implemented in data.all. It is important to understand what really happens in AWS when dealing with downstream integrations that will consume shared data.

In this type of share the permissions are granted to the IAM role specified in the request as principal. It can be either a data.all team IAM role or an external role defined as consumption role.

When processing a sharing task for an S3 Bucket, data.all will:

Update the S3 Bucket policy to add permissions to the principal IAM role
Create/Update the IAM policy “Share policy” that grants IAM permissions to the requested S3 bucket and KMS key. Attach this policy to the principal IAM role.
(If the Bucket is encrypted using a KMS key) Update the KMS Key policy to add permissions to the principal IAM role

In this type of share the permissions are granted to the IAM role specified in the request as principal. It can be either a data.all team IAM role or an external role defined as consumption role.

When processing a sharing task for a Glue Table, data.all will:

Create a Glue database in the target account with name of the original database plus the suffix _shared. This database will be re-used if other share requests for the same source databaser are processed for other principals in the same environment.
(If the share is cross-account) Revoke IAMAllowedPrincipal permissions from the table to ensure Lake Formation is used in the management of the table access and update LakeFormation to use Version 3 if not already >=3
Grant Lake Formation permissions on the original database and table to the IAM principals in the target. If the share is cross account this step will create a RAM invitation that data.all will identify and accept.
Create a resource link table from the original database table to the _shared database in the target account
Grant Lake Formation permissions to the resource link table for the IAM principals.

In this type of share the permissions are granted to the IAM role specified in the request as principal. It can be either a data.all team IAM role or an external role defined as consumption role.

When processing a sharing task for a Folder, data.all will:

Update the Dataset Bucket policy to allow access point sharing. This is a one-time operation
Create/Update an S3 Access Point and its policy granting permissions to the requested S3 prefix (folder) in the bucket for the principal IAM role.
Create/Update the IAM policy “Share policy” that grants IAM permissions to the S3 Access Point and KMS key. Attach this policy to the principal IAM role.
(If the Bucket is encrypted using a KMS key) Update the KMS Key policy to add permissions to the principal IAM role

In this type of share the permissions are granted to the Redshift role in the Redshift namespace specified in the request.

When processing a sharing task for a Redshift table, data.all will:

In the source namespace, create a Redshift datashare. Add requested schema and tables to the datashare.
Grant access to the datashare for the consumer namespace (same account) or for the consumer AWS account (cross account)
(If cross-account share) Authorize and associate datashare with the target namespace
In the target namespace, create local database for the datashare and grant permissions to the principal Redshift role.
In the target namespace, create external schema in local database and grant usage permissions to the principal Redshift role.
For the local database and for the external schema, grant select access to the requested table to the principal Redshift role.

Consume shared data

Knowing what we know form the previous section we can now define some ways of consuming the shared data for each type of shareable item.

For S3 bucket sharing, IAM policies, S3 bucket policies, and KMS Key policies (if applicable) are updated to enable sharing of the S3 Bucket resource. Therefore, we can use S3 API calls to access the data referring the Bucket directly. We need to assume or use the credentials of the principal IAM role used in the share request (team IAM role or consumption IAM role).

Here is an example using the AWS CLI:

 aws s3 ls s3://<BUCKET_NAME>

Glue tables are shared using AWS Lake Formation, therefore any service that reads Glue tables and integrates with Lake Formation is able to consume the data.

We need to assume or use the credentials of the principal IAM role used in the share request (team IAM role or consumption IAM role).

For the case of folders, the underlying sharing mechanism used is S3 Access Points. You can read data inside a prefix executing API calls to the S3 access point.

We need to assume or use the credentials of the principal IAM role used in the share request (team IAM role or consumption IAM role).

For example, we could use the AWS CLI with the following access point:

 aws s3 ls arn:aws:s3:<SOURCE_REGION>:<SOURCE_AWSACCOUNTID>:accesspoint/<DATASETURI>-<REQUESTER-TEAM>/<FOLDER_NAME>/

Redshift tables are shared through Redshift datashares and the principal of the share request is a Redshift role. Thus, we can consume data accessing the Redshift Query editor or other applications that consume from Redshift with a user that has access to the Redshift role.

In data.all, you can enable email notification to send emails to requesters and approvers of a share request. Email notifications are triggered during all share workflows - Share Submitted, Approved, Rejected, Revoked.

The content sent in email notification is similar to the UI based notification.

For example the email body will look like,

User <USERNAME> <SHARE_ACTION> share request for dataset <DATASET_NAME>

where <SHARE_ACTION> corresponds to "submitted", "approved", "revoked", "rejected"

Note - In order to enable email notification, you need to configure it in config.json and setup the AWS services needed for during the deployment phase. Please review steps for setting up email notification on data.all webpage in the Deploy to AWS section

Table of Contents

Shares

Create a share request (requester)

S3/Glue share request

Redshift share request

Check your sent/received share requests

Add/delete items

Submit a share request (requester)

(Optional Pre-Approval Work) Adding Filters to Glue Table Share Items (approver)

Approve/Reject a share request (approver)

Verify (and Re-apply) Items

Revoke Items

View Share Logs

Delete share request

AWS data sharing technical details

S3 Bucket sharing

Glue Table sharing

S3 Prefix sharing (Folders)

Redshift Table sharing

Consume shared data

S3 Bucket sharing

Glue Table sharing

S3 Prefix sharing (Folders)

Redshift Table sharing

Email Notification on share requests