This is not an officially supported Google product.
The Datashare Toolkit
is a solution for data publishers to easily manage datasets residing within BigQuery. The toolkit includes functionality to ingest and entitle data, relieving consumers from much of the toil involved in onboarding datasets from a variety of providers. Publishers upload data files to a storage bucket and allocate permissioned datasets for their consumers to use with BigQuery authorized views.
While these tools are used for data management and entitlement, they follow a bring-your-own-license (BYOL) for entitling publisher data. Hence, publishers should already have licensing arrangements for those consumers withing to access their data within GCP, and the consumers can furnish the GCP account ID's corresponding to their entitled user principals. These account IDs are required for the creation of the authorized views.
The toolkit is open-source. Some supporting infrastructure, such as storage buckets, serverless functions, and BigQuery datasets, must be maintained within GCP by publishers as a prerequisite. As a consumer, when the GCP accounts are added to the publisher entitlements, the published can be queried directly within BigQuery, ready to integrate into your analytics workflow, machine learning model, or runtime application. Publishers are responsible for managing the limited support infrastructure necessary. While consumers are billed for BigQuery compute and networking, publishers incur costs only on the storage of their data in BigQuery and Cloud Storage.
- Publisher UI for creating data sharing policies, managing user accounts, creating views
- Ingestion performed by a Google Cloud Function
- GCP Marketplace integration for selling your data
- Multicast client
If you plan to use GCP Marketplace integration, the production project that you install and manage Datashare from must follow the required naming convention (punctuation and spaces not allowed): [yourcompanyname]-public
.
- Setup the Datashare API Manager Service Account
- Setup your domain
- Setup OAuth credential
- Deploy Datashare
- Initialize Schema
Then get started, see the User Guide for usage information.
- Perform Datashare version upgrade - Update the API and UI software versions.
- Update Data Producers - Modify the administrators for the Datashare UI.
- Updating API Gateway Configuration - Modify the API Gateway configuration to apply the latest defined security policies.
- A GCP account with billing enabled
- A Google Cloud Storage bucket to store staged data
- A valid Google Account or Google Group email address (which includes Gsuite and Gmail email addresses).
Note: Consumers can create a Google account with an existing email address here - Entitlements granted by the publisher to your specific licensed datasets
This is not an officially supported Google product.
Datashare is under active development. Interfaces and functionality may change at any time.
This repository is licensed under the Apache 2 license (see LICENSE).
Contributions are welcome. See CONTRIBUTING for more information.