# Databricks System Tables Tutorial using Databricks Demo

Source: [Databricks Demos Website](https://www.databricks.com/resources/demos/tutorials/governance/system-tables?itm_data=demo_center)

In [0]:
%pip install dbdemos

In [0]:
import dbdemos
dbdemos.install('uc-04-system-tables')

`Remember that running dbdemos will end in a series of deployments on the platform, from Databricks SQL cluster to DLT Pipelines depending on the demo being installed!`

# System Tables

### Audit Log System Table:

###### Description: Includes records for all audit events across your Databricks account. For a list of available audit events, see Audit log reference.
###### Location: `system.access.audit`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `version`   | string    | Audit log schema version  | 2.0 |
| `event_time` | timestamp | Timestamp | 2023-01-01T01:01:01.123 |
| `event_date` | date | Calendar date the action took place | 2023-01-01 |
| `workspace_id` | long | ID of the workspace | 1234567890123456 |
| `source_ip_address` | string | IP address where the request originated | 10.30.0.242 |
| `user_agent` | string | Origination of request | Apache-HttpClient/4.5.13 (Java/1.8.0_345) |
| `session_id` | string | ID of the session where the request came from | 123456789 |
| `user_identity` | string | Identity of user initiating request | {"email": "user@domain.com", "subjectName": null} |
| `service_name` | string | Service name initiating request | unityCatalog |
| `action_name` | string | Category of the event captured in audit log | getTable |
| `request_id` | string | ID of request | ServiceMain-4529754264 |
| `request_params` | map | Map of key values containing all the request parameters. Depends on request type | [["full_name_arg", "user.chat.messages"], ["workspace_id", "123456789"], ["metastore_id", "123456789"]] |
| `response` | struct | Struct of response return values | {"statusCode": 200, "errorMessage": null, "result": null} |
| `audit_level` | string | Workspace or account level event | ACCOUNT_LEVEL |
| `account_id` | string | ID of the account | 23e22ba4-87b9-4cc2-9770-d10b894bxx |
| `event_id` | string | ID of the event | 34ac703c772f3549dcc8671f654950f0 |

### Lineage System Table:

###### Description: Includes a record for each read or write event on a Unity Catalog table or path and also record for each read or write event on a Unity Catalog column (but does not include events that do not have a source).
###### Location: `system.access.table_lineage` & `system.access.column_lineage`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `account_id` | string | The ID of the Databricks account. | 7af234db-66d7-4db3-bbf0-956098224879 |
| `metastore_id` | string | The ID of the Unity Catalog metastore. | 5a31ba44-bbf4-4174-bf33-e1fa078e6765 | 
| `workspace_id` | string | The ID of the workspace | 123456789012345 |
| `entity_type` | string | The type of entity the lineage transaction was captured from. The value is NOTEBOOK, JOB, PIPELINE, DBSQL_DASHBOARD, DBSQL_QUERY, OR NULL. | NOTEBOOK |
| `entity_id` | string | The ID of the entity the lineage transaction was captured from. If entity_type is NULL, entity_id is NULL. | Notebook: 23098402394234 <br /> Job: 23098402394234  <br /> Databricks SQL query: e9cd8a31-de2f-4206-adfa-4f6605d68d88  <br /> Databricks SQL dashboard: e9cd8a31-de2f-4206-adfa-4f6605d68d88  <br /> Pipeline: e9cd8a31-de2f-4206-adfa-4f6605d68d88 |
| `entity_run_id` | string | ID to describe the unique run of the entity, or NULL. This differs for each entity type:  <br /> Notebook: command_run_id  <br /> Job: job_run_id  <br /> Databricks SQL query: query_run_id  <br /> Databricks SQL dashboard: query_run_id  <br /> Pipeline: pipeline_update_id  <br /> If entity_type is NULL, entity_run_id is NULL.  |  Notebook: 23098402394234  <br /> Job: 23098402394234  <br /> Databricks SQL query: e9cd8a31-de2f-4206-adfa-4f6605d68d88  <br /> Databricks SQL dashboard: e9cd8a31-de2f-4206-adfa-4f6605d68d88  <br /> Pipeline: e9cd8a31-de2f-4206-adfa-4f6605d68d88
| `source_table_full_name` | string | Three-part name to identify the source table. | catalog.schema.table |
| `source_table_catalog` | string | The catalog of the source table. | catalog |
| `source_table_schema` | string | The schema of the source table. | catalog.schema |
| `source_table_name` | string | The name of the source table. | table |
| `source_path` | string | Location in cloud storage of the source table, or the path if it’s reading from cloud storage directly. | s3://mybucket/table1 |
| `source_type` | string | The type of the source. The value is TABLE, PATH, VIEW, or STREAMING_TABLE. | TABLE |
| `source_column_name` | string | The name of the source column. | date |
| `target_table_full_name` | string | Three-part name to identify the target table. | catalog.schema.table |
| `target_table_catalog` | string | The catalog of the target table. | catalog |
| `target_table_schema` | string | The schema of the target table. | catalog.schema |
| `target_table_name` | string | The name of the target table. | table |
| `target_path` | string | Location in cloud storage of the target table. | s3://mybucket/table1 |
| `target_type` | string | The type of the target. The value is TABLE, PATH, VIEW,or STREAMING TABLE. | TABLE |
| `target_column_name` | string | The name of the target column. | date |
| `created_by` | string | The user who generated this lineage. This can be a Databricks username, a Databricks service principal ID, “System-User”, or NULL if the user information cannot be captured. | crampton.rods@email.com |
| `event_time` | timestamp | The timestamp when the lineage was generated. | 2023-06-20T19:47:21.194+0000 |
| `event_date` | date | The date when the lineage was generated. This is a partitioned column. | 2023-06-20 |

### Billable Usage System Table:

###### Description: Includes records for all billable usage across your account. Each usage record is an hourly aggregate of a resource’s billable usage.
###### Location: `system.billing.usage`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `record_id` | string | Unique ID for this record | 11e22ba4-87b9-4cc2-9770-d10b894b7118 |
| `account_id` | string | ID of the account this report was generated for | 23e22ba4-87b9-4cc2-9770-d10b894b7118 |
| `workspace_id` | string | ID of the Workspace this usage was associated with | 1234567890123456 |
| `sku_name` | string | Name of the SKU | STANDARD_ALL_PURPOSE_COMPUTE |
| `cloud` | string | Cloud this usage is relevant for. Possible values are AWS, AZURE, and GCP. | AWS <br /> AZURE <br />GCP |
| `usage_start_time` | timestamp | The start time relevant to this usage record | 2023-01-09 10:00:00.000 |
| `usage_end_time` | timestamp | The end time relevant to this usage record | 2023-01-09 11:00:00.000 |
| `usage_date` | date | Date of the usage record, this field can be used for faster aggregation by date | 2023-01-01 |
| `custom_tags` | map | Tags applied by the users to this usage | { “env”: “production” } |
| `usage_unit` | string | Unit this usage is measured in. Possible values include DBUs. | DBU |
| `usage_quantity` | decimal | Number of units consumed for this record. | 259.2958 |
| `usage_metadata` | struct | System-provided metadata about the usage, including IDs for compute resources and jobs (if applicable). | {cluster_id: 12345; instance_pool_id: null; warehouse_id: null; job_id: null; node_type: null} |

### Pricing System Table:

###### Description: A historical log of SKU pricing. A record gets added each time there is a change to a SKU price.
###### Location: `system.billing.list_prices`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `price_start_time` | timestamp | The time this price became effective | 2023-01-01T09:59:59.999Z |
| `price_end_time` | timestamp | The time this price stopped being effective | 2023-01-01T09:59:59.999Z |
| `account_id` | string | ID of the account this report was generated for | 1234567890123456 |
| `sku_name` | string | Name of the SKU | STANDARD_ALL_PURPOSE_COMPUTE |
| `cloud` | string | Cloud this usage is relevant for. Possible values are AWS, AZURE, and GCP. | AWS <br /> AZURE <br />GCP |
| `currency_code` | string | The currency this price is expressed in | USD |
| `usage_unit` | string | The unit of measurement that is monetized. | DBU |
| `pricing` | struct | A structured data field that includes pricing info at the published list price rate. The key default will always return a single price that can be used for simple estimates. Some pricing models might also include additional keys that provide more detail. | {default: $0.10} |

### Clusters System Table:

###### Description: A slow-changing dimension table that contains the full history of cluster configurations over time for any cluster.
###### Location: `system.compute.clusters`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `account_id` | string | ID of the account where this cluster was created. | 23e22ba4-87b9-4cc2-9770-d10b894b7118 |
| `workspace_id` | string | ID of the workspace where this cluster was created. | 1234567890123456 |
| `cluster_id` | string | ID of the cluster for which this record is associated. | 0000-123456-xxxxxxxx |
| `cluster_name` | string | User defined name for the cluster. | My cluster |
| `owned_by` | string | Username of the cluster owner. Defaults to the cluster creator, but can be changed through the Clusters API. | sample_user@email.com |
| `create_time` | timestamp | Timestamp of the change to this compute definition. | 2023-01-09 11:00:00.000 |
| `delete_time` | timestamp | Timestamp of when the cluster was deleted. The value is null if the cluster is not deleted. | 2023-01-09 11:00:00.000 |
| `driver_node_type` | string | Driver node type name. This matches the instance type name from the cloud provider. | i3.xlarge |
| `worker_node_type` | string | Worker node type name. This matches the instance type name from the cloud provider. | i3.xlarge |
| `worker_count` | bigint | Number of workers. Defined for fixed-size clusters only. | 4 |
| `min_autoscale_workers` | bigint | The set minimum number of workers. This field is valid only for autoscaling clusters. | 1 |
| `max_autoscale_workers` | bigint | The set maximum number of workers. This field is valid only for autoscaling clusters. | 1 |
| `auto_termination_minutes` | bigint | The configured autotermination duration. | 120 |
| `enable_elastic_disk` | boolean | Autoscaling disk enablement status. | true |
| `tags` | map | Default and user defined tags for the cluster. | {"ResourceClass":"SingleNode"} |
| `cluster_source` | string | Indicates the creator for the cluster: UI, API, DLT, JOB, etc. | UI |
| `init_scripts` | array | Set of paths for init scripts. | "/Users/example@email.com/files/scripts/install-pyt |
| `aws_attributes` | struct | AWS specific settings. | {<br>"ebs_volume_count": null<br>"availability": "SPOT_WITH_FALLBACK",<br>"first_on_demand": "0",<br>"spot_bid_price_percent": "100"<br>} |
| `azure_attributes` | struct | Azure specific settings. | empty |
| `gcp_attributes` | struct | GCP specific settings. This field will be empty. | empty |
| `driver_instance_pool_id` | string | Instance pool ID if the driver is configured on top of an instance pool. | 1107-555555-crhod16-pool-DIdnjazB |
| `worker_instance_pool_id` | string | Instance Pool ID if the worker is configured on top of an instance pool. | 1107-555555-crhod16-pool-DIdnjazB |
| `dbr_version` | string | The Databricks Runtime of the cluster. | 14.x-snapshot-scala2.12 |
| `change_time` | timestamp | Timestamp of change to the compute definition. | 2023-01-09 11:00:00.000 |
| `change_date` | date | Change date. Used for retention. | 2023-01-09 |

### Node Types System Table:

###### Description: Captures the currently available node types with their basic hardware information.
###### Location: `system.compute.node_types`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `account_id` | string | ID of the account where this cluster was created. | 23e22ba4-87b9-4cc2-9770-d10b894b7118 |
| `node_type_name` | string | Unique identifier for node type. | i3.xlarge |
| `core_count` | double | Number of vCPUs for the instance. | 48.0 |
| `memory_mb` | long | Total memory for the instance. | 393216 |
| `gpu_count` | long | Number of GPUs for the instance. | 0 |

### Marketplace Listing Access System Table:

###### Description: Includes consumer info for completed request data or get data events on your listings.
###### Location: `system.marketplace.listing_access_events`

| Column Name | Data Type | Description |
|-------------|-----------|-------------|
| `account_id` | string | The account ID that hosts the listing. |
| `metastore_id` | string | The metatore ID that hosts the listing. |
| `metastore_cloud` | string | The cloud provider of the metastore that hosts the listing. |
| `metastore_region` | string | The region of the metastore that hosts the listing. |
| `provider_id` | string | The provider profile ID. |
| `provider_name` | string | The provider profile name. |
| `listing_id` | string | The listing ID. |
| `listing_name` | string | The listing name. |
| `consumer_delta_sharing_recipient_name` | string | The underlying Delta Sharing recipient name for the consumer. The value is null when the event_type is REQUEST_DATA. |
| `consumer_delta_sharing_recipient_type` | string | Whether the consumer is on a Databricks account or not. Values will be either OPEN or DATABRICKS. |
| `consumer_cloud` | string | The consumer’s cloud. Nullable if consumer_delta_sharing_recipient_type is OPEN. |
| `consumer_region` | string | The consumer’s region. Nullable if consumer_delta_sharing_recipient_type is OPEN. |
| `consumer_metastore_id` | string | The consumer’s metastore ID. Nullable if consumer_delta_sharing_recipient_type is OPEN. |
| `consumer_email` | string | The consumer’s email address. PII. |
| `consumer_name` | string | The consumer’s name. PII. |
| `consumer_company` | string | The consumer’s company. |
| `consumer_intended_use` | string | The consumer’s intended use of the listing. |
| `consumer_comments` | string | Any additional comment the consumer left. |
| `event_type` | string | The type of access. The value can be either REQUEST_DATA or GET_DATA. |
| `event_date` | date | The UTC date the event happened. |
| `event_time` | timestamp | The exact UTC timestamp when the event happened. |

### Predictive Optimization History System Table:

###### Description: Tracks the operation history of the predictive optimization feature.
###### Location: `system.storage.predictive_optimization_operations_history`

| Column Name | Data Type | Description | Example |
|-------------|-----------|-------------|---------|
| `account_id` | string | ID of the account. | 11e22ba4-87b9-4cc2-9770-d10b894b7118 |
| `workspace_id` | string | The ID of the workspace in which predictive optimization ran the operation. | 1234567890123456 |
| `start_time` | timestamp | The time at which the operation started. | 2023-01-09 10:00:00.000 |
| `end_time` | timestamp | The time at which the operation ended. | 2023-01-09 11:00:00.000 |
| `metastore_name` | string | The name of the metastore to which the optimized table belongs. | metastore |
| `catalog_name` | string | The name of the catalog to which the optimized table belongs. | catalog |
| `schema_name` | string | The name of the schema to which the optimized table belongs. | schema |
| `table_id` | string | The ID of the optimized table. | 138ebb4b-3757-41bb-9e18-52b38d3d2836 |
| `table_name` | string | The name of the optimized table. | table1 |
| `operation_type` | string | The optimization operation which was performed. The value will be COMPACTION or VACUUM. | COMPACTION |
| `operation_id` | string | The ID for the optimization operation. | 4dad1136-6a8f-418f-8234-6855cfaff18f |
| `operation_status` | string | The status of the optimization operation. The value will be SUCCESSFUL or FAILED: INTERNAL_ERROR. | SUCCESSFUL |
| `operation_metrics` | map[string, string]` | Additional details about the specific optimization that was performed. For COMPACTION operations: (number_of_compacted_files, amount_of_data_compacted_bytes, number_of_output_files, amount_of_output_data_bytes) For VACUUM operations: (number_of_deleted_files, amount_of_data_deleted_bytes)` | {"number_of_output_files":"100", <br /> "number_of_compacted_files":"1000", <br /> "amount_of_output_data_bytes":"4000", <br /> "amount_of_data_compacted_bytes":"10000"}` |
| `usage_unit` | string | The unit of usage that this operation incurred. Can only be one value: ESTIMATED_DBU. | ESTIMATED_DBU |
| `usage_quantity` | decimal | The amount of the usage unit that was used by this operation. | 2.12 |