- City
- Customer
- POS_order
- Product
- Store
Before moving to next steps, create compute.
- Query the "customers" table to get all the customer data.
- Query the "stores" table to get the data for the latest version of stores.
- Query the "products" table to get the data for the latest version of products.
- Explode the "transaction_line_item" column of the "transactions" table and join it with the "discount" column to get the data for all transactions with discounts. This is stored in a temporary table called "txns_discount_exploded".
- Extract the data for all returned transactions from the "transactions" table and store it in a temporary table called "txn_return".
- Union the "txns_discount_exploded" and "txn_return" tables to get all transaction data.
- Join the enriched transaction data with the customer, store, and product data to create an "orders_enriched_data" table containing all the relevant information about each order.
- Pick all columns from the latest version of the "input_customers" table and rename some columns. Save the result as "customers".
- Join the "input_pos_transactions" table with the "customers" table on the "customer_index" column using a left join. Exclude some columns from both tables. Save the result as "pos_transaction_with_customer".
- Group the "input_pos_transactions" table by "customer_index" and calculate the total sales amount, order count, average order value, and current timestamp. Save the result as "pos_transaction_total_amount".
- Join the "pos_transaction_total_amount" table with the "customers" table on the "customer_index" column using a left join. Exclude some columns from both tables. Save the result as to a Kafka topic named "customer_order_events"
- Defining Schema of Data Products
- Defining Quality of Data Products
- Defining Semantics of Data Products
- Defining Policies of Data Products
- Defining Storage Location of Data Products with Its Properties
- Creating Compute for Query
- Creating Cluster On Top Of Compute To Query Data Products
The "orders_enriched_data" table could be used to store and manage order information for analysis and reporting purposes. It could be used to track customer behavior, product sales, and revenue, among other things.
icebase
minerva
- Freshness: Should be updated at least daily to ensure that the data is current and relevant.
- Latency: Should have a low latency, with data available to users within 15 minutes of being updated.
- Accuracy: Should have a high degree of accuracy, with data that is consistent, complete, and free from errors.
- Security: The data product should be secure, with all personally identifiable information (PII) masked or otherwise protected.
fastbase
minerva
- Volume: Should have all the events for past week
- Latency: Should have a low latency, with data available to users in near real-time
- Security: The data product should be secure, with all personally identifiable information (PII) masked or otherwise protected