# Snowflake (Cloud Data Platform)

## Goals
- Understand what **Snowflake** is (and what it is not).
- Know what Snowflake is practically used for in real systems.
- See a minimal **Snowflake SDK** pseudo-code workflow (no execution).


## Prerequisites
- Basic SQL and analytics/data warehouse concepts.
- Basic Python knowledge (environment variables, scripts).

> This notebook includes **pseudo-code only**. It does not execute any Snowflake SDK calls.


## What Snowflake is
**Snowflake** is a cloud-native **data platform** primarily used as a managed **analytical data warehouse**.

At a high level, Snowflake provides:
- **SQL** for querying and transforming data (plus support for semi-structured data).
- Separation of **storage** and **compute** (you scale compute independently via *virtual warehouses*).
- Elastic concurrency (multiple warehouses can query the same stored data).
- Managed operations: encryption, scaling, caching, metadata services, and more.

Key Snowflake concepts:
- **Database / schema / table**: familiar organizational units (SQL).
- **Virtual warehouse**: the compute cluster that runs queries.
- **Stages**: locations for loading/unloading data (e.g., cloud object storage).
- **Time Travel / zero-copy cloning**: point-in-time access and cheap copies for dev/test.
- **Secure data sharing**: share data across accounts without copying.

### What it is not
- Not primarily an **OLTP** database (high-frequency transactions, ultra-low latency key/value access).
- Not an application cache or message queue.


## What Snowflake is practically used for
Snowflake is commonly used as the central place to **store, transform, and analyze** data.

Typical use-cases:
- **Analytics + BI**: dashboards, reporting, KPIs, ad-hoc analysis.
- **ELT pipelines**: load raw data, then transform it in-warehouse (often with tools like dbt).
- **Data sharing**: share curated datasets between teams/partners without moving data.
- **Feature/analytics datasets for ML**: create training datasets and run large SQL feature engineering jobs.
- **Governance + access control**: centralized permissions, masking policies, auditing.


## Using Snowflake with the SDK (pseudo-code)
A common way to use Snowflake from Python is via the **Snowflake Python Connector** (`snowflake-connector-python`).

Notes:
- Use a secrets manager / key-pair auth / OAuth in real systems (avoid hardcoding passwords).
- Keep warehouses scoped to workloads (e.g., one for ELT, one for BI) to manage cost and concurrency.

```python
# PSEUDO-CODE (do not run)

import os
import snowflake.connector

conn = snowflake.connector.connect(
    account=os.environ["SNOWFLAKE_ACCOUNT"],
    user=os.environ["SNOWFLAKE_USER"],
    password=os.environ["SNOWFLAKE_PASSWORD"],  # store in a secrets manager
    role="ANALYST",
    warehouse="COMPUTE_WH",
    database="ANALYTICS",
    schema="PUBLIC",
)

with conn.cursor() as cur:
    # 1) Sanity check: query a built-in function.
    cur.execute("SELECT CURRENT_VERSION()")
    snowflake_version = cur.fetchone()[0]

    # 2) Run an analytics query.
    cur.execute(
        """
        SELECT day, total_revenue
        FROM ANALYTICS.PUBLIC.DAILY_REVENUE
        WHERE day >= DATEADD('day', -7, CURRENT_DATE())
        ORDER BY day
        """
    )
    rows = cur.fetchall()

conn.close()

print(snowflake_version)
print(rows[:3])
```

If you want a higher-level, DataFrame-style API (and to push more Python logic into Snowflake), look at **Snowpark** (`snowflake-snowpark-python`).
