In [1]:
# Colab cell
from google.colab import drive

drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [2]:
# Adjust these two for YOUR repo
REPO_OWNER = "ywanglab"
REPO_NAME  = "STAT4160"   # e.g., unified-stocks-team1

BASE_DIR   = "/content/drive/MyDrive/dspt25"
CLONE_DIR  = f"{BASE_DIR}/{REPO_NAME}"
REPO_URL   = f"https://github.com/{REPO_OWNER}/{REPO_NAME}.git"

import os, pathlib
pathlib.Path(BASE_DIR).mkdir(parents=True, exist_ok=True)


In [3]:
import os, subprocess, shutil, pathlib

if not pathlib.Path(CLONE_DIR).exists():
    !git clone {REPO_URL} {CLONE_DIR}
else:
    # If the folder exists, just ensure it's a git repo and pull latest
    os.chdir(CLONE_DIR)
    # !git status
    # !git pull --rebase # !git pull --ff-only
os.chdir(CLONE_DIR)
print("Working dir:", os.getcwd())

Working dir: /content/drive/MyDrive/dspt25/STAT4160


In [8]:
# Ensure pandas and sqlite3 are available (sqlite3 is in stdlib)
import pandas as pd, sqlite3, numpy as np, os
from pathlib import Path

Path("data/raw").mkdir(parents=True, exist_ok=True)
if not Path("data/raw/prices.csv").exists():
    print("No prices.csv found; generating a small synthetic one.")
    tickers = ["AAPL","MSFT","NVDA","AMZN","GOOGL"]
    dates = pd.bdate_range("2022-01-03", periods=120)
    rng = np.random.default_rng(7)
    frames=[]
    for t in tickers:
        r = rng.normal(0, 0.01, len(dates))
        price = 100*np.exp(np.cumsum(r))
        vol = rng.integers(1e5, 5e6, len(dates))
        df = pd.DataFrame({"ticker": t, "date": dates, "adj_close": price, "volume": vol})
        df["log_return"] = np.log(df["adj_close"]).diff().fillna(0)
        frames.append(df)
    pd.concat(frames, ignore_index=True).to_csv("data/raw/prices.csv", index=False)

# Show a peek
pd.read_csv("data/raw/prices.csv").head()

Unnamed: 0,ticker,date,adj_close,volume,log_return
0,AAPL,2020-01-01,100.00123,4457901,0.0
1,AAPL,2020-01-02,100.300426,2664190,0.002987
2,AAPL,2020-01-03,100.025841,4100245,-0.002741
3,AAPL,2020-01-06,99.138974,4586613,-0.008906
4,AAPL,2020-01-07,98.689241,1556062,-0.004547


`db_path.unlink()`

Deletes the file at that path (similar to `rm` in Unix or `del` in Windows).

1. **`unlink()`**

   * Low-level **system call** in Unix/POSIX.
   * Removes a *directory entry* (a name → inode mapping).
   * If no more directory entries or processes reference that inode, the filesystem reclaims the storage.
   * In Python’s `pathlib`, `Path.unlink()` is a wrapper that calls this system-level removal.

   
2. **`rm`**

   * A **Unix command-line utility** (higher-level tool).
   * Under the hood, `rm` calls `unlink()` (or `unlinkat()`) to actually remove the file.
   * Supports **flags** like:

     * `rm -r dir/` → remove directories recursively.
     * `rm -f file` → force remove, ignore errors.
     * `rm -i file` → interactive, ask before deleting.
   * Works with globbing (`rm *.txt`).

### **So the difference**

* `unlink()` = **primitive** system call (or Python API) → removes one name from the filesystem.
* `rm` = **user-facing command** that wraps `unlink()` (and adds options, recursion, safety prompts, etc.).

### **What is an inode?**

* **inode = index node** (used in Unix/Linux filesystems like ext4, XFS, etc.).
* It’s a **data structure** on disk that stores **metadata about a file**, *not* the file’s name.

**An inode contains things like:**

* File type (regular file, directory, symlink, etc.)
* File size
* Permissions (read/write/execute bits, owner, group)
* Timestamps (created, modified, accessed)
* Pointers (addresses) to the actual data blocks on disk

* Filenames live in a **directory entry** (a mapping from name → inode number).

### **Inode mapping**

* A directory is just a table mapping:

  ```
  filename → inode number
  ```
* Example:

  ```
  "notes.txt" → inode #12345
  "report.pdf" → inode #54321
  ```
* Multiple names (hard links) can map to the same inode.

---

### **Why “unlink”?**

* When you “delete” a file in Unix, you’re really just **removing the link** (the directory entry) that points to the inode.
* If other links (hard links) or processes still reference that inode, the data stays around.
* Only when the **link count** drops to zero *and* no process has the file open will the filesystem free the inode and reclaim the storage.

---


* **Inode** = the actual file cabinet drawer (where the data + info lives).
* **Filename** = the sticky note on the cabinet saying “Report.pdf → Drawer #12.”
* **unlink()** = removing the sticky note.
* If nobody else has a sticky note to Drawer #12, the cabinet can be emptied and reused.

---

* An **inode** is a filesystem structure with metadata + pointers to file data.
* A **directory entry** maps a name to an inode.
* **unlink()** removes that mapping; the inode and file data go away only when no names or processes reference it.



### 1. **Connect to SQLite**

```python
con = sqlite3.connect(db_path)
cur = con.cursor()
```

* Opens (or creates) the SQLite database at `db_path`.
* Creates a cursor object for executing SQL commands.

---

### 2. **Enable foreign keys**

```python
cur.execute("PRAGMA foreign_keys = ON;")
```

* SQLite doesn’t enforce foreign keys unless this is turned **ON** per connection.
* Ensures that values in the `prices.ticker` column must exist in the `meta.ticker` column.
* The `;` marks the end of an SQL statement.

  * Required when running multiple statements or in interactive/database shells.

  *Optional for a single query in many programming APIs (like Python’s execute).
---

### 3. **Enable WAL mode**

```python
cur.execute("PRAGMA journal_mode = WAL;")
```

* WAL = *Write-Ahead Logging*.
* Improves concurrency: one process can read while another writes.
* Not strictly necessary here, but helpful if you expect multiple readers/writers.
* It’s a journaling mode where **writes go to a separate log file first**, not directly to the main database file.
* In SQLite:

  * Normally (`DELETE` or `ROLLBACK` journal mode), when you write, the database has to **lock the whole DB**, make changes, and then update the main file.
  * In **WAL mode**, changes are appended to a `*.db-wal` log file. Later, they’re **checkpointed** (merged) back into the main DB file.

---

### 4. **Define schema (DDL block)**

* **`prices` table** stores daily prices.
* `(ticker, date)` pair is the **primary key**, so each ticker has at most one entry per day.
* `adj_close` and `volume` have `CHECK` constraints to forbid negative values.
* `FOREIGN KEY (ticker)` ensures every price row corresponds to a `meta` row.
A foreign key is a column (or set of columns) in one table that references a primary key in another table.

```sql
CREATE INDEX IF NOT EXISTS idx_prices_date ON prices(date);
```

* Adds an index on `date` (across all tickers).
* Speeds up queries like *“give me all prices between 2020-01-01 and 2020-12-31”*.

---

### 5. **Execute + commit**

```python
cur.executescript(ddl)
con.commit()
```

* `executescript()` runs the multi-statement DDL block in one go.
* `commit()` saves changes to the database file.

---

**Summary**:
This code builds a small relational schema for stock data in SQLite.

* `meta` table = company metadata.
* `prices` table = daily stock prices, linked by foreign key to `meta`.
* Foreign keys and constraints protect data integrity.
* WAL mode and the date index improve performance.

`-- ISO 'YYYY-MM-DD'` SQL comment line

That line is creating a **database index** on the `date` column of your `prices` table. Let’s unpack it carefully:

```sql
CREATE INDEX IF NOT EXISTS idx_prices_date ON prices(date);
```
In SQLite, the index is stored inside the same database file (`.db`).

* **`CREATE INDEX`** → defines an index to speed up lookups.
* **`IF NOT EXISTS`** → create it if it does not existp;otherwise, do nothing.
* **`idx_prices_date`** → the chosen name of the index.
* **`ON prices(date)`** → builds the index on the `date` column of the `prices` table.

Normally, SQLite would scan the entire table row by row (**full table scan**).
* With the index, SQLite can **jump directly** to the rows with matching dates → much faster when `prices` has many rows.



In [9]:
import sqlite3, textwrap, os
from pathlib import Path

db_path = Path("data/prices.db")
if db_path.exists(): db_path.unlink()  # start fresh for class; remove this in real life
con = sqlite3.connect(db_path) # Opens (or creates) the SQLite database at db_path.
cur = con.cursor() # Creates a cursor object for executing SQL commands.

# Turn on foreign keys
cur.execute("PRAGMA foreign_keys = ON;")
# (Optional) WAL can help concurrency; not critical here. Ensures that values in the prices.ticker column must exist in the meta.ticker column.
cur.execute("PRAGMA journal_mode = WAL;")
# WAL = Write-Ahead Logging.
# Improves concurrency: one process can read while another writes.
# Not strictly necessary here, but helpful if you expect multiple readers/writers.

ddl = textwrap.dedent("""
CREATE TABLE meta (
  ticker TEXT PRIMARY KEY,
  name   TEXT,
  sector TEXT NOT NULL
);

CREATE TABLE prices (
  ticker     TEXT NOT NULL,
  date       TEXT NOT NULL,               -- ISO 'YYYY-MM-DD'
  adj_close  REAL NOT NULL CHECK (adj_close >= 0),
  volume     INTEGER NOT NULL CHECK (volume >= 0),
  log_return REAL NOT NULL,
  PRIMARY KEY (ticker, date),
  FOREIGN KEY (ticker) REFERENCES meta(ticker)
);

-- Index to speed up date-range scans across all tickers
CREATE INDEX IF NOT EXISTS idx_prices_date ON prices(date);
""")
cur.executescript(ddl)
con.commit()
print("Created:", db_path)

Created: data/prices.db


`warnings.filterwarnings("ignore")` tells Python:
“Hide all warnings — don’t print them at all.”

In [10]:
import pandas as pd, numpy as np
import warnings
warnings.filterwarnings("ignore")

# Read tickers (from existing CSV or fallback)
if Path("tickers_25.csv").exists():
    tickers = pd.read_csv("tickers_25.csv")["ticker"].dropna().unique().tolist()
else:
    tickers = pd.read_csv("data/raw/prices.csv")["ticker"].dropna().unique().tolist()

def fetch_sector_map(tickers):
    try:
        import yfinance as yf
        out=[]
        for t in tickers:
            info = yf.Ticker(t).info or {}
            name  = info.get("shortName") or info.get("longName") or t
            sector= info.get("sector") or "Unknown"
            out.append({"ticker": t, "name": name, "sector": sector})
        return pd.DataFrame(out)
    except Exception:
        pass
    # Fallback: deterministic synthetic sectors
    sectors = ["Technology","Financials","Healthcare","Energy","Consumer"]
    rng = np.random.default_rng(42)
    return pd.DataFrame({
        "ticker": tickers,
        "name": tickers,
        "sector": [sectors[i % len(sectors)] for i in range(len(tickers))]
    })

meta_df = fetch_sector_map(tickers)
meta_df.head()

Unnamed: 0,ticker,name,sector
0,AAPL,Apple Inc.,Technology
1,MSFT,Microsoft Corporation,Technology
2,AMZN,"Amazon.com, Inc.",Consumer Cyclical
3,GOOGL,Alphabet Inc.,Communication Services
4,META,"Meta Platforms, Inc.",Communication Services


### 1. **Context manager for the connection**
**Note**: This is not working in notebook. We will remove it below.
```python
with con:
```

* Opens a transaction automatically.
* If everything succeeds, it commits at the end.
* If an error happens, it rolls back.
* Cleaner than `con.commit()` / `con.rollback()` manually.

---

### 2. **Bulk insert with `executemany`**

```python
con.executemany(
    "INSERT INTO meta(ticker, name, sector) VALUES(?, ?, ?)",
    meta_df[["ticker","name","sector"]].itertuples(index=False, name=None)
)
```

* `executemany` runs the SQL statement once **for each row** in the provided sequence.
* The placeholders `?, ?, ?` are **parameter markers** → safe against SQL injection, and efficient.
* `meta_df[["ticker","name","sector"]].itertuples(index=False, name=None)`:

  * Takes only the `ticker`, `name`, `sector` columns from the DataFrame.
  * `itertuples(..., name=None)` yields each row as a plain tuple, e.g.:

    ```python
    ("AAPL", "Apple Inc.", "Technology")
    ("MSFT", "Microsoft Corp.", "Technology")
    ```


So: all rows in your pandas `DataFrame` get inserted into the `meta` table in **one batch**.


### 3. **Check the results**

```python
print(pd.read_sql_query("SELECT * FROM meta LIMIT 5;", con))
```

* Runs a quick SELECT to show the first 5 rows you just inserted.
* Uses `pandas.read_sql_query`, so you get the results as a DataFrame.


If you’re inserting **one** row into `meta`, the SQL is simply:

```sql
INSERT INTO meta (ticker, name, sector)
VALUES ('AAPL', 'Apple Inc.', 'Technology');
```

In Python with `sqlite3`, use a **parameterized** single-row insert (safer for quotes, etc.):

```python
con.execute(
    "INSERT INTO meta(ticker, name, sector) VALUES(?, ?, ?)",
    ("AAPL", "Apple Inc.", "Technology")
)
con.commit()  # or use `with con:` to auto-commit
```

With Jupyter SQL magic:
```python
t, n, s = "AAPL", "Apple Inc.", "Technology"
```

```python
%sql INSERT INTO meta(ticker, name, sector) VALUES (:t, :n, :s)
```

### Notes

* `executemany(...)` is just the **bulk** version of the same statement; it runs the single-row `INSERT` repeatedly for each tuple.
* Since `ticker` is the **PRIMARY KEY**, inserting a duplicate will error. If you want “upsert” behavior, use one of these:

  * **Ignore duplicates:**

    ```sql
    INSERT OR IGNORE INTO meta(ticker, name, sector)
    VALUES ('AAPL', 'Apple Inc.', 'Technology');
    ```
  * **Update on conflict (preferred upsert):**

    ```sql
    INSERT INTO meta(ticker, name, sector)
    VALUES ('AAPL', 'Apple Inc.', 'Technology')
    ON CONFLICT(ticker) DO UPDATE
      SET name = excluded.name,
          sector = excluded.sector;
    ```

  *(Avoid `INSERT OR REPLACE` with FKs; it performs a delete+insert under the hood and can clash with foreign keys.)*


* **`excluded`** is a special row alias that holds the values you *tried to insert* but that **conflicted** (here, on `ticker`).
* The statement says: “when there’s a PK conflict on `ticker`, **update the existing row** so its `name` and `sector` become the attempted values.”

### Useful variations

**Only update if something actually changed (null-safe):**

```sql
ON CONFLICT(ticker) DO UPDATE
SET name   = excluded.name,
    sector = excluded.sector
WHERE name   IS NOT excluded.name
   OR sector IS NOT excluded.sector;
```

**Preserve existing non-NULLs (only overwrite when new value is non-NULL):**

```sql
ON CONFLICT(ticker) DO UPDATE
SET name   = COALESCE(excluded.name,   meta.name),
    sector = COALESCE(excluded.sector, meta.sector);
```


In [11]:
# Insert meta with parameterized query
# with con:
#     con.executemany(
#         "INSERT INTO meta(ticker, name, sector) VALUES(?, ?, ?)",
#         meta_df[["ticker","name","sector"]].itertuples(index=False, name=None)
#     )
con.executemany(
        "INSERT INTO meta(ticker, name, sector) VALUES(?, ?, ?)",
        meta_df[["ticker","name","sector"]].itertuples(index=False, name=None)
    )
print(pd.read_sql_query("SELECT * FROM meta LIMIT 5;", con))

  ticker                   name                  sector
0   AAPL             Apple Inc.              Technology
1   MSFT  Microsoft Corporation              Technology
2   AMZN       Amazon.com, Inc.       Consumer Cyclical
3  GOOGL          Alphabet Inc.  Communication Services
4   META   Meta Platforms, Inc.  Communication Services


In [12]:
# Sanity check
print("DBs attached:", list(con.execute("PRAGMA database_list;")))
print("Tables:", list(con.execute("SELECT name FROM sqlite_master WHERE type='table'")))
print("In transaction?", con.in_transaction)


DBs attached: [(0, 'main', '/content/drive/MyDrive/dspt25/STAT4160/data/prices.db')]
Tables: [('meta',), ('prices',)]
In transaction? True


1. **`drop_duplicates(subset=["ticker","date"])`**

* Removes rows that have the same `(ticker, date)` pair.
* By default it **keeps the first** occurrence and drops later ones (`keep="first"`).
* You can change behavior: `keep="last"` or `keep=False` (drop *all* duplicates).

2. **`reset_index(drop=True)`**

* Moves the **index** back into regular **columns**, and replaces the index with a default **RangeIndex(0…N−1)**.
* If the index has a **name**, that name becomes the new column name; if unnamed, the column will be called `"index"`.

* `drop=True` discards the old index instead of adding it as a column.

### Common patterns

```python
# 1) Typical cleanup after filtering/dropping rows
df = df.reset_index(drop=True)       # discard old index, get 0..N-1

# 2) After groupby (turn group labels from index to columns)
out = df.groupby("sector")["adj_close"].mean().reset_index()

# 3) Only reset some levels of a MultiIndex
df = df.reset_index(level=["ticker"])  # bring just 'ticker' out as a column
```


In [14]:
prices = pd.read_csv("data/raw/prices.csv", parse_dates=["date"])
# Normalize date to ISO text
prices["date"] = prices["date"].dt.strftime("%Y-%m-%d")
# Keep only needed columns and ensure order matches table
prices = prices[["ticker","date","adj_close","volume","log_return"]]

# Optional: drop duplicates to respect PK before insert
prices = prices.drop_duplicates(subset=["ticker","date"]).reset_index(drop=True)
len(prices)

4500

**`IGNORE`** (in SQLite) is a **conflict resolution** policy that tells the engine to **skip the row that violates a constraint and continue**—no error is raised and nothing is written for that row.


* **Old-style clause on the statement:**

  ```sql
  INSERT OR IGNORE INTO meta(ticker, name, sector)
  VALUES ('AAPL', 'Apple Inc.', 'Technology');
  ```
* **UPSERT form (SQLite ≥ 3.24):**

  ```sql
  INSERT INTO meta(ticker, name, sector)
  VALUES ('AAPL', 'Apple Inc.', 'Technology')
  ON CONFLICT(ticker) DO NOTHING;   -- same effect as OR IGNORE
  ```

### What it applies to

`IGNORE` (and `DO NOTHING`) suppresses errors for **constraint conflicts** on:

* `PRIMARY KEY` / `UNIQUE`
* `NOT NULL`
* `CHECK`

The row with the conflict is **discarded**; other rows in the same statement continue.


### Examples

**Skip duplicate primary key**

```sql
-- If 'AAPL' already exists, this inserts nothing and raises no error
INSERT OR IGNORE INTO meta(ticker, name, sector)
VALUES ('AAPL', 'Apple Inc.', 'Technology');
```

**Bulk insert: keep the non-duplicates**

```python
rows = [
    ("AAPL", "Apple Inc.", "Technology"),
    ("MSFT", "Microsoft", "Technology"),
    ("AAPL", "Apple Inc.", "Tech")  # duplicate PK -> ignored
]
con.executemany(
    "INSERT OR IGNORE INTO meta(ticker, name, sector) VALUES(?, ?, ?)", rows
)
```



In [15]:
# Bulk insert inside one transaction; ignore rows violating FK or PK (e.g., duplicates)
# with con:
#     con.executemany(
#         "INSERT OR IGNORE INTO prices(ticker,date,adj_close,volume,log_return) VALUES(?,?,?,?,?)",
#         prices.itertuples(index=False, name=None)
#     )
con.executemany(
        "INSERT OR IGNORE INTO prices(ticker,date,adj_close,volume,log_return) VALUES(?,?,?,?,?)",
        prices.itertuples(index=False, name=None)
    )
# Quick counts
print(pd.read_sql_query("SELECT COUNT(*) AS nrows FROM prices;", con))
print(pd.read_sql_query("""
SELECT ticker, COUNT(*) AS n
FROM prices
GROUP BY ticker
ORDER BY n DESC
LIMIT 5;
""",
con))

   nrows
0   4500
  ticker    n
0   AAPL  180
1   AMZN  180
2    BAC  180
3   CSCO  180
4    CVX  180


in Jupyter/Colab you can use the **SQL magic** from the `ipython-sql` (or `jupysql`) extension.

### Quick setup (one-time)

```python
%pip install -q ipython-sql sqlalchemy
%load_ext sql
```

### Connect to your SQLite DB

Use an SQLAlchemy URL.

* **Relative path:** `sqlite:///data/prices.db`
* **Absolute path:** `sqlite:////content/drive/MyDrive/dspt25/STAT4160/data/prices.db`

```python
%sql sqlite:///data/prices.db
```

### Run queries

* **Line magic** (`%sql`) for one-liners:

```python
%sql SELECT COUNT(*) AS n FROM meta;
```

* **Cell magic** (`%%sql`) for multi-line SQL:

```sql
%%sql
SELECT ticker, COUNT(*) AS days
FROM prices
GROUP BY ticker
ORDER BY days DESC
LIMIT 5;
```

### Get results into pandas

```python
# Option A: assign the result, then convert
res = %sql SELECT * FROM meta LIMIT 5;
df = res.DataFrame()

# Option B: store directly into a DataFrame named df
%sql -o df SELECT ticker, sector FROM meta LIMIT 5;
```

### Use Python variables in your SQL

```python
sym = "AAPL"
%sql SELECT date, adj_close FROM prices WHERE ticker = :sym ORDER BY date DESC LIMIT 5;
```

### Persist a pandas DataFrame to SQLite (create/append a table)

```python
# Suppose you have a DataFrame named prices_df
%sql --persist prices_df     # creates a table named prices_df
# or explicitly:
# prices_df.to_sql("prices", sqlite3.connect("data/prices.db"), if_exists="append", index=False)
```

### Tips / gotchas

* The magic opens **its own DB connection**, separate from your `sqlite3` `con`. Commit your writes first to avoid “database is locked”.
* For **absolute paths**, use **four slashes** after `sqlite:` (e.g., `sqlite:////abs/path/to.db`).
* For an **in-memory DB**, use `sqlite:///:memory:` (note: it disappears when that connection closes).



* need `ipython-sql`** to get the `%sql` / `%%sql` magics in Jupyter.
* **need `SQLAlchemy`** because `ipython-sql` uses it under the hood to connect to databases via URLs like `sqlite:///path/to.db`. (Installing `ipython-sql` usually pulls `sqlalchemy`, but explicitly installing both avoids version/dependency hiccups.)

`%load_ext sql` loads the **IPython extension** provided by `ipython-sql`. Loading it:

* **Registers** the `%sql` (line) and `%%sql` (cell) magics.
* After that, you can connect and run SQL right in cells.

Example:

```python
%load_ext sql
%sql sqlite:///data/prices.db          # open a connection
%sql SELECT COUNT(*) AS n FROM meta;   # run a one-line query
```

* The connection string you pass (`sqlite:///...`, `postgresql://...`, `mysql+pymysql://...`) is a **SQLAlchemy URL**.
* `ipython-sql` uses SQLAlchemy’s engines/dialects to handle connections and execute your SQL.



* **`%pip`** → IPython **magic**. **Recommended.** Runs `python -m pip` with the **same interpreter as the kernel**, and refreshes the environment so installs are available immediately (when possible).
* **`!pip`** → **shell command**. May call a **different** `pip` from your system `PATH`, so packages can end up in the wrong environment and not be importable in the notebook.

---

### Why `%pip` is safer

* Uses the **kernel’s Python** (same `sys.executable`) → installs land in the notebook’s environment.
* After install, IPython updates import paths so you can typically `import` right away.
* Same idea applies to **`%conda`** vs `!conda`.

### What `!pip` really does

* The leading `!` runs a **shell** command. It picks whichever `pip` is first on your `PATH` (could be system Python, not your kernel’s).
* That’s why you sometimes install a package and still get `ModuleNotFoundError` in the next cell.


### Quick sanity check snippets

```python
import sys, subprocess, shlex

print("Kernel Python:", sys.executable)
print("python -m pip ->", subprocess.check_output([sys.executable, "-m", "pip", "--version"]).decode().strip())
# Compare with the shell's pip:
# (May be different!)
# !pip --version
```



In [66]:
%pip install -q ipython-sql sqlalchemy



Note that you may hit a known incompatibility between the **%sql** magic and **PrettyTable ≥ 3.12**: PrettyTable moved its style constants, so `%sql`’s default style lookup for `"DEFAULT"` crashes with `KeyError: 'DEFAULT'`.

### Quick fixes (pick one)

**A) Set the old fallback style once per notebook**

```python
%config SqlMagic.style = '_DEPRECATED_DEFAULT'
```

Then re-run your `%%sql` cell. This is the simplest workaround.

**B) Pin PrettyTable to a pre-change version**

```python
%pip install "prettytable<3.12"
```

Restart the kernel, `%load_ext sql`, reconnect, and run your query. (The break came with PrettyTable 3.12.)



In [16]:
%load_ext sql
# %reload_ext sql
%sql sqlite:///data/prices.db

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [17]:
%config SqlMagic.style = '_DEPRECATED_DEFAULT'


In [18]:
%%sql
SELECT ticker, COUNT(*) AS days
FROM prices
GROUP BY ticker
ORDER BY days DESC
LIMIT 5;


 * sqlite:///data/prices.db
Done.


ticker,days
AAPL,180
AMZN,180
BAC,180
CSCO,180
CVX,180


When
* SQLite raises “**cannot commit transaction – SQL statements in progress**” when any cursor on the same connection still has an active statement/result set.
* Closing the cursor guarantees there’s no active statement before you commit.

Safe patterns:

```python
# WRITE-ONLY path (either order is fine, but this is safest)
cur = con.cursor()
cur.executemany("INSERT INTO meta(ticker,name,sector) VALUES(?, ?, ?)", rows)
cur.close()          # ensure no statements in progress
con.commit()
```
