# Setup notebook

This notebook can be used to copy some sample data from github/other datasources into the current lakehouse. You just need to run below cells once.

In the next cell, we are testing whether a lakehouse has been attached to the notebook. If so, it creates a couple of sub folders in the `files` section.

In [None]:
import os, requests, gzip

if not os.path.exists("/lakehouse/default"):
    raise FileNotFoundError(
        "Default lakehouse not found, please add a lakehouse and restart the session."
    )

os.makedirs("/lakehouse/default/Files/sampledata", exist_ok=True)
os.makedirs("/lakehouse/default/Files/taxidata", exist_ok=True)

The next cell, copies some compressed (gz) TSV (tab separated values) file from github into the lakehouse

In [None]:
remote_url = "https://github.com/weslbo/DP-601/raw/main/data/pageviews-by-second-tsv.gz"

response = requests.get(remote_url)
compressed_data = io.BytesIO(response.content)

with gzip.GzipFile(fileobj=compressed_data, mode='rb') as decompressed_data:
    decompressed_content = decompressed_data.read()

with open(f"/lakehouse/default/Files/sampledata/pageviews-by-second.tsv", "wb") as f:
    f.write(decompressed_content)

And here, we are copying various sample data into the lakehouse

In [None]:
def download(remote_url, filename, download_path="/lakehouse/default/Files/sampledata"):
    response = requests.get(remote_url)
    with open(f"{download_path}/{filename}", "wb") as f:
        f.write(response.content)

download("https://github.com/weslbo/DP-601/raw/main/data/zipcodes_singlelines.json", "zipcodes_singlelines.json")
download("https://github.com/weslbo/DP-601/raw/main/data/zipcodes_multilines.json", "zipcodes_multilines.json")
download("https://github.com/weslbo/DP-601/raw/main/data/population.parquet", "population.parquet")
download("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-06.parquet", "yellow_tripdata_2023-06.parquet", download_path="/lakehouse/default/Files/taxidata")
download("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-05.parquet", "yellow_tripdata_2023-05.parquet", download_path="/lakehouse/default/Files/taxidata")
download("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-04.parquet", "yellow_tripdata_2023-04.parquet", download_path="/lakehouse/default/Files/taxidata")
download("https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-03.parquet", "yellow_tripdata_2023-03.parquet", download_path="/lakehouse/default/Files/taxidata")