## Querying the Warehouse

<div style="background-color: #cce5ff; padding: 10px; border-radius: 5px;">
    <strong>Notes:</strong>
    Thanks to the comprehensive work completed during the ETL phases, queries can now be executed with ease, offering low latency and improved clarity, as demonstrated below.
</div>

In [None]:
import duckdb
import os

#### Querying - 2.5 Volume Spikes

In [10]:
# Set S3 credentials (skip if public bucket)
duckdb.sql(f"""
SET s3_region='{os.environ["AWS_REGION"]}';
SET s3_access_key_id='{os.environ["AWS_ACCESS_KEY_ID"]}';
SET s3_secret_access_key='{os.environ["AWS_SECRET_ACCESS_KEY"]}';
""")

# Query directly from S3
df = duckdb.sql("""
SELECT *
FROM 's3://ucl-de/data/crypto_top_volume_spikes.parquet'
""").df()

df.head()

Unnamed: 0,asset,date,volume_spike,type
0,TRX,2024-12-03,15.97,Crypto
1,TRX,2024-08-20,8.53,Crypto
2,ETH,2024-08-05,7.38,Crypto
3,LINK,2024-08-05,6.71,Crypto
4,AVAX,2024-06-22,6.28,Crypto


#### Querying - 3.1 Daily Return

In [11]:
# Set S3 credentials (skip if public bucket)
duckdb.sql(f"""
SET s3_region='{os.environ["AWS_REGION"]}';
SET s3_access_key_id='{os.environ["AWS_ACCESS_KEY_ID"]}';
SET s3_secret_access_key='{os.environ["AWS_SECRET_ACCESS_KEY"]}';
""")

# Query directly from S3
df = duckdb.sql("""
SELECT *
FROM 's3://ucl-de/data/crypto_daily_return.parquet'
""").df()

df.head()

Unnamed: 0,coin_name,date,daily_return
0,ADA,2024-01-02,-0.0293
1,ADA,2024-01-03,-0.0795
2,ADA,2024-01-04,0.0237
3,ADA,2024-01-05,-0.0493
4,ADA,2024-01-06,-0.0354


#### Querying - 4.1 Rolling 7-day volatility

In [12]:
# Set S3 credentials (skip if public bucket)
duckdb.sql(f"""
SET s3_region='{os.environ["AWS_REGION"]}';
SET s3_access_key_id='{os.environ["AWS_ACCESS_KEY_ID"]}';
SET s3_secret_access_key='{os.environ["AWS_SECRET_ACCESS_KEY"]}';
""")

# Query directly from S3
df = duckdb.sql("""
SELECT *
FROM 's3://ucl-de/data/crypto_volatility.parquet'
""").df()

df.head()

Unnamed: 0,coin_name,date,rolling_volatility
0,ADA,2024-01-03,0.0354
1,ADA,2024-01-04,0.0516
2,ADA,2024-01-05,0.0434
3,ADA,2024-01-06,0.0376
4,ADA,2024-01-07,0.0346


#### Querying - 8.2 Sentiment vs Daily Returns

In [13]:
# Set S3 credentials (skip if public bucket)
duckdb.sql(f"""
SET s3_region='{os.environ["AWS_REGION"]}';
SET s3_access_key_id='{os.environ["AWS_ACCESS_KEY_ID"]}';
SET s3_secret_access_key='{os.environ["AWS_SECRET_ACCESS_KEY"]}';
""")

# Query directly from S3
df = duckdb.sql("""
SELECT *
FROM 's3://ucl-de/data/sentiment_vs_daily_return.parquet'
""").df()

df.head()

Unnamed: 0,DATE,stock_pct_change,crypto_pct_change,avg_sentiment
0,2024-01-02,-0.32,1.69,0.17
1,2024-01-03,-0.03,-4.78,0.179
2,2024-01-24,0.0,0.44,0.125
3,2024-01-25,-0.14,-0.37,0.204
4,2024-01-26,0.03,4.57,0.064


#### Querying - 8.3 Days When Sentiment Was High but Prices Dropped

In [14]:
# Set S3 credentials (skip if public bucket)
duckdb.sql(f"""
SET s3_region='{os.environ["AWS_REGION"]}';
SET s3_access_key_id='{os.environ["AWS_ACCESS_KEY_ID"]}';
SET s3_secret_access_key='{os.environ["AWS_SECRET_ACCESS_KEY"]}';
""")

# Query directly from S3
df = duckdb.sql("""
SELECT *
FROM 's3://ucl-de/data/sentiment_high_prices_low.parquet'
""").df()

df.head()

Unnamed: 0,DATE,sentiment,stock_pct_change,crypto_pct_change
0,2024-03-13,0.323,-0.24,2.27


#### Querying - 8.4 Days When Sentiment Was Low but Prices Rose

In [15]:
# Set S3 credentials (skip if public bucket)
duckdb.sql(f"""
SET s3_region='{os.environ["AWS_REGION"]}';
SET s3_access_key_id='{os.environ["AWS_ACCESS_KEY_ID"]}';
SET s3_secret_access_key='{os.environ["AWS_SECRET_ACCESS_KEY"]}';
""")

# Query directly from S3
df = duckdb.sql("""
SELECT *
FROM 's3://ucl-de/data/sentiment_low_prices_high.parquet'
""").df()

df.head()

Unnamed: 0,DATE,sentiment,stock_pct_change,crypto_pct_change
0,2024-02-01,-0.028,0.63,1.22
1,2024-02-26,-0.121,-0.9,5.22
2,2024-12-16,0.025,1.22,1.62
