# Loading Preprocessed Data

Before starting the feature engineering process, we first load the preprocessed dataset that was saved after the EDA step. This ensures that all data cleaning, formatting, and exploratory work are carried forward consistently into this next phase.

In [39]:
# Importing necessary libraries
import pandas as pd
import numpy as np

# Defining the path to the preprocessed data file
file_path = r'C:\Users\ACER\OneDrive\Documents\my codess\Data-Analytics-Assignment\Crypto-Liquidity-Prediction-ML-Project\data\processed\processed_data.csv'

# Load the dataset
df = pd.read_csv(file_path)

# Previewing the loaded DataFrame
print("Loaded preprocessed dataset:")
print(df.head())


Loaded preprocessed dataset:
       coin symbol         price     1h    24h     7d    24h_volume  \
0   Bitcoin    BTC  40859.460000  0.022  0.030  0.055  3.539076e+10   
1  Ethereum    ETH   2744.410000  0.024  0.034  0.065  1.974870e+10   
2    Tether   USDT      1.000000 -0.001 -0.001  0.000  5.793497e+10   
3       BNB    BNB    383.430000  0.018  0.028  0.004  1.395854e+09   
4  USD Coin   USDC      0.999874 -0.001  0.000 -0.000  3.872274e+09   

        mkt_cap        date  
0  7.709915e+11  2022-03-16  
1  3.271044e+11  2022-03-16  
2  7.996516e+10  2022-03-16  
3  6.404382e+10  2022-03-16  
4  5.222214e+10  2022-03-16  


# 2. Feature Engineering

**Feature Engineering** involves creating new variables or transforming existing ones to better capture important patterns in the data. These new features can help improve the performance of models or provide deeper insights during analysis. In this section, we create moving averages to smooth out short-term fluctuations and highlight underlying trends in cryptocurrency price and market capitalization.

### 2.1 Feature Engineering: Calculating Moving Averages

In this step, we will enrich the dataset by designing new features in the form of moving averages. We initially sort the data by date to keep the proper order of dates. Next, we calculate a 2-day moving average of the price as well as the market capitalization. Moving averages smooth short-term volatility in the data, allowing us to better capture underlying trends and patterns. The engineered features can enhance the quality of analysis and modeling that follows.

In [42]:
# Ensuring that the dataframe is sorted by the date for accurate rolling calculations
df = df.sort_values(by='date')

# Calculating 2-day moving average of 'price' column
df['price_MA_2d'] = df['price'].rolling(window=2).mean()

# Calculating 2-day moving average of 'market capitalization'
df['market_cap_MA_2d'] = df['mkt_cap'].rolling(window=2).mean()

# Displaying the first 5 rows of the original and new moving average columns
print(df[['price', 'price_MA_2d', 'mkt_cap', 'market_cap_MA_2d']].head())

            price  price_MA_2d       mkt_cap  market_cap_MA_2d
0    4.085946e+04          NaN  7.709915e+11               NaN
340  1.080000e+00  20430.27000  1.300442e+08      3.855608e+11
339  7.960000e+00      4.52000  1.302007e+08      1.301224e+08
338  2.949200e-01      4.12746  1.327759e+08      1.314883e+08
337  3.051000e-09      0.14746  1.329136e+08      1.328448e+08


**Description of Moving Average Output**

- The moving average columns (`price_MA_2d` and `market_cap_MA_2d`) represent the average of the current and previous day’s values, smoothing out short-term fluctuations.
- Notice that the first row shows `NaN` values for moving averages because there is no previous day to average with.
- The moving averages provide a clearer trend by reducing noise in daily price and market cap data.
- This helps in identifying underlying patterns or trends that may not be visible in the raw data.
- For example, the `price_MA_2d` smooths out sudden spikes or drops seen in the raw price, aiding better analysis and forecasting.

### 2.2 Calculating Simple Volatility Based on Price Changes

In this step, we will calculate a simple measure of volatility by taking the absolute difference between the **24-hour** and **1-hour** return rates. This metric captures how much the price return has changed within the day compared to the last hour, giving an insight into short-term price fluctuations. Higher values indicate greater volatility, which is crucial for understanding risk and market dynamics.

In [43]:
# Computing simple volatility as the absolute difference between 24-hour and 1-hour returns
df['volatility'] = (df['24h'] - df['1h']).abs()

print(df[['1h', '24h', 'volatility']].head())

        1h    24h  volatility
0    0.022  0.030       0.008
340  0.000 -0.004       0.004
339  0.017  0.008       0.009
338  0.023  0.010       0.013
337  0.012 -0.005       0.017


**Description of Volatility Score Output**

The table shows the **1-hour** and **24-hour** return percentages alongside the calculated **volatility score**, which measures the absolute difference between these two returns. This score helps capture short-term price fluctuations:

- For example, the first row indicates a 1-hour return of 2.2% and a 24-hour return of 3.0%, resulting in a volatility score of 0.8%.
- The volatility score quantifies how much the returns differ within these two timeframes, highlighting periods of higher or lower price stability.
- Smaller volatility scores suggest relatively stable price movement between 1 hour and 24 hours, whereas larger scores indicate more pronounced changes in price behavior during that period.

This simple metric offers a quick way to identify coins experiencing significant short-term price swings.

### 2.3 Liquidity Ratio Calculation

In this step, we will calculate the liquidity ratio by dividing the **24-hour** trading volume by the market capitalization for each cryptocurrency. This metric will provides insight into how actively a coin is being traded relative to its overall market value, helping to assess the ease of buying or selling the asset without impacting its price significantly.

In [44]:
# Calculate liquidity ratio as the ratio of 24-hour trading volume to market capitalization
df['liquidity_ratio'] = df['24h_volume'] / df['mkt_cap']

print(df[['24h_volume', 'mkt_cap', 'liquidity_ratio']].head())

       24h_volume       mkt_cap  liquidity_ratio
0    3.539076e+10  7.709915e+11         0.045903
340  9.525810e+04  1.300442e+08         0.000733
339  1.069360e+06  1.302007e+08         0.008213
338  3.041720e+03  1.327759e+08         0.000023
337  1.894020e+05  1.329136e+08         0.001425


**Liquidity Ratio Output Description**

- **24h_volume**: The 24-hour volume of trading of the cryptocurrency.
- **mkt_cap**: The overall market capitalization of the cryptocurrency.
- **liquidity_ratio**: The 24-hour trading volume to market capitalization ratio, showing how actively the coin is traded compared to its size.

From the output, we see:
- Liquidity ratios differ significantly across coins, ranging from as much as around `0.046` to very low levels like `0.000023`.
- A higher liquidity ratio indicates the coin is more actively traded relative to its market cap and thus implies greater liquidity.
- Lower values indicate less trading activity in relation to the size of the coin and can imply lesser liquidity.

# Saving the Feature-Engineered Data

After completing the feature engineering steps such as calculating moving averages, volatility, and liquidity ratio, it is important to save the updated dataframe. Saving to a new CSV file preserves the original processed data and clearly marks the data ready for the next step — `Model training`.

This practice ensures a clean workflow, allows easy tracking of data transformations, and provides a backup of the original processed dataset.


In [45]:
# Save the feature-engineered dataframe to a new CSV file for model training
output_path = r"C:\Users\ACER\OneDrive\Documents\my codess\Data-Analytics-Assignment\Crypto-Liquidity-Prediction-ML-Project\data\processed\crypto_data_feature_engineered.csv"
df.to_csv(output_path, index=False)

print(f"Feature-engineered data saved successfully to:\n{output_path}")

Feature-engineered data saved successfully to:
C:\Users\ACER\OneDrive\Documents\my codess\Data-Analytics-Assignment\Crypto-Liquidity-Prediction-ML-Project\data\processed\crypto_data_feature_engineered.csv


# Summary of Feature Engineering `02_feature_engineering.ipynb`

In this step, we enhanced our dataset by creating new features to improve model performance:

- **Loaded Preprocessed Data**: Imported the cleaned and preprocessed dataset saved from the previous step.
- **Calculated Moving Averages**: Computed 2-day moving averages for `price` and `market capitalization` to smooth short-term fluctuations and capture trends.
- **Calculated Volatility Score**: Derived a simple volatility measure as the absolute difference between 24-hour and 1-hour returns to quantify price fluctuations.
- **Calculated Liquidity Ratio**: Created a liquidity metric by dividing the 24-hour trading volume by the market capitalization, indicating asset liquidity.
- **Saved Feature-Engineered Data**: Exported the updated dataset with new features to a new CSV file for use in model training.

These new features help capture important temporal and market dynamics for more accurate liquidity prediction.
