This project focuses on predicting the AC power generation of a solar power plant using machine learning models. The primary goal is to forecast power generation for the upcoming days, assisting plant operators in efficient resource planning and management. This project was conducted under the EE7209: Machine Learning module in the Department of Electrical and Information Engineering, University of Ruhuna.
- Plant_2_Generation_Data.csv: Contains power generation data for different inverters over time.
- Plant_2_Weather_Sensor_Data.csv: Holds weather information relevant to power generation (temperature, irradiation, etc.).
- SolarPowerGenerationPrediction.ipynb: The Jupyter Notebook containing the data analysis, preprocessing, model building, and evaluation steps.
- Data Loading and Cleaning: The dataset is loaded from CSV files and initial cleaning is performed (dropping irrelevant columns).
- Feature Engineering:
- A new feature
TOTAL_MINUTES_PASS
is created to capture time within a day in a format suitable for machine learning models. - The
SOURCE_KEY
(inverter ID) is one-hot encoded to handle its categorical nature.
- A new feature
- Exploratory Data Analysis (EDA): The data is thoroughly explored using visualizations and summary statistics to understand patterns and relationships.
- Feature Scaling: Numerical features are standardized for better model performance.
- Model Building: Two regression models are implemented and compared:
- Linear Regression: A simple baseline model.
- Random Forest Regression: A more flexible, non-linear model.
- Hyperparameter Tuning: Grid Search is used to find optimal hyperparameters for the Random Forest model.
- Model Evaluation: Models are evaluated on both training and testing sets using metrics like R-squared, Mean Absolute Error (MAE), Median Absolute Error (MedAE), and Root Mean Squared Error (RMSE).
- Random Forest outperforms Linear Regression: The Random Forest model, especially after hyperparameter tuning, significantly outperforms Linear Regression in predicting AC power generation.
- Overfitting Mitigation: Techniques to reduce overfitting in the Random Forest model (limiting tree depth, adjusting minimum samples per split/leaf) were explored.
- Data Insights: EDA reveals important relationships between weather conditions, time, and power generation.