This repository contains a machine learning model that predicts cricket match outcomes based on various in-game statistics.
cricket/
├── data/
│ ├── train/
│ │ └── cricket_dataset.csv
│ ├── test/
│ │ └── cricket_dataset_test.csv
│ └── results/
│ └── prediction_results_{timestamp}.csv
├── src/
│ ├── app.py
│ └── eval_api.py
│ ├── model/
│ └── cricket_prediction_model.pkl
│ ├── notebooks/
│ └── cricket_prediction.ipynb
└── README.md
The model uses the following features to make predictions:
total_runs: Total runs scoredwickets: Number of wickets fallentarget: Target scoreballs_left: Number of balls remainingwon: Target variable (1 for win, 0 for loss)
Random Forest Classifier was chosen for this task because:
- It handles non-linear relationships effectively
- Provides good protection against overfitting
- Can capture complex interactions between features
- Works well with both numerical and categorical data
- Clone the repository:
git clone https://github.com/zee-rox/cricket.git
cd cricket- Install required dependencies:
pip install requirements.txt- Start the FastAPI server:
cd src
uvicorn app:app --reloadThe server will start at http://localhost:8000
Makes predictions on cricket match outcomes using a CSV file.
Request body:
{
"csv_path": "path/to/input.csv"
}Response:
{
"result_path": "path/to/results/prediction_results_{timestamp}.csv"
}- Run the evaluation script:
python src/eval_api.pyThis will:
- Send a test CSV to the API
- Print the API response
- Display prediction results
- Clean up test files
The Random Forest model achieves:
- High accuracy in predicting match outcomes
- Good balance between precision and recall
- Robust performance across different match scenarios
The API includes comprehensive error handling for:
- Missing CSV files
- Invalid data formats
- Empty datasets
- Server errors
Predictions are saved in the data/results directory with timestamps for easy tracking and analysis.
- The model filters matches where:
- Less than 60 balls are remaining
- Target score is greater than 120
- Predictions are binary (win/loss)
- Results include all original features plus predictions