Skip to content

A comprehensive exploratory data analysis (EDA) project showcasing data cleaning, visualization, and statistical insights using Python and Jupyter Notebook.

Notifications You must be signed in to change notification settings

zeroandoneme/Exploratory-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis (EDA)

Overview

This project demonstrates the use of exploratory data analysis techniques to extract meaningful insights from a dataset. The analysis is performed using a Jupyter Notebook, combining data manipulation, visualization, and statistical exploration. We pre-process + clean the dataset making it ready for model training. The model trained on this dataset will be a mulit-class classifier for 4 different breast cancer types.

Features

  • Data Loading and Cleaning: Initial exploration of the dataset, handling missing values, and fixing data inconsistencies.
  • Statistical Summary: Calculation of descriptive statistics (mean, median, standard deviation, etc.).
  • Data Visualization: Generation of plots (e.g., histograms, scatter plots, box plots) for understanding data distributions and relationships.
  • Correlation Analysis: Investigation of relationships between variables.
  • Key Findings: Highlights and insights derived from the analysis.

Prerequisites

To run this notebook, ensure you have the following installed:

  • Python 3.8+
  • Jupyter Notebook or Jupyter Lab
  • The following Python libraries:
    • pandas
    • numpy
    • matplotlib
    • seaborn
    • scipy (optional, if advanced statistical tests are included)

You can install the dependencies using:

pip install pandas numpy matplotlib seaborn scipy

How to Use

  • Clone the repository or download the notebook file.
  • Open the notebook in VS Code or any preferred IDE.
  • Open the EDA.ipynb file.
  • Run each cell sequentially to perform the analysis.

Dataset

  • Name: data.csv
  • Description: This dataset contains clinical and genomic data for patients with breast cancer. The dataset includes four types of breast cancer.

Results

The results are documented in the notebook as inline comments and visualizations. For further analysis or reporting, you can export the figures and data summaries.

Contact

For questions or feedback, please reach out to:

About

A comprehensive exploratory data analysis (EDA) project showcasing data cleaning, visualization, and statistical insights using Python and Jupyter Notebook.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published