Skip to content
This repository was archived by the owner on Feb 15, 2026. It is now read-only.

toxicbishop/StatsPrograms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StatsPrograms

A comprehensive collection of Python programs demonstrating statistical analysis, data manipulation, and machine learning techniques. This repository contains 12 programs covering various statistical concepts and implementations.

📋 Programs Overview

01.py - Data Manipulation with Pandas

  • DataFrame merging and joining operations
  • Pivot tables and data reshaping
  • Data frame comparison techniques

02.py - String Manipulation and Regular Expressions

  • String methods (upper, title, strip, etc.)
  • String splitting and joining
  • Regular expression pattern matching
  • Email and phone number extraction and substitution

03.py - Time Series Analysis

  • Time series data generation
  • Daily mean calculations
  • ARIMA model implementation
  • Forecasting and visualization

04.py - Descriptive Statistics

  • Mean, median, and mode calculations
  • Frequency distribution handling
  • Statistical analysis with NumPy and SciPy

05.py - Cross-Validation Techniques

  • K-Fold cross-validation
  • Leave-One-Out cross-validation
  • Model evaluation with R-squared metrics
  • Regression model validation

06.py - Probability Distributions

  • Normal distribution
  • Binomial distribution
  • Poisson distribution
  • Bernoulli distribution
  • Distribution visualization with Matplotlib

07.py - Hypothesis Testing

  • One-sample t-test
  • Two-sample t-test
  • Statistical significance testing
  • Sample mean and standard deviation calculations

08.py - Analysis of Variance (ANOVA)

  • One-way ANOVA
  • Two-way ANOVA
  • Sum of squares calculations
  • Between-group and within-group variance analysis

09.py - Correlation and Regression

  • Pearson correlation coefficient
  • Linear regression implementation
  • Scatter plots and regression line visualization
  • Correlation analysis

10.py - Principal Component Analysis (PCA)

  • PCA implementation on Wisconsin Breast Cancer dataset
  • Dimensionality reduction
  • Variance analysis
  • PCA visualization

11.py - Linear Discriminant Analysis (LDA)

  • LDA implementation on Iris dataset
  • Classification boundary visualization
  • Feature transformation
  • Discriminant analysis

12.py - Linear Regression with Sklearn

  • Model training and testing
  • Prediction accuracy evaluation
  • Mean Squared Error (MSE) calculation
  • R-squared score analysis
  • Actual vs. Predicted visualization

📁 Project Structure

StatsPrograms/
│
├── .github/
│   └── workflows/          # GitHub Actions workflows
│
├── outputs/                # Generated visualizations and results
│
├── 01.py                   # Data Manipulation with Pandas
├── 02.py                   # String Manipulation and Regular Expressions
├── 03.py                   # Time Series Analysis
├── 04.py                   # Descriptive Statistics
├── 05.py                   # Cross-Validation Techniques
├── 06.py                   # Probability Distributions
├── 07.py                   # Hypothesis Testing
├── 08.py                   # Analysis of Variance (ANOVA)
├── 09.py                   # Correlation and Regression
├── 10.py                   # Principal Component Analysis (PCA)
├── 11.py                   # Linear Discriminant Analysis (LDA)
├── 12.py                   # Linear Regression with Sklearn
│
├── requirements.txt        # Python dependencies
└── README.md              # Project documentation

🛠️ Technologies Used

  • Python 3.x
  • NumPy - Numerical computing
  • Pandas - Data manipulation and analysis
  • Matplotlib - Data visualization
  • Seaborn - Statistical data visualization
  • SciPy - Scientific computing
  • Scikit-learn - Machine learning algorithms
  • Statsmodels - Statistical modeling

📦 Installation

  1. Clone the repository:
git clone https://github.com/toxicbishop/StatsPrograms.git
cd StatsPrograms
  1. Install required dependencies:
pip install -r requirements.txt

🚀 Usage

Run any program individually:

python 01.py
python 02.py
# ... and so on

📊 Outputs

Program outputs (including visualizations) are saved in the outputs/ directory.

📝 License

This project is free to use without requiring a specific license.

👤 Author

ToxicBishop


This repository serves as a learning resource for statistical analysis and data science with Python.

About

A comprehensive collection of Python programs demonstrating statistical analysis, data manipulation, and machine learning techniques. This repository contains 12 programs covering various statistical concepts and implementations.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages