A comprehensive collection of Python programs demonstrating statistical analysis, data manipulation, and machine learning techniques. This repository contains 12 programs covering various statistical concepts and implementations.
- DataFrame merging and joining operations
- Pivot tables and data reshaping
- Data frame comparison techniques
- String methods (upper, title, strip, etc.)
- String splitting and joining
- Regular expression pattern matching
- Email and phone number extraction and substitution
- Time series data generation
- Daily mean calculations
- ARIMA model implementation
- Forecasting and visualization
- Mean, median, and mode calculations
- Frequency distribution handling
- Statistical analysis with NumPy and SciPy
- K-Fold cross-validation
- Leave-One-Out cross-validation
- Model evaluation with R-squared metrics
- Regression model validation
- Normal distribution
- Binomial distribution
- Poisson distribution
- Bernoulli distribution
- Distribution visualization with Matplotlib
- One-sample t-test
- Two-sample t-test
- Statistical significance testing
- Sample mean and standard deviation calculations
- One-way ANOVA
- Two-way ANOVA
- Sum of squares calculations
- Between-group and within-group variance analysis
- Pearson correlation coefficient
- Linear regression implementation
- Scatter plots and regression line visualization
- Correlation analysis
- PCA implementation on Wisconsin Breast Cancer dataset
- Dimensionality reduction
- Variance analysis
- PCA visualization
- LDA implementation on Iris dataset
- Classification boundary visualization
- Feature transformation
- Discriminant analysis
- Model training and testing
- Prediction accuracy evaluation
- Mean Squared Error (MSE) calculation
- R-squared score analysis
- Actual vs. Predicted visualization
StatsPrograms/
│
├── .github/
│ └── workflows/ # GitHub Actions workflows
│
├── outputs/ # Generated visualizations and results
│
├── 01.py # Data Manipulation with Pandas
├── 02.py # String Manipulation and Regular Expressions
├── 03.py # Time Series Analysis
├── 04.py # Descriptive Statistics
├── 05.py # Cross-Validation Techniques
├── 06.py # Probability Distributions
├── 07.py # Hypothesis Testing
├── 08.py # Analysis of Variance (ANOVA)
├── 09.py # Correlation and Regression
├── 10.py # Principal Component Analysis (PCA)
├── 11.py # Linear Discriminant Analysis (LDA)
├── 12.py # Linear Regression with Sklearn
│
├── requirements.txt # Python dependencies
└── README.md # Project documentation
- Python 3.x
- NumPy - Numerical computing
- Pandas - Data manipulation and analysis
- Matplotlib - Data visualization
- Seaborn - Statistical data visualization
- SciPy - Scientific computing
- Scikit-learn - Machine learning algorithms
- Statsmodels - Statistical modeling
- Clone the repository:
git clone https://github.com/toxicbishop/StatsPrograms.git
cd StatsPrograms- Install required dependencies:
pip install -r requirements.txtRun any program individually:
python 01.py
python 02.py
# ... and so onProgram outputs (including visualizations) are saved in the outputs/ directory.
This project is free to use without requiring a specific license.
This repository serves as a learning resource for statistical analysis and data science with Python.