Useful data science resources, mostly articles.
Data Science for Startups: Introduction ---> link for all articles.
Data science project flow for startups
Everything You REALLY Need to Know to Become a Data Scientist
One Analyst’s Guide for going from Good to Great
The Art of Effective Visualization of Multi-dimensional Data
Effective Visualization of Multi-Dimensional Data — A Hands-on Approach
seeing theory: visualization of statistical concepts
- basic probability
- compound probability
- probability distributions
- frequentist inference
- bayesian inference
- regression analysis
playlist by Brandon Foltz: covers almost everything
Visualizing Pandas' Pivoting and Reshaping Functions
Methods for handling missing values: good aggregation of all methods together
The Mistakes I Made As a Beginner Programmer
Data scientists, the only useful code is production code
Data Engineer vs Data Scientist
109 Commonly Asked Data Science Interview Questions
40 Questions on Probability for data science
41 Essential Machine Learning Interview Questions
PostgreSQL vs. pandas — how to balance tasks between server and client side
Amazon Redshift - Fundamentals
exp-platform: probably best resource on the topic. Also read every Quora answer given by Ronny Kohavi
Online Controlled Experiments by Ronny Kohavi
What is a one tailed vs. a two tailed test?
A Few Useful Things to Know about Machine Learning
How to handle Imbalanced Classification Problems in machine learning?
Kaggle Machine Learning Tutorial
Boosting vs Bagging: Boosting & Bagging
Clustering Algorithms with Animation
Understanding Feature Engineering (Part 1) — Continuous Numeric Data
Understanding Feature Engineering (Part 2) — Categorical Data
Understanding Feature Engineering (Part 3) — Traditional Methods for Text Data
Feature Engineering: Data scientist's Secret Sauce !
Dimensionality Reduction Animation
Explainable Artificial Intelligence Part-1
Explainable Artificial Intelligence Part-2
How to choose a machine learning model: 5 parts post from Brandon Rohrer. part2, part3, part4, part5
ppscore: Predictive Power Score (PPS) in Python auto-sklearn: auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator
Prophet: Time Series Forecasting
Feature Tools: Featuretools is a framework to perform automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning.
ctparse: Parse natural language time expressions
ExpAn: statistical analysis of A/B tests by Zalando
physt: P(i/y)thon h(i/y)stograms
firefly: function as a service, minimal library to deploy ML as a RESTful API
Skater: Python Library for Model Interpretation/Explanations
textgenrnn: Train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
LIME: Explaining the predictions of any machine learning classifier
skopt (scikit-optimize): library for optimization (hyperparameter optimization for example)
pyod: A Python Toolkit for Scalable Outlier Detection (Anomaly Detection)
resources to share with springboard students
Quandl: Financial, Economic and Alternative data.
Early Corn Yields Prediction Using Satellite Images
Predicting Building Permit Issuance Times
Predicting Player Rating by Player Position for European Football
Predicting Opioid Overdose Mortality Rate