Skip to content

toppare/data_science_resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 

Repository files navigation

Data Science Resources

Useful data science resources, mostly articles.

What Data Science is About

Data Science for Startups: Introduction ---> link for all articles.

Data science project flow for startups

Required Skills

Everything You REALLY Need to Know to Become a Data Scientist

One Analyst’s Guide for going from Good to Great

Data Visualization

The Art of Effective Visualization of Multi-dimensional Data

Effective Visualization of Multi-Dimensional Data — A Hands-on Approach

Probability & Statistics

seeing theory: visualization of statistical concepts

playlist by Brandon Foltz: covers almost everything

Data Wrangling

Pandas DataFrame indexing

Visualizing Pandas' Pivoting and Reshaping Functions

Methods for handling missing values: good aggregation of all methods together

How to Handle Missing Data

Handling Missing Values

Best Practices with Pandas

Data Discovery

Spotify - Lexikon

Linkedin - DataHub

Lyft - Amundsen

Netflix - Metacat

Programming

The Mistakes I Made As a Beginner Programmer

Data scientists, the only useful code is production code

Git and Github Tutorial

Organization & Roles

Data Engineer vs Data Scientist

Data scientist archetypes

The death of data scientist

Analytics Organization

Interview Preparation

109 Commonly Asked Data Science Interview Questions

interview questions - Quora

real life questions

40 Questions on Probability for data science

Data Science Question Answer

41 Essential Machine Learning Interview Questions

SQL / Databases

PostgreSQL vs. pandas — how to balance tasks between server and client side

Amazon Redshift - Fundamentals

A/B Testing

exp-platform: probably best resource on the topic. Also read every Quora answer given by Ronny Kohavi

Online Controlled Experiments by Ronny Kohavi

What is a one tailed vs. a two tailed test?

Machine Learning

Rules of ML

ML Glossary

A Few Useful Things to Know about Machine Learning

How to handle Imbalanced Classification Problems in machine learning?

Kaggle Machine Learning Tutorial

Algorithms

How to Use t-SNE Effectively

Boosting vs Bagging: Boosting & Bagging

Clustering Algorithms with Animation

Feature Engineering

Understanding Feature Engineering (Part 1) — Continuous Numeric Data

Understanding Feature Engineering (Part 2) — Categorical Data

Understanding Feature Engineering (Part 3) — Traditional Methods for Text Data

Feature Engineering: Data scientist's Secret Sauce !

Dimensionality Reduction Animation

Interpretability

Explainable Artificial Intelligence Part-1

Explainable Artificial Intelligence Part-2

Metrics

ROC Curve

Model Selection

How to choose a machine learning model: 5 parts post from Brandon Rohrer. part2, part3, part4, part5

Other Resources

Data Elixir Newsletter

Data Machina Newsletter

Interesting Libraries & Repositories

ppscore: Predictive Power Score (PPS) in Python auto-sklearn: auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator

Prophet: Time Series Forecasting

Python Data Science Handbook

Feature Tools: Featuretools is a framework to perform automated feature engineering. It excels at transforming transactional and relational datasets into feature matrices for machine learning.

ctparse: Parse natural language time expressions

ExpAn: statistical analysis of A/B tests by Zalando

physt: P(i/y)thon h(i/y)stograms

firefly: function as a service, minimal library to deploy ML as a RESTful API

Skater: Python Library for Model Interpretation/Explanations

textgenrnn: Train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

LIME: Explaining the predictions of any machine learning classifier

skopt (scikit-optimize): library for optimization (hyperparameter optimization for example)

pyod: A Python Toolkit for Scalable Outlier Detection (Anomaly Detection)

Springboard Related

resources to share with springboard students

Finding Datasets

Google Dataset Search

Quandl: Financial, Economic and Alternative data.

data.world

/r/datasets

Kaggle

openml

figure eight

another github repo

a list gathered by gengo.ai

Past Projects (good ones)

cancer classification

movie recommender

kaggle airbnb competition

Early Corn Yields Prediction Using Satellite Images

Predicting Building Permit Issuance Times

Predicting Player Rating by Player Position for European Football

Predicting Opioid Overdose Mortality Rate

Topic-Modeling with /r/PersonalFinance

Toxic Comment Classification

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published