- Programming : Algorithm, Data Structure, CS basic, (python/Java/Shell/R)
- DS basics : Statistics, ML, DL. DB
- DE basics : system design, cloud services, ETL, Scalability
- DE/DE env : Dockerfile
- Case study
Reference | Comments |
---|---|
Programming | Python / R / Shell & Linux / Javascript / Java |
Machine Learning | ML / DL |
Statistics | Statistics Learning / Probability |
Database | RDBMS / NoSQL / data warehouse |
ETL | Airflow |
AWS | Intro AWS EB/RDS/EC2/Lambda/Kinesis |
GCP | Intro to GCP |
Docker | Docker intro / docker playgrond (play with docker) |
System Design | system design note |
Spark | Spark basics/pipeline/ML intro |
Scalable | Scalable note |
Streaming | Streaming intro |
Case-Study | various DS/DE case sharing |
Competition | DS Competitions |
Other Resources | some other information |
Dockerfile | DS/DE Dockerfile |
- CS 50 - CS 50 course
- Intro to CS - OSSU CS course
- Introduction to Computer Networking (Stanford) - Intro to Networking in CS
- Algorithm note - NTNU's Algorithm note
- CS basics - My repo for Algorithm/Data structure learning (and leetcode)
- Leetcode - My Leetcode solutions
- Python official intro - official python tutorial
- Python central tutorial - pythoncentral tutorial
- Python udacity - Udacity python 101 course
- Python codecademy - codecademy python course
- 101 NumPy Exercises - a good exercises test your Numpy skills
- Getting Started With Testing in Python - learn how to write test with python
- Learn functional pyhton in 10 munute - learn functional programming via python
- Python exceptions - learn how to handle exceptions in python
- Software Testing -- Python
- Software Debugging -- Python
- Intro to Javascript - w3schools JS tutorial
- Codecademy Javascript - Codecademy JS tutorial
- Intro to Node.Js - Intro to Node.Js (JS full stack framework)
- Jquery VS Javascript - comparison between using Jquery VS Javascript
- Codecademy Java - Codecademy Java tutorial
- Intro to Java - tutorialspoint Java tutorial
- Scala tutorialspoint - tutorialspoint Scala tutorial
- Scala school (Twitter) - Twitter Scala tutorial
- Intro to Scala - Online Scala course
- Bash scripting cheatsheet - bash/shell scripting cheatsheet
- Intro to Linux/Unix - tutorialspoint Linux/Unix tutorial
- Command line Challenge - Test how is your CLI (bash/shell) skill
- R intro - Datacamp R course
- https://www.youtube.com/watch?v=eiDyK_ofPPM&list=PLC0nd42SBTaNuP4iB4L6SJlMaHE71FG6N
- https://github.com/yennanliu/java-design-patterns
- https://github.com/yennanliu/design-patterns-java
- CS 229 Machine Learning - Andrew Ng's ML course
- CS 229 Machine Learning nb - Andrew Ng's ML exercises in ipython notebook
- Scikit-learn intro - Popular ML API video
- Google ML intro - Google's fast-paced, practical introduction to machine learning with TensorFlow APIs
- ML FAQ - 40 Interview Questions asked at Startups in Machine Learning / Data Science
- Almost All ML Problem Approach - Kaggle post about practical ML problem approaches
- Deep Learning Specialization - Andrew Ng's DL courses set
- Deep Learning nb - Andrew Ng's DL courses assignment notebook reference
- Udemy lazy-programmer - Deep learning, AI course sets
- MIT 6.S094: Deep Learning - MIT DL course for self-driving cars
- Deep Learning Online Book - An MIT Press book : Ian Goodfellow and Yoshua Bengio and Aaron Courville
- Fast.ai - Awesome DL Tutorial
- neuralnetworksanddeeplearning.com - DL online book in math/physics style
- Siraj Raval sets - My favorite DL/ML youtuber
- Morvan-Python - Another awesome DL/ML tutorial in Mandarin
- Kaggle-Learn - Hands-On Data Science Education on Kaggle
- CS4705 NLP - Michael Collins's NLP course
- Deep Dive into Math Behind Deep Networks - Math explaination DL
- Dive into Deep Learning - An interactive deep learning book for students, engineers, and researchers.
- Dive into Deep Learning (Zh version) - Dive into Deep Learning in ZH.
- seedbank (google research) - Collection of Interactive Machine Learning Examples and can run on google colab.
- Full Stack Deep Learning) - Spring 2019 Full Stack Deep Learning Bootcamp.
- CS231n - Convolutional Neural Networks for Visual Recognition
- Intro to Statistics - tutorialspoint statistics tutorial
- intro2stats - Introduction to Statistics using Python
- Stats in python - statistical analysis python tutorial
- Statistics Rethinking - Statistical Rethinking with brms, ggplot2
- Statistics Rethinking Repo - Github repo for Statistical Rethinking
- Intermediate Statistics (Lecture 13 : Bootstrap) - CMU 36-705 Intermediate Statistics
- A/B Testing: The Definitive Guide to improve your prod - Dataquest's A/B test intro
- A/B Testing Udacity - Udacity A/B test course
- Customer Analytics & A/B Testing in Python - Do Customer Analytics via python
- Awesome AB Test Tool - Intuitive statistical calculators, ideal for planning and analyzing A/B tests
- Laundch school Posgre course - DATABASE DESIGN AND PERFORMANCE
- Database Structure and Design Tutorial - database-design
- PostgreSQL Intro - PostgreSQL Tutorial
- Dataquest DE - data-engineer Tutorial
- pgmodeler - UI for postgresql data modeling (generate DDL)
- Hbase - Nosql DB for Hadoop
- Hive - data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL
- Hadoop - framework that allows for the distributed processing of large data sets across clusters of computers
- Redshift table design - AWS Tutorial: Tuning Table Design (Star Schema Benchmark (SSB) schema without sort keys, distribution styles, and compression encodings)
- Redshift best practices - AWS Tutorial: Summary of the most important design decisions and presents best practices for optimizing query performance. Designing Tables provides more detailed explanations and examples of table design options.
- Postgre Window function - go through operations like
min/max group value
,rank/denseRank
,lag/lead
with Postgre window function - Database Intro (Stanford) - Introduction to Databases - Self Paced Series
- Airflow doc. - Airflow Documentation
- Astronomer doc. - Astronomer airflow Documentation
- Astronomer CLI - Astronomer command line repo
- awesome-apache-airflow - Curated list of resources about Apache Airflow
- Intro how to deploy a flask APP via AWS EB/RDS/EC2 - Blog intro how to deploy flask web app to AWS
- AWS Lambda tutorial - tutorialspoint AWS Lambda tutorial
- AWS Kinesis tutorial - create a real-time processing system, via AWS Kinesis
- AWS Kinesis tutorial (zh) - another AWS Kinesis tutorial in ZH
- GCP tutorial - google official GCP tutorial
- PLAY WITH DOCKER - An online free env build/test docker
- How Docker Help You Be Effective Data Scientist - Intro how does docker can help DS project
- docker-workshop - Set up DB Docker
- katacoda - Learn new technologies (docker, k8s...) using real environments right in your browser
- System design primer - Awesome repo for System design learning
- CS 245 - Principles of Data-Intensive Systems Stanford CS 245 course : Principles of Data-Intensive Systems
- A Beginner’s Guide to Data Engineering — Part I - DE intro part I from Airbnb's data scientist
- A Beginner’s Guide to Data Engineering — Part II - DE intro part II from Airbnb's data scientist
- A Beginner’s Guide to Data Engineering — Part III DE intro part III from Airbnb's data scientist
- CS 246 : Spark Tutorial
- Databricks Getting Started Guide - Databricks documentation (Spark)
- Hadoop ecosystem intro - Hadoop, Hbase, Spark, Spark-submit, Hive intro 30 days series
- Spark tutorial - Spark Tutorial – Learn Spark Programming
- System design questions - Set of interview system design questions
- highscalability.com - Website about scalable system design practices
- Mining of Massive Datasets - CS246: Mining Massive Datasets (and CS345A: Data Mining)
- CS 246 video - CS246: Mining Massive Datasets
- CS 345 A -CS345A Winter 2009: Data Mining
- CS 246H - Labd of CS 246
- Real-time syslog Processing - eal-time syslog Processing with Apache Kafka and KSQL
- Real-time Stock Data Pipeline - real-time pipeline with python, kafka, Zookeeper, Cassandra, Spark..
- Real-time streaming Pipeline with twitter data and GCP - Realtime Streaming Pipeline using GVP and Bokeh
- kafka to tranform a batch pipeline to real time one - Using Apache Kafka Java
- Analytics Pipeline @ Lyst - Lyst data science infra
- Build a Real-time Stream Processing Pipeline with Apache Flink on AWS - AWS pipeline
- Netflix real time pipeline - Migrating Batch ETL to Stream Processing: A Netflix Case Study with Kafka and Flink
- Evolution of Netflix data pipeline - evolution-of-the-netflix-data-pipeline
- Airstream at airbnb - spark-streaming-at-airbnb
- Architecture Of Giants Data Stacks Architecture (data) @ Netflix/Airbnb/Twitter..
- The Waiting Time Paradox, or, Why Is My Bus Always Late?
- Netflix billing migration to AWS part - 1
- Netflix billing migration to AWS part - 2
- Netflix billing migration to AWS part - 3
- Kaggle - worlds' biggest data science competition platform
- Quantopian - Quant competition platform for developers
- numer.ai - another data science competition platform
- Towards Data Science - DS blog for Sharing concepts, ideas, and codes
- Thepudding - The Pudding is a digital publication that explains ideas debated in culture with visual essays. (DS/data visualization/insights)