Skip to content

Resources for software/backend/data learning | #SE | #DE | #DS

License

Notifications You must be signed in to change notification settings

yennanliu/knowledge_base_repo

Repository files navigation

Knowledge_base_repo

Repo for software engineering learning

Topics

  • Programming : Algorithm, Data Structure, CS basic, (python/Java/Shell/R)
  • DS basics : Statistics, ML, DL. DB
  • DE basics : system design, cloud services, ETL, Scalability
  • DE/DE env : Dockerfile
  • Case study

Contents

Reference Comments
Programming Python / R / Shell & Linux / Javascript / Java
Machine Learning ML / DL
Statistics Statistics Learning / Probability
Database RDBMS / NoSQL / data warehouse
ETL Airflow
AWS Intro AWS EB/RDS/EC2/Lambda/Kinesis
GCP Intro to GCP
Docker Docker intro / docker playgrond (play with docker)
System Design system design note
Spark Spark basics/pipeline/ML intro
Scalable Scalable note
Streaming Streaming intro
Case-Study various DS/DE case sharing
Competition DS Competitions
Other Resources some other information
Dockerfile DS/DE Dockerfile

Programming

Computer Science

Python

Javascript

Java

Scala

Shell & Linux

R

Design pattern


ML


DL


Statistics


Database

  • Laundch school Posgre course - DATABASE DESIGN AND PERFORMANCE
  • Database Structure and Design Tutorial - database-design
  • PostgreSQL Intro - PostgreSQL Tutorial
  • Dataquest DE - data-engineer Tutorial
  • pgmodeler - UI for postgresql data modeling (generate DDL)
  • Hbase - Nosql DB for Hadoop
  • Hive - data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL
  • Hadoop - framework that allows for the distributed processing of large data sets across clusters of computers
  • Redshift table design - AWS Tutorial: Tuning Table Design (Star Schema Benchmark (SSB) schema without sort keys, distribution styles, and compression encodings)
  • Redshift best practices - AWS Tutorial: Summary of the most important design decisions and presents best practices for optimizing query performance. Designing Tables provides more detailed explanations and examples of table design options.
  • Postgre Window function - go through operations like min/max group value, rank/denseRank,lag/lead with Postgre window function
  • Database Intro (Stanford) - Introduction to Databases - Self Paced Series

ETL


AWS


GCP


Docker


System-Design


Spark


Scalable


Streaming


Case-Study


Competition

  • Kaggle - worlds' biggest data science competition platform
  • Quantopian - Quant competition platform for developers
  • numer.ai - another data science competition platform

Other-Resources

  • Towards Data Science - DS blog for Sharing concepts, ideas, and codes
  • Thepudding - The Pudding is a digital publication that explains ideas debated in culture with visual essays. (DS/data visualization/insights)

Dockerfile (depreciated)