Skip to content

yz599/ML-algorithms

Repository files navigation

This repository includes a brief summary of interesting books, articles and my own notes to develop skills required to pursue career in data science.

So, what is a data scientist?

Here are some answers from Kaggle, Wikipedia,
and datascience@berkeley - Data science Life cycle.

“The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.”
--- Hal Varian, chief economist at Google and UC Berkeley professor of information sciences, business, and economics 3

Fig 1. Data science life cycle1:

The image represents the five stages of the data science life cycle:

  1. Capture
  2. Maintain
  3. Process
  4. Analyze
  5. Communicate

To summarize,

  • you need be skilled at the tools and techniques to manage and manipulate with data (stage 1 and 2),
  • and proficient in programming language to efficiently implement the machine learning algorithms to do tasks like clustering, classification, regression, and ect.(stage 3 and 4),
  • domain knowledge to make practical and physical sense of data product.

Machine learning comes into play at the stage 3 and 4 of data processing and analyzing. If you want to dig into machine learning a bit here, this article - Understanding a Machine Learning workflow through food- explains the workflow of machine learn through cooking.


And what does it take to became a data scientist?

This blog on datasciencecentral.com gives a great practical introduction to data science, specially, the critical and basic skills and knowledge needed.

I also find this amazing pragmatic and visual representation of a curriculum, a learning plan that one can use in this becoming a data scientist journey by Swami Chandrasekaran.

Fig 2. Becoming a Data Scientist – Curriculum via Metromap

The skill sets required to be a data scientist:

1. Manipulate with BIG data

  • Data acquisition
    • web scraping...
    • API...
  • Data ingestion
    • dataset management (SQL ...) ...
  • Data integration
    • serialization
  • Data wrangling and munging
    • working with data in CPU (numpy, pandas ...)
    • data preprocessing (cleaning, missing data)
    • visualization
    • feature extraction ...

🔖Recommended Courses:
Get data ready

Fig 3. Big Data sources and methods for social and economic analyses

Fig 4. Big Data Architecture

2. Machine learning

  • Theory:
    • Linear algebra
    • Statistics
    • Calculus
  • Deep Learning (DL)
    • ANN ...
  • Programming:
    • Packages

3. Deployment

🔖Recommended Courses:

  1. Deep learning
    Interesting intro to DL
    Stanford CS class CS231n:
    Oxford MLSS Chicago
    Columbia

🔖Books

  1. Deep learning
    Dive into deep learning
    Chinese version
    Other Stanford-cs221
    CMU1
    CMU2
    CMU TOM Stanford

  2. Unsupervised learning Information Theory, Inference, and Learning Algorithms Unsupervised learning

  3. Programming
    Clean code

  4. Mathematics for ML
    STAT 157 UC Berkeley Introduction to Probability and Statistics


You'll find more useful references in the importantURLs note. I'll put my notes of ML study here named as mynote+number+topic.

About

Implement of ml methods with python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published