mlScikitTensorflow

Requirement

dependencies
- jupyter==1.0.0
- matplotlib==2.0.2
- numexpr==2.6.3
- numpy==1.13.1
- pandas==0.20.3
- Pillow==4.2.1
- protobuf==3.4.0
- psutil==5.3.1
- scikit-learn==0.19.0
- scipy==0.19.1
- sympy==1.1.1
- tensorflow==1.3.0

Official Book and Guidance

official book:

Hands-On Machine Learning with Scikit-Learn & TensorFlow

official guidance in jupyter notebook:

link

content

Part I, e Fundamentals of Machine Learning

Linear and Polynomial Regression, Logistic Regression, k-Nearest Neighbors, Support Vector Machines, Decision Trees, Random Forests, and Ensemble methods.

Part II, Neural Networks and Deep Learning

feedforward neural nets, convolutional nets, recurrent nets, long short-term memory (LSTM) nets, and autoencoders.

Other Materials

Joel Grus, Data Science from Scratch (O’Reilly). This book presents the fundamentals of Machine Learning, and implements some of the main algorithms in pure Python (from scratch, as the name suggests).
Stephen Marsland, Machine Learning: An Algorithmic Perspective (Chapman and Hall). This book is a great introduction to Machine Learning, covering a wide range of topics in depth, with code examples in Python (also from scratch, but using NumPy).
Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd Edition (Pearson). This is a great (and huge) book covering an incredible amount of topics, including Machine Learning. It helps put ML into perspective.

Challenge

Kaggle

Chapter 01

Algorithms List

Supervised learning
- k-nearest neighbors
- linear regression
- logistic regression
- support vector machines
- decision trees and random forests
- neural networks
Unsupervised learning
- Clustering
  - k-means
  - Hierarchical cluster analysis
  - expectation maximization
- Visualization and dimensionality reduction
  - principal component analysis
  - kernel PCA
  - locally-linear embedding
  - t-distributed Strochastic neighbor embedding
- Association rule learning
  - Apriori
  - Eclat
Semisupervised learning
- deep belif networks
Reinforcement learning

Main Challenges of ML

insufficient quantity of training data
nonrepresentative training data
poor-quality data ( pre-pocessing
irrelevant features ( feature engineering
overfitting the training data ( regularization - hyperparameter
underfitting the training data

Testing and Validating

80% the training set and 20% the test set ( generalization error)
cross-validation no free lunch theorem: if you make absolutely no assumption about the data, then there is no reason to prefer one model over any other.

Chapter 02

Scikit-Learn design

consistency: all objects share a consistent and simple interface:
- estimators - .fit(): estimate some parameters based on a dataset.
  - inspection .<attributes>: access the hyperparameters of the estimators
- transformers - .transform(): transform a dataset fit_transform()
- predictors - .predict(): make predictions given a dataset
  - .score(): measure the quality of the predictions given a test set
- composition: existing building blocks are reused as much as possible
- sensible defaults: SL provides reasonable default values for most parameters, making it easy to create a baseline working system quickly

Custom transformers

create a class and implement three methods, add Base:BaseEstimator as a base class
- fit()
- transform()
- fit_transform() - get it free by adding Base:ßTransformerMixin as a base class

Transformation pipelines

to help with sequences of transformations - pipeline:Pipeline

from sklearn.pipeline import Pipeline
num_pipeline = Pipeline([
        ('trans1', <transformer1>),
        ('trans2', <transformer2>)
        ...
    ])

new_data = num_pipeline.fit_transform(old_data)

to gather the result from different attribute - pipeline:FeatrureUnion

from sklearn.pipeline import FeatureUnion

pipe1 = ...
pipe2 = ...

full pipeline = FeatureUnion(transformer_list=[
        ('pipe1',pipe1),
        ('pipe2',pipe2)
])

Save model

to save the trained model for late usage - externals:joblib:

from sklearn.externals import joblib

joblib.dump(my_model,'my_model.pkl')
my_model_loaded = joblib.load('my_model.pkl')

Chapter 03

Implementing cross-validation

more control over the cross-validation process model_selection:StratifiedKFold,base:clone

skfolds = StratifiedKFold(n_splits=3, random_state=42)

for train_idx, test_idx in skfolds.split(X_train, y_train):
    clone_model = clone(ML_model)
    X_train_folds = X_train[train_idx]
    y_train_folds = y_train[train_idx]
    X_test_fold = X_train[test_idx]
    y_test_fold = y_train[test_idx]
    
    clone_model.fit(X_train_folds, y_train_folds)
    y_pred = clone_model.predict(X_test_fold)
    n_correct = sum(y_pred == y_test_fold)
    print(n_correct/(len(y_pred))

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.idea		.idea
datasets/housing		datasets/housing
newPipe		newPipe
Chapter00 SklearnTech.ipynb		Chapter00 SklearnTech.ipynb
Chapter01 TheMachineLearningLandscape.ipynb		Chapter01 TheMachineLearningLandscape.ipynb
Chapter01TheMachineLearningLandscape.ipynb		Chapter01TheMachineLearningLandscape.ipynb
Chapter02 End-to-end ML project.ipynb		Chapter02 End-to-end ML project.ipynb
Chapter03 Classification.ipynb		Chapter03 Classification.ipynb
Chapter04 Training Models.ipynb		Chapter04 Training Models.ipynb
Chapter05 SVM.ipynb		Chapter05 SVM.ipynb
Chapter06 Decision trees.ipynb		Chapter06 Decision trees.ipynb
Chapter07 Ensemble learning and random forests.ipynb		Chapter07 Ensemble learning and random forests.ipynb
Chapter08 Dimensionality reduction.ipynb		Chapter08 Dimensionality reduction.ipynb
ML_with_Scikit_Learn_and_TensorFlow.pdf		ML_with_Scikit_Learn_and_TensorFlow.pdf
README.md		README.md
ScikitApi.xlsx		ScikitApi.xlsx
excercise4_11.ipynb		excercise4_11.ipynb
iris_tree.dot		iris_tree.dot
iris_tree.png		iris_tree.png
pre.sh		pre.sh
~$ScikitApi.xlsx		~$ScikitApi.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlScikitTensorflow

Requirement

Official Book and Guidance

content

Part I, e Fundamentals of Machine Learning

Part II, Neural Networks and Deep Learning

Other Materials

Challenge

Chapter 01

Algorithms List

Main Challenges of ML

Testing and Validating

Chapter 02

Scikit-Learn design

Custom transformers

Transformation pipelines

Save model

Chapter 03

Implementing cross-validation

About

Releases

Packages

Contributors 2

Languages

xshii/mlScikitTensorflow

Folders and files

Latest commit

History

Repository files navigation

mlScikitTensorflow

Requirement

Official Book and Guidance

content

Part I, e Fundamentals of Machine Learning

Part II, Neural Networks and Deep Learning

Other Materials

Challenge

Chapter 01

Algorithms List

Main Challenges of ML

Testing and Validating

Chapter 02

Scikit-Learn design

Custom transformers

Transformation pipelines

Save model

Chapter 03

Implementing cross-validation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages