UBC Scientific Software Seminar: Machine Learning in Python with scikit-learn
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
2016-10-07-notes.ipynb Added notes for October 7 Oct 8, 2016
2016-11-04-notes.ipynb Rename 2016-11-9-notes.ipynb to 2016-11-04-notes.ipynb Nov 11, 2016


UBC Scientific Software Seminar

The UBC Scientific Software Seminar is inspired by Software Carpentry and its goal is to help students, graduates, fellows and faculty at UBC develop software skills for science.

Fall 2016: Machine Learning in Python with scikit-learn


  • What are the learning goals?
    • To learn how to use scikit-learn to solve machine learning problems
    • To master Python programming for scientific computing
    • To learn mathematics and statistics applied to data science and machine learning
    • To meet and collaborate with other students and faculty interested in scientific computing
  • What software tools are we going to use?
  • What scientific topics will we study?
  • Where do we start? What are the prerequisites?
    • UBCS3 Fall 2016 is a continuation of UBCS3 Summer 2016 which included:
      • Bash shell
      • Git/GitHub
      • Python programming
      • SciPy stack: NumPy, Scipy, matplotlib and pandas
      • Basic examples using scikit-learn
    • Calculus, linear algebra, probability and statistics
  • Who is the target audience?
    • Everyone is invited!
    • If the outline above is at your level, perfect! Get ready to write a lot of code!
    • If the outline above seems too intimidating, come anyway! You'll learn things just by being exposed to new tools and ideas, and meeting new people!
    • If you have experience with all the topics outlined above, come anyway! You'll become more of an expert by participating as a helper/instructor!


Fall 2016 will consist of weekly 1-hour meetings held from October until mid-December. The regular scheduled time is Friday 1-2pm (with additional hour 3-4pm for those who cannot attend 1-2pm).

  • Week 1 - Friday October 7 - 1-2pm - LSK 121 [Notes]
    • Overview of machine learning problems
    • Exploring the scikit-learn documentation
    • Getting to know the scikit-learn API
    • First examples with builtin example datasets
  • Week 2 - Friday October 14 - 1-2pm - LSK 121 [Notes]
    • Regression Example: Diabetes dataset
      • A closer look at least squares linear regression calculations
      • Can we improve R2? Let's create more features
      • Splitting the dataset: Training data and testing data
    • Classification Example: Hand-written digits dataset
      • K-nearest neighbors classifier
      • Evaluating the model
  • Week 3 - Friday October 21 - 1-2pm - LSK 121 [Notes]
    • Dimensionality reduction
    • Principal component analysis
    • Visualizing the digits dataset
    • Linear algebra behind principal component analysis
  • Week 4 - Friday October 28 - 1-2pm - LSK 121 [Notes]
    • PCA revisted
      • Visualizing principal components
    • Unsupervised learning
      • Clustering with K-means
      • Digits dataset: How many different kinds of 1s are there?
      • Combining KMeans with PCA
  • Week 5 - Friday November 4 - 1-2pm - LSK 121 [Notes]
    • Kernel density estimation and Gaussian processes - Presented by @sempwn
  • Remembrance Day - No meeting November 11
  • Week 6 - Friday November 18 - 1-2pm - UCLL 109
    • Natural Language Processing with nltk: Movie Review Classification - Presented by @dbhaskar92
  • Week 7 - Friday November 25 - 1-2pm - UCLL 109 [Notes]
    • Natural Language Processing with nltk: Movie Review Classification (Continued)
      • Working with nltk movie review dataset
      • Using regular expressions to remove punctuation and stopwords
      • Creating feature vectors from movie reviews
      • Applying a Naive Bayes classifier