GitHub - ucb-stat154/stat154-spring-2018: Course materials for Stat 154, spring 2018, at UC Berkeley

Stat 154 Spring 2018

This repository holds the course materials for the Spring 2018 edition of Statistics 154: Modern Statistical Prediction and Machine Learning at UC Berkeley.

Instructor: Gaston Sanchez, gasigiri [at] berkeley [dot] edu
Class Time: MWF 11-12pm in 180 Tan
Session Dates: 01/17/18 - 05/04/18
Code #: 30887
Units: 4 (more info here)
Office Hours: MW 2:15-3:15pm in 309 Evans (or by appointment)
Piazza: piazza.com/berkeley/spring2018/stat154
Final: Tue May 8, 7-10pm (room TBD)
GSI: Omid Solari (Mon. 5-6pm, Wed. 8-10am @444 EVANS).

Lab	Date	Room	GSI
101	M 12-2pm	334 Evans	Omid Solari
102	M 3-5pm	334 Evans	Omid Solari

Description

This is an introductory-level course in statistical learning, with an emphasis on regression and classification methods, and a pinch of unsupervised methods. The course includes, time permiting, the following topics (not necessarily in the displayed order, see syllabus for more info):

Process of predictive model building
Data Preprocessing
Regression Models
- Linear models
- Non-linear models (time permitting)
- Tree-based methods
Classification Models
- Linear models
- Non-linear models
- Tree-based methods
- Support Vector Machines (time permitting)
Unsupervised methods like PCA and Clustering
Data spending: splitting and resampling methods
Bias-Variance Trade-off
Model Assessment
Model Selection

Throughout the semester we will explore the predictive modeling lifecycle, including question formulation, data preprocessing, exploratory data analysis and visualization, model building, model assessment/validation, model selection, and decision-making.

Prerequisites / Review

Multivariate calculus or the equivalent, esp. partial derivatives; e.g. Math 53
Linear algebra or the equivalent (matrices, vector spaces); e.g. Math 54
Statistical inference or the equivalent; e.g. Stat 135
Scripting experience in R required; e.g. Stat 133

This course will build on a lot of material from matrix algebra. In particular, you should be comfortable with notions such as vector spaces, inner products, norms, matrix products/transpose/rank/determinants/inverses, as well as matrix decompositions.

You should also have some scripting experience---preferably in R---at the level of writing functions, conditionals (if-then-else structures), for loops, while loops, sampling, read in data sets, export results.

Last but not least, it is nice to know the basics of Rmd files, as well as some knowledge of LaTeX, especially some experience writing math symbols and equations.

Textbooks

There is no official textbook for this course although we will use the following texts as supporting material:

An Introduction to Statistical Learning (ISL) by James, Witten, Hastie, and Tibshirani. Springer, 2013. It is freely available online in pdf format (courtesy of the authors) at http://www-bcf.usc.edu/~gareth/ISL/.
The Elements of Statistical Learning by Hastie, Tibshirani and Friedman. Springer, 2009 (2nd Ed). This book is more mathematically-and-conceptually advanced than ISL. It is freely available online in pdf format (courtesy of the authors) at https://statweb.stanford.edu/~tibs/ElemStatLearn/.
Applied Predictive Modeling by Max Kuhn and Kjell Johnson. Springer, 2013.
Data Mining and Statistics for Decision Making by Stephane Tuffery. Wiley 2011.

Expectations

We expect that at the end of the course you:

Have a basic, yet solid, understanding of the prediction modeling process/lifecycle.
Be able to read a well-described algorithm, and write code to implement it computationally (in R).
Know the pros and cons of each predictive technique.
Be able to describe (to non-professionals) what a predictive technique is doing.

Methods of Instruction

We will be using a combination of materials such as slides, tutorials, reading assignments, and chalk-and-talk.
The main computational tool will be the computing and programming environment R.
The main workbench will be the IDE RStudio. You will also use a terminal emulator to work with the command line.

Other

Please read the course logistics and policies for mode details about the structure of the course, DO's and DONT's, etc.

License

Unless otherwise noticed, this work, by Gaston Sanchez, is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Author: Gaston Sanchez

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
apps		apps
data		data
labs		labs
papers		papers
problems		problems
slides		slides
syllabus		syllabus
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stat 154 Spring 2018

Description

Prerequisites / Review

Textbooks

Expectations

Methods of Instruction

Other

License

About

Releases

Packages

Contributors 2

Languages

ucb-stat154/stat154-spring-2018

Folders and files

Latest commit

History

Repository files navigation

Stat 154 Spring 2018

Description

Prerequisites / Review

Textbooks

Expectations

Methods of Instruction

Other

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages