# BIOSTAT 257: Statistical Computing

* Tue/Thu 1pm-2:50pm @ Zoom <https://ucla.zoom.us/j/507925583>  
* Instructor: Dr. Hua Zhou, <huazhou@ucla.edu>

## What is statistics?

* Statistics, the science of *data analysis*, is the applied mathematics in the 21st century. 

* People (scientists, goverment, health professionals, companies) collect data in order to answer certain questions. Statisticians's job is to help them extract knowledge and insights from data. 

* Must-read for (bio)statistics students:  
  - [_50 years of data sicence_](https://ucla-biostat-257-2020spring.github.io/readings/Donoho15FiftyYearsDataScience.pdf), by David Donoho.

* If existing software tools readily solve the problem, all the better. 

* Often statisticians need to implement their own methods, test new algorithms, or tailor classical methods to new types of data (big, streaming). 

* This entails at least two essential skills: **programming** and fundamental knowledge of **algorithms**. 

## What is this course about?

* **Not** a course on statistical packages. It does not answer questions such as _How to fit a linear mixed model in R,  Julia, SAS, SPSS, or Stata?_

* **Not** a pure programming course, although programming is important and we do homework in Julia.  
BIOSTAT 203A (Data Management) in fall quarter focuses on programming in R and SAS.

* **Not** a course on data science. The new course [BIOSTAT 203B (Introduction to Data Science)](https://ucla-biostat203b-2020winter.github.io/schedule.html) in winter quarter focuses on some software tools for data scientists.

* This course focuses on **algorithms**, mostly those in **numerical linear algebra** and **numerical optimization**. 

## Learning objectives

1. Be highly appreciative of this quote by [James Gentle](https://books.google.com/books?id=Pbz3D7Tg5eoC&pg=PR9&lpg=PR9&dq=The+form+of+a+mathematical+expression+and+the+way+the+expression+should+be+evaluated+in+actual+practice+may+be+quite+different.&source=bl&ots=MYABVAwDtC&sig=MGuPY_171sZFZLMCuewlOjV-Cl4&hl=en&sa=X&ved=0ahUKEwjkv_u34v7SAhUJrlQKHfT6DjAQ6AEIITAB#v=onepage&q=The%20form%20of%20a%20mathematical%20expression%20and%20the%20way%20the%20expression%20should%20be%20evaluated%20in%20actual%20practice%20may%20be%20quite%20different.&f=false)
> The form of a mathematical expression and the way the expression should be evaluated in actual practice may be quite different.

    Examples: $\boldsymbol{X}^T \boldsymbol{W} \boldsymbol{X}$, $\operatorname{tr} (\boldsymbol{A} \boldsymbol{B})$, $\operatorname{diag}(\boldsymbol{A} \boldsymbol{B})$, multivariate normal density,...  

2. Become **memory-conscious**. You care about looping order. You do benchmarking on hot functions fanatically to make sure it's not allocating.    
<img src="./memory.jpg" align="center" width="150"/>   
Image source: <https://www.independent.co.uk/news/health/memory-loss-alzheimers-disease-age-of-8-university-college-london-a9178631.html>

3. **No inversion mentality**. Whenever you see a matrix inverse in mathematical expression, your brain reacts with _matrix decomposition_, _iterative solvers_, etc. For R users, that means you almost never use the `solve()` function.   

    Examples: $(\boldsymbol{X}^T \boldsymbol{X})^{-1} \boldsymbol{X}^T \mathbf{y}$, $\mathbf{y}^T \boldsymbol{\Sigma}^{-1} \mathbf{y}$, Newton-Raphson algorithm, ...   
<img src="./yoga_inversion.jpg" align="center" width="250"/>   
Image source: <https://www.yogajournal.com/practice/inversion-inquiry>

4. Know some basic strategies to solve **big data** problems. 

    Examples: how Google solve the PageRank problem with $10^{9}$ webpages, linear regression with $10^7$ observations, etc.  

5. No afraid of **optimizations** and treat it as a technology. Be able to recognize some major optimization classes and choose the best solver(s) correspondingly.

6. Be immune to the language fight. 

## Course logistics

* Course webpage: <https://ucla-biostat-257-2020spring.github.io> or <http://ucla-biostat-257.com>.

* [Syllabus](https://ucla-biostat-257-2020spring.github.io/syllabus/syllabus.html).

* Check the [Schedule](https://ucla-biostat-257-2020spring.github.io/schedule/schedule.html) and [Announcements](https://ucla-biostat-257-2020spring.github.io/announcement.html) pages frequently. 

* Jupyter notebooks will be posted before each lecture.