Skip to content

ucdavis-sta141c-2021-winter/sta141c-lectures

Repository files navigation

STA 141C Big Data & High Performance Statistical Computing

People

  • Instructor: Randy Lai
    You should use Campuswire or Canvas to contact me. DO NOT send email to me as I tend to ignore emails (too much spams).
  • Meeting time: 12:10 - 1:30 PM, TR
  • TA: Hoseung Song, Cong Xu
  • Office hour:
    • Hoseung: 2:00 - 4:00pm PM Thursday
    • Cong: 1:00 - 3:00pm PM Friday
    • Randy: 10:00 AM - 12:00 PM Friday

Material

Date Note HTML
01-05 Introduction introduction
01-07 functional programming purrr
01-19 debug debug
profile profile
01-25 Rcpp Rcpp
02-02 R-pkgs
02-04 parallel parallel
multidplyr multidplyr
02-10 bootstrap bootstrap
blb blb
02-18 random forest rf
02-25 python python
03-02 keras keras

Links

  • Canvas for grades
  • GitHub for lecture notes and assignments. Please always refer to the documents found on GitHub.
  • Campuswire for discussions. Please use your ucd email to register! If you have used piazza before, it is similar but has a better interface. Please use Campuswire to ask any questions related to assignments and course materials. I and the TA will not answer any emails related to the course materials.
    Use Join code: 0149
    Learn how to ask a question. Asking a question is an art, stackoverflow.com has some good tips. You could also use the reprex package to make reproducible examples.

Tentative Schedule

Topic
Introduction
Functional programming
Object Oriented programming
Debugging and profiling
Writing C and C++ extensions
Writing an R package
Parallel & Distributed Computing
Async programming
Bootstrap and BLB
Google Sheet and Big Query
Interoperate with Python and Julia
Deep learning in R

Grading

Category Grade Percentage
Assignments 65%
Final Project 25%
Participation 10%
  • There will be around 6 assignments and they are assigned via GitHub classroom.
  • Assignments must be turned in by the due date. No late assignments are accepted.
  • Participation will be based on your reputation point in Campuswire.
    • 1% each week if the reputation point for the week is above 20.
    • the top scorers for the quarter will earn extra bonuses.

Resources

Prerequisites

  • Strong in R programming
  • R 4.0.3 (check your R version)
  • RStudio 1.3.1093 (check your RStudio Version)
  • R Markdown (read this https://rmarkdown.rstudio.com/lesson-1.html)
  • Knowledge about git and GitHub: read ‘Happy Git and GitHub for the useR’ (It is absoluately important to read the ebook if you have no experiences with git/GitHub)
  • Some knowledge of dplyr will be helpful
  • Minimal amount of Python

How to “clone” the notes repo

Assuming that you have git installed,

  • Open RStudio -> New Project -> Version Control -> Git -> paste the URL: https://github.com/ucdavis-sta141c-2021-winter/sta141c-lectures.git
  • Choose a directory to create the project
  • You could make any changes to the repo as you wish.
  • To fetch updates
    • go to the git pane in RStudio
    • click the “Commit” button and
    • check the files changed by you
    • type a short message about the changes and hit “Commit”
    • After committing the message, hit the “Pull” button (PS: there is a sub button “Pull with rebase”, only use it if you truly understand what it is)
    • Done if you see no errors
    • If there were lines which are updated by both me and you, you would see a merge conflict.
    • To resolve the conflict, locate the files with conflicts (U flag in the git pane).
    • Open the files and edit the conflicts, usually a conflict looks like
    <<<<<<< HEAD
    - RStudio 1.2.5011 (check your RStudio Version)
    =======
    - RStudio 1.2.5033 (check your RStudio Version)
    >>>>>>> 85858c9a6ebba9057ca8db7c269bd0a2f7a3910a
    
    • check all the files with conflicts and commit them again with a new message.

Assignments

Link your github account at https://signin-apd27wnqlq-uw.a.run.app/sta141c/

Check regularly the course github organization https://github.com/ucdavis-sta141c-2021-winter for any newly posted assignments.

Feedback

Feedback will be given in forms of GitHub issues or pull requests.

Regrade Requests

Regrade requests must be made within one week of the return of the assignment. One of the most common reasons is not having the knitted html files uploaded, 30% of the grade of that assignment will be deducted if it happens. To make a request, send me a Canvas message with the following information:

  • Which assignment
  • URL to the repo of your assignment
  • The reason of the request

Assignment Rubric

(Adapted from Nick Ulle and Clark Fitzgerald )

Point values and weights may differ among assignments. This is to indicate what the most important aspects are, so that you spend your time on those that matter most. Check the homework submission page on Canvas to see what the point values are for each assignment.

The grading criteria are correctness, code quality, and communication. The following describes what an excellent homework solution should look like:

Correctness

The report does the following:

  • solves all the questions contained in the prompt
  • makes conclusions that are supported by evidence in the data
  • discusses efficiency and limitations of the computation
  • cites any sources used

The attached code runs without modification.

Code Quality

The code is idiomatic and efficient. Different steps of the data processing are logically organized into scripts and small, reusable functions. Variable names are descriptive. The style is consistent and easy to read.

Communication

Plots include titles, axis labels, and legends or special annotations where appropriate. Tables include only columns of interest, are clearly explained in the body of the report, and not too large. Writing is clear, correct English.

Inquisitiveness

The report points out anomalies or notable aspects of the data discovered over the course of the analysis. It discusses assumptions in the overall approach and examines how credible they are. It mentions ideas for extending or improving the analysis or the computation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published