QMSS GR5069 - APPLIED DATA SCIENCE FOR SOCIAL SCIENTISTS

Instructor: Marco Morales, Columbia University
Co-Instructor: Nana Yaw Essuman, Columbia University

TA: Ummugul Bezirhan, Columbia University

This repository is a companion to the course Applied Data Science for Social Scientists taught at the Quantitative Methods in the Social Sciences program over the Spring of 2020.

It contains curated reference materials, slides and sample code. You can find the most updated version of the course syllabus here. Make sure to check regularly for updates.

Overview

In his now classic Venn diagram, Drew Conway described Data Science as sitting at the intersection between good hacking skills, math and statistics knowledge, and substantive expertise. As a result of normal instruction, social scientists possess a fluid combination of all three but also bring an additional layer to the mix. We have acquired slightly different training, skills and expertise tailored to understand human behavior, and to explain why things happen the way they do. Social scientists are, thus, a particular kind of data scientist.

This course is a collection of topics that fill very specific gaps identified over the years on what a social scientist should know at minimum when entering data science, and what a data scientist should know to hit the ground running and add immediate value to their teams.

To do that, this course aims to:

teach processes and practices at the intersection of Data Science and Data Engineering that are central to the data product cycle. Data Scientists typically start being exposed to Data Engineering on the job. There's much to be gained from early exposure to concepts and practices in this field;
sharpen you technical skills not only in fitting models, but particularly in building knowledge and generating insights from the data. While this may seem obvious for a Data Scientist, it is not always the focus of training,
train in working effectively in teams to build projects and products. Data Science is collaborative in nature and constantly evolving in best practices that enhance efficient collaboration. Collaboration for school projects/assignments is vastly different from the highly-structured collaboration that happens in DS teams, but is not always the focus of training, and
sharpen and enhance soft skills that are key to a successful interaction with business stakeholders. The most important - and often neglected - activity of a data scientist is to obtain expert knowledge from and communicate with non-technical audiences. The greatest insight/project/product is moot if no one outside the Data Science team understands it or its value.

All of these are highly valued skills in the data science job market, but not always considered explicitly as part of an integral Data Science curriculum.

Prerequisites:

It is assumed that students have basic to intermediate knowledge of object-oriented programming - e.g R or Python - including experience using it for data manipulation, visualizations, and model estimation. Some mathematics, statistics and algebra are also assumed.

Course Resources

There are no required textbooks for this course. Curated readings for each week's topic will be available in Canvas. Sample code and other materials will be available in this repo. Starter code for in-class and take-home exercises will be available in the course's GitHub classroom repo. A Slack workspace for this course is also available with multiple channels to facilitate team collaboration within the class.

To actively participate on this course

Make sure to have the latest versions of R/RStudio, and/or Anaconda, as well as Git installed on your computer. Sign up for a GitHub account if you don't have one already. Atom is recommended to simplify your interaction with Git and GitHub.

Registered students will receive instructions to get access to GitHub classroom, Google Cloud Platform, Databricks, and Slack resources to be used for this course.

Accessing course materials in this repo

install git in your local machine
from the command line, got to the directory where you want to clone this repo
```
$ cd <dir>
```

clone this repository to get a local copy in your machine

$ git clone https://github.com/marco-morales/QMSS-GR5069_Spring2020.git

pull every week before class to sync your local copy with the changes pushed to the remote repo
```
$ git pull origin master
```
subscribe to the repository to get notifications each time new changes are pushed to the repository

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
syllabus		syllabus
week_01		week_01
week_02		week_02
week_03		week_03
week_04		week_04
week_05		week_05
week_06		week_06
week_07		week_07
week_08		week_08
week_11		week_11
week_12		week_12
week_13		week_13
week_15		week_15
.gitignore		.gitignore
README.md		README.md

vl2354/QMSS-GR5069_Spring2020

Folders and files

Latest commit

History

Repository files navigation

QMSS GR5069 - APPLIED DATA SCIENCE FOR SOCIAL SCIENTISTS

Overview

Prerequisites:

Course Resources

To actively participate on this course

Accessing course materials in this repo

About

Resources

Stars

Watchers

Forks

Languages