Skip to content

Commit

Permalink
initial migration to github
Browse files Browse the repository at this point in the history
  • Loading branch information
wadefagen committed Mar 25, 2018
0 parents commit 9b2a1b9
Show file tree
Hide file tree
Showing 34 changed files with 130,158 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
@@ -0,0 +1 @@
**/.vscode/*
63 changes: 63 additions & 0 deletions README.md
@@ -0,0 +1,63 @@
# wadefagen's Useful Datasets

This repository contains a collection of datasets I've found useful. Many of these datasets are clean versions of public datasets, provided in a clean, consistant format for use in data science projects.

## General Format

Unless otherwise noted, all datasets are CSV files where the first row contains column headers.

Common column names across multiple datasets include:

- `Year`, a four digit year (ex: `2018`, `2017`, etc)
- `Term`, one of `Spring`, `Summer`, `Fall`, or `Winter`
- `YearTerm`, a four digit year followed by `-sp`, `-su`, `-fa`, or `-wi`. For example: `2018-sp`. This format ensure that all `YearTerm >= "2016-fa"` contains all data available from the Fall 2016 to present.

## Avaialble Datasets

- [GPAs of Courses at The Unviersity of Illinois](gpa/), `gpa/uiuc-gpa-dataset.csv`
- [Teachers Ranked as Execllent by their Students at UIUC](teachers-ranked-as-excellent/), `teachers-ranked-as-excellent/uiuc-tre-dataset.csv`
- [UIUC Courses by thier General Education category](geneds/), `geneds/uiuc-geneds-dataset.csv`
- [Students at The University of Illinois by their home state](students-by-state/), `students-by-state/uiuc-students-by-state.csv`

## Useful Scripts

If you're working with these datasets, the following snippets may be helpful to load the data. Each example assumes you have cloned this repo inside of your project's working directory (as `datasets`, the default name).

### Python (pandas)
```
import pandas as pd
df = pd.read_csv('datasets/gpa/uiuc-gpa-dataset.csv')
# `df` is a DataFrame of the CSV file
```

### Python (dictionary)
```
import csv
with open("datasets/gpa/uiuc-gpa-dataset.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
# Each `row` is a row from the CSV as a Python dict indexed with column headers.
# Example usage:
term = row["Term"]
year = int(row["Year"]) # Note that Python treats all data as strings; may be useful to make the year an `int`
```

### JavaScript (node.js)
With the [csv-parse package](https://www.npmjs.com/package/csv-parse) (`npm install --save csv-parse`):

```
const parse = require('csv-parse/lib/sync');
var rows = parse( fs.readFileSync("datasets/gpa/uiuc-gpa-dataset.csv"), {columns: true} );
rows.forEach(function (row) {
// Each `row` is a row from the CSV as a dictionary indexed with column headers.
// Example usage:
var term = row["Term"];
var year = row["Year"];
});
```

30 changes: 30 additions & 0 deletions geneds/README.md
@@ -0,0 +1,30 @@

# University of Illinois' GenEds

A collection of General Education ("Gen Ed") categories from https://courses.illinois.edu/gened/DEFAULT/DEFAULT

## Data Format

The first row of the CSV file contains column headers. Every row after the first contains data. Sample:

| Year | Term | YearTerm | Course | Course Title | ACP | CS | HUM | NAT | QR | SBS |
| ---- | ---- | -------- | ------ | ------------ | --- | -- | --- | --- | -- | --- |
| 2018 | Fall | 2018-fa | AAS 100 | Intro Asian American Studies | | US | | | | SS |
| ... |
| 2018 | Fall | 2018-fa | CS 225 | Data Structures | | | | | QR1 | |
| ... |

All courses listed contains at least one Gen Ed category. Many courses contain multiple Gen Ed categories.

The column labels and values have the following meaning:

- `ACP` for "Advanced Composition"; values `ACP` or blank
- `CS` for "Cultural Studies"; values are `NW` for "Non-Western Cultures", `WCC` for "Western/Comparative Cultures", `US` for "US Minority Cultures", or blank
- `HUM` for "Humanities & the Arts"; values are `HP` for "Historical & Philosophical Perspectives", `LA` for "Literature & the Arts", or blank
- `NAT` for "Natural Sciences & Technology"; value are `LS` for "Life Sciences", `PS` for "Physical Sciences", or blank
- `QR` for "Quantitative Reasoning"; values are `QR1` for "Quantitative Reasoning 1", `QR2` for "Quantitative Reasoning 2", or blank
- `SBS` for "Social & Behavioral Sciences"; values are `BS` for "Behavioral Sciences", `SS` for "Social Sciences", or blank

## Data Source

Scraped from https://courses.illinois.edu/gened/DEFAULT/DEFAULT on March 25, 2018
1,009 changes: 1,009 additions & 0 deletions geneds/dataset.csv

Large diffs are not rendered by default.

47 changes: 47 additions & 0 deletions gpa/README.md
@@ -0,0 +1,47 @@

# University of Illinois' GPA Dataset

In July 2016, the University of Illinois responded to a Freedom of Information Act request (FOIA #16-456, and later #17-042, #17-213, and #18-150) for *"the grade distributions by percent and/or letter grade, for every class [...] at the University of Illinois at Urbana-Champaign"*. This repository contains a record of all of the data from the above FOIA requests in a clean, documented CSV format.

Download the full dataset as a single CSV file

## Data Format

The first row of the CSV file contains column headers. Every row after the first contains data. Sample:

| Year | Term | YearTerm | Subject | Number | Course Title | A+ | A | A- | B+ | B | B- | C+ | C | C- | D+ | D | D- | F | W | Primary Instructor |
| ---- | ---- | -------- | ------- | ------ | ------------ | -- | - | -- | -- | - | -- | -- | - | -- | -- | - | -- | - | - | ------------------ |
| 2017 | Fall | 2017-fa | AAS | 100 | Intro Asian American Studies | 2 | 17 | 0 | 6 | 5 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Espiritu, Augusto F |
| ... |
| 2017 | Fall | 2017-fa | CS | 225 | Data Structures | 114 | 47 | 27 | 6 | 28 | 17 | 14 | 18 | 13 | 12 | 9 | 12 | 16 | 2 | Fagen-Ulmschnei, Wade A |
| 2017 | Fall | 2017-fa | CS | 225 | Data Structures | 121 | 40 | 27 | 20 | 29 | 16 | 14 | 24 | 4 | 12 | 14 | 16 | 14 | 4 | Fagen-Ulmschnei, Wade A |
| ... |

*Note that long names for "Primary Instructor" are truncated in this dataset.*


## Data Source

All data contains in this repository is data contained in public documents released in response to FOIA reqeusts. Some data was excluded by The University of Illinois to adhere to privacy laws. A table detailing the FOIA request for each term of data is provided.

### Exclusion of Data

From FOIA #2018-150:

> Please be advised that certain information has been withheld under section 140/7(1)(a) that exempts from disclosure “[i]nformation specifically prohibited from disclosure by federal or State law or rules and regulations implementing adopted under federal or State law.” Specifically, the Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. §1232g) protects the privacy of student education records and prohibits the release of any information from a student’s education record without the consent of the eligible student. In this case, grade distributions are not displayed when a section has low enrollment or when all students in the class have the same grade. Because of the low enrollment in those classes or because all the students in a class received the same grade, the grade data could identify a student. Thus, such information was not provided to you as it would not only violate FERPA, but it would also be invasion of personal privacy under Section 7(1)(c) of FOIA which exempts from disclosure “personal information.”

Based on analysis, courses with 20 or fewer students were excluded (the smallest course in the dataset has 21 students).

### Table of FOIA Responses

| Year | Spring | Summer | Fall | Winter |
| ---- | ------------ | ------------- | ------------ | ------------- |
| 2017 | ✔ (2018-150) || ✔ (2018-150) ||
| 2016 | ✔ (2016-456) | ✔ (2017-042) | ✔ (2017-213) ||
| 2015 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) |
| 2014 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) |
| 2013 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- |
| 2012 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- |
| 2011 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- |
| 2010 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- |

0 comments on commit 9b2a1b9

Please sign in to comment.