Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 9b2a1b9
Showing
34 changed files
with
130,158 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
**/.vscode/* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# wadefagen's Useful Datasets | ||
|
||
This repository contains a collection of datasets I've found useful. Many of these datasets are clean versions of public datasets, provided in a clean, consistant format for use in data science projects. | ||
|
||
## General Format | ||
|
||
Unless otherwise noted, all datasets are CSV files where the first row contains column headers. | ||
|
||
Common column names across multiple datasets include: | ||
|
||
- `Year`, a four digit year (ex: `2018`, `2017`, etc) | ||
- `Term`, one of `Spring`, `Summer`, `Fall`, or `Winter` | ||
- `YearTerm`, a four digit year followed by `-sp`, `-su`, `-fa`, or `-wi`. For example: `2018-sp`. This format ensure that all `YearTerm >= "2016-fa"` contains all data available from the Fall 2016 to present. | ||
|
||
## Avaialble Datasets | ||
|
||
- [GPAs of Courses at The Unviersity of Illinois](gpa/), `gpa/uiuc-gpa-dataset.csv` | ||
- [Teachers Ranked as Execllent by their Students at UIUC](teachers-ranked-as-excellent/), `teachers-ranked-as-excellent/uiuc-tre-dataset.csv` | ||
- [UIUC Courses by thier General Education category](geneds/), `geneds/uiuc-geneds-dataset.csv` | ||
- [Students at The University of Illinois by their home state](students-by-state/), `students-by-state/uiuc-students-by-state.csv` | ||
|
||
## Useful Scripts | ||
|
||
If you're working with these datasets, the following snippets may be helpful to load the data. Each example assumes you have cloned this repo inside of your project's working directory (as `datasets`, the default name). | ||
|
||
### Python (pandas) | ||
``` | ||
import pandas as pd | ||
df = pd.read_csv('datasets/gpa/uiuc-gpa-dataset.csv') | ||
# `df` is a DataFrame of the CSV file | ||
``` | ||
|
||
### Python (dictionary) | ||
``` | ||
import csv | ||
with open("datasets/gpa/uiuc-gpa-dataset.csv", "r") as f: | ||
reader = csv.DictReader(f) | ||
for row in reader: | ||
# Each `row` is a row from the CSV as a Python dict indexed with column headers. | ||
# Example usage: | ||
term = row["Term"] | ||
year = int(row["Year"]) # Note that Python treats all data as strings; may be useful to make the year an `int` | ||
``` | ||
|
||
### JavaScript (node.js) | ||
With the [csv-parse package](https://www.npmjs.com/package/csv-parse) (`npm install --save csv-parse`): | ||
|
||
``` | ||
const parse = require('csv-parse/lib/sync'); | ||
var rows = parse( fs.readFileSync("datasets/gpa/uiuc-gpa-dataset.csv"), {columns: true} ); | ||
rows.forEach(function (row) { | ||
// Each `row` is a row from the CSV as a dictionary indexed with column headers. | ||
// Example usage: | ||
var term = row["Term"]; | ||
var year = row["Year"]; | ||
}); | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
|
||
# University of Illinois' GenEds | ||
|
||
A collection of General Education ("Gen Ed") categories from https://courses.illinois.edu/gened/DEFAULT/DEFAULT | ||
|
||
## Data Format | ||
|
||
The first row of the CSV file contains column headers. Every row after the first contains data. Sample: | ||
|
||
| Year | Term | YearTerm | Course | Course Title | ACP | CS | HUM | NAT | QR | SBS | | ||
| ---- | ---- | -------- | ------ | ------------ | --- | -- | --- | --- | -- | --- | | ||
| 2018 | Fall | 2018-fa | AAS 100 | Intro Asian American Studies | | US | | | | SS | | ||
| ... | | ||
| 2018 | Fall | 2018-fa | CS 225 | Data Structures | | | | | QR1 | | | ||
| ... | | ||
|
||
All courses listed contains at least one Gen Ed category. Many courses contain multiple Gen Ed categories. | ||
|
||
The column labels and values have the following meaning: | ||
|
||
- `ACP` for "Advanced Composition"; values `ACP` or blank | ||
- `CS` for "Cultural Studies"; values are `NW` for "Non-Western Cultures", `WCC` for "Western/Comparative Cultures", `US` for "US Minority Cultures", or blank | ||
- `HUM` for "Humanities & the Arts"; values are `HP` for "Historical & Philosophical Perspectives", `LA` for "Literature & the Arts", or blank | ||
- `NAT` for "Natural Sciences & Technology"; value are `LS` for "Life Sciences", `PS` for "Physical Sciences", or blank | ||
- `QR` for "Quantitative Reasoning"; values are `QR1` for "Quantitative Reasoning 1", `QR2` for "Quantitative Reasoning 2", or blank | ||
- `SBS` for "Social & Behavioral Sciences"; values are `BS` for "Behavioral Sciences", `SS` for "Social Sciences", or blank | ||
|
||
## Data Source | ||
|
||
Scraped from https://courses.illinois.edu/gened/DEFAULT/DEFAULT on March 25, 2018 |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
|
||
# University of Illinois' GPA Dataset | ||
|
||
In July 2016, the University of Illinois responded to a Freedom of Information Act request (FOIA #16-456, and later #17-042, #17-213, and #18-150) for *"the grade distributions by percent and/or letter grade, for every class [...] at the University of Illinois at Urbana-Champaign"*. This repository contains a record of all of the data from the above FOIA requests in a clean, documented CSV format. | ||
|
||
Download the full dataset as a single CSV file | ||
|
||
## Data Format | ||
|
||
The first row of the CSV file contains column headers. Every row after the first contains data. Sample: | ||
|
||
| Year | Term | YearTerm | Subject | Number | Course Title | A+ | A | A- | B+ | B | B- | C+ | C | C- | D+ | D | D- | F | W | Primary Instructor | | ||
| ---- | ---- | -------- | ------- | ------ | ------------ | -- | - | -- | -- | - | -- | -- | - | -- | -- | - | -- | - | - | ------------------ | | ||
| 2017 | Fall | 2017-fa | AAS | 100 | Intro Asian American Studies | 2 | 17 | 0 | 6 | 5 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | Espiritu, Augusto F | | ||
| ... | | ||
| 2017 | Fall | 2017-fa | CS | 225 | Data Structures | 114 | 47 | 27 | 6 | 28 | 17 | 14 | 18 | 13 | 12 | 9 | 12 | 16 | 2 | Fagen-Ulmschnei, Wade A | | ||
| 2017 | Fall | 2017-fa | CS | 225 | Data Structures | 121 | 40 | 27 | 20 | 29 | 16 | 14 | 24 | 4 | 12 | 14 | 16 | 14 | 4 | Fagen-Ulmschnei, Wade A | | ||
| ... | | ||
|
||
*Note that long names for "Primary Instructor" are truncated in this dataset.* | ||
|
||
|
||
## Data Source | ||
|
||
All data contains in this repository is data contained in public documents released in response to FOIA reqeusts. Some data was excluded by The University of Illinois to adhere to privacy laws. A table detailing the FOIA request for each term of data is provided. | ||
|
||
### Exclusion of Data | ||
|
||
From FOIA #2018-150: | ||
|
||
> Please be advised that certain information has been withheld under section 140/7(1)(a) that exempts from disclosure “[i]nformation specifically prohibited from disclosure by federal or State law or rules and regulations implementing adopted under federal or State law.” Specifically, the Family Educational Rights and Privacy Act (FERPA) (20 U.S.C. §1232g) protects the privacy of student education records and prohibits the release of any information from a student’s education record without the consent of the eligible student. In this case, grade distributions are not displayed when a section has low enrollment or when all students in the class have the same grade. Because of the low enrollment in those classes or because all the students in a class received the same grade, the grade data could identify a student. Thus, such information was not provided to you as it would not only violate FERPA, but it would also be invasion of personal privacy under Section 7(1)(c) of FOIA which exempts from disclosure “personal information.” | ||
|
||
Based on analysis, courses with 20 or fewer students were excluded (the smallest course in the dataset has 21 students). | ||
|
||
### Table of FOIA Responses | ||
|
||
| Year | Spring | Summer | Fall | Winter | | ||
| ---- | ------------ | ------------- | ------------ | ------------- | | ||
| 2017 | ✔ (2018-150) | ✘ | ✔ (2018-150) | ✘ | | ||
| 2016 | ✔ (2016-456) | ✔ (2017-042) | ✔ (2017-213) | ✘ | | ||
| 2015 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | | ||
| 2014 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | | ||
| 2013 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- | | ||
| 2012 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- | | ||
| 2011 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- | | ||
| 2010 | ✔ (2016-456) | ✔ (2016-456) | ✔ (2016-456) | --- | | ||
|
Oops, something went wrong.