## courses + new changes

- run entire notebook, the very last function is called `do_everything()`. to get, say, ECON course changes, just run `do_everything('ECON')`

the following are hard-coded (will need to be updated in future years): 
- the base URL that everything begins from
- typing 20182019 and 20192020 (eg next year change to 20192020 and 20202021)

june 26, 2019


In [1]:
# autumn + winter + spring courses for 2018-2019
BASE_PAGE_URL = 'https://explorecourses.stanford.edu/search?q=LINGUIST&view=catalog&filter-term-Winter=on&filter-departmentcode-LINGUIST=on&filter-catalognumber-LINGUIST=on&academicYear=20182019&filter-term-Autumn=on&filter-term-Spring=on&page=0&filter-coursestatus-Active=on&collapse='

In [2]:
from bs4 import BeautifulSoup
from subprocess import check_output

In [3]:
def get_n_pages(pgs):
    lst = []
    for x in pgs:
        try:
            lst.append(int(x))
        except:
            pass
    return max(lst)

def get_search_results_on_page(url, year, ith):
    new_url = url.replace('page=0', 'page=' + str(ith))
    new_src = check_output([
        'wget',
        '-qO-',
        new_url
    ])
    parse = BeautifulSoup(new_src)
    return parse.body.find(
        'div',
        attrs={'id': 'searchResults'}
    )
    
def get_results_list(url, year='20182019'):
    src = check_output([
        'wget',
        '-qO-',
        url
    ])
    parse = BeautifulSoup(src)
    n_pages = get_n_pages(parse.body.find(
        'div',
        attrs={'id': 'pagination'}
    ).text.split())
    return [get_search_results_on_page(url, year, ith) for ith in range(n_pages)]

In [4]:
results_list = get_results_list(BASE_PAGE_URL)

In [5]:
legal_fields = {
    'Terms',
    'Units',
    'UG Reqs',
    'Grading',
}

def extract_data(fields):
    relevant_name_to_val = {}
    for item in fields:
        idx = item.find(':')
        field_name = item[:idx]
        if field_name in legal_fields:
            field_val = item[idx + 1:].strip()
            relevant_name_to_val[field_name] = field_val
    return relevant_name_to_val

def fetch_info(course, mp):
    code = course.find(
        'span',
        attrs={'class': 'courseNumber'}
    ).text.strip(':')
    mp[code] = {}
    name = course.find(
        'span',
        attrs={'class': 'courseTitle'}
    ).text
    mp[code]['Title'] = name
    
    attribs = course.find(
        'div',
        attrs={'class': 'courseAttributes'}
    ).text
    fields = [x.strip() for x in ' '.join(attribs.split()).split('|')]
    extracted = extract_data(fields)    
    for field in legal_fields:
        if field in extracted:
            mp[code][field] = extracted[field]
        else:
            mp[code][field] = None

def parse_results_list(results_list):
    course_to_field_to_info_map_map = {}
    for results in results_list:
        all_courses = results.findAll(
            'div',
            attrs={'class': 'courseInfo'}
        )
        for course in all_courses:
            fetch_info(course, course_to_field_to_info_map_map)
    return course_to_field_to_info_map_map

In [6]:
ling_20182019 = parse_results_list(results_list)

In [7]:
url_20192020 = BASE_PAGE_URL.replace('20182019', '20192020')
ling_20192020 = parse_results_list(get_results_list(url_20192020))

In [8]:
import re
from termcolor import cprint

fields = [
    'Title',
    'Terms',
    'Units',
    'UG Reqs',
    'Grading'
]

# stackoverflow Mark Byers
def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

def find_differences(A, B, a, b):
    print('Courses in ' + a + ' but not in ' + b + ':')
    for course in A:
        if course not in B:
            cprint(' - ' + course + '\n   - ' + A[course]['Title'], attrs=['bold'])
    print('\n')
    print('Courses in ' + b + ' but not in ' + a + ':')
    for course in B:
        if course not in A:
            cprint(' - ' + course + '\n   - ' + B[course]['Title'] + ' (' + B[course]['Terms'] + ')', attrs=['bold'])
    print('\nCourses Retained:')
    overlap = A.keys() & B.keys()
    for course in natural_sort(overlap):
        cprint(' - ' + course, attrs=['bold'])
        for f in fields:
            print('    ' + f + ': ', end=' ')
            if f not in A[course] or f not in B[course]:
                if f not in A[course]:
                    cprint('None', 'red', attrs=['bold'])
                else:
                    cprint(A[course], 'red', attrs=['bold'])
                if f not in B[course]:
                    cprint('None', 'green', attrs=['bold'])
                else:
                    cprint(B[course], 'green', attrs=['bold'])
            elif A[course][f] != B[course][f]:
                cprint(A[course][f], 'red', end=' ', attrs=['bold'])
                cprint(B[course][f], 'green', attrs=['bold'])
            else:
                print(A[course][f])

In [9]:
find_differences(ling_20182019, ling_20192020, 'Ling 18-19', 'Ling 19-20')

Courses in Ling 18-19 but not in Ling 19-20:
[1m - LINGUIST 55N
   - Language in the City[0m
[1m - LINGUIST 65
   - African American Vernacular English (AFRICAAM 21, CSRE 21, LINGUIST 265)[0m
[1m - LINGUIST 83Q
   - Translation[0m
[1m - LINGUIST 121A
   - The Syntax of English[0m
[1m - LINGUIST 152
   - Sociolinguistics and Pidgin Creole Studies (LINGUIST 252)[0m
[1m - LINGUIST 157
   - Sociophonetics (LINGUIST 257)[0m
[1m - LINGUIST 160
   - Introduction to Language Change[0m
[1m - LINGUIST 200
   - Foundations of Linguistic Theory[0m
[1m - LINGUIST 207A
   - Advanced Phonetics[0m
[1m - LINGUIST 211
   - Metrics[0m
[1m - LINGUIST 225
   - Seminar in Syntax: Distributed Morphology[0m
[1m - LINGUIST 236
   - Seminar in Semantics: Causation[0m
[1m - LINGUIST 247
   - Seminar in Psycholinguistics: Advanced Topics (PSYCH 227)[0m
[1m - LINGUIST 250
   - Sociolinguistic Theory and Analysis[0m
[1m - LINGUIST 252
   - Sociolinguistics and Pidgin Creole Studies (LIN

In [10]:
def do_everything(dept_code):
    url = BASE_PAGE_URL.replace('LINGUIST', dept_code)
    url_2 = url.replace('20182019', '20192020')
    a = parse_results_list(get_results_list(url))
    b = parse_results_list(get_results_list(url_2))
    find_differences(a, b, dept_code + ' 18-19', dept_code + ' 19-20')

In [11]:
do_everything('CS')

Courses in CS 18-19 but not in CS 19-20:
[1m - CS 11SI
   - How to Make VR: Introduction to Virtual Reality Design and Development[0m
[1m - CS 17SI
   - Frontiers in Reproductive Technology[0m
[1m - CS 18SI
   - Geopolitical Ramifications of Technological Advances[0m
[1m - CS 19SI
   - Evaluating Education Technology: Developing Frameworks to Make Sense of EdTech[0m
[1m - CS 21SI
   - AI for Social Good[0m
[1m - CS 41
   - Hap.py Code: The Python Programming Language[0m
[1m - CS 43
   - Functional Programming Abstractions[0m
[1m - CS 47
   - Cross-Platform Mobile Development[0m
[1m - CS 51
   - CS + Social Good Studio: Designing Social Impact Projects[0m
[1m - CS 52
   - CS + Social Good Studio[0m
[1m - CS 53
   - DISCUSSIONS IN TECH FOR GOOD[0m
[1m - CS 101
   - Introduction to Computing Principles[0m
[1m - CS 106AJ
   - Programming Methodology in JavaScript[0m
[1m - CS 106AP
   - Programming Methodology in Python[0m
[1m - CS 106S
   - Coding for Social Go

    Grading:  Letter (ABCD/NP)
[1m - CS 196[0m
    Title:  Computer Consulting (VPTL 196)
    Terms:  Win, Spr
    Units:  2
    UG Reqs:  None
    Grading:  Satisfactory/No Credit
[1m - CS 198[0m
    Title:  Teaching Computer Science
    Terms:  Aut, Win, Spr
    Units:  3-4
    UG Reqs:  None
    Grading:  Satisfactory/No Credit
[1m - CS 198B[0m
    Title:  Additional Topics in Teaching Computer Science
    Terms:  Aut, Win, Spr
    Units:  1
    UG Reqs:  None
    Grading:  Satisfactory/No Credit
[1m - CS 199[0m
    Title:  Independent Work
    Terms:  Aut, Win, Spr, Sum
    Units:  1-6
    UG Reqs:  None
    Grading:  Letter (ABCD/NP)
[1m - CS 199P[0m
    Title:  Independent Work
    Terms:  Aut, Win, Spr, Sum
    Units:  1-6
    UG Reqs:  None
    Grading:  Satisfactory/No Credit
[1m - CS 202[0m
    Title:  Law for Computer Science Professionals
    Terms:  Aut
    Units:  1
    UG Reqs:  None
    Grading:  Satisfactory/No Credit
[1m - CS 204[0m
    Title:  Legal Inf

[1m - CS 348K[0m
    Title:  Visual Computing Systems
    Terms:  [1m[31mAut[0m [1m[32mSpr[0m
    Units:  3-4
    UG Reqs:  None
    Grading:  Letter or Credit/No Credit
[1m - CS 349F[0m
    Title:  Technology for Financial Systems
    Terms:  Spr
    Units:  1
    UG Reqs:  None
    Grading:  Satisfactory/No Credit
[1m - CS 350[0m
    Title:  Secure Compilation
    Terms:  [1m[31mWin[0m [1m[32mSpr[0m
    Units:  3
    UG Reqs:  None
    Grading:  Letter or Credit/No Credit
[1m - CS 354[0m
    Title:  Topics in Intractability: Unfulfilled Algorithmic Fantasies
    Terms:  Spr
    Units:  3
    UG Reqs:  None
    Grading:  Letter or Credit/No Credit
[1m - CS 355[0m
    Title:  Advanced Topics in Cryptography
    Terms:  Spr
    Units:  3
    UG Reqs:  None
    Grading:  Letter or Credit/No Credit
[1m - CS 356[0m
    Title:  Topics in Computer and Network Security
    Terms:  Aut
    Units:  3
    UG Reqs:  None
    Grading:  Letter or Credit/No Credit
[1m - CS 3

In [12]:
do_everything('MATH')

Courses in MATH 18-19 but not in MATH 19-20:
[1m - MATH 137
   - Mathematical Methods of Classical Mechanics[0m
[1m - MATH 145
   - Algebraic Geometry[0m
[1m - MATH 148
   - Algebraic Topology[0m
[1m - MATH 154
   - Algebraic Number Theory[0m
[1m - MATH 161
   - Set Theory[0m
[1m - MATH 234
   - Large Deviations Theory (STATS 374)[0m
[1m - MATH 235A
   - Topics in combinatorics[0m
[1m - MATH 237A
   - Topics in Financial Math: Market microstructure and trading algorithms[0m
[1m - MATH 256B
   - Partial Differential Equations[0m
[1m - MATH 257C
   - Symplectic Geometry and Topology[0m
[1m - MATH 263C
   - Topics in Representation Theory[0m
[1m - MATH 273
   - Topics in Mathematical Physics (STATS 359)[0m
[1m - MATH 275
   - Topics in Applied Mathematics: A World of Flows[0m
[1m - MATH 305
   - Applied mathematics through toys and magic[0m


Courses in MATH 19-20 but not in MATH 18-19:
[1m - MATH 142
   - Hyperbolic Geometry (Spr)[0m
[1m - MATH 147
   - Dif