# Analysis of public high school names in California

I obtained a list of names of public high schools in California, excluding charter and magnet schools, from the [Califorina Department of Education website](https://www.cde.ca.gov/schooldirectory/). 

In [1]:
import pandas as pd
import numpy as np

In [2]:
schools = pd.read_csv("CDESchoolDirectory-classified.csv")

In [3]:
schools.head()

Unnamed: 0,Record Type,Federal School ID,County,District,School,Latitude,Longitude,Street Address,Street City,Street State,Street Zip,Name Classification,Spanish?,Person Occupation,notes
0,School,41,Alameda,Alameda Unified,Alameda High,37.764958,-122.245935,2200 Central Ave.,Alameda,CA,94501-4406,place,n,,
1,School,59,Alameda,Albany City Unified,Albany High,37.896661,-122.292572,603 Key Route Blvd.,Albany,CA,94706-1422,place,n,,
2,School,432,Alameda,Berkeley Unified,Berkeley High,37.868913,-122.2712,1980 Allston Way,Berkeley,CA,94704-1463,place,n,,
3,School,742,Alameda,Castro Valley Unified,Castro Valley High,37.705184,-122.078478,19400 Santa Maria Ave.,Castro Valley,CA,94546-3400,place,n,,
4,School,9273,Alameda,Dublin Unified,Dublin High,37.720996,-121.926391,8151 Village Pkwy.,Dublin,CA,94568-1656,place,n,,


In [4]:
# there are 881 public high schools (excluding charter and magnet schools)
schools.shape

(881, 15)

In [5]:
schools.dtypes

Record Type             object
Federal School ID       object
County                  object
District                object
School                  object
Latitude               float64
Longitude              float64
Street Address          object
Street City             object
Street State            object
Street Zip              object
Name Classification     object
Spanish?                object
Person Occupation       object
notes                   object
dtype: object

## How many schools are named after a place, a person or other?

In [6]:
schools['Name Classification'].value_counts()

Name Classification
place     519
other     208
person    154
Name: count, dtype: int64

In [7]:
place_per = 519/881
place_per = place_per * 100
place_per

58.910329171396135

In [8]:
person_per = 154/881
person_per = person_per * 100
person_per 

17.480136208853576

In [9]:
other_per = 208/881
other_per = other_per * 100
other_per

23.609534619750285

In [10]:
print(f'{round(place_per, 2)} percent of schools are named after a place, {round(person_per, 2)} percent are named after a person and {round(other_per, 2)} percent are named after something else.')

58.91 percent of schools are named after a place, 17.48 percent are named after a person and 23.61 percent are named after something else.


## What occupations did the people schools were named after have?

In [11]:
schools['Person Occupation'].value_counts()

Person Occupation
politician           40
school leadership    35
activist             16
landowner            15
explorer              9
founder               7
writer                5
service member        5
civil servant         4
artist                3
naturalist            2
inventor              2
farmer                2
community leader      1
actor                 1
astronaut             1
newspaper owner       1
industrialist         1
civil cervant         1
scientist             1
Name: count, dtype: int64

## How many schools have Spanish names? 

In [12]:
schools['Spanish?'].value_counts()

Spanish?
n    678
y    203
Name: count, dtype: int64

In [13]:
spanish_per = 203/881
spanish_per = spanish_per * 100
spanish_per

23.04199772985244

In [14]:
print(f'{round(spanish_per, 2)} percent of schools have Spanish names.')

23.04 percent of schools have Spanish names.


## Is there a most common school name?

In [15]:
schools['School'].value_counts()

School
John F. Kennedy High    4
Foothill High           4
Liberty High            4
Centennial High         3
Golden Valley High      3
                       ..
Winters High            1
Woodland Senior High    1
Lindhurst High          1
Marysville High         1
Castro Valley High      1
Name: count, Length: 837, dtype: int64

## What counties have the most schools with Spanish names?

In [16]:
schools.groupby('County')['Spanish?']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001D44CD0AF90>

In [17]:
total_schools = schools.groupby('County').size()
total_schools.sort_values(ascending=False)

County
Los Angeles        165
Riverside           52
Orange              52
San Diego           51
San Bernardino      51
Sacramento          36
Santa Clara         34
Kern                33
Fresno              29
Alameda             26
Contra Costa        26
Stanislaus          18
San Mateo           17
Ventura             17
Sonoma              14
Placer              14
Tulare              13
Merced              13
San Joaquin         13
Santa Barbara       11
Monterey            11
San Luis Obispo     10
Solano              10
San Francisco       10
Mendocino            9
Butte                8
Siskiyou             8
Humboldt             8
Madera               8
Marin                8
Tuolumne             7
Santa Cruz           7
Imperial             7
Shasta               7
Kings                6
El Dorado            6
Yolo                 6
Glenn                5
Inyo                 5
Napa                 5
Tehama               4
Sutter               4
Lassen               4
Lake

In [18]:
spanish_schools = schools[schools['Spanish?'] == 'y'].groupby('County').size()
spanish_schools.sort_values(ascending=False)

County
Los Angeles        41
Orange             22
Riverside          16
San Diego          15
Sacramento         13
Santa Clara        10
Contra Costa        9
San Bernardino      8
San Mateo           6
San Luis Obispo     5
Ventura             5
Monterey            5
Santa Barbara       5
Merced              4
Solano              4
Alameda             3
Sonoma              3
Kern                3
Madera              3
Butte               2
El Dorado           2
San Joaquin         2
Tulare              2
Santa Cruz          2
Fresno              2
Calaveras           1
Del Norte           1
Marin               1
Placer              1
Mariposa            1
Imperial            1
San Francisco       1
Tehama              1
Stanislaus          1
Tuolumne            1
Yolo                1
dtype: int64

Los Angeles, Orange, Riverside, San Diego and Sacramento counties have the most Spanish-named schools.

## What county has the highest percentage of schools with Spanish names? 

In [19]:
spanish_per = (spanish_schools / total_schools) * 100
spanish_per.sort_values(ascending=False)

County
Del Norte          100.000000
Calaveras           50.000000
San Luis Obispo     50.000000
Santa Barbara       45.454545
Monterey            45.454545
Orange              42.307692
Solano              40.000000
Madera              37.500000
Sacramento          36.111111
San Mateo           35.294118
Contra Costa        34.615385
Mariposa            33.333333
El Dorado           33.333333
Merced              30.769231
Riverside           30.769231
Ventura             29.411765
San Diego           29.411765
Santa Clara         29.411765
Santa Cruz          28.571429
Butte               25.000000
Tehama              25.000000
Los Angeles         24.848485
Sonoma              21.428571
Yolo                16.666667
San Bernardino      15.686275
Tulare              15.384615
San Joaquin         15.384615
Imperial            14.285714
Tuolumne            14.285714
Marin               12.500000
Alameda             11.538462
San Francisco       10.000000
Kern                 9.090909
Pla

Del Norte County only has one school listed and Calaveras only has two. I'm going ingore those in the top percent of counties with Spanish-named schools so the results aren't skewed.

Out of counties that have more then five schools, San Luis Obispo, Santa Barbara, Monterey, Orange and Solano had the highest percentages of schools with Spanish names.

## What county has the most schools named after a person? 

In [20]:
schools.groupby('County')['Name Classification']

<pandas.core.groupby.generic.SeriesGroupBy object at 0x000001D44E58BEC0>

In [21]:
person_schools = schools[schools['Name Classification'] == 'person'].groupby('County').size()
person_schools.sort_values(ascending=False)

County
Los Angeles       49
Santa Clara       12
San Francisco      9
San Bernardino     7
Sacramento         7
Stanislaus         7
San Diego          7
Orange             5
Riverside          5
Alameda            5
Fresno             5
Solano             5
Sonoma             4
Contra Costa       4
Kern               3
Colusa             2
San Mateo          2
San Joaquin        2
Ventura            2
Santa Barbara      2
Calaveras          1
Glenn              1
Monterey           1
Merced             1
Marin              1
Madera             1
Lassen             1
Inyo               1
Placer             1
Tuolumne           1
dtype: int64

## What county has the highest percentage of schools named after a person?