# 1. Open Data

Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "open" movements such as open source, open hardware, open content, open government and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov and Data.gov.uk.

## Municipal schools 

source: http://ckan.imd.ufrn.br/

The dataset provides information about public elementary schools and kindergartens of Natal-RN in 2014.

source: http://ckan.imd.ufrn.br/dataset/quadro-das-escolas-e-cmeis-do-municipio


## Data from education units

In [74]:
from pathlib import Path
from tqdm import tqdm
import pandas as pd
import folium
from folium.plugins import HeatMap
import geocoder


In [2]:


#'CMEI' - Centro Municipal de Educação Infantil 
url_school = 'http://ckan.imd.ufrn.br/dataset/9b362c15-832b-4aa9-9dfe-a5e015b3ce54/resource/99e0eef6-e16c-4ed8-bf62-d6bca9626eeb/download/escolas-por-regioes-administrativas.csv'
url_cmei = 'http://ckan.imd.ufrn.br/dataset/9b362c15-832b-4aa9-9dfe-a5e015b3ce54/resource/6d8e8580-bb48-4d75-a55c-4fa04da23919/download/cmeis-por-regioes-administrativas.csv'

df_school = pd.read_csv(url_school, encoding = 'utf-8', sep = ';')
df_cmei = pd.read_csv(url_cmei, encoding = 'utf-8', sep = ';')


In [3]:
# exploratore data analysis
print(df_school.columns)
print(df_school.shape)
print(df_school.info())
df_school.head()

Index(['Região Administrativa', 'CÓDIGO', 'ESTABELECIMENTO', 'ENDEREÇO', 'Nº',
       'BAIRRO', 'CEP', 'FONE'],
      dtype='object')
(72, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 8 columns):
Região Administrativa    72 non-null object
CÓDIGO                   72 non-null int64
ESTABELECIMENTO          72 non-null object
ENDEREÇO                 72 non-null object
Nº                       68 non-null object
BAIRRO                   72 non-null object
CEP                      72 non-null int64
FONE                     72 non-null int64
dtypes: int64(3), object(5)
memory usage: 4.6+ KB
None


Unnamed: 0,Região Administrativa,CÓDIGO,ESTABELECIMENTO,ENDEREÇO,Nº,BAIRRO,CEP,FONE
0,SUL,24058890,ESC MUL PROF ANTÔNIO SEVERIANO,AV OURO PRETO,2754,NEÓPOLIS,59088690,32324762
1,SUL,24058912,ESC MUL PROF ARNALDO MONTEIRO BEZERRA,ARACITABA,2993,NEÓPOLIS,59084080,32324763
2,SUL,24060690,ESC MUL PROF ASCENDINO DE ALMEIDA,RUA JOAQUIM CARDOSO,,PITIMBU,59069010,32324767
3,SUL,24058793,ESC MUL PROF CARLOS BELLO MORENO,RUA ARAPIRACA,SN,NEÓPOLIS,59086210,32324761
4,SUL,24075710,ESC MUL PROF OTTO DE BRITO GUERRA,RUA SERRA DA JUREMA,SN,PITIMBU,59068150,32328373


In [4]:
# exploratore data analysis
print(df_cmei.columns)
print(df_cmei.shape)
print(df_cmei.info())
df_cmei.head()

Index(['Região Administrativa', 'CÓDIGO', 'ESTABELECIMENTO', 'ENDEREÇO', 'Nº',
       'BAIRRO', 'CEP', 'FONE'],
      dtype='object')
(73, 8)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73 entries, 0 to 72
Data columns (total 8 columns):
Região Administrativa    73 non-null object
CÓDIGO                   73 non-null int64
ESTABELECIMENTO          73 non-null object
ENDEREÇO                 72 non-null object
Nº                       72 non-null object
BAIRRO                   73 non-null object
CEP                      73 non-null int64
FONE                     67 non-null float64
dtypes: float64(1), int64(2), object(5)
memory usage: 4.6+ KB
None


Unnamed: 0,Região Administrativa,CÓDIGO,ESTABELECIMENTO,ENDEREÇO,Nº,BAIRRO,CEP,FONE
0,SUL,24077720,CMEI CLAUDETE COSTA MACIEL,RUA SERRA DOS CARAJAS,3160,PITIMBU,59068200,32328403.0
1,SUL,24077739,CMEI HAYDEE MONTEIRO BEZERRA DE MELO,RUA JOSÉ SELEDON,70,PONTA NEGRA,59090215,32328413.0
2,SUL,24056936,CMEI KÁTIA FAGUNDES GARCIA,RUA PROFESSORA ANA DJANIRA,1960,CANDELÁRIA,59064480,87291989.0
3,SUL,24077267,CMEI MARIA CELONI CAMPOS,RUA BAIA FORMOSÁ,1517,LAGOA NOVA II,59063060,32329443.0
4,SUL,24077500,CMEI MOEMA TINOCO DA CUNHA LIMA,RUA JACUI,217,NEÓPOLIS,59080270,32328376.0


In [5]:
# Normalize NaN values on 'Nº' column
df_school.loc[:, 'Nº'].fillna('', inplace=True)

# Normalize NaN values on 'ENDEREÇO' and 'FONE' column
df_cmei.loc[:, 'ENDEREÇO'].fillna('', inplace=True)
df_cmei.loc[:, 'FONE'].fillna('', inplace=True)

In [6]:
df_school.info()
df_cmei.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 72 entries, 0 to 71
Data columns (total 8 columns):
Região Administrativa    72 non-null object
CÓDIGO                   72 non-null int64
ESTABELECIMENTO          72 non-null object
ENDEREÇO                 72 non-null object
Nº                       72 non-null object
BAIRRO                   72 non-null object
CEP                      72 non-null int64
FONE                     72 non-null int64
dtypes: int64(3), object(5)
memory usage: 4.6+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 73 entries, 0 to 72
Data columns (total 8 columns):
Região Administrativa    73 non-null object
CÓDIGO                   73 non-null int64
ESTABELECIMENTO          73 non-null object
ENDEREÇO                 73 non-null object
Nº                       72 non-null object
BAIRRO                   73 non-null object
CEP                      73 non-null int64
FONE                     73 non-null object
dtypes: int64(2), object(6)
memory usage: 4.6+ 

In [7]:
# Group the data by region

schools_by_region = pd.DataFrame(df_school.groupby(['Região Administrativa'])['CÓDIGO'].count()).reset_index()
schools_by_region.rename(columns = {
    'Região Administrativa': 'region',
    'CÓDIGO': 'amount'
}, inplace = True)

cmeis_by_region = pd.DataFrame(df_cmei.groupby(['Região Administrativa'])['CÓDIGO'].count()).reset_index()
cmeis_by_region.rename(columns = {
    'Região Administrativa': 'region',
    'CÓDIGO': 'amount'
}, inplace = True)

In [8]:
schools_by_region.head()

Unnamed: 0,region,amount
0,LESTE,9
1,NORTE,34
2,OESTE,21
3,SUL,8


In [9]:
cmeis_by_region.head()

Unnamed: 0,region,amount
0,LESTE,10
1,NORTE,27
2,OESTE,24
3,SUL,12


## Geocoding for education units

In [64]:
# Prepare elementary schools geolocation data

#cast column CEP to str
df_school['CEP'] = df_school['CEP'].astype(str)

df_school['GEOCODE_INPUT'] = df_school['ENDEREÇO'] + ', ' + df_school['Nº'] + ' ,NATAL-RN'
df_school['LAT'], df_school['LNG'] = [0, 0]
df_school['Type'] = 'Elem'

schools_geolocation = df_school[['ESTABELECIMENTO', 'Type', 'GEOCODE_INPUT', 'LAT', 'LNG']]

schools_geolocation = schools_geolocation.rename(columns = {'ESTABELECIMENTO':'NAME'})

schools_geolocation.head()

Unnamed: 0,NAME,Type,GEOCODE_INPUT,LAT,LNG
0,ESC MUL PROF ANTÔNIO SEVERIANO,Elem,"AV OURO PRETO, 2754 ,NATAL-RN",0,0
1,ESC MUL PROF ARNALDO MONTEIRO BEZERRA,Elem,"ARACITABA, 2993 ,NATAL-RN",0,0
2,ESC MUL PROF ASCENDINO DE ALMEIDA,Elem,"RUA JOAQUIM CARDOSO, ,NATAL-RN",0,0
3,ESC MUL PROF CARLOS BELLO MORENO,Elem,"RUA ARAPIRACA, SN ,NATAL-RN",0,0
4,ESC MUL PROF OTTO DE BRITO GUERRA,Elem,"RUA SERRA DA JUREMA, SN ,NATAL-RN",0,0


In [65]:
# Prepare CMEI geolocation data

#cast column CEP to str
df_cmei['CEP'] = df_cmei['CEP'].astype(str)


df_cmei['GEOCODE_INPUT'] = df_cmei['ENDEREÇO'] + ', ' + df_cmei['Nº'] + ' ,NATAL-RN' 
df_cmei['Type'] = 'Cmei'


df_cmei['LAT'], df_cmei['LNG'] = [0, 0]
cmeis_geolocation = df_cmei[['ESTABELECIMENTO', 'Type', 'GEOCODE_INPUT', 'LAT', 'LNG']]

cmeis_geolocation = cmeis_geolocation.rename(columns = {'ESTABELECIMENTO':'NAME'})

cmeis_geolocation.head()

Unnamed: 0,NAME,Type,GEOCODE_INPUT,LAT,LNG
0,CMEI CLAUDETE COSTA MACIEL,Cmei,"RUA SERRA DOS CARAJAS, 3160 ,NATAL-RN",0,0
1,CMEI HAYDEE MONTEIRO BEZERRA DE MELO,Cmei,"RUA JOSÉ SELEDON, 70 ,NATAL-RN",0,0
2,CMEI KÁTIA FAGUNDES GARCIA,Cmei,"RUA PROFESSORA ANA DJANIRA, 1960 ,NATAL-RN",0,0
3,CMEI MARIA CELONI CAMPOS,Cmei,"RUA BAIA FORMOSÁ, 1517 ,NATAL-RN",0,0
4,CMEI MOEMA TINOCO DA CUNHA LIMA,Cmei,"RUA JACUI, 217 ,NATAL-RN",0,0


In [66]:
# Group the dataframes
frames = [schools_geolocation, cmeis_geolocation]
df_geo = pd.concat(frames, ignore_index = True) 
df_geo.head()

Unnamed: 0,NAME,Type,GEOCODE_INPUT,LAT,LNG
0,ESC MUL PROF ANTÔNIO SEVERIANO,Elem,"AV OURO PRETO, 2754 ,NATAL-RN",0,0
1,ESC MUL PROF ARNALDO MONTEIRO BEZERRA,Elem,"ARACITABA, 2993 ,NATAL-RN",0,0
2,ESC MUL PROF ASCENDINO DE ALMEIDA,Elem,"RUA JOAQUIM CARDOSO, ,NATAL-RN",0,0
3,ESC MUL PROF CARLOS BELLO MORENO,Elem,"RUA ARAPIRACA, SN ,NATAL-RN",0,0
4,ESC MUL PROF OTTO DE BRITO GUERRA,Elem,"RUA SERRA DA JUREMA, SN ,NATAL-RN",0,0


In [67]:
# Calculate the geolocation info 
for i in tqdm(range(len(df_geo))):
    g = geocoder.google(df_geo.loc[i,'GEOCODE_INPUT'])
    df_geo.ix[i,'LAT'] = g.lat
    df_geo.ix[i,'LNG'] = g.lng

100%|██████████| 145/145 [02:38<00:00,  1.03s/it]


In [68]:
df_geo.head()

Unnamed: 0,NAME,Type,GEOCODE_INPUT,LAT,LNG
0,ESC MUL PROF ANTÔNIO SEVERIANO,Elem,"AV OURO PRETO, 2754 ,NATAL-RN",-5.87058,-35.214868
1,ESC MUL PROF ARNALDO MONTEIRO BEZERRA,Elem,"ARACITABA, 2993 ,NATAL-RN",-5.868884,-35.197566
2,ESC MUL PROF ASCENDINO DE ALMEIDA,Elem,"RUA JOAQUIM CARDOSO, ,NATAL-RN",-5.851115,-35.240676
3,ESC MUL PROF CARLOS BELLO MORENO,Elem,"RUA ARAPIRACA, SN ,NATAL-RN",-5.863587,-35.206993
4,ESC MUL PROF OTTO DE BRITO GUERRA,Elem,"RUA SERRA DA JUREMA, SN ,NATAL-RN",-5.855924,-35.246218


In [69]:
len(df_geo[~df_geo['LNG'].isnull()])

135

In [70]:
len(df_geo[df_geo['LNG'].isnull()])

10

In [71]:
aux_df_geo = df_geo[~df_geo['LNG'].isnull()].reset_index()
aux_df_geo = aux_df_geo.drop('index', 1)


In [72]:
aux_df_geo.head()

Unnamed: 0,NAME,Type,GEOCODE_INPUT,LAT,LNG
0,ESC MUL PROF ANTÔNIO SEVERIANO,Elem,"AV OURO PRETO, 2754 ,NATAL-RN",-5.87058,-35.214868
1,ESC MUL PROF ARNALDO MONTEIRO BEZERRA,Elem,"ARACITABA, 2993 ,NATAL-RN",-5.868884,-35.197566
2,ESC MUL PROF ASCENDINO DE ALMEIDA,Elem,"RUA JOAQUIM CARDOSO, ,NATAL-RN",-5.851115,-35.240676
3,ESC MUL PROF CARLOS BELLO MORENO,Elem,"RUA ARAPIRACA, SN ,NATAL-RN",-5.863587,-35.206993
4,ESC MUL PROF OTTO DE BRITO GUERRA,Elem,"RUA SERRA DA JUREMA, SN ,NATAL-RN",-5.855924,-35.246218


In [73]:
# Create map object
map_osm = folium.Map(
    location = [-5.791659, -35.228385],
    zoom_start= 12
)

unit_type_colors = {
    'Elem': 'green',
    'Cmei': 'red',
}


for i in tqdm(range(len(aux_df_geo))):
    folium.Marker([aux_df_geo.ix[i,'LAT'], aux_df_geo.ix[i,'LNG']],
            icon = folium.Icon(color = unit_type_colors[aux_df_geo.ix[i, 'Type']],
                               icon = 'info-sign'),
            popup = aux_df_geo.ix[i, 'NAME']
        ).add_to(map_osm)
map_osm

100%|██████████| 135/135 [00:03<00:00, 36.50it/s]


In [78]:
aux_df_geo.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135 entries, 0 to 134
Data columns (total 5 columns):
NAME             135 non-null object
Type             135 non-null object
GEOCODE_INPUT    134 non-null object
LAT              135 non-null float64
LNG              135 non-null float64
dtypes: float64(2), object(3)
memory usage: 5.4+ KB


In [87]:

coordinates_elem = []
coordinates_cmei = []

for i in tqdm(range(len(aux_df_geo))):
    if (aux_df_geo.ix[i,'Type'] == 'Elem'):
        coordinates_elem.append([aux_df_geo.ix[i,'LAT'], aux_df_geo.ix[i,'LNG'], 1])
    else:
        coordinates_cmei.append([aux_df_geo.ix[i,'LAT'], aux_df_geo.ix[i,'LNG'], 1])
        
# Create map object
map_elem = folium.Map(
    location = [-5.791659, -35.228385],
    zoom_start= 12
)

# Create map object
map_cmei = folium.Map(
    location = [-5.791659, -35.228385],
    zoom_start= 12
)

HeatMap(coordinates_elem).add_to(map_elem)
HeatMap(coordinates_cmei).add_to(map_cmei)


100%|██████████| 135/135 [00:00<00:00, 32562.60it/s]


<folium.plugins.heat_map.HeatMap at 0x11316b278>

In [88]:
map_elem

In [89]:
map_cmei

# Assessment

- Generate heatmap figures of elementary and cmei schools in Natal-RN, assuming the number of teachers in each educational unit as weight of heatmap. 

In [90]:

#'CMEI' - Centro Municipal de Educação Infantil 
url_prof_school = 'http://ckan.imd.ufrn.br/dataset/0cc7f31d-1fe7-4232-82fd-0ef356d62342/resource/c06090bb-506c-4193-a115-02840d6635ea/download/funcao-docente-do-ens.-fundamental-por-estabelecimento.csv'
url_prof_cmei = 'http://ckan.imd.ufrn.br/dataset/0cc7f31d-1fe7-4232-82fd-0ef356d62342/resource/06356b96-82c3-4969-a072-63029dd76a97/download/funcao-docente-do-ens.-infantil-por-estabelecimento.csv'

df_prof_school = pd.read_csv(url_prof_school, encoding = 'utf-8', sep = ';')
df_prof_cmei = pd.read_csv(url_prof_cmei, encoding = 'utf-8', sep = ';')

In [96]:
df_prof_school.head()

Unnamed: 0,Estabelecimento,Ens. Fundamental,Ens. Médio Magistério,Ens. Médio,Ens. Superio,Especialização,Mestrado,Doutorado,Nenhum,Total
0,ESC MUL 4º CENTENÁRIO,0,0,0,28,12,2,0,14,56
1,ESC MUL CELESTINO PIMENTEL,0,0,0,34,24,1,0,10,69
2,ESC MUL CHICO SANTEIRO,0,0,0,16,6,0,0,10,32
3,ESC MUL DJALMA MARANHÃO,0,0,0,24,12,2,0,12,50
4,ESC MUL ESTUDANTE EMMANUEL BEZERRA,0,0,0,45,32,2,0,13,92


In [92]:
df_prof_cmei.head()

Unnamed: 0,Estabelecimento,Ens. Fundamental,Ens. Médio Magistério,Ens. Médio,Ens. Superio,Especialização,Mestrado,Doutorado,Nenhum
0,CMEI AMOR DE MÃE,0,0,0,27,1,0,0,26
1,CMEI BELCHIOR JORGE DE SÁ,0,0,0,13,1,0,0,12
2,CMEI BOM SAMARITANO,0,0,0,6,1,0,0,5
3,CMEI CARMEM FERNANDES PEDROZA,0,0,0,14,5,0,0,9
4,CMEI CLARA CAMARÃO,0,0,0,11,1,0,0,10


In [95]:
df_prof_school['Total'] = df_prof_school.sum(axis=1)
df_prof_school.head()

In [97]:
df_prof_cmei['Total'] = df_prof_cmei.sum(axis=1)
df_prof_cmei.head()

Unnamed: 0,Estabelecimento,Ens. Fundamental,Ens. Médio Magistério,Ens. Médio,Ens. Superio,Especialização,Mestrado,Doutorado,Nenhum,Total
0,CMEI AMOR DE MÃE,0,0,0,27,1,0,0,26,54
1,CMEI BELCHIOR JORGE DE SÁ,0,0,0,13,1,0,0,12,26
2,CMEI BOM SAMARITANO,0,0,0,6,1,0,0,5,12
3,CMEI CARMEM FERNANDES PEDROZA,0,0,0,14,5,0,0,9,28
4,CMEI CLARA CAMARÃO,0,0,0,11,1,0,0,10,22
