# Aula 5.3: Conhecendo as APIs do BACEN


## O Portal
O [Portal Brasileiro de Dados Abertos do Banco Central](https://dadosabertos.bcb.gov.br/dataset) é o meio utilizado pelo BC para disponibilizar dados e informações públicas. Ele foi criado para auxiliar os usuários a localizar os conjuntos de dados de seu interesse, entender a estrutura desses dados e encontrar o caminho para acessá-los. Uma vez localizado o dado e entendida a sua estrutura, o usuário poderá utilizar alguma ferramenta de análise ou algum software de programação para acessar os dados propriamente ditos. O Portal contém dados em sua forma bruta, sem formatações visuais, para facilitar o processamento por computadores.


## Dados Disponíveis
São dados armazenados em bases de dados mantidas pelo Banco Central do Brasil, desde que sobre elas não recaia hipótese de restrição de acesso.

Também estão disponíveis os Dados do Sistema Financeiro Nacional, constituídos por informações de interesse público não sujeitas a hipóteses de sigilo, disponibilizadas em formato aberto pelas instituições financeiras e demais instituições autorizadas a funcionar pelo BC, com o objetivo de promover a competitividade, transparência e inovação no setor financeiro.

In [1]:
import requests                    # api module
import json
import pandas as pd

#Obtendo os dados
url = 'https://olinda.bcb.gov.br/olinda/servico/Informes_Agencias/versao/v1/odata/Agencias?$format=json&$select=Segmento,MunicipioIbge,Municipio,UF'
response = requests.get(url)
#bcbJson = response.json()
print(response)

<Response [200]>


É sempre bom checar se o retono da requisição foi 200 (OK).

In [2]:
agencias = response.json()
print(agencias)

{'@odata.context': 'https://was-p.bcnet.bcb.gov.br/olinda/servico/Informes_Agencias/versao/v1/odata$metadata#Agencias(Segmento,MunicipioIbge,Municipio,UF)', 'value': [{'Segmento': 'Caixa Econômica Federal', 'MunicipioIbge': '2107506', 'Municipio': 'PACO DO LUMIAR', 'UF': 'MA'}, {'Segmento': 'Caixa Econômica Federal', 'MunicipioIbge': '3523305', 'Municipio': 'ITARIRI', 'UF': 'SP'}, {'Segmento': 'Banco Múltiplo', 'MunicipioIbge': '3501608', 'Municipio': 'AMERICANA', 'UF': 'SP'}, {'Segmento': 'Banco Múltiplo', 'MunicipioIbge': '3548708', 'Municipio': 'SAO BERNARDO DO CAMPO', 'UF': 'SP'}, {'Segmento': 'Banco Múltiplo', 'MunicipioIbge': '3548708', 'Municipio': 'SAO BERNARDO DO CAMPO', 'UF': 'SP'}, {'Segmento': 'Banco Múltiplo', 'MunicipioIbge': '4322509', 'Municipio': 'VACARIA', 'UF': 'RS'}, {'Segmento': 'Banco Múltiplo', 'MunicipioIbge': '3550308', 'Municipio': 'SAO PAULO', 'UF': 'SP'}, {'Segmento': 'Banco Múltiplo', 'MunicipioIbge': '3300704', 'Municipio': 'CABO FRIO', 'UF': 'RJ'}, {'Segm

O resultado da requisição contém informações sobre instituições bancárias no Brasil.Vamos carregar o resultado da requisição em um Dataframe

In [3]:
dfBancos = pd.json_normalize(agencias['value'])
dfBancos.head()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF
0,Caixa Econômica Federal,2107506,PACO DO LUMIAR,MA
1,Caixa Econômica Federal,3523305,ITARIRI,SP
2,Banco Múltiplo,3501608,AMERICANA,SP
3,Banco Múltiplo,3548708,SAO BERNARDO DO CAMPO,SP
4,Banco Múltiplo,3548708,SAO BERNARDO DO CAMPO,SP


In [4]:
dfBancos.describe()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF
count,18146,18146,18146,18146
unique,21,3149,3070,27
top,Banco Múltiplo,3550308,SAO PAULO,SP
freq,13945,2100,2100,5446


Obseve que existem 21 categorias para segmento. Vamos calcular a quantidade de instituiçõe de cada tipo, nas cidades. Primeiro vamos recuperar uma lista com os tipos de instituição

In [5]:
fullNameList = pd.unique(dfBancos['Segmento'])
initialList = []
for names in fullNameList:
  initialList.append(''.join([x[0] for x in names.split(' ')]))

instSigla = dict(zip(initialList, fullNameList))

Vamos recuperar as informações da API do IBGE:

In [6]:
# Obtendo os dados per capita - ceará e pernambuco
url = 'https://servicodados.ibge.gov.br/api/v3/agregados/3974/periodos/2010/variaveis/3948?localidades=N6[N3[23,26]]&classificacao=12085[100543]|58[95253]'
response = requests.get(url)
pib = response.json()
for item in pib:
  for key in item['resultados']:
    pibJson = key
pibJson.pop('classificacoes')
dfPIB = pd.json_normalize(pibJson['series'])
dfPIB[['CIDADE','UF']] = dfPIB[dfPIB.columns[3]].str.split(' - ',1).tolist()
dfPIB.rename(columns = {dfPIB.columns[0]:'ID', 
                        dfPIB.columns[-3]: 'PIB'},
             inplace = True)
dfPIB.set_index('ID', inplace = True)

new_columns = (dfPIB.columns.drop('PIB').tolist()) + ['PIB']
dfPIB = dfPIB[new_columns]
dfPIB.drop(dfPIB.columns[0:3], axis=1,inplace=True)

# Obtendo a densidade populacional: Ceará e Pernambuco
url = "https://servicodados.ibge.gov.br/api/v3/agregados/1301/periodos/2010/variaveis/616?localidades=N6[N3[23,26]]"

response = requests.get(url)
dens = response.json()
for item in dens:
  for key in item['resultados']:
    densJson = key
densJson.pop('classificacoes')
dfDens = pd.json_normalize(densJson['series'])
dfDens.rename(columns = {dfDens.columns[0]:'ID', 
                         dfDens.columns[-1]: 'DENS'},
             inplace = True)
dfDens.set_index('ID', inplace = True)
dfDens.drop(dfDens.columns[0:3], axis=1,inplace=True)


# Obtendo os dados de escolarização por grupo - ceará e pernambuco
url = 'https://servicodados.ibge.gov.br/api/v3/agregados/3955/periodos/2010/variaveis/3930?localidades=N6[N3[23,26]]&classificacao=12085[100543]|58[95253]'
response = requests.get(url)
esc = response.json()
for item in esc:
  for key in item['resultados']:
    escJson = key
escJson.pop('classificacoes')
dfEsc = pd.json_normalize(escJson['series'])

dfEsc.rename(columns = {dfEsc.columns[0]:'ID',
                       dfEsc.columns[-1]: 'ESC'},
             inplace = True)
dfEsc.set_index('ID', inplace = True)
dfEsc.drop(dfEsc.columns[0:3], axis=1,inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


In [7]:
dfPIB.head()

Unnamed: 0_level_0,CIDADE,UF,PIB
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2300101,Abaiara,CE,240.9
2300150,Acarape,CE,274.9
2300200,Acaraú,CE,276.3
2300309,Acopiara,CE,292.6
2300408,Aiuaba,CE,222.9


In [8]:
dfDens.head()

Unnamed: 0_level_0,DENS
ID,Unnamed: 1_level_1
2300101,58.69
2300150,95.69
2300200,68.31
2300309,22.7
2300408,6.66


In [9]:
dfEsc.head()

Unnamed: 0_level_0,ESC
ID,Unnamed: 1_level_1
2300101,73.8
2300150,50.1
2300200,62.7
2300309,81.5
2300408,71.6


Vamos criar um dataframe único:

In [10]:
df = dfPIB.merge(dfDens.merge(dfEsc,left_index=True, right_index=True),left_index=True, right_index=True)
df.head()

Unnamed: 0_level_0,CIDADE,UF,PIB,DENS,ESC
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2300101,Abaiara,CE,240.9,58.69,73.8
2300150,Acarape,CE,274.9,95.69,50.1
2300200,Acaraú,CE,276.3,68.31,62.7
2300309,Acopiara,CE,292.6,22.7,81.5
2300408,Aiuaba,CE,222.9,6.66,71.6


Ótimo. Agora vamos retornar ao dataframe dos dados bancários. Iremos recuperar apenas os dados dos estados CE e PE:

In [11]:
dfBancos = pd.concat([dfBancos[dfBancos['UF']=='CE'],dfBancos[dfBancos['UF']=='PE']])
dfBancos.head()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF
38,Banco Múltiplo,2304400,FORTALEZA,CE
127,Banco Múltiplo,2304400,FORTALEZA,CE
164,Banco Múltiplo,2304400,FORTALEZA,CE
211,Banco Múltiplo,2304400,FORTALEZA,CE
227,Banco Múltiplo,2304400,FORTALEZA,CE


Vamos utilizar o método `value_counts` para verificar a quantidade de amostras por segmento:

In [12]:
dfBancos.value_counts(subset = 'Segmento')

Segmento
Banco Múltiplo                                        741
Caixa Econômica Federal                               177
Sociedade Corretora de Câmbio                          18
Sociedade de Crédito Direto                             3
Sociedade de Crédito ao Microempreendedor               3
Associação de Poupança e Empréstimo                     2
Sociedade Corretora de TVM                              2
BNDES                                                   1
Sociedade de Crédito, Financiamento e Investimento      1
dtype: int64

In [13]:
dfBancos.describe()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF
count,948,948,948,948
unique,9,188,188,2
top,Banco Múltiplo,2611606,RECIFE,PE
freq,741,180,180,505


Certo, temos apenas 9 classes de segmento para CE e PE. Vamos criar uma lista de abreviaturas para adicionar ao dataframe que já contem os dados de PIB, Densidade Populacional e Escolaridade:

In [14]:
fullNameList = pd.unique(dfBancos['Segmento'])
initialList = []
for names in fullNameList:
  initialList.append(''.join([x[0] for x in names.split(' ')]))

`fullNameList`: Corresponde às classes dos Segmentos

`initialList`: Corresponde às iniciais dos Segmentos

Agora, precisamos contar quantas vezes cada classe é observada em cada cidade. Quantos Bancos Múltiplos existem na cidade de Juazeiro do Norte-CE? Para isso, vamos usar o método `groupby`:

In [15]:
dfTest = dfBancos
for i in range(0,len(initialList)):
  dfTest[initialList[i]] = dfTest[ dfTest['Segmento']== fullNameList[i]].groupby(['MunicipioIbge'])['Segmento'].transform('count')

O que fizemos aqui? Buscamos cada classe de segmento no DataFrame e agrupamos po Município. Por fim, usamos `transform('count')` para contabilizar a quantidade de itens. Adicionamos essas contagens a novas colunas das iniciais de cada categoria.

In [16]:
dfTest.head()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
38,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,,,,,,,,
127,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,,,,,,,,
164,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,,,,,,,,
211,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,,,,,,,,
227,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,,,,,,,,


O processo acabou linhas colunas duplicadas. Vamos remover os segmentos duplicados em cada município:

In [17]:
dfTest = dfTest.drop_duplicates(subset=['Segmento','MunicipioIbge'])
dfTest.head()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
38,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,,,,,,,,
244,Banco Múltiplo,2301109,ARACATI,CE,4.0,,,,,,,,
291,Caixa Econômica Federal,2304400,FORTALEZA,CE,,34.0,,,,,,,
439,Banco Múltiplo,2312304,SAO BENEDITO,CE,3.0,,,,,,,,
479,Banco Múltiplo,2304285,EUSEBIO,CE,4.0,,,,,,,,


Ótimo. Porém, agora temos NaN nas colunas que não puderam ser preenchidas. Vamos usar o `fillna(0)` para lidar com esse problema.

In [18]:
dfTest=dfTest.fillna(0)
dfTest.head()

Unnamed: 0,Segmento,MunicipioIbge,Municipio,UF,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
38,Banco Múltiplo,2304400,FORTALEZA,CE,124.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
244,Banco Múltiplo,2301109,ARACATI,CE,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
291,Caixa Econômica Federal,2304400,FORTALEZA,CE,0.0,34.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
439,Banco Múltiplo,2312304,SAO BENEDITO,CE,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
479,Banco Múltiplo,2304285,EUSEBIO,CE,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Vamos renomear as colunas `Segmento`,  `MunicipioIbge` e  `Municipio` para padronizar com o DataFrame do IBGE:

In [19]:
dfTest.rename(columns = {dfTest.columns[1]:'ID',
                       dfTest.columns[2]: 'CIDADE'},
             inplace = True)
dfTest.drop(['Segmento'], axis=1,inplace=True)
dfTest.set_index('ID', inplace = True)
dfTest.head()

Unnamed: 0_level_0,CIDADE,UF,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2304400,FORTALEZA,CE,124.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2301109,ARACATI,CE,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2304400,FORTALEZA,CE,0.0,34.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2312304,SAO BENEDITO,CE,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2304285,EUSEBIO,CE,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Agora precisamos usar o `groupby` novamente para juntar linhas da mesma cidade:

In [20]:
dfTest = dfTest.groupby(['ID'])[initialList].sum()
dfTest.head()

Unnamed: 0_level_0,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2300200,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300309,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300705,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300754,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2301000,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Conseguimos construir uma tabela que relaciona as classes (itens) dos segmentos à quantidade em cada cidade. Vamos verificar as intituições na cidade de Fortaleza:

In [21]:
fullNameList

array(['Banco Múltiplo', 'Caixa Econômica Federal',
       'Sociedade Corretora de Câmbio',
       'Sociedade de Crédito ao Microempreendedor',
       'Sociedade de Crédito Direto',
       'Associação de Poupança e Empréstimo',
       'Sociedade Corretora de TVM',
       'Sociedade de Crédito, Financiamento e Investimento', 'BNDES'],
      dtype=object)

In [22]:
dfTest[dfTest.index == '2304400']

Unnamed: 0_level_0,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2304400,124.0,34.0,4.0,2.0,2.0,1.0,1.0,0.0,0.0


Após esse pré-processamento, vamos concatenar os datasets do IBGE e do Banco Central:

In [23]:
df = df.merge(dfTest,left_index=True, right_index=True, how='outer').fillna(0)
df.head()

Unnamed: 0_level_0,CIDADE,UF,PIB,DENS,ESC,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2300101,Abaiara,CE,240.9,58.69,73.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300150,Acarape,CE,274.9,95.69,50.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300200,Acaraú,CE,276.3,68.31,62.7,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300309,Acopiara,CE,292.6,22.7,81.5,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2300408,Aiuaba,CE,222.9,6.66,71.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [24]:
df.describe()

Unnamed: 0,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B
count,369.0,369.0,369.0,369.0,369.0,369.0,369.0,369.0,369.0
mean,2.00813,0.479675,0.04878,0.00813,0.00813,0.00542,0.00542,0.00271,0.00271
std,9.643225,2.319634,0.661688,0.116278,0.116278,0.073521,0.073521,0.052058,0.052058
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,2.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,135.0,34.0,12.0,2.0,2.0,1.0,1.0,1.0,1.0


Para Clustering, podemos começar pelas colunas a seguir:

In [25]:
print(df.columns[2:].tolist())

['PIB', 'DENS', 'ESC', 'BM', 'CEF', 'SCdC', 'SdCaM', 'SdCD', 'AdPeE', 'SCdT', 'SdCFeI', 'B']


Vamos escolher as cinco primeiras classes para clusterização:

In [26]:
cols = df.columns[2:8].tolist()
X = df[cols].to_numpy()
print(X)

[['240.9' '58.69' '73.8' 0.0 0.0 0.0]
 ['274.9' '95.69' '50.1' 0.0 0.0 0.0]
 ['276.3' '68.31' '62.7' 3.0 1.0 0.0]
 ...
 ['264.1' '134.78' '77.7' 1.0 0.0 0.0]
 ['425.5' '349.58' '72.1' 5.0 2.0 0.0]
 ['264.2' '127.18' '91.4' 0.0 0.0 0.0]]


In [27]:
df[cols]=df[cols].astype(float)
X = df[cols].to_numpy()
print(X)

[[240.9   58.69  73.8    0.     0.     0.  ]
 [274.9   95.69  50.1    0.     0.     0.  ]
 [276.3   68.31  62.7    3.     1.     0.  ]
 ...
 [264.1  134.78  77.7    1.     0.     0.  ]
 [425.5  349.58  72.1    5.     2.     0.  ]
 [264.2  127.18  91.4    0.     0.     0.  ]]


Executando o KMeans com 8 clusters:

In [28]:
from sklearn.cluster import KMeans
cl = KMeans(n_clusters=8, random_state=0).fit(X)
clusters = pd.DataFrame(cl.labels_, columns=['cluster'])
detail = pd.merge(df.reset_index(), clusters, right_index=True, left_index=True)
detail['cluster'] = detail['cluster'].astype(str)
detail.head()

Unnamed: 0,ID,CIDADE,UF,PIB,DENS,ESC,BM,CEF,SCdC,SdCaM,SdCD,AdPeE,SCdT,SdCFeI,B,cluster
0,2300101,Abaiara,CE,240.9,58.69,73.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7
1,2300150,Acarape,CE,274.9,95.69,50.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7
2,2300200,Acaraú,CE,276.3,68.31,62.7,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7
3,2300309,Acopiara,CE,292.6,22.7,81.5,3.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7
4,2300408,Aiuaba,CE,222.9,6.66,71.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,7


Vamos obter novamente os shapes:

In [29]:
#Shapes CE
url = 'https://servicodados.ibge.gov.br/api/v3/malhas/estados/23?formato=application/vnd.geo+json&qualidade=minima&intrarregiao=municipio'
response = requests.get(url)
shapesJsonCE = response.json()


#Shapes PE
url = 'https://servicodados.ibge.gov.br/api/v3/malhas/estados/26?formato=application/vnd.geo+json&qualidade=minima&intrarregiao=municipio'
response = requests.get(url)
shapesJsonPE = response.json()
shapesJson = shapesJsonCE
shapesJson['features'] += shapesJsonPE['features']

In [30]:
import plotly.express as px
from plotly.offline import init_notebook_mode, plot, iplot, download_plotlyjs
br_lat = -6
br_lon = -38.0
brazilMap = px.choropleth_mapbox(detail,
                geojson=shapesJson, 
                locations='ID',
                hover_name = 'CIDADE',
                color="cluster", 
                featureidkey="properties.codarea",
                )
brazilMap.update_layout(mapbox_style="open-street-map", 
                            mapbox_zoom=5, 
                            mapbox_center = {"lat": br_lat, "lon": br_lon},
                            title="Clusters - Kmeans")
brazilMap.show()