# Extraindo Dados

Desenvolvendo uma tabela que contenha os seguintes dados: o identificador do estudante (ID discente), o ano em que o estudante ingressou na instituição, o ano e período da última matrícula realizada pelo estudante, o status atual do estudante e a quantidade de vezes que o estudante cursou a disciplina.

Importando o Pandas e o Csv.

In [1913]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import csv
sns.set()

Leitura do arquivo em csv e carregamento desses dados em um DataFrame usando o ponto e vírgula como separador.

In [1914]:
df_dados = pd.read_csv('dataframe-bsi-2009-2022.csv', sep=';')

Listando as colunas do dataframe.

In [1915]:
df_dados.columns

Index(['discente', 'unidade', 'media_final', 'descricao', 'ano',
       'id_componente', 'nome', 'ch_total', 'sexo', 'ano_nascimento',
       'ano_ingresso', 'status'],
      dtype='object')

# Filtros

Fazendo um recorte da nossa análise, vamos começar por disciplinas obrigatórias do Bacharelado em Sistemas da Informação (BSI):

In [1916]:
lista_obrigatórias = [
                'ALGORITMOS E LÓGICA DE PROGRAMAÇÃO',
                'INTRODUÇÃO À INFORMÁTICA',
                'FUNDAMENTOS DE MATEMÁTICA',
                'LÓGICA',
                'TEORIA GERAL DA ADMINISTRAÇÃO',
                'PROGRAMAÇÃO',
                'CÁLCULO DIFERENCIAL E INTEGRAL',
                'TEORIA GERAL DOS SISTEMAS',
                'PROGRAMAÇÃO ORIENTADA A OBJETOS I',
                'ESTRUTURA DE DADOS',
                'ÁLGEBRA LINEAR',
                'ORGANIZAÇÃO, SISTEMAS E MÉTODOS',
                'FUNDAMENTOS DE SISTEMAS DE INFORMAÇÃO',
                'PROGRAMAÇÃO WEB',
                'ARQUITETURA DE COMPUTADORES',
                'PROBABILIDADE E ESTATÍSTICA',
                'BANCO DE DADOS',
                'ENGENHARIA DE SOFTWARE I',
                'PROGRAMAÇÃO ORIENTADA A OBJETOS II',
                'SISTEMAS OPERACIONAIS',
                'PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS',
                'ENGENHARIA DE SOFTWARE II',
                'REDES DE COMPUTADORES',
                'EMPREENDEDORISMO EM INFORMÁTICA',
                'GESTÃO DE PROJETO DE SOFTWARE',
                'PROGRAMAÇÃO VISUAL',
                'MATEMÁTICA FINANCEIRA',
                'SISTEMAS DE APOIO À DECISÃO',
                'ÉTICA',
                ]
condição_nome = f"nome in {lista_obrigatórias}"
df_dados_filtrado = df_dados.query(condição_nome)

Listando os dados NaN.

In [1917]:
df_dados_filtrado.isnull().sum()

discente             0
unidade           2034
media_final       4449
descricao            0
ano                  0
id_componente        0
nome                 0
ch_total             0
sexo                 0
ano_nascimento       0
ano_ingresso         0
status               0
dtype: int64

Preenchendo os dados NaN da coluna **unidade** com 1.

In [1918]:
df_dados_filtrado.loc[:, 'unidade'] = df_dados_filtrado['unidade'].fillna(1)
df_dados_filtrado 

Unnamed: 0,discente,unidade,media_final,descricao,ano,id_componente,nome,ch_total,sexo,ano_nascimento,ano_ingresso,status
0,afba64c0118bfcc8d5b3987e725ed545,1.0,15,REPROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1987,2009,CANCELADO
1,afba64c0118bfcc8d5b3987e725ed545,2.0,15,REPROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1987,2009,CANCELADO
2,afba64c0118bfcc8d5b3987e725ed545,3.0,15,REPROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1987,2009,CANCELADO
3,9526e01da587b20211a39b4e66673aea,1.0,92,APROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1990,2009,CONCLUÍDO
4,9526e01da587b20211a39b4e66673aea,2.0,92,APROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1990,2009,CONCLUÍDO
...,...,...,...,...,...,...,...,...,...,...,...,...
53194,22f4aed4a073c5e9515a8669e9c102f3,2.0,52,APROVADO POR NOTA,20222,62764,PROGRAMAÇÃO VISUAL,60,M,1984,2019,ATIVO - FORMANDO
53195,22f4aed4a073c5e9515a8669e9c102f3,3.0,52,APROVADO POR NOTA,20222,62764,PROGRAMAÇÃO VISUAL,60,M,1984,2019,ATIVO - FORMANDO
53199,943463d2de8c6a60ec5f5d959ba7f1ac,1.0,100,APROVADO,20222,2054400,MATEMÁTICA FINANCEIRA,60,M,1999,2018,CONCLUÍDO
53200,943463d2de8c6a60ec5f5d959ba7f1ac,2.0,100,APROVADO,20222,2054400,MATEMÁTICA FINANCEIRA,60,M,1999,2018,CONCLUÍDO


Filtrar os dados onde a coluna **unidade** foi preenchida com 1.

In [1919]:
df_dados_filtrado = df_dados_filtrado[df_dados_filtrado['unidade'] == 1]
df_dados_filtrado

Unnamed: 0,discente,unidade,media_final,descricao,ano,id_componente,nome,ch_total,sexo,ano_nascimento,ano_ingresso,status
0,afba64c0118bfcc8d5b3987e725ed545,1.0,15,REPROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1987,2009,CANCELADO
3,9526e01da587b20211a39b4e66673aea,1.0,92,APROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1990,2009,CONCLUÍDO
6,1ed6777bd6ff4fd393e0b334d519c642,1.0,80,APROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1991,2009,CONCLUÍDO
9,cd66757ed4a317a3537ae3e246648778,1.0,73,APROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1975,2009,CANCELADO
12,fa7b20f8ac2312976cd7338487ad527d,1.0,98,APROVADO,20091,2037000,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,90,M,1978,2009,CONCLUÍDO
...,...,...,...,...,...,...,...,...,...,...,...,...
52942,b3c338925bb16486f1bf2a2f1771e464,1.0,70,EXCLUIDA,20222,70879,EMPREENDEDORISMO EM INFORMÁTICA,60,M,2001,2020,ATIVO
53189,7d2dd0d35ebb8319b0c0e612660d2c3a,1.0,93,APROVADO,20222,62766,SISTEMAS DE APOIO À DECISÃO,60,M,2000,2019,CONCLUÍDO
53190,7d2dd0d35ebb8319b0c0e612660d2c3a,1.0,98,APROVADO,20222,62764,PROGRAMAÇÃO VISUAL,60,M,2000,2019,CONCLUÍDO
53193,22f4aed4a073c5e9515a8669e9c102f3,1.0,52,APROVADO POR NOTA,20222,62764,PROGRAMAÇÃO VISUAL,60,M,1984,2019,ATIVO - FORMANDO


In [1920]:
df_dados_filtrado = df_dados_filtrado.replace({'descricao':
               {'APROVADO POR NOTA':'APROVADO',
                'REPROVADO POR FALTAS':'REPROVADO',
                'REPROVADO POR MÉDIA E POR FALTAS':'REPROVADO',
                'REPROVADO POR NOTA E FALTA':'REPROVADO',
                'REPROVADO POR NOTA':'REPROVADO',
                }}, regex=True)

In [1921]:
df_dados_filtrado.descricao.value_counts()

descricao
APROVADO       8390
REPROVADO      2771
EXCLUIDA       1118
CANCELADO       898
TRANCADO        591
INDEFERIDO      397
DESISTENCIA     178
Name: count, dtype: int64

Calculando a quantidade de vezes que cada discente cursou cada disciplina.

In [1922]:
quantidade_disciplinas = df_dados_filtrado.groupby(['discente', 'nome']).size().reset_index(name='quantidade')
quantidade_disciplinas 


Unnamed: 0,discente,nome,quantidade
0,001cea3c82e2010681f2cdeab21e5ecf,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,1
1,001cea3c82e2010681f2cdeab21e5ecf,FUNDAMENTOS DE MATEMÁTICA,1
2,001cea3c82e2010681f2cdeab21e5ecf,INTRODUÇÃO À INFORMÁTICA,1
3,001cea3c82e2010681f2cdeab21e5ecf,LÓGICA,1
4,001cea3c82e2010681f2cdeab21e5ecf,TEORIA GERAL DA ADMINISTRAÇÃO,1
...,...,...,...
10584,ff56f2c5048dae0797fd3e851572b80c,TEORIA GERAL DOS SISTEMAS,2
10585,ff56f2c5048dae0797fd3e851572b80c,ÁLGEBRA LINEAR,1
10586,ffe447b2fe2058d45d5f756349a26f45,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,1
10587,ffe447b2fe2058d45d5f756349a26f45,INTRODUÇÃO À INFORMÁTICA,1


Pivotando as disciplinas.

In [1923]:
tabela_final = quantidade_disciplinas.pivot(index='discente', columns='nome', values='quantidade').reset_index()
tabela_final

nome,discente,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,ARQUITETURA DE COMPUTADORES,BANCO DE DADOS,CÁLCULO DIFERENCIAL E INTEGRAL,EMPREENDEDORISMO EM INFORMÁTICA,ENGENHARIA DE SOFTWARE I,ENGENHARIA DE SOFTWARE II,ESTRUTURA DE DADOS,FUNDAMENTOS DE MATEMÁTICA,...,PROGRAMAÇÃO VISUAL,PROGRAMAÇÃO WEB,PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS,REDES DE COMPUTADORES,SISTEMAS DE APOIO À DECISÃO,SISTEMAS OPERACIONAIS,TEORIA GERAL DA ADMINISTRAÇÃO,TEORIA GERAL DOS SISTEMAS,ÁLGEBRA LINEAR,ÉTICA
0,001cea3c82e2010681f2cdeab21e5ecf,1.0,,,,,,,,1.0,...,,,,,,,1.0,,,
1,005c14d7c07bf7980b60c703f99c5ee7,1.0,2.0,1.0,1.0,,1.0,,3.0,3.0,...,,1.0,2.0,1.0,,1.0,1.0,1.0,2.0,1.0
2,0107fd69d8cd7e3d30dede96fb68bfe5,2.0,,,,,,,,1.0,...,,,,,,,1.0,1.0,,1.0
3,014789363f7940922e71e710ee9d22bc,2.0,3.0,1.0,,1.0,1.0,1.0,2.0,3.0,...,1.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0
4,014f0dec46fe7a9c5836527662e1df10,1.0,,1.0,,,,,,1.0,...,1.0,2.0,,,,,1.0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
677,fe802d8d85de6f842749468401d1146c,2.0,,,,,,,,2.0,...,,,,,,,1.0,1.0,,
678,fe87dfa176a74fc10a5cb701b9fb5dd4,,,,,,,2.0,,,...,,,,,,1.0,,,,
679,fec9ed6026d55ecdf514c640312c3d08,1.0,1.0,1.0,1.0,,1.0,,1.0,1.0,...,,2.0,1.0,,,,1.0,1.0,1.0,
680,ff56f2c5048dae0797fd3e851572b80c,4.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,6.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,


Substituindo NaN por 0 nas colunas de nome.

In [1924]:
tabela_final = tabela_final.fillna(0)

Adicionando o *ano_ingresso* para cada discente.

In [1925]:
ano_ingresso_discente = df_dados_filtrado.drop_duplicates(subset=['discente'])[['discente', 'ano_ingresso']]
tabela_final = tabela_final.merge(ano_ingresso_discente, on='discente', how='left')
tabela_final

Unnamed: 0,discente,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,ARQUITETURA DE COMPUTADORES,BANCO DE DADOS,CÁLCULO DIFERENCIAL E INTEGRAL,EMPREENDEDORISMO EM INFORMÁTICA,ENGENHARIA DE SOFTWARE I,ENGENHARIA DE SOFTWARE II,ESTRUTURA DE DADOS,FUNDAMENTOS DE MATEMÁTICA,...,PROGRAMAÇÃO WEB,PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS,REDES DE COMPUTADORES,SISTEMAS DE APOIO À DECISÃO,SISTEMAS OPERACIONAIS,TEORIA GERAL DA ADMINISTRAÇÃO,TEORIA GERAL DOS SISTEMAS,ÁLGEBRA LINEAR,ÉTICA,ano_ingresso
0,001cea3c82e2010681f2cdeab21e5ecf,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2018
1,005c14d7c07bf7980b60c703f99c5ee7,1.0,2.0,1.0,1.0,0.0,1.0,0.0,3.0,3.0,...,1.0,2.0,1.0,0.0,1.0,1.0,1.0,2.0,1.0,2018
2,0107fd69d8cd7e3d30dede96fb68bfe5,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,2011
3,014789363f7940922e71e710ee9d22bc,2.0,3.0,1.0,0.0,1.0,1.0,1.0,2.0,3.0,...,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2016
4,014f0dec46fe7a9c5836527662e1df10,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,...,2.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2020
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
677,fe802d8d85de6f842749468401d1146c,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2022
678,fe87dfa176a74fc10a5cb701b9fb5dd4,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2016
679,fec9ed6026d55ecdf514c640312c3d08,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,2.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,2020
680,ff56f2c5048dae0797fd3e851572b80c,4.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,6.0,...,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,0.0,2014


Alterando o conteúdo da coluna status.

In [1926]:
df_dados_filtrado['status'] = df_dados_filtrado['status'].replace({'CANCELADO': -1, 'ATIVO': -1, 'ATIVO - FORMANDO': -1,'CONCLUÍDO': 1})

Adicionando o *status* para cada discente.

In [1927]:
status_discente = df_dados_filtrado.drop_duplicates(subset=['discente'])[['discente', 'status']]
tabela_final = tabela_final.merge(status_discente, on='discente', how='left')
tabela_final

Unnamed: 0,discente,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,ARQUITETURA DE COMPUTADORES,BANCO DE DADOS,CÁLCULO DIFERENCIAL E INTEGRAL,EMPREENDEDORISMO EM INFORMÁTICA,ENGENHARIA DE SOFTWARE I,ENGENHARIA DE SOFTWARE II,ESTRUTURA DE DADOS,FUNDAMENTOS DE MATEMÁTICA,...,PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS,REDES DE COMPUTADORES,SISTEMAS DE APOIO À DECISÃO,SISTEMAS OPERACIONAIS,TEORIA GERAL DA ADMINISTRAÇÃO,TEORIA GERAL DOS SISTEMAS,ÁLGEBRA LINEAR,ÉTICA,ano_ingresso,status
0,001cea3c82e2010681f2cdeab21e5ecf,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2018,-1
1,005c14d7c07bf7980b60c703f99c5ee7,1.0,2.0,1.0,1.0,0.0,1.0,0.0,3.0,3.0,...,2.0,1.0,0.0,1.0,1.0,1.0,2.0,1.0,2018,-1
2,0107fd69d8cd7e3d30dede96fb68bfe5,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,2011,-1
3,014789363f7940922e71e710ee9d22bc,2.0,3.0,1.0,0.0,1.0,1.0,1.0,2.0,3.0,...,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,2016,1
4,014f0dec46fe7a9c5836527662e1df10,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2020,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
677,fe802d8d85de6f842749468401d1146c,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,2022,-1
678,fe87dfa176a74fc10a5cb701b9fb5dd4,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2016,1
679,fec9ed6026d55ecdf514c640312c3d08,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,1.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,2020,-1
680,ff56f2c5048dae0797fd3e851572b80c,4.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,6.0,...,1.0,1.0,1.0,1.0,1.0,2.0,1.0,0.0,2014,1


Agrupando por discente e somando a carga horária.

In [1928]:
df_carga_horaria_cumprida = df_dados_filtrado.groupby('discente')['ch_total'].sum().reset_index()
df_carga_horaria_cumprida.rename(columns={'ch_total': 'ch_cumprida'}, inplace=True)

Adicionando a carga horária cumprida de cada discente.

In [1929]:
tabela_final = tabela_final.merge(df_carga_horaria_cumprida, on='discente', how='left')

Dividindo a carga horária cumprida do discente pela carga horária das disciplinas obrigatórias.

In [1930]:
tabela_final['ch_cumprida_dividida'] = tabela_final['ch_cumprida'] / 1830

Identificar semestres únicos cursados por cada discente.

In [1931]:
semestres_unicos_por_discente = df_dados_filtrado.groupby('discente')['ano'].nunique().reset_index()
semestres_unicos_por_discente.rename(columns={'ano' : 'semestre'}, inplace=True)

Adicionando a quantidade de semestres cursados por cada discente.

In [1932]:
tabela_final = tabela_final.merge(semestres_unicos_por_discente, on='discente', how='left')

Dividindo a quantidade de semestres cursados pela quantidade de semestres estimados no curso.

In [1933]:
tabela_final['semestre_dividido'] = tabela_final['semestre'] / 8

Agrupar por 'discente' e 'descricao' e contar as matrículas.

In [1934]:
matriculas_por_disciplina = df_dados_filtrado.groupby(['discente', 'descricao']).size().reset_index(name='matriculas')
matriculas_por_disciplina

Unnamed: 0,discente,descricao,matriculas
0,001cea3c82e2010681f2cdeab21e5ecf,DESISTENCIA,5
1,005c14d7c07bf7980b60c703f99c5ee7,APROVADO,18
2,005c14d7c07bf7980b60c703f99c5ee7,INDEFERIDO,3
3,005c14d7c07bf7980b60c703f99c5ee7,REPROVADO,14
4,005c14d7c07bf7980b60c703f99c5ee7,TRANCADO,1
...,...,...,...
2083,ff56f2c5048dae0797fd3e851572b80c,CANCELADO,4
2084,ff56f2c5048dae0797fd3e851572b80c,EXCLUIDA,5
2085,ff56f2c5048dae0797fd3e851572b80c,INDEFERIDO,6
2086,ff56f2c5048dae0797fd3e851572b80c,REPROVADO,9


Pivotar o DataFrame.

In [1935]:
tabela_disciplina = matriculas_por_disciplina.pivot(index='discente', columns='descricao', values='matriculas').reset_index()
tabela_disciplina

descricao,discente,APROVADO,CANCELADO,DESISTENCIA,EXCLUIDA,INDEFERIDO,REPROVADO,TRANCADO
0,001cea3c82e2010681f2cdeab21e5ecf,,,5.0,,,,
1,005c14d7c07bf7980b60c703f99c5ee7,18.0,,,,3.0,14.0,1.0
2,0107fd69d8cd7e3d30dede96fb68bfe5,3.0,6.0,,,,4.0,
3,014789363f7940922e71e710ee9d22bc,28.0,,,2.0,,8.0,
4,014f0dec46fe7a9c5836527662e1df10,2.0,2.0,,6.0,,,
...,...,...,...,...,...,...,...,...
677,fe802d8d85de6f842749468401d1146c,4.0,,,,,4.0,
678,fe87dfa176a74fc10a5cb701b9fb5dd4,4.0,,,2.0,1.0,,
679,fec9ed6026d55ecdf514c640312c3d08,20.0,,,2.0,,,1.0
680,ff56f2c5048dae0797fd3e851572b80c,28.0,4.0,,5.0,6.0,9.0,


Preencher valores NaN com 0.

In [1936]:
tabela_disciplina = tabela_disciplina.fillna(0)

Encontrar o *último período* que cada aluno estudou no curso.

In [1937]:
ultimo_periodo = df_dados_filtrado.groupby('discente')['ano'].max().reset_index()
ultimo_periodo.rename(columns={'ano': 'ultimo_periodo'}, inplace=True)
ultimo_periodo

Unnamed: 0,discente,ultimo_periodo
0,001cea3c82e2010681f2cdeab21e5ecf,20181
1,005c14d7c07bf7980b60c703f99c5ee7,20221
2,0107fd69d8cd7e3d30dede96fb68bfe5,20121
3,014789363f7940922e71e710ee9d22bc,20206
4,014f0dec46fe7a9c5836527662e1df10,20206
...,...,...
677,fe802d8d85de6f842749468401d1146c,20222
678,fe87dfa176a74fc10a5cb701b9fb5dd4,20206
679,fec9ed6026d55ecdf514c640312c3d08,20222
680,ff56f2c5048dae0797fd3e851572b80c,20192


Juntar as informações do último período ao DataFrame original.

In [1938]:
tabela_final = tabela_final.merge(ultimo_periodo, on='discente', how='left')
tabela_final 

Unnamed: 0,discente,ALGORITMOS E LÓGICA DE PROGRAMAÇÃO,ARQUITETURA DE COMPUTADORES,BANCO DE DADOS,CÁLCULO DIFERENCIAL E INTEGRAL,EMPREENDEDORISMO EM INFORMÁTICA,ENGENHARIA DE SOFTWARE I,ENGENHARIA DE SOFTWARE II,ESTRUTURA DE DADOS,FUNDAMENTOS DE MATEMÁTICA,...,TEORIA GERAL DOS SISTEMAS,ÁLGEBRA LINEAR,ÉTICA,ano_ingresso,status,ch_cumprida,ch_cumprida_dividida,semestre,semestre_dividido,ultimo_periodo
0,001cea3c82e2010681f2cdeab21e5ecf,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,2018,-1,330,0.180328,1,0.125,20181
1,005c14d7c07bf7980b60c703f99c5ee7,1.0,2.0,1.0,1.0,0.0,1.0,0.0,3.0,3.0,...,1.0,2.0,1.0,2018,-1,2340,1.278689,10,1.250,20221
2,0107fd69d8cd7e3d30dede96fb68bfe5,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,1.0,2011,-1,870,0.475410,3,0.375,20121
3,014789363f7940922e71e710ee9d22bc,2.0,3.0,1.0,0.0,1.0,1.0,1.0,2.0,3.0,...,1.0,1.0,1.0,2016,1,2430,1.327869,11,1.375,20206
4,014f0dec46fe7a9c5836527662e1df10,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,2020,-1,630,0.344262,3,0.375,20206
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
677,fe802d8d85de6f842749468401d1146c,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,...,1.0,0.0,0.0,2022,-1,540,0.295082,2,0.250,20222
678,fe87dfa176a74fc10a5cb701b9fb5dd4,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,...,0.0,0.0,0.0,2016,1,420,0.229508,3,0.375,20206
679,fec9ed6026d55ecdf514c640312c3d08,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,...,1.0,1.0,0.0,2020,-1,1470,0.803279,7,0.875,20222
680,ff56f2c5048dae0797fd3e851572b80c,4.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,6.0,...,2.0,1.0,0.0,2014,1,3390,1.852459,12,1.500,20192


Soma dos valores das colunas para contabilizar a matricula por discente.

In [1939]:
matricula =      tabela_final['ALGORITMOS E LÓGICA DE PROGRAMAÇÃO'] + \
                 tabela_final['INTRODUÇÃO À INFORMÁTICA'] + \
                 tabela_final['FUNDAMENTOS DE MATEMÁTICA'] + \
                 tabela_final['LÓGICA'] + \
                 tabela_final['TEORIA GERAL DA ADMINISTRAÇÃO'] + \
                 tabela_final['PROGRAMAÇÃO'] + \
                 tabela_final['CÁLCULO DIFERENCIAL E INTEGRAL'] + \
                 tabela_final['TEORIA GERAL DOS SISTEMAS'] + \
                 tabela_final['PROGRAMAÇÃO ORIENTADA A OBJETOS I'] + \
                 tabela_final['ESTRUTURA DE DADOS'] + \
                 tabela_final['ÁLGEBRA LINEAR'] + \
                 tabela_final['ORGANIZAÇÃO, SISTEMAS E MÉTODOS'] + \
                 tabela_final['FUNDAMENTOS DE SISTEMAS DE INFORMAÇÃO'] + \
                 tabela_final['PROGRAMAÇÃO WEB'] + \
                 tabela_final['ARQUITETURA DE COMPUTADORES'] + \
                 tabela_final['PROBABILIDADE E ESTATÍSTICA'] + \
                 tabela_final['BANCO DE DADOS'] + \
                 tabela_final['ENGENHARIA DE SOFTWARE I'] + \
                 tabela_final['PROGRAMAÇÃO ORIENTADA A OBJETOS II'] + \
                 tabela_final['SISTEMAS OPERACIONAIS'] + \
                 tabela_final['PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS'] + \
                 tabela_final['ENGENHARIA DE SOFTWARE II'] + \
                 tabela_final['REDES DE COMPUTADORES'] + \
                 tabela_final['EMPREENDEDORISMO EM INFORMÁTICA'] + \
                 tabela_final['GESTÃO DE PROJETO DE SOFTWARE'] + \
                 tabela_final['PROGRAMAÇÃO VISUAL'] + \
                 tabela_final['MATEMÁTICA FINANCEIRA'] + \
                 tabela_final['SISTEMAS DE APOIO À DECISÃO'] + \
                 tabela_final['ÉTICA']

Adicionando a coluna matriculas na tabela.

In [1940]:
tabela_final['matricula'] = matricula

Converter a coluna 'ch_cumprida_dividida' para tipo numérico.

In [1941]:
tabela_final['ch_cumprida_dividida'] = pd.to_numeric(tabela_final['ch_cumprida_dividida'], errors='coerce')

Converter a coluna 'status' para tipo numérico.

In [1942]:
tabela_final['status'] = pd.to_numeric(tabela_final['status'], errors='coerce')

Multiplicar os valores da coluna 'ch_cumprida_dividida' pela coluna 'status'.

In [1943]:
tabela_final['tempo_relativo'] = tabela_final['ch_cumprida_dividida'] * tabela_final['status']

Mudando a ordem das colunas.

In [1946]:
colunas_ordenadas = ['discente', 'ano_ingresso', 'ultimo_periodo', 'semestre',
                    'semestre_dividido', 'ch_cumprida', 'ch_cumprida_dividida', 
                    'tempo_relativo', 'status', 'matricula',
                    'ALGORITMOS E LÓGICA DE PROGRAMAÇÃO',
                    'INTRODUÇÃO À INFORMÁTICA',
                    'FUNDAMENTOS DE MATEMÁTICA',
                    'LÓGICA',
                    'TEORIA GERAL DA ADMINISTRAÇÃO',
                    'PROGRAMAÇÃO',
                    'CÁLCULO DIFERENCIAL E INTEGRAL',
                    'TEORIA GERAL DOS SISTEMAS',
                    'PROGRAMAÇÃO ORIENTADA A OBJETOS I',
                    'ESTRUTURA DE DADOS',
                    'ÁLGEBRA LINEAR',
                    'ORGANIZAÇÃO, SISTEMAS E MÉTODOS',
                    'FUNDAMENTOS DE SISTEMAS DE INFORMAÇÃO',
                    'PROGRAMAÇÃO WEB',
                    'ARQUITETURA DE COMPUTADORES',
                    'PROBABILIDADE E ESTATÍSTICA',
                    'BANCO DE DADOS',
                    'ENGENHARIA DE SOFTWARE I',
                    'PROGRAMAÇÃO ORIENTADA A OBJETOS II',
                    'SISTEMAS OPERACIONAIS',
                    'PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS',
                    'ENGENHARIA DE SOFTWARE II',
                    'REDES DE COMPUTADORES',
                    'EMPREENDEDORISMO EM INFORMÁTICA',
                    'GESTÃO DE PROJETO DE SOFTWARE',
                    'PROGRAMAÇÃO VISUAL',
                    'MATEMÁTICA FINANCEIRA',
                    'SISTEMAS DE APOIO À DECISÃO',
                    'ÉTICA']


Reordenando as colunas.

In [1947]:
tabela_final = tabela_final[colunas_ordenadas]
tabela_final

Unnamed: 0,discente,ano_ingresso,ultimo_periodo,semestre,semestre_dividido,ch_cumprida,ch_cumprida_dividida,tempo_relativo,status,matricula,...,SISTEMAS OPERACIONAIS,PROJETO E ADMINISTRAÇÃO DE BANCO DE DADOS,ENGENHARIA DE SOFTWARE II,REDES DE COMPUTADORES,EMPREENDEDORISMO EM INFORMÁTICA,GESTÃO DE PROJETO DE SOFTWARE,PROGRAMAÇÃO VISUAL,MATEMÁTICA FINANCEIRA,SISTEMAS DE APOIO À DECISÃO,ÉTICA
0,001cea3c82e2010681f2cdeab21e5ecf,2018,20181,1,0.125,330,0.180328,-0.180328,-1.0,5.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,005c14d7c07bf7980b60c703f99c5ee7,2018,20221,10,1.250,2340,1.278689,-1.278689,-1.0,36.0,...,1.0,2.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,1.0
2,0107fd69d8cd7e3d30dede96fb68bfe5,2011,20121,3,0.375,870,0.475410,-0.475410,-1.0,13.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,014789363f7940922e71e710ee9d22bc,2016,20206,11,1.375,2430,1.327869,1.327869,1.0,38.0,...,1.0,1.0,1.0,2.0,1.0,1.0,1.0,2.0,1.0,1.0
4,014f0dec46fe7a9c5836527662e1df10,2020,20206,3,0.375,630,0.344262,-0.344262,-1.0,10.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
677,fe802d8d85de6f842749468401d1146c,2022,20222,2,0.250,540,0.295082,-0.295082,-1.0,8.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
678,fe87dfa176a74fc10a5cb701b9fb5dd4,2016,20206,3,0.375,420,0.229508,0.229508,1.0,7.0,...,1.0,0.0,2.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
679,fec9ed6026d55ecdf514c640312c3d08,2020,20222,7,0.875,1470,0.803279,-0.803279,-1.0,23.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
680,ff56f2c5048dae0797fd3e851572b80c,2014,20192,12,1.500,3390,1.852459,1.852459,1.0,52.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


Para salvar o DataFrame no formato CSV com a separação por ';' e garantir que os dados numéricos estejam no formato correto.

In [1948]:
tabela_final.to_csv('tabela_final.csv', index=False, sep=';', quoting=csv.QUOTE_NONNUMERIC)