# Tratamento e Manipulação dos dados

As etapas deste notebook têm por objetivo garantir a integridade dos dados necessários para os cálculos da receita obtida em diferentes tipos de transações realizadas pelo PicPay, além de gerar arquivos em CSV com os resultados obtidos.

As tarefas realizadas foram divididas em:
1. Importação e inspeção dos dados
2. Cálculos da tabela "transactions"
3. Cálculos da tabela "transactions_installments"
4. Criação e exportação do arquivo final

## 1. Importação e inspeção dos dados

### Importação de Biblioteca

In [48]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

### Importação dos Dados

In [49]:
transactions = pd.read_csv('transactions.csv', sep=';')

In [50]:
transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate
0,1,03/01/2021,P2P,400,600,Credit card,12,1.99,2.99,3.49
1,2,14/09/2021,BILLS,650,300,Credit card,5,1.99,2.99,3.49
2,3,20/07/2021,BILLS,1200,0,Credit card,8,1.99,2.99,3.49
3,4,06/08/2021,P2P,350,800,Credit card,9,1.99,2.99,3.49
4,5,13/04/2021,P2P,3500,0,Credit card,10,1.99,2.99,3.49
5,6,24/05/2021,P2P,3420,0,Credit card,7,1.99,2.99,3.49
6,7,31/03/2021,BILLS,5000,0,Credit card,12,1.99,2.99,3.49
7,8,30/08/2021,P2P,2800,0,Credit card,12,1.99,2.99,3.49
8,9,28/08/2021,BILLS,6000,0,Credit card,4,1.99,2.99,3.49
9,10,27/09/2021,P2P,8000,0,Credit card,6,1.99,2.99,3.49


In [51]:
transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   transaction_id          10 non-null     int64  
 1   transaction_date        10 non-null     object 
 2   transaction_type        10 non-null     object 
 3   transaction_value       10 non-null     int64  
 4   receiver_used_cc_limit  10 non-null     int64  
 5   payment_method          10 non-null     object 
 6   installments            10 non-null     int64  
 7   p2p_surcharge_rate      10 non-null     float64
 8   bills_surcharge_rate    10 non-null     float64
 9   installment_rate        10 non-null     float64
dtypes: float64(3), int64(4), object(3)
memory usage: 932.0+ bytes


In [52]:
transactions['transaction_date'] = pd.to_datetime(transactions['transaction_date'], dayfirst=True)

In [53]:
transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate
0,1,2021-01-03,P2P,400,600,Credit card,12,1.99,2.99,3.49
1,2,2021-09-14,BILLS,650,300,Credit card,5,1.99,2.99,3.49
2,3,2021-07-20,BILLS,1200,0,Credit card,8,1.99,2.99,3.49
3,4,2021-08-06,P2P,350,800,Credit card,9,1.99,2.99,3.49
4,5,2021-04-13,P2P,3500,0,Credit card,10,1.99,2.99,3.49
5,6,2021-05-24,P2P,3420,0,Credit card,7,1.99,2.99,3.49
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1.99,2.99,3.49
7,8,2021-08-30,P2P,2800,0,Credit card,12,1.99,2.99,3.49
8,9,2021-08-28,BILLS,6000,0,Credit card,4,1.99,2.99,3.49
9,10,2021-09-27,P2P,8000,0,Credit card,6,1.99,2.99,3.49


In [54]:
transactions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   transaction_id          10 non-null     int64         
 1   transaction_date        10 non-null     datetime64[ns]
 2   transaction_type        10 non-null     object        
 3   transaction_value       10 non-null     int64         
 4   receiver_used_cc_limit  10 non-null     int64         
 5   payment_method          10 non-null     object        
 6   installments            10 non-null     int64         
 7   p2p_surcharge_rate      10 non-null     float64       
 8   bills_surcharge_rate    10 non-null     float64       
 9   installment_rate        10 non-null     float64       
dtypes: datetime64[ns](1), float64(3), int64(4), object(2)
memory usage: 932.0+ bytes


## 2. Cálculos da tabela "transactions"

Os cálculos realizados serão de:

| Campo                          | Descrição                                                                 |
|-------------------------------|---------------------------------------------------------------------------|
| p2p_surcharge               | Valor da taxa adicional para transações P2P                               |
| bill_surcharge             | Valor da taxa adicional para transações BILLS                             |
| surcharged_transaction_value | Valor total da transação incluindo taxas P2P e BILLS                      |
| installment_fee            | Valor total do juros a ser cobrado pelo parcelamento (valor total, não por parcela) |
| individual_installment     | Valor de cada parcela a ser paga pelo usuário que fez um pagamento parcelado com cartão de crédito |


In [55]:
transactions['p2p_surcharge'] = np.where(
    (
        (transactions['transaction_type'] == 'P2P') & 
        (transactions['transaction_value'] + transactions['receiver_used_cc_limit'] > 800) &
        (transactions['payment_method'] == 'Credit card')
    ),
    (transactions['transaction_value'] + transactions['receiver_used_cc_limit'] - 800) * transactions['p2p_surcharge_rate']/100,
    0
)

transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate,p2p_surcharge
0,1,2021-01-03,P2P,400,600,Credit card,12,1.99,2.99,3.49,3.98
1,2,2021-09-14,BILLS,650,300,Credit card,5,1.99,2.99,3.49,0.0
2,3,2021-07-20,BILLS,1200,0,Credit card,8,1.99,2.99,3.49,0.0
3,4,2021-08-06,P2P,350,800,Credit card,9,1.99,2.99,3.49,6.965
4,5,2021-04-13,P2P,3500,0,Credit card,10,1.99,2.99,3.49,53.73
5,6,2021-05-24,P2P,3420,0,Credit card,7,1.99,2.99,3.49,52.138
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1.99,2.99,3.49,0.0
7,8,2021-08-30,P2P,2800,0,Credit card,12,1.99,2.99,3.49,39.8
8,9,2021-08-28,BILLS,6000,0,Credit card,4,1.99,2.99,3.49,0.0
9,10,2021-09-27,P2P,8000,0,Credit card,6,1.99,2.99,3.49,143.28


In [56]:
transactions['bills_surcharge'] = np.where(
    (
        (transactions['transaction_type'] == 'BILLS') &
        (transactions['payment_method'] == 'Credit card')
    ),
    (transactions['transaction_value'] * transactions['bills_surcharge_rate'])/100,
    0
)

transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate,p2p_surcharge,bills_surcharge
0,1,2021-01-03,P2P,400,600,Credit card,12,1.99,2.99,3.49,3.98,0.0
1,2,2021-09-14,BILLS,650,300,Credit card,5,1.99,2.99,3.49,0.0,19.435
2,3,2021-07-20,BILLS,1200,0,Credit card,8,1.99,2.99,3.49,0.0,35.88
3,4,2021-08-06,P2P,350,800,Credit card,9,1.99,2.99,3.49,6.965,0.0
4,5,2021-04-13,P2P,3500,0,Credit card,10,1.99,2.99,3.49,53.73,0.0
5,6,2021-05-24,P2P,3420,0,Credit card,7,1.99,2.99,3.49,52.138,0.0
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1.99,2.99,3.49,0.0,149.5
7,8,2021-08-30,P2P,2800,0,Credit card,12,1.99,2.99,3.49,39.8,0.0
8,9,2021-08-28,BILLS,6000,0,Credit card,4,1.99,2.99,3.49,0.0,179.4
9,10,2021-09-27,P2P,8000,0,Credit card,6,1.99,2.99,3.49,143.28,0.0


In [57]:
transactions['surcharged_transaction_value'] = transactions['transaction_value'] + transactions['p2p_surcharge'] + transactions['bills_surcharge']

transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate,p2p_surcharge,bills_surcharge,surcharged_transaction_value
0,1,2021-01-03,P2P,400,600,Credit card,12,1.99,2.99,3.49,3.98,0.0,403.98
1,2,2021-09-14,BILLS,650,300,Credit card,5,1.99,2.99,3.49,0.0,19.435,669.435
2,3,2021-07-20,BILLS,1200,0,Credit card,8,1.99,2.99,3.49,0.0,35.88,1235.88
3,4,2021-08-06,P2P,350,800,Credit card,9,1.99,2.99,3.49,6.965,0.0,356.965
4,5,2021-04-13,P2P,3500,0,Credit card,10,1.99,2.99,3.49,53.73,0.0,3553.73
5,6,2021-05-24,P2P,3420,0,Credit card,7,1.99,2.99,3.49,52.138,0.0,3472.138
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1.99,2.99,3.49,0.0,149.5,5149.5
7,8,2021-08-30,P2P,2800,0,Credit card,12,1.99,2.99,3.49,39.8,0.0,2839.8
8,9,2021-08-28,BILLS,6000,0,Credit card,4,1.99,2.99,3.49,0.0,179.4,6179.4
9,10,2021-09-27,P2P,8000,0,Credit card,6,1.99,2.99,3.49,143.28,0.0,8143.28


In [58]:
s0 = transactions['surcharged_transaction_value']
j = transactions['installment_rate']/100
n = transactions['installments']

transactions['individual_installment'] = np.where(
    (
        (transactions['transaction_type'] == 'BILLS') &
        (transactions['payment_method'] == 'Credit card') &
        (transactions['installments'] > 1) 
    ),
    (s0 * (j/(1 - (1 + j)**(-n)))),
    0
)

transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate,p2p_surcharge,bills_surcharge,surcharged_transaction_value,individual_installment
0,1,2021-01-03,P2P,400,600,Credit card,12,1.99,2.99,3.49,3.98,0.0,403.98,0.0
1,2,2021-09-14,BILLS,650,300,Credit card,5,1.99,2.99,3.49,0.0,19.435,669.435,148.225395
2,3,2021-07-20,BILLS,1200,0,Credit card,8,1.99,2.99,3.49,0.0,35.88,1235.88,179.716649
3,4,2021-08-06,P2P,350,800,Credit card,9,1.99,2.99,3.49,6.965,0.0,356.965,0.0
4,5,2021-04-13,P2P,3500,0,Credit card,10,1.99,2.99,3.49,53.73,0.0,3553.73,0.0
5,6,2021-05-24,P2P,3420,0,Credit card,7,1.99,2.99,3.49,52.138,0.0,3472.138,0.0
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1.99,2.99,3.49,0.0,149.5,5149.5,532.577026
7,8,2021-08-30,P2P,2800,0,Credit card,12,1.99,2.99,3.49,39.8,0.0,2839.8,0.0
8,9,2021-08-28,BILLS,6000,0,Credit card,4,1.99,2.99,3.49,0.0,179.4,6179.4,1681.949333
9,10,2021-09-27,P2P,8000,0,Credit card,6,1.99,2.99,3.49,143.28,0.0,8143.28,0.0


In [59]:
transactions['installment_fee'] = np.where(
    (
        (transactions['transaction_type'] == 'BILLS') & 
        (transactions['payment_method'] == 'Credit card') &
        (transactions['installments'] > 1) 
    ),
    transactions['individual_installment'] * transactions['installments'] - transactions['surcharged_transaction_value'],
    0
)
transactions

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate,p2p_surcharge,bills_surcharge,surcharged_transaction_value,individual_installment,installment_fee
0,1,2021-01-03,P2P,400,600,Credit card,12,1.99,2.99,3.49,3.98,0.0,403.98,0.0,0.0
1,2,2021-09-14,BILLS,650,300,Credit card,5,1.99,2.99,3.49,0.0,19.435,669.435,148.225395,71.691973
2,3,2021-07-20,BILLS,1200,0,Credit card,8,1.99,2.99,3.49,0.0,35.88,1235.88,179.716649,201.853189
3,4,2021-08-06,P2P,350,800,Credit card,9,1.99,2.99,3.49,6.965,0.0,356.965,0.0,0.0
4,5,2021-04-13,P2P,3500,0,Credit card,10,1.99,2.99,3.49,53.73,0.0,3553.73,0.0,0.0
5,6,2021-05-24,P2P,3420,0,Credit card,7,1.99,2.99,3.49,52.138,0.0,3472.138,0.0,0.0
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1.99,2.99,3.49,0.0,149.5,5149.5,532.577026,1241.424311
7,8,2021-08-30,P2P,2800,0,Credit card,12,1.99,2.99,3.49,39.8,0.0,2839.8,0.0,0.0
8,9,2021-08-28,BILLS,6000,0,Credit card,4,1.99,2.99,3.49,0.0,179.4,6179.4,1681.949333,548.397331
9,10,2021-09-27,P2P,8000,0,Credit card,6,1.99,2.99,3.49,143.28,0.0,8143.28,0.0,0.0


In [62]:
transactions_installments = transactions.copy()
transactions_installments = transactions_installments.drop(
    [
        'p2p_surcharge_rate', 
        'bills_surcharge_rate', 
        'installment_rate', 
        'p2p_surcharge', 
        'bills_surcharge', 
        'surcharged_transaction_value', 
        'individual_installment'
    ],
    axis='columns'
)
transactions_installments.rename(columns={'receiver_used_cc_limit': 'used_cc_limit'})

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,used_cc_limit,payment_method,installments,installment_fee
0,1,2021-01-03,P2P,400,600,Credit card,12,0.0
1,2,2021-09-14,BILLS,650,300,Credit card,5,71.691973
2,3,2021-07-20,BILLS,1200,0,Credit card,8,201.853189
3,4,2021-08-06,P2P,350,800,Credit card,9,0.0
4,5,2021-04-13,P2P,3500,0,Credit card,10,0.0
5,6,2021-05-24,P2P,3420,0,Credit card,7,0.0
6,7,2021-03-31,BILLS,5000,0,Credit card,12,1241.424311
7,8,2021-08-30,P2P,2800,0,Credit card,12,0.0
8,9,2021-08-28,BILLS,6000,0,Credit card,4,548.397331
9,10,2021-09-27,P2P,8000,0,Credit card,6,0.0


In [63]:
transactions_installments = transactions_installments.loc[transactions_installments.index.repeat(transactions_installments['installments'])].copy()
transactions_installments['installment_number'] = transactions_installments.groupby('transaction_id').cumcount(ascending=False) + 1
transactions_installments['installments'] = transactions_installments['installment_number']

transactions_installments

Unnamed: 0,transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,installment_fee,installment_number
0,1,2021-01-03,P2P,400,600,Credit card,12,0.0,12
0,1,2021-01-03,P2P,400,600,Credit card,11,0.0,11
0,1,2021-01-03,P2P,400,600,Credit card,10,0.0,10
0,1,2021-01-03,P2P,400,600,Credit card,9,0.0,9
0,1,2021-01-03,P2P,400,600,Credit card,8,0.0,8
...,...,...,...,...,...,...,...,...,...
9,10,2021-09-27,P2P,8000,0,Credit card,5,0.0,5
9,10,2021-09-27,P2P,8000,0,Credit card,4,0.0,4
9,10,2021-09-27,P2P,8000,0,Credit card,3,0.0,3
9,10,2021-09-27,P2P,8000,0,Credit card,2,0.0,2
