# Create sample data

``In order to create visualizations, you first need to have data to work with.``

So in this video I'll teach you how to create sample data in the form of a car loan table. So what we have here is a car loan of $34,690 with a 7.02% interest rate over 60 months. There are multiple approaches to structure your data and to a form acceptable by Pandas DataFrame. 


**Nested list Approcah**

One approach is by using ``a nested list`` where in the first row, we have the first month of our car payment. So we have one for the month, we have our starting balance, we have a repayment, which is $687.23. We have how much of that repayment is going toward interest. We have how much of that payment is going toward the principal of the loan. And then after a payment we have how much is our new balance? We have the terminal loan, which is how many months we're going to be paying this loan for in total. We have the interest rate for the loan and we also have the car type. In this case it's a Toyota Sienna. And then we also have the column names, which are what I just told you before in the form of a Python list where we have the month, the starting balance, the repayment, et cetera. And then once we have the data in a nice clean format, we're going to press shift enter to run the cell. 


In [1]:
# Import libraries
import pandas as pd
import numpy as np

In [2]:
# Approach 1 List
carLoans = [[1, 34689.96, 687.23, 202.93, 484.3, 34205.66, 60, 0.0702,'Toyota Sienna'],
           [2, 34205.66, 687.23, 200.1, 487.13, 33718.53, 60, 0.0702,'Toyota Sienna'],
           [3, 33718.53, 687.23, 197.25, 489.98, 33228.55, 60, 0.0702,'Toyota Sienna'],
           [4, 33228.55, 687.23, 194.38, 492.85, 32735.7, 60, 0.0702,'Toyota Sienna'],
           [5, 32735.7, 687.23, 191.5, 495.73, 32239.97, 60, 0.0702,'Toyota Sienna']]

colNames = ['Month',
            'Starting Balance',
            'Repayment',
            'Interest Paid',
            'Principal Paid',
            'New Balance',
            'term',
            'interest_rate',
            'car_type']

From here we're going to use the PD.DataFrame method and we're going to insert our carLoans variable into the data parameter and our colNames variable into the columns parameter. And then we're going to press shift enter to initialize our DataFrame. 

In [4]:
df = pd.DataFrame(data = carLoans, columns=colNames)

And from here we're going to press shift enter again. And we have our data in a nice clean format. And obviously this is more readable than a nested list, but there's also other ways to do this as well.

In [5]:
df

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
0,1,34689.96,687.23,202.93,484.3,34205.66,60,0.0702,Toyota Sienna
1,2,34205.66,687.23,200.1,487.13,33718.53,60,0.0702,Toyota Sienna
2,3,33718.53,687.23,197.25,489.98,33228.55,60,0.0702,Toyota Sienna
3,4,33228.55,687.23,194.38,492.85,32735.7,60,0.0702,Toyota Sienna
4,5,32735.7,687.23,191.5,495.73,32239.97,60,0.0702,Toyota Sienna


**NumPy array Approach**

We could have also structured our data in the form of ``a NumPy array``. And this looks very similar to a nested list, except for a NumPy array is more efficient than a Python list. And similarly, we have the same colNames as before. So now that we have the data in a format that we like, we're going to press shift enter. And just as before, we're going to have our carLoans variable assigned to the data parameter and our colNames variable assigned to our columns parameter.


In [6]:
# Approach 2 NumPy Array
carLoans = np.array([
                  [1, 34689.96, 687.23, 202.93, 484.3, 34205.66, 60, 0.0702,'Toyota Sienna'],
                  [2, 34205.66, 687.23, 200.1, 487.13, 33718.53, 60, 0.0702,'Toyota Sienna'],
                  [3, 33718.53, 687.23, 197.25, 489.98, 33228.55, 60, 0.0702,'Toyota Sienna'],
                  [4, 33228.55, 687.23, 194.38, 492.85, 32735.7, 60, 0.0702,'Toyota Sienna'],
                  [5, 32735.7, 687.23, 191.5, 495.73, 32239.97, 60, 0.0702,'Toyota Sienna']
                 ])
   
colNames = ['Month',
            'Starting Balance',
            'Repayment',
            'Interest Paid',
            'Principal Paid',
            'New Balance',
            'term',
            'interest_rate',
            'car_type']

We're then going to press shift plus enter to initialize our dataframe. And as before, we have our data in a nice clean format, which is obviously easier to look at than an NumPy array. 

In [7]:
df = pd.DataFrame(data = carLoans, columns=colNames)
df 

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
0,1,34689.96,687.23,202.93,484.3,34205.66,60,0.0702,Toyota Sienna
1,2,34205.66,687.23,200.1,487.13,33718.53,60,0.0702,Toyota Sienna
2,3,33718.53,687.23,197.25,489.98,33228.55,60,0.0702,Toyota Sienna
3,4,33228.55,687.23,194.38,492.85,32735.7,60,0.0702,Toyota Sienna
4,5,32735.7,687.23,191.5,495.73,32239.97,60,0.0702,Toyota Sienna


**Python dictionary Approach**

The third approach is to use a ``Python dictionary``. And while this may not seem like as clean for a format to structure your data, there are times when you're already going to have your data in a Python dictionary. It'll just be easier to use that Python dictionary to then initialize the panda's DataFrame. So we have our carLoans data, and like before, we're going to press shift plus enter to initialize our carLoans variable. 

In [8]:
# Approach 3 Python Dictionary
carLoans = {'Month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5},
             'Starting Balance': {0: 34689.96,1: 34205.66,2: 33718.53,3: 33228.55,4: 32735.7},
             'Repayment': {0: 687.23, 1: 687.23, 2: 687.23, 3: 687.23, 4: 687.23},
             'Interest Paid': {0: 202.93, 1: 200.1, 2: 197.25, 3: 194.38, 4: 191.5},
             'Principal Paid': {0: 484.3, 1: 487.13, 2: 489.98, 3: 492.85, 4: 495.73},
             'New Balance': {0: 34205.66,1: 33718.53,2: 33228.55,3: 32735.7,4: 32239.97},
             'term': {0: 60, 1: 60, 2: 60, 3: 60, 4: 60},
             'interest_rate': {0: 0.0702, 1: 0.0702, 2: 0.0702, 3: 0.0702, 4: 0.0702},
             'car_type': {0: 'Toyota Sienna',1: 'Toyota Sienna',2: 'Toyota Sienna',3: 'Toyota Sienna',4: 'Toyota Sienna'}}

And from here we're initialize our Pandas DataFrame by having our carLoans variable assigned to our data parameter. And our colNames variable assigned to our columns parameter. And then we're going to press shift plus enter to initialize our dataframe. And like before you can see that we have our payment table, which is clearly an easier to look at format than our Python dictionary. 

In [9]:
df = pd.DataFrame(data = carLoans, columns=colNames)
df

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
0,1,34689.96,687.23,202.93,484.3,34205.66,60,0.0702,Toyota Sienna
1,2,34205.66,687.23,200.1,487.13,33718.53,60,0.0702,Toyota Sienna
2,3,33718.53,687.23,197.25,489.98,33228.55,60,0.0702,Toyota Sienna
3,4,33228.55,687.23,194.38,492.85,32735.7,60,0.0702,Toyota Sienna
4,5,32735.7,687.23,191.5,495.73,32239.97,60,0.0702,Toyota Sienna


**Limitation of these Approaches**

So as you can see, building tables like these aren't all that difficult. However, there are some limitations of these approaches that I'd like you to be aware of. For example, if you have a larger data set, like the entire payment table for a particular car loan, for example, like what I'm showing here, it would be painfully slow to type this out and get it correctly, for one, and two, it'd be very memory intensive. And now that I've painfully typed out this payment table, we're going to run this cell by doing shift plus enter. And as before, we're going to initialize our DataFrame and voila, we have our entire payment table. This is a very tedious approach and I do not recommend it. I should note that if you don't know what a command does, for example PD.dataframe, you can always use the inbuilt Python function help to find out what the parameters accept as valid, and that's it. In order to create data visualizations, you need to have data and now you know how to create it.

In [10]:
# this is painfully slow to type
carLoans = [
            [1, 34689.96, 687.23, 202.93, 484.3, 34205.66, 60, 0.0702, 'Toyota Sienna'],
            [2, 34205.66, 687.23, 200.1, 487.13, 33718.53, 60, 0.0702, 'Toyota Sienna'],
            [3, 33718.53, 687.23, 197.25, 489.98, 33228.55, 60, 0.0702, 'Toyota Sienna'],
            [4, 33228.55, 687.23, 194.38, 492.85, 32735.7, 60, 0.0702, 'Toyota Sienna'],
            [5, 32735.7, 687.23, 191.5, 495.73, 32239.97, 60, 0.0702, 'Toyota Sienna'],
            [6, 32239.97, 687.23, 188.6, 498.63, 31741.34, 60, 0.0702, 'Toyota Sienna'],
            [7, 31741.34, 687.23, 185.68, 501.55, 31239.79, 60, 0.0702, 'Toyota Sienna'],
            [8, 31239.79, 687.23, 182.75, 504.48, 30735.31, 60, 0.0702, 'Toyota Sienna'],
            [9, 30735.31, 687.23, 179.8, 507.43, 30227.88, 60, 0.0702, 'Toyota Sienna'],
            [10, 30227.88, 687.23, 176.83, 510.4, 29717.48, 60, 0.0702, 'Toyota Sienna'],
            [11, 29717.48, 687.23, 173.84, 513.39, 29204.09, 60, 0.0702, 'Toyota Sienna'],
            [12, 29204.09, 687.23, 170.84, 516.39, 28687.7, 60, 0.0702, 'Toyota Sienna'],
            [13, 28687.7, 687.23, 167.82, 519.41, 28168.29, 60, 0.0702, 'Toyota Sienna'],
            [14, 28168.29, 687.23, 164.78, 522.45, 27645.84, 60, 0.0702, 'Toyota Sienna'],
            [15, 27645.84, 687.23, 161.72, 525.51, 27120.33, 60, 0.0702, 'Toyota Sienna'],
            [16, 27120.33, 687.23, 158.65, 528.58, 26591.75, 60, 0.0702, 'Toyota Sienna'],
            [17, 26591.75, 687.23, 155.56, 531.67, 26060.08, 60, 0.0702, 'Toyota Sienna'],
            [18, 26060.08, 687.23, 152.45, 534.78, 25525.3, 60, 0.0702, 'Toyota Sienna'],
            [19, 25525.3, 687.23, 149.32, 537.91, 24987.39, 60, 0.0702, 'Toyota Sienna'],
            [20, 24987.39, 687.23, 146.17, 541.06, 24446.33, 60, 0.0702, 'Toyota Sienna'],
            [21, 24446.33, 687.23, 143.01, 544.22, 23902.11, 60, 0.0702, 'Toyota Sienna'],
            [22, 23902.11, 687.23, 139.82, 547.41, 23354.7, 60, 0.0702, 'Toyota Sienna'],
            [23, 23354.7, 687.23, 136.62, 550.61, 22804.09, 60, 0.0702, 'Toyota Sienna'],
            [24, 22804.09, 687.23, 133.4, 553.83, 22250.26, 60, 0.0702, 'Toyota Sienna'],
            [25, 22250.26, 687.23, 130.16, 557.07, 21693.19, 60, 0.0702, 'Toyota Sienna'],
            [26, 21693.19, 687.23, 126.9, 560.33, 21132.86, 60, 0.0702, 'Toyota Sienna'],
            [27, 21132.86, 687.23, 123.62, 563.61, 20569.25, 60, 0.0702, 'Toyota Sienna'],
            [28, 20569.25, 687.23, 120.33, 566.9, 20002.35, 60, 0.0702, 'Toyota Sienna'],
            [29, 20002.35, 687.23, 117.01, 570.22, 19432.13, 60, 0.0702, 'Toyota Sienna'],
            [30, 19432.13, 687.23, 113.67, 573.56, 18858.57, 60, 0.0702, 'Toyota Sienna'],
            [31, 18858.57, 687.23, 110.32, 576.91, 18281.66, 60, 0.0702, 'Toyota Sienna'],
            [32, 18281.66, 687.23, 106.94, 580.29, 17701.37, 60, 0.0702, 'Toyota Sienna'],
            [33, 17701.37, 687.23, 103.55, 583.68, 17117.69, 60, 0.0702, 'Toyota Sienna'],
            [34, 17117.69, 687.23, 100.13, 587.1, 16530.59, 60, 0.0702, 'Toyota Sienna'],
            [35, 16530.59, 687.23, 96.7, 590.53, 15940.06, 60, 0.0702, 'Toyota Sienna'],
            [36, 15940.06, 687.23, 93.24, 593.99, 15346.07, 60, 0.0702, 'Toyota Sienna'],
            [37, 15346.07, 687.23, 89.77, 597.46, 14748.61, 60, 0.0702, 'Toyota Sienna'],
            [38, 14748.61, 687.23, 86.27, 600.96, 14147.65, 60, 0.0702, 'Toyota Sienna'],
            [39, 14147.65, 687.23, 82.76, 604.47, 13543.18, 60, 0.0702, 'Toyota Sienna'],
            [40, 13543.18, 687.23, 79.22, 608.01, 12935.17, 60, 0.0702, 'Toyota Sienna'],
            [41, 12935.17, 687.23, 75.67, 611.56, 12323.61, 60, 0.0702, 'Toyota Sienna'],
            [42, 12323.61, 687.23, 72.09, 615.14, 11708.47, 60, 0.0702, 'Toyota Sienna'],
            [43, 11708.47, 687.23, 68.49, 618.74, 11089.73, 60, 0.0702, 'Toyota Sienna'],
            [44, 11089.73, 687.23, 64.87, 622.36, 10467.37, 60, 0.0702, 'Toyota Sienna'],
            [45, 10467.37, 687.23, 61.23, 626.0, 9841.37, 60, 0.0702, 'Toyota Sienna'],
            [46, 9841.37, 687.23, 57.57, 629.66, 9211.71, 60, 0.0702, 'Toyota Sienna'],
            [47, 9211.71, 687.23, 53.88, 633.35, 8578.36, 60, 0.0702, 'Toyota Sienna'],
            [48, 8578.36, 687.23, 50.18, 637.05, 7941.31, 60, 0.0702, 'Toyota Sienna'],
            [49, 7941.31, 687.23, 46.45, 640.78, 7300.53, 60, 0.0702, 'Toyota Sienna'],
            [50, 7300.53, 687.23, 42.7, 644.53, 6656.0, 60, 0.0702, 'Toyota Sienna'],
            [51, 6656.0, 687.23, 38.93, 648.3, 6007.7, 60, 0.0702, 'Toyota Sienna'],
            [52, 6007.7, 687.23, 35.14, 652.09, 5355.61, 60, 0.0702, 'Toyota Sienna'],
            [53, 5355.61, 687.23, 31.33, 655.9, 4699.71, 60, 0.0702, 'Toyota Sienna'],
            [54, 4699.71, 687.23, 27.49, 659.74, 4039.97, 60, 0.0702, 'Toyota Sienna'],
            [55, 4039.97, 687.23, 23.63, 663.6, 3376.37, 60, 0.0702, 'Toyota Sienna'],
            [56, 3376.37, 687.23, 19.75, 667.48, 2708.89, 60, 0.0702, 'Toyota Sienna'],
            [57, 2708.89, 687.23, 15.84, 671.39, 2037.5, 60, 0.0702, 'Toyota Sienna'],
            [58, 2037.5, 687.23, 11.91, 675.32, 1362.18, 60, 0.0702, 'Toyota Sienna'],
            [59, 1362.18, 687.23, 7.96, 679.27, 682.91, 60, 0.0702, 'Toyota Sienna'],
            [60, 682.91, 687.23, 3.99, 683.24, -0.33, 60, 0.0702, 'Toyota Sienna']
            ]

colNames = ['Month',
            'Starting Balance',
            'Repayment',
            'Interest Paid',
            'Principal Paid',
            'New Balance',
            'term',
            'interest_rate',
            'car_type']

df = pd.DataFrame(data = carLoans, columns=colNames)
df

Unnamed: 0,Month,Starting Balance,Repayment,Interest Paid,Principal Paid,New Balance,term,interest_rate,car_type
0,1,34689.96,687.23,202.93,484.3,34205.66,60,0.0702,Toyota Sienna
1,2,34205.66,687.23,200.1,487.13,33718.53,60,0.0702,Toyota Sienna
2,3,33718.53,687.23,197.25,489.98,33228.55,60,0.0702,Toyota Sienna
3,4,33228.55,687.23,194.38,492.85,32735.7,60,0.0702,Toyota Sienna
4,5,32735.7,687.23,191.5,495.73,32239.97,60,0.0702,Toyota Sienna
5,6,32239.97,687.23,188.6,498.63,31741.34,60,0.0702,Toyota Sienna
6,7,31741.34,687.23,185.68,501.55,31239.79,60,0.0702,Toyota Sienna
7,8,31239.79,687.23,182.75,504.48,30735.31,60,0.0702,Toyota Sienna
8,9,30735.31,687.23,179.8,507.43,30227.88,60,0.0702,Toyota Sienna
9,10,30227.88,687.23,176.83,510.4,29717.48,60,0.0702,Toyota Sienna
