## Convert Pandas DataFrames to NumPy arrays or Dictionaries

When working with Pandas DataFrames, you'll oftentimes find you want to convert them to NumPy arrays or Python dictionaries. ``The reason why is because certain libraries prefer NumPy rays or Python dictionaries as inputs to their methods as opposed to Pandas DataFrames.``

In [1]:
# Import libraries
import pandas as pd
import numpy as np

In [2]:
# Load Excel File
filename = 'car_financing_missing.xlsx'
df = pd.read_excel(filename)

We'll work with the car loan dataset again, and we're going to look at the first five rows. There are two ways to convert Pandas DataFrames to NumPy arrays. 


In [3]:
df.head()

Unnamed: 0,month,starting_balance,interest_paid,principal_paid,new_balance,interest_rate,car_type
0,1,34689.96,202.93,484.3,34205.66,0.0702,Toyota Sienna
1,2,34205.66,200.1,487.13,33718.53,0.0702,Toyota Sienna
2,3,33718.53,197.25,489.98,33228.55,0.0702,Toyota Sienna
3,4,33228.55,194.38,492.85,32735.7,0.0702,Toyota Sienna
4,5,32735.7,191.5,495.73,32239.97,0.0702,Toyota Sienna


### Convert Pandas DataFrames to NumPy Arrays

The first approach is to use the two underscore NumPy method, and what this outputs is a NumPy array. 


In [4]:
# Approach 1
df.to_numpy()

array([[1, 34689.96, 202.93, 484.3, 34205.66, 0.0702, 'Toyota Sienna'],
       [2, 34205.66, 200.1, 487.13, 33718.53, 0.0702, 'Toyota Sienna'],
       [3, 33718.53, 197.25, 489.98, 33228.55, 0.0702, 'Toyota Sienna'],
       [4, 33228.55, 194.38, 492.85, 32735.7, 0.0702, 'Toyota Sienna'],
       [5, 32735.7, 191.5, 495.73, 32239.97, 0.0702, 'Toyota Sienna'],
       [6, 32239.97, 188.6, 498.63, 31741.34, 0.0702, 'Toyota Sienna'],
       [7, 31741.34, 185.68, 501.55, 31239.79, 0.0702, 'Toyota Sienna'],
       [8, 31239.79, 182.75, 504.48, 30735.31, 0.0702, 'Toyota Sienna'],
       [9, 30735.31, 179.8, 507.43, 30227.88, 0.0702, 'Toyota Sienna'],
       [10, 30227.88, 176.83, 510.4, 29717.48, 0.0702, 'Toyota Sienna'],
       [11, 29717.48, 173.84, 513.39, 29204.09, 0.0702, 'Toyota Sienna'],
       [12, 29204.09, 170.84, 516.39, 28687.7, 0.0702, 'Toyota Sienna'],
       [13, 28687.7, 167.82, 519.41, 28168.29, 0.0702, 'Toyota Sienna'],
       [14, 28168.29, 164.78, 522.45, 27645.84, 0.0702, '

The second approach is to use the values attribute, and this also produces a NumPy array. I should note that either of these approaches works just as well as the other. 

In [5]:
# Approach 2
df.values

array([[1, 34689.96, 202.93, 484.3, 34205.66, 0.0702, 'Toyota Sienna'],
       [2, 34205.66, 200.1, 487.13, 33718.53, 0.0702, 'Toyota Sienna'],
       [3, 33718.53, 197.25, 489.98, 33228.55, 0.0702, 'Toyota Sienna'],
       [4, 33228.55, 194.38, 492.85, 32735.7, 0.0702, 'Toyota Sienna'],
       [5, 32735.7, 191.5, 495.73, 32239.97, 0.0702, 'Toyota Sienna'],
       [6, 32239.97, 188.6, 498.63, 31741.34, 0.0702, 'Toyota Sienna'],
       [7, 31741.34, 185.68, 501.55, 31239.79, 0.0702, 'Toyota Sienna'],
       [8, 31239.79, 182.75, 504.48, 30735.31, 0.0702, 'Toyota Sienna'],
       [9, 30735.31, 179.8, 507.43, 30227.88, 0.0702, 'Toyota Sienna'],
       [10, 30227.88, 176.83, 510.4, 29717.48, 0.0702, 'Toyota Sienna'],
       [11, 29717.48, 173.84, 513.39, 29204.09, 0.0702, 'Toyota Sienna'],
       [12, 29204.09, 170.84, 516.39, 28687.7, 0.0702, 'Toyota Sienna'],
       [13, 28687.7, 167.82, 519.41, 28168.29, 0.0702, 'Toyota Sienna'],
       [14, 28168.29, 164.78, 522.45, 27645.84, 0.0702, '

### Convert Pandas DataFrames to Dictionaries

You can also convert Pandas DataFrames to Python dictionaries. You can do this by using the two underscore dict method, and the reason why you'd want to do this versus convert your Pandas DataFrame to a NumPy array is oftentimes you're interested in preserving the indices of your Pandas DataFrame. 


In [6]:
df.to_dict()

{'month': {0: 1,
  1: 2,
  2: 3,
  3: 4,
  4: 5,
  5: 6,
  6: 7,
  7: 8,
  8: 9,
  9: 10,
  10: 11,
  11: 12,
  12: 13,
  13: 14,
  14: 15,
  15: 16,
  16: 17,
  17: 18,
  18: 19,
  19: 20,
  20: 21,
  21: 22,
  22: 23,
  23: 24,
  24: 25,
  25: 26,
  26: 27,
  27: 28,
  28: 29,
  29: 30,
  30: 31,
  31: 32,
  32: 33,
  33: 34,
  34: 35,
  35: 36,
  36: 37,
  37: 38,
  38: 39,
  39: 40,
  40: 41,
  41: 42,
  42: 43,
  43: 44,
  44: 45,
  45: 46,
  46: 47,
  47: 48,
  48: 49,
  49: 50,
  50: 51,
  51: 52,
  52: 53,
  53: 54,
  54: 55,
  55: 56,
  56: 57,
  57: 58,
  58: 59,
  59: 60},
 'starting_balance': {0: 34689.96,
  1: 34205.66,
  2: 33718.53,
  3: 33228.55,
  4: 32735.7,
  5: 32239.97,
  6: 31741.34,
  7: 31239.79,
  8: 30735.31,
  9: 30227.88,
  10: 29717.48,
  11: 29204.09,
  12: 28687.7,
  13: 28168.29,
  14: 27645.84,
  15: 27120.33,
  16: 26591.75,
  17: 26060.08,
  18: 25525.3,
  19: 24987.39,
  20: 24446.33,
  21: 23902.11,
  22: 23354.7,
  23: 22804.09,
  24: 22250.26,
  2

``The practicality of this is that sometimes certain libraries don't accept Pandas DataFrames as inputs to their methods.``
