## <b>NUMERICAL COMPUTING WITH NUMPY</b>

#### INSTALL SOFTWARE AND IMPORT LIBRARIES

In [1]:
import sys
!{sys.executable} -m pip install --user numpy



In [2]:
pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


In [3]:
import numpy as np

#### CREATING VARIABLES

Weights we will use to perform statisitcal analysis.

In [109]:
w1, w2, w3 = 0.3, 0.2, 0.5
weights_ls = [w1, w2, w3]

Several regions with their respective temperature, rainfall, and humidity data as lists. 

In [110]:
kanto = [73, 67, 43]
johto = [91, 88, 64]
hoenn = [87, 134, 58]
sinnoh = [102, 43, 37]
unova = [69, 96, 70]

The function crop_yield calculates the yield of somg crops given the climate data and respective weights.

In [111]:
def crop_yield(region, weights):
    result = 0
    for x, w in zip(region, weights):
        result += x * w
    return result

Explore our data using the function we created.

In [112]:
crop_yield(kanto, weights_ls)

56.8

In [113]:
crop_yield(johto, weights_ls)

76.9

In [114]:
crop_yield(unova, weights_ls)

74.9

##### CONVERTING PYTHON LISTS TO NUMPY ARRAYS
crop_yield essentially performs a "dot product" calculation on two vectors.

NumPy has a built in function that calculates dot products, but in order to use it, we must convert our lists to numpy arrays.

In [115]:
kanto = np.array([73, 67, 43])
kanto

array([73, 67, 43])

In [116]:
weights = np.array([w1, w2, w3])
weights

array([0.3, 0.2, 0.5])

Further exploration shows that numpy arrays have their own data type classification.

In [117]:
type(kanto)

numpy.ndarray

In [118]:
type(weights)

numpy.ndarray

In [119]:
np.dot(kanto, weights)

56.8

In [120]:
help(np.dot)

Help on _ArrayFunctionDispatcher in module numpy:

dot(...)
    dot(a, b, out=None)

    Dot product of two arrays. Specifically,

    - If both `a` and `b` are 1-D arrays, it is inner product of vectors
      (without complex conjugation).

    - If both `a` and `b` are 2-D arrays, it is matrix multiplication,
      but using :func:`matmul` or ``a @ b`` is preferred.

    - If either `a` or `b` is 0-D (scalar), it is equivalent to
      :func:`multiply` and using ``numpy.multiply(a, b)`` or ``a * b`` is
      preferred.

    - If `a` is an N-D array and `b` is a 1-D array, it is a sum product over
      the last axis of `a` and `b`.

    - If `a` is an N-D array and `b` is an M-D array (where ``M>=2``), it is a
      sum product over the last axis of `a` and the second-to-last axis of
      `b`::

        dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])

    It uses an optimized BLAS library when possible (see `numpy.linalg`).

    Parameters
    ----------
    a : array_like
        Fir

Numpy arrays have advantages over Python lists:
1. Ease of Use: makes mathematical expressions more concise
2. Performance: Numpy is written in C++ internally, which is faster than Python

See this example:

In [121]:
# Python lists
arr1 = list(range(1000000))
arr2 = list(range(1000000, 2000000))

# Numpy arrays
arr1_np = np.array(arr1, dtype=np.int64)
arr2_np = np.array(arr2, dtype=np.int64)

In [122]:
%%time
result = 0
for x1, x2 in zip(arr1, arr2):
    result += x1 * x2
result

CPU times: total: 141 ms
Wall time: 138 ms


833332333333500000

In [123]:
%%time
np.dot(arr1_np, arr2_np)

CPU times: total: 0 ns
Wall time: 1 ms


833332333333500000

NOTE: NumPy automatically uses np.int32 dtype arrays. In order to use np.dot() correctly, we must use np.int64 dtype arrays. We can convert the data type using dtype=

If we forget to convert the data type, np.dot() will produce a negative value.

#### MULTIDIMENSIONAL NUMPY ARRAYS
We can represent all of our climate data together in a 2D array.

In [133]:
climate_data = np.array([
    [73, 67, 43],
    [91, 88, 64], 
    [87, 134, 58], 
    [102, 43, 37], 
    [69, 96, 70]
])

We can check the dimension of our multidimensional arrays using .shape

In [134]:
climate_data.shape

(5, 3)

For demonstration purposes, we can also create a 3D array using:

In [135]:
climate_data_3d = np.array([
    [[91, 88, 64], 
    [87, 134, 58]], 

    [[91, 88, 64], 
    [87, 134, 58]],  

    [[91, 88, 64], 
    [87, 134, 58]],
])

In [136]:
climate_data_3d

array([[[ 91,  88,  64],
        [ 87, 134,  58]],

       [[ 91,  88,  64],
        [ 87, 134,  58]],

       [[ 91,  88,  64],
        [ 87, 134,  58]]])

Investigating the shape of this 3D array will show us a tuple which can be understood as "3 tables of 2 rows and 3 columns".

In [137]:
climate_data_3d.shape

(3, 2, 3)

For the sake of efficiency and performance, all elements in a numpy array will have the same data type. We can investigate this using .dtype

In [138]:
climate_data_3d.dtype

dtype('int32')

However, if we create a numpy array with even 1 element that has a mismatching data type, all of the other elements in the array will be converted to match that data type. 

In the below example, we can see that all of the values in this array are integers except the element at (3, 1, 1), which is a float. When we investigate the data type for this array, it will be classified as float64.

In [139]:
climate_data_3d_demo = np.array([
    [[91, 88, 64], 
    [87, 134, 58]], 

    [[91, 88, 64], 
    [87, 134, 58]],  

    [[91.0, 88, 64], 
    [87, 134, 58]],
])

In [140]:
climate_data_3d_demo.dtype

dtype('float64')

Using multidimensional arrays will allow us to calculate crop yields in all the regions at once. 

We can calculate the dot product of our 2D array and our weights using .matmul()

In [141]:
np.matmul(climate_data, weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

Another way to perform this calculation is using the @ operator.

In [142]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

#### WORKING WITH CSV DATA FILES


We can read CSV files into numpy arrays using the "genfromtxt" function.

But first, we retrieve our CSV file from a URL.

In [156]:
import urllib.request

urllib.request.urlretrieve(
    'https://gist.github.com/BirajCoder/a4ffcb76fd6fb221d76ac2ee2b8584e9/raw/4054f90adfd361b7aa4255e99c2e874664094cea/climate.csv', 
    'climate.txt'
)

('climate.txt', <http.client.HTTPMessage at 0x1e2d1e8d2b0>)

In [157]:
extended_climate_data = np.genfromtxt('climate.txt', delimiter= ',', skip_header=1)

extended_climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [158]:
extended_climate_data.shape

(10000, 3)

Now we can find the dot product for our entire dataset.

In [159]:
extended_weighted_yield = extended_climate_data @ weights

extended_weighted_yield

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [154]:
extended_weighted_yield.shape

(10000,)

We can add our results to our original matrix using np.concatenate()

In [167]:
final_results = np.concatenate((extended_climate_data, extended_weighted_yield.reshape(10000, 1)), axis=1)

Notice we had to use np.reshape() to format the dimensions of our yields array and to then be able to use np.concatenate() properly.

In [171]:
help(np.reshape)

Help on _ArrayFunctionDispatcher in module numpy:

reshape(a, newshape, order='C')
    Gives a new shape to an array without changing its data.

    Parameters
    ----------
    a : array_like
        Array to be reshaped.
    newshape : int or tuple of ints
        The new shape should be compatible with the original shape. If
        an integer, then the result will be a 1-D array of that length.
        One shape dimension can be -1. In this case, the value is
        inferred from the length of the array and remaining dimensions.
    order : {'C', 'F', 'A'}, optional
        Read the elements of `a` using this index order, and place the
        elements into the reshaped array using this index order.  'C'
        means to read / write the elements using C-like index order,
        with the last axis index changing fastest, back to the first
        axis index changing slowest. 'F' means to read / write the
        elements using Fortran-like index order, with the first index
     

Also notice that axis is set to 1. This attribute ensures that we are working along the y axis of our matrix (columns). If we set our axis to 0, this would ensure that we're working along the x axis (rows).

In [172]:
help(np.concatenate)

Help on _ArrayFunctionDispatcher in module numpy:

concatenate(...)
    concatenate((a1, a2, ...), axis=0, out=None, dtype=None, casting="same_kind")

    Join a sequence of arrays along an existing axis.

    Parameters
    ----------
    a1, a2, ... : sequence of array_like
        The arrays must have the same shape, except in the dimension
        corresponding to `axis` (the first, by default).
    axis : int, optional
        The axis along which the arrays will be joined.  If axis is None,
        arrays are flattened before use.  Default is 0.
    out : ndarray, optional
        If provided, the destination to place the result. The shape must be
        correct, matching that of what concatenate would have returned if no
        out argument were specified.
    dtype : str or dtype
        If provided, the destination array will have this dtype. Cannot be
        provided together with `out`.

        .. versionadded:: 1.20.0

    casting : {'no', 'equiv', 'safe', 'same_kind', 

In [168]:
final_results.shape

(10000, 4)

Now we can write the results to a new file.

In [169]:
np.savetxt(
    'climate_results.txt',
    final_results,
    fmt='%.2f',
    header='temperature, rainfall, humidity, yields',
    comments=''
)

In [170]:
help(np.savetxt)

Help on _ArrayFunctionDispatcher in module numpy:

savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)
    Save an array to a text file.

    Parameters
    ----------
    fname : filename or file handle
        If the filename ends in ``.gz``, the file is automatically saved in
        compressed gzip format.  `loadtxt` understands gzipped files
        transparently.
    X : 1D or 2D array_like
        Data to be saved to a text file.
    fmt : str or sequence of strs, optional
        A single format (%10.5f), a sequence of formats, or a
        multi-format string, e.g. 'Iteration %d -- %10.5f', in which
        case `delimiter` is ignored. For complex `X`, the legal options
        for `fmt` are:

        * a single specifier, `fmt='%.4e'`, resulting in numbers formatted
          like `' (%s+%sj)' % (fmt, fmt)`
        * a full string specifying every real and imaginary part, e.g.
          `' %.4e %+.4ej %.4e %+.4ej %.4

#### ARITHMETIC/COMPARISON OPERATIONS AND BROADCASTING
As long as 2 arrays have the same dimensions or we are dealing with arrays and scalars, we can perform arithmetic and comparison operations on them.

In [176]:
arr3 = np.array([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10 , 11, 12]
])

arr4 = np.array([
    [1, 1, 1, 1],
    [2, 2, 2, 2],
    [3, 3, 3, 3]
])

In [177]:
arr3 + arr4

array([[ 2,  3,  4,  5],
       [ 7,  8,  9, 10],
       [12, 13, 14, 15]])

In [179]:
arr3 * arr4

array([[ 1,  2,  3,  4],
       [10, 12, 14, 16],
       [27, 30, 33, 36]])

In [178]:
arr3 + 2

array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10],
       [11, 12, 13, 14]])

In [202]:
arr4 >= arr3

array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])

In [207]:
# Return the total number of elements that share the same value
(arr4 == arr3).sum()

1

Numpy arrays support "broadcasting", which allows us to work with arrays of differing dimensions as long as they have compatible shapes. 2 arrays are compatible if they share the same number of elements along at least 1 axis.

In [200]:
arr5 = np.array([
    [1, 1, 1, 1]
])

arr6 = np.array([
    [2],
    [2],
    [2]
])

In [193]:
arr4 + arr5

array([[2, 2, 2, 2],
       [3, 3, 3, 3],
       [4, 4, 4, 4]])

In [194]:
arr4 + arr6

array([[3, 3, 3, 3],
       [4, 4, 4, 4],
       [5, 5, 5, 5]])

In [205]:
arr4 == arr5

array([[ True,  True,  True,  True],
       [False, False, False, False],
       [False, False, False, False]])

In [208]:
# Return the total number of elements that share the same value
(arr4 == arr5).sum()

4

#### ARRAY INDEXING AND SLICING

In [210]:
# Retrieve a single elements (remember that indexing starts at 0)
climate_data_3d_demo[1, 1, 2]

58.0

In [216]:
# Subarray using ranges (remember that index ranges are exclusive)
climate_data_3d_demo[1: , 0:1, :2]

array([[[91., 88.]],

       [[91., 88.]]])

In [217]:
climate_data_3d_demo[1: , 0:1, :2].shape

(2, 1, 2)

In [218]:
# Mixing indices and ranges
climate_data_3d_demo[1: , 1, 2]

array([58., 58.])

In [223]:
# Using less than 3 indexing values
climate_data_3d_demo[0: , 1]

array([[ 87., 134.,  58.],
       [ 87., 134.,  58.],
       [ 87., 134.,  58.]])

#### OTHER NUMPY ARRAYS

In [224]:
# A vector/array of random floats between 0 and 1
np.random.rand(5)

array([0.36628836, 0.94173094, 0.65658286, 0.25311551, 0.32613337])

In [228]:
# A matrix of 0s
np.zeros([3, 2])

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [229]:
# A matrix of 1s
np.ones((2, 2, 3))

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

NOTE: You can use brackets/tuples or parentheses/indices with these methods and still get the same results.

In [232]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [233]:
np.eye(2)

array([[1., 0.],
       [0., 1.]])

In [235]:
# Return a matrix of samples from the standard normal distribution
np.random.randn(2, 3)

array([[ 0.43126197,  0.3348622 ,  0.52706178],
       [-1.10843974,  0.3010278 ,  1.76016534]])

In [236]:
# Create a matrix of a fixed value
np.full([2, 3], 42)

array([[42, 42, 42],
       [42, 42, 42]])

In [238]:
# An array of a range with a start, end, and step and a specified dimension
np.arange(10, 90, 3).reshape(3, 3, 3)

array([[[10, 13, 16],
        [19, 22, 25],
        [28, 31, 34]],

       [[37, 40, 43],
        [46, 49, 52],
        [55, 58, 61]],

       [[64, 67, 70],
        [73, 76, 79],
        [82, 85, 88]]])

In [239]:
# Equally spaced numbers in a range
np.linspace(3, 27, 9)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27.])

#### INTERACTING WITH OS AND FILESYSTEM

In [1]:
import os

Check current working directory

In [2]:
os.getcwd()

'c:\\Users\\Veronica\\Documents\\python\\data_analysis'

Get list of files in directory

In [3]:
# For relative path
os.listdir('.')

['.git',
 '100_numpy_exercises.ipynb',
 'climate.txt',
 'climate_results.txt',
 'lesson3_numerical_computing.ipynb',
 'lesson4_analyzing_data.ipynb']

In [8]:
# For absolute path
os.listdir('/')

['$Recycle.Bin',
 '$WinREAgent',
 'Documents and Settings',
 'DumpStack.log.tmp',
 'hiberfil.sys',
 'pagefile.sys',
 'PerfLogs',
 'Program Files',
 'Program Files (x86)',
 'ProgramData',
 'Recovery',
 'swapfile.sys',
 'System Volume Information',
 'Users',
 'Windows']

In [10]:
os.listdir('/Windows')
os.listdir('/Windows/addins')

['FXSEXT.ecf']

In [11]:
os.listdir('/Users/Veronica/Documents/python/data_analysis')

['.git',
 '100_numpy_exercises.ipynb',
 'climate.txt',
 'climate_results.txt',
 'lesson3_numerical_computing.ipynb',
 'lesson4_analyzing_data.ipynb']

In [12]:
os.listdir('/Users/Veronica/Documents/python')

['beginner_python',
 'data_analysis',
 'data_structures_algorithms',
 'intermediate_python',
 'object_oriented_programming',
 'quiz']

Make a new directory

In [13]:
os.makedirs('./data', exist_ok=True)

Confirm directory was made and is empty

In [14]:
'data' in os.listdir('.')

True

In [15]:
os.listdir('./data')

[]

Download files into data directory using urllib module

In [16]:
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [17]:
import urllib.request

In [18]:
urllib.request.urlretrieve(url1, './data/loans1.txt')
urllib.request.urlretrieve(url2, './data/loans2.txt')
urllib.request.urlretrieve(url3, './data/loans3.txt')

('./data/loans3.txt', <http.client.HTTPMessage at 0x1856d4dd6d0>)

In [19]:
os.listdir('./data')

['loans1.txt', 'loans2.txt', 'loans3.txt']

Reading from a file

In [20]:
file1 = open('./data/loans1.txt', mode='r')

In [23]:
file1_contents = file1.read()
file1_contents

''

In [22]:
print(file1_contents)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


Dont forget to close files

In [24]:
file1.close()

Alternatively, to ensure we never forget to close our files, we could do:

In [26]:
with open('./data/loans2.txt', mode='r') as file2:
    file2_contents = file2.read()
    print(file2_contents)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


Reading a file line by line

In [29]:
with open('./data/loans3.txt', mode='r') as file3:
    # Contents as a list with newline characters
    file3_lines = file3.readlines()

In [30]:
file3_lines

['amount,duration,rate,down_payment\n',
 '45230,48,0.07,4300\n',
 '883000,16,0.14,\n',
 '100000,12,0.1,\n',
 '728400,120,0.12,100000\n',
 '3637400,240,0.06,\n',
 '82900,90,0.07,8900\n',
 '316000,16,0.13,\n',
 '15230,48,0.08,4300\n',
 '991360,99,0.08,\n',
 '323000,27,0.09,4720010000,36,0.08,20000\n',
 '528400,120,0.11,100000\n',
 '8633400,240,0.06,\n',
 '12900,90,0.08,8900']

In [32]:
# To remove newline characters
file3_lines[0].strip()

'amount,duration,rate,down_payment'

Processing data from files

We have to convert the string of file contents into data types to be able to perform operations on it. 

1. Read file line by line
2. Parse the first line to get column names (header)
3. Split each remaining line in the file and convert the values to floats
4. Create a dictionary for each loan (rows) using the column names as keys
5. Create a list of dictionaries to keep track of all loans

We can create functions for this process since it is useful for all CSV files.

In [38]:
def parse_header(header_line):
    return header_line.strip().split(',')

In [51]:
column_names_3 = parse_header(file3_lines[0])          

In [47]:
def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        values.append(float(item))
    return values

The above function will lead to ValueError in places where values are missing (empty strings cant be converted to floats). We can enhance this function to handle such edge cases like so:

In [57]:
def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        if item == '':
            values.append(0.0)
        else:
            values.append(float(item))
    return values

Create a function that takes a list of values and a list of headers, then returns a dictionary containing the values categorized under their respective headers.

In [49]:
def create_item_dict(values, headers):
    result = {}
    for value, header in zip(values, headers):
        result[header] = value
    return result

If we investigate zip(), we can see that it pairs elements in seperate iterables as tuples.

In [50]:
help(zip())

Help on zip object:

class zip(object)
 |  zip(*iterables, strict=False) --> Yield tuples until an input is exhausted.
 |
 |     >>> list(zip('abcdefg', range(3), range(4)))
 |     [('a', 0, 0), ('b', 1, 1), ('c', 2, 2)]
 |
 |  The zip object yields n-length tuples, where n is the number of iterables
 |  passed as positional arguments to zip().  The i-th element in every tuple
 |  comes from the i-th iterable argument to zip().  This continues until the
 |  shortest argument is exhausted.
 |
 |  If strict is true and one of the arguments is exhausted before the others,
 |  raise a ValueError.
 |
 |  Methods defined here:
 |
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |
 |  __iter__(self, /)
 |      Implement iter(self).
 |
 |  __next__(self, /)
 |      Implement next(self).
 |
 |  __reduce__(...)
 |      Return state information for pickling.
 |
 |  __setstate__(...)
 |      Set state information for unpickling.
 |
 |  --------------------------------------

Test create_item_dict()

In [58]:
values2 = parse_values(file3_lines[2])
create_item_dict(values2, column_names_3)

{'amount': 883000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0}

Combine all functions together to create read_csv()

In [59]:
def read_csv(path):
    result = []
    # Open file in read mode
    with open(path, mode='r') as file:
        # Get list of lines
        lines = file.readlines()
        # Parse header
        headers = parse_header(lines[0])
        # Loop over remaining lines
        for data_line in lines[1:]:
            # Parse values
            values = parse_values(data_line)
            # Create dictionary using values and headers
            item_dict = create_item_dict(values, headers)
            # Add dictionary to result
            result.append(item_dict)
    return result

Test read_csv()

In [60]:
loans1_data = read_csv('./data/loans1.txt')
loans1_data

[{'amount': 100000.0, 'duration': 36.0, 'rate': 0.08, 'down_payment': 20000.0},
 {'amount': 200000.0, 'duration': 12.0, 'rate': 0.1, 'down_payment': 0.0},
 {'amount': 628400.0,
  'duration': 120.0,
  'rate': 0.12,
  'down_payment': 100000.0},
 {'amount': 4637400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.07, 'down_payment': 8900.0},
 {'amount': 916000.0, 'duration': 16.0, 'rate': 0.13, 'down_payment': 0.0},
 {'amount': 45230.0, 'duration': 48.0, 'rate': 0.08, 'down_payment': 4300.0},
 {'amount': 991360.0, 'duration': 99.0, 'rate': 0.08, 'down_payment': 0.0},
 {'amount': 423000.0, 'duration': 27.0, 'rate': 0.09, 'down_payment': 47200.0}]

Now we can use this function from Lesson 2

In [63]:
import math
def loan_emi(amount, duration, rate, down_payment=0):
    loan_amount = amount - down_payment
    try:
        emi = loan_amount * rate * ((1 + rate)**duration) / (((1 + rate)**duration) - 1)
    except ZeroDivisionError:
        emi = loan_amount / duration
    emi = math.ceil(emi)
    return emi

Calculate EMIs for all loans in a file

In [61]:
loans2_data = read_csv('./data/loans2.txt')

In [64]:
for loan in loans2_data:
    loan['emi'] = loan_emi(loan['amount'],
                           loan['duration'],
                           loan['rate']/12,
                           loan['down_payment'])

We can transform this process into a function so that it can be performed on all files and loans

In [65]:
def compute_emis(loans):
    for loan in loans:
        loan['emi'] = loan_emi(
            loan['amount'],
            loan['duration'],
            loan['rate']/12,
            loan['down_payment']
        )

Writing to files

In [66]:
compute_emis(loans2_data)

In [72]:
with open('./data/emis2.txt', mode='w') as file:
    for loan in loans2_data:
        file.write(f'{loan['amount']}, {loan['duration']}, {loan['rate']}, {loan['down_payment']}, {loan['emi']}\n')

Confirm that file was created

In [73]:
os.listdir('./data')

['emis2.txt', 'loans1.txt', 'loans2.txt', 'loans3.txt']

In [74]:
with open('./data/emis2.txt') as file:
    print(file.read())

828400.0, 120.0, 0.11, 100000.0, 10034
4633400.0, 240.0, 0.06, 0.0, 33196
42900.0, 90.0, 0.08, 8900.0, 504
983000.0, 16.0, 0.14, 0.0, 67707
15230.0, 48.0, 0.07, 4300.0, 262



Define a generic function that takes a list of dictionaries and writes it to a file in CSV format.

In [75]:
def write_csv(items, path):
    # Open file in write mode
    with open(path, mode='w') as file:
        # Return if there's nothing to write
        if len(items) == 0:
            return
        
        # Write headers in first line
        headers = list(items[0].keys())
        file.write(','.join(headers) + '\n')

        # Write one item per line
        for item in items:
            values = []
            for header in headers:
                values.append(str(item.get(header, '')))
            file.write(','.join(values) + '\n')