# Recap Last Week

We covered:
* How to import Python modules
* Useful modules in the Python Standard Library
* Creating our own Python Classes

# This Week

what we will cover:

* Reading and Writing with the built in CSV module
* installing 3rd party packages with pip
* Reading and Writing Excel files with the openpyxl module

## Reading from CSV files

CSV (comma seperated value) files are a standard for storing tabular data. Data within the file is stored in plain text, and as the name implies, each value in the csv file is deliminated by a symbol (genrally a comma). Reading from csv's is a fairly straightforward task in Python. With the help of the csv module we can quickly parse and extract the formatted data

The Standard Library does a great job explaining how to use the [csv moulde](https://docs.python.org/3/library/csv.html)


To demonstrate how to work with csv module we'll analyze stock data for Apple. We'll want to figure out a few things:

* What was the return on apple stock in the current period
* What was the highest price the stock reached
* what was the lowest price the stock reached
* what was the average stock price over this period
* what trading day had the most volume, and did the stock rise or fall?
* how many days did the stock price end lower than where it started?

In [22]:
import csv

data = []

with open('AAPL.csv') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        data.append(row)

# only print out the first 5 rows
for row in data[:5]:
    print(row)

['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
['2018-06-11', '191.350006', '191.970001', '190.210007', '191.229996', '188.362747', '18308500']
['2018-06-12', '191.389999', '192.610001', '191.149994', '192.279999', '189.397003', '16911100']
['2018-06-13', '192.419998', '192.880005', '190.440002', '190.699997', '187.840683', '21638400']
['2018-06-14', '191.550003', '191.570007', '190.220001', '190.800003', '187.939194', '21610100']


As you can see, with just a few lines of Python code we were able to read in all the data from our csv file. There are a couple things to point out here. 

* 1) **The header row will also be read in**, so you'll need to account for that if your data includes column headers
* 2) **All the data read from the csv file will be Strings**, we'll have to cast values to differnt data types ourselves

In [23]:
# just to show you that each value is a string

# the first row of data
for header, value in zip(data[0], data[1]):
    print(f'{header} data is of type {type(value)} -- {value}')

Date data is of type <class 'str'> -- 2018-06-11
Open data is of type <class 'str'> -- 191.350006
High data is of type <class 'str'> -- 191.970001
Low data is of type <class 'str'> -- 190.210007
Close data is of type <class 'str'> -- 191.229996
Adj Close data is of type <class 'str'> -- 188.362747
Volume data is of type <class 'str'> -- 18308500


# Casting values

Before we can analyze the data we'll need to convert all the values into proper types. Taking a look at the csv file we know that the columns are in the following order: Date, Open, High, Low, Close, Adj Close, and Volume. We'll convert each string into an approprate datatype so that we can continue our analysis

In [26]:
import datetime as dt

def cast_stock_data(stock_data):
    """
    receives a list of strings, where each string represents data in the following order:
    
    Date, Open, High, Low, Close, Adj Close, and Volume
    
    returns a new list where data has been properly converted
    """
    data_copy = stock_data.copy()
    for index, value in enumerate(data_copy):
        # first value is the data
        if index == 0:
            data_copy[0] = dt.datetime.strptime(value, '%Y-%m-%d').date()
        
        elif '.' in value:
            data_copy[index] = round(float(value), 2)
        else:
            data_copy[index] = int(value)

    return data_copy

# skip the header row
stock_data = [cast_stock_data(row) for row in data[1:]].

# show all the types of the first row
for header, value in zip(data[0], stock_data[1]):
    print(f'{header} data is of type {type(value)} -- {value}')
    

Date data is of type <class 'datetime.date'> -- 2018-06-12
Open data is of type <class 'float'> -- 191.39
High data is of type <class 'float'> -- 192.61
Low data is of type <class 'float'> -- 191.15
Close data is of type <class 'float'> -- 192.28
Adj Close data is of type <class 'float'> -- 189.4
Volume data is of type <class 'int'> -- 16911100


Now that values have the correct types we can begin answering our questions

### What was the return on Apple stock in the current period

**Note: you can calculate the return by taking the % difference between the starting and ending price**

    stock_return = (P1 / P0) - 1
    
    where: 
        P0 = initial stock price
        P1 = ending stock price

In [30]:
# lets see if our data is ascending or descending order
print(data[0])

# Note stock_data only contains the data without the headers
for value in stock_data[:5]:
    print(value)

['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume']
[datetime.date(2018, 6, 11), 191.35, 191.97, 190.21, 191.23, 188.36, 18308500]
[datetime.date(2018, 6, 12), 191.39, 192.61, 191.15, 192.28, 189.4, 16911100]
[datetime.date(2018, 6, 13), 192.42, 192.88, 190.44, 190.7, 187.84, 21638400]
[datetime.date(2018, 6, 14), 191.55, 191.57, 190.22, 190.8, 187.94, 21610100]
[datetime.date(2018, 6, 15), 190.03, 190.16, 188.26, 188.84, 186.01, 61719200]


Our data is clearly in ascending order, which menas our data starts on the first row and ends on the last row

In [43]:
def stock_return(row0, row1):
    """
    each row should contain stock data with the following format:
    
    Date, Open, High, Low, Close, Adj Close, and Volume
    
    returns the calcuated stock return
    """
    return (row1[5] /row0[5]) - 1

appl_return = stock_return(stock_data[0], stock_data[-1])

date0 = stock_data[0][0]
date1 = stock_data[-1][0]

q1 = 'What was the return on Apple stock in the current period?'
a1 = f'The return on Apple stock between {date0} and {date1} was {appl_return * 100:2f}%'

print(q1)
print(a1)

What was the return on Apple stock in the current period?
The return on Apple stock between 2018-06-11 and 2019-06-07 was 0.950308%


### What was the highest price the stock reached?

In [44]:

def max_price(stock_data):
    """
    each row should contain stock data with the following format:
    
    Date, Open, High, Low, Close, Adj Close, and Volume
    
    returns (date, price)
    """
    # initialze the starting values     
    date = stock_data[0][0]
    price = stock_data[0][2]

    # remember that the high is index 2
    for data in stock_data:
        if data[2] > price:
            date = data[0]
            price = data[2]
    
    return date, price


max_date, max_price = max_price(stock_data)

q2 = 'What was the highest price the stock reached?'
a2 = f'Apple was at its highest on {max_date} at a price of ${max_price}'

print(q2)
print(a2)

What was the highest price the stock reached?
Apple was at its highest on 2018-10-03 at a price of $233.47


### What was the lowest price the stock reached?

In [46]:
def min_price(stock_data):
    """
    each row should contain stock data with the following format:
    
    Date, Open, High, Low, Close, Adj Close, and Volume
    
    returns (date, price)
    """
    # initialize the starting values
    date = stock_data[0][0]
    price = stock_data[0][5]
    
    for data in stock_data:
        if data[3] < price:
            date = data[0]
            price = data[3]
    
    return date, price

min_date, min_price = min_price(stock_data)

q3 = 'What was the lowest price the stock reached?'
a3 = f'Apple was at its lowest on {min_date} at a price of ${min_price}'

print(q3)
print(a3)

What was the lowest price the stock reached?
Apple was at its lowest on 2019-01-03 at a price of $142.0


### what was the average stock price over this period

In [48]:
def avg_stock_price(stock_data):
    """
    each row should contain stock data with the following format:
    
    Date, Open, High, Low, Close, Adj Close, and Volume
    
    returns the average stock price based on the Adj Close
    """
    only_prices = [data[5] for data in stock_data]
    price_sum = sum(only_prices)
    avg_price = price_sum / len(only_prices)
    return round(avg_price, 2)

avg_price = avg_stock_price(stock_data)

q4 = 'What was the average stock price over this period?'
a4 = f'The average price of Apply for the period between {date0} and {date1} was ${avg_price}'

print(q4)
print(a4)

What was the average stock price over this period?
The average price of Apply for the period between 2018-06-11 and 2019-06-07 was $189.66


### what trading day had the most volume, and did the stock rise or fall?

### how many days did the stock price end lower than where it started?

## Writing to csv files

Now that we've done all of our analysis we'll store the results in a csv file named apple_analysis.csv