## Problem description

The python library yfinance is a popular open source library developed by Ran Aroussi as a means to access the financial data available on Yahoo Finance. Yahoo Finance offers an excellent range of market data on stocks, bonds, currencies and cryptocurrencies. It also offers market news, reports and analysis and additionally options and fundamentals data- setting it apart from some of it’s competitors. The Ticker() module allows you get market and meta data for a security, using a Pythonic way. yfinance is not affiliated, endorsed, or vetted by Yahoo, Inc. It's an open-source tool that uses Yahoo's publicly available APIs, and is intended for research and educational purposes. 

## SCOPE

The scope of this project is to study the yfinance library and explore the Ticker() by developing and automating the below tasks (Tasks are explained in detail later):

A) Create a function to get data by simply passing stock symbol, start date and end date.

B) Create a function to export the data pulled using yfinance to a library of choice. Here the symbol, start and end dates are passed.

C) Automate the task of creating the file above for all the NIFTY50 stocks. Save these files to a new folder NIFTY50 in the library chosen previously.

D) Create a function to calculate the daily return of a stock on a particular daya.

E) Using the above daily return function, and the fies created in the step C, Calcualte and update all the files with daily return value for the dates. 

F) Automate the creation of all the daily return graphs for the NIFTY50 stocks. All graphs will be saved in a new Graphs folder with the stock name. (Keep the graph creation process optional)

### A) get_stock_data() function

Design a function named    get_stock_data(symbol, start_date, end_date)

that would take a stock symbol, a start date and an end date and generate a data frame of the above kind.

For example, 

get_stock_data('TATAMOTORS.NS', start_date='2021-10-19', end_date='2022-10-19').head() 

Would return...

                              Open	      High	      Low	     Close	 Volume	   Dividends	Stock Splits
            Date							
            2021-10-19	512.450012	517.400024	476.049988	481.899994	57428637	0	         0
            2021-10-20	481.799988	497.000000	471.250000	486.899994	55444814	0	         0
            2021-10-21	491.750000	510.399994	485.750000	508.000000	52608672	0	         0
            2021-10-22	509.899994	510.700012	487.399994	490.899994	42742785	0	         0
            2021-10-25	493.899994	496.000000	473.250000	479.899994	33107841	0	         0

In [4]:
#importing yfinance libray
import yfinance as yf

#defining the get_stock_data function
def get_stock_data(symbol, start_date, end_date):  
    
    #reading the ticker data using the Ticker function by passing the symbol from parameters
    tickerData = yf.Ticker(symbol)
    
    #get the data using the history function
    tickerDf = tickerData.history(period='1d', start=start_date, end=end_date)
    
    #convert the date format to YYYY-MM-DD
    tickerDf.index = tickerDf.index.strftime('%Y-%m-%d')
    
    #return the data frame
    return tickerDf

In [5]:
get_stock_data('TATAMOTORS.NS', start_date='2021-10-19', end_date='2022-10-19').head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2021-10-19,512.450012,517.400024,476.049988,481.899994,57428637,0,0
2021-10-20,481.799988,497.0,471.25,486.899994,55444814,0,0
2021-10-21,491.75,510.399994,485.75,508.0,52608672,0,0
2021-10-22,509.899994,510.700012,487.399994,490.899994,42742785,0,0
2021-10-25,493.899994,496.0,473.25,479.899994,33107841,0,0


## B) export_stock_data() function

Create a function named       export_stock_data(symbol, start_date, end_date, directory)
        
that would take a stock symbol, a start date, an end date and a path (directory) to generate a data frame of the above kind and export it as a CSV file in the given directory with the name of the file as the symbol of the stock.

In [6]:
#importing yfinance libray
import yfinance as yf
import os

#defining the get_stock_data function
#The directory name should be given with r' when calling this function to make it raw string as input.
def export_stock_data(symbol, start_date, end_date,directory):  
    
    #reading the ticker data using the Ticker function by passing the symbol from parameters
    tickerData = yf.Ticker(symbol)
    
    #get the data using the history function
    tickerDf = tickerData.history(period='1d', start=start_date, end=end_date)
    
    #convert the date format to YYYY-MM-DD
    tickerDf.index = tickerDf.index.strftime('%Y-%m-%d')
    
    #converting the backslashes to forward slashes
    #directory = directory.replace('/','\\')
    
    #keep the current working directory.
    currDir = os.getcwd()
    
    #saving the file to a CSV file with symbol as name
    os.chdir(directory)
    tickerDf.to_csv(symbol+".csv")
    
    #change back to the python directory for future codes
    os.chdir(currDir)
    
    #return the data frame
    return tickerDf

In [8]:
export_stock_data('TATAMOTORS.NS', start_date='2021-10-19', end_date='2022-10-19',directory= r'C:\Users\asus\Downloads\py exam')

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2021-10-19,512.450012,517.400024,476.049988,481.899994,57428637,0,0
2021-10-20,481.799988,497.000000,471.250000,486.899994,55444814,0,0
2021-10-21,491.750000,510.399994,485.750000,508.000000,52608672,0,0
2021-10-22,509.899994,510.700012,487.399994,490.899994,42742785,0,0
2021-10-25,493.899994,496.000000,473.250000,479.899994,33107841,0,0
...,...,...,...,...,...,...,...
2022-10-12,394.549988,398.200012,391.100006,396.549988,14287344,0,0
2022-10-13,396.549988,403.500000,394.549988,399.000000,12584114,0,0
2022-10-14,407.000000,409.200012,395.299988,396.250000,12875470,0,0
2022-10-17,394.000000,398.899994,392.500000,396.100006,9543556,0,0


## C) Automate the file creation for all the NIFTY50 stocks

The file nifty50.csv contains the name and symbols of all the 50 stocks of nifty-50.

Use the function named

        export_stock_data(symbol, start_date, end_date, directory)
        
that would take the stock symbol one at a time from the nifty50.csv file to generate a data frame of the above kind and export it as a CSV file inside a NEW FOLDER in the given directory with the name of the file as exactly equal to the symbol of the stock given in the CSV file.

**Note:**

1. The symbols given in the dataset do not have the suffix .NS which needs to be added to the symbols before it can be used in the function.

2. The new folder must be created with the name NIFTY50

3. All the data frames must be exported as a CSV files as saved in the NIFTY50 folder


In [9]:
import pandas as pd

nifty50 = pd.read_csv('nifty50.csv')

#keep the parent directory for future use
parentdict = os.getcwd()

#change the directory to the NEW FOLDER
os.chdir(os.getcwd())

#create the new folder called NIFTY 50
os.mkdir('NIFTY50')

#Use the new folder created to save the 
os.chdir(os.getcwd()+'\\NIFTY50')

#keep the directory name for each iteration
directory = os.getcwd()

#initialize the dates
start_date = '2021-10-19'
end_date = '2022-10-19' 

#Run the loop for each company in the NIFTY 50 file
for symbol in nifty50.Symbol:
    
    #add .NS to the symbol name
    symbol = symbol+".NS"
    
    #call the export function to save each file
    export_stock_data(symbol, start_date, end_date, directory)  

## D) Function to calculate daily returns daily_return()

Create a function called,

    daily_return(df)
    
This function should be able to take a dataframe shown in the example above and calculate the daily return of the close price and output a dataframe with the same number of columns and an additional column named "**Return_Close**". 

The daily return for a day is calculated as follows:

    daily return = (today's price - previous day's price)/(previous day's price)*100

**Note:** You can calculate the daily return starting from the day 2. The return for the day 1 should be set to 0.

In [10]:
#get the dataframe using the function get_stock_data
df = get_stock_data('TATAMOTORS.NS', start_date='2021-10-19', end_date='2022-10-19')

#defining the daily_return function
def daily_return(df):
    
    #creating an empty list to hold the daily return values
    daily_return =[]    
    
    #running for loop for each day in the dataframe
    for i in range(len(df)):
        
        #for first row we have to save zero as daily return
        #from the second row onwards we use the equation :
        #daily return = (today's price - previous day's price)/(previous day's price)*100
        
        #drValue = daily return variable to hold each day's data
        
        if (i == 0):
            drValue = 0
        else:
            drValue = (df.Close[i] - df.Close[i-1])*100/df.Close[i-1]
            
        #saving each daily return value to the list
        daily_return.append(drValue)
        
    #Create the new column Return_Close and use the daily return values list we created for the data
    df['Return_Close'] = daily_return
    
    #return the new dataframe
    return df

## E) Read each file in the NIFTY50 folder and update with daily return 

Consider the NIFTY50 folder created in Q3. This folder should contain 50 data frames - one for each stocks in the Nifty-50 list. Create a function called,

    get_return(directory)
    

This function would take the link to the NIFTY50 folder, read one CSV file at a time present inside it, calulate the return of the close price, save the return value in an new column called "Return" and update the CSV file. 


In [17]:
## Reading code (using glob) taken partially from https://www.geeksforgeeks.org/how-to-read-all-csv-files-in-a-folder-in-pandas/
# import necessary libraries
import pandas as pd
import os
import glob
  
#defining the get return function
def get_return(directory):
    
    #use glob to get all the csv files
    path = directory
    csv_files = glob.glob(os.path.join(path, "*.csv"))
    
    # loop through all the files
    for file in csv_files:

        # read the csv file
        df = pd.read_csv(file)
        
        #use the daily_return function to calculate and create the column for return of the close price 
        df = daily_return(df)
        
        #renaming the Return_Close to Return
        
        df.rename(columns = {'Return_Close':'Return'}, inplace = True)
        
        #export the updated csv file to the same folder. it'll replace the old file wthout return.
        df.to_csv(file, index=False)
        
get_return(os.getcwd())

## F) Create and save graph for daily return for each stock automatically

Modify the get_return() function such that it has another argument,

    get_return(directory, plot = False)
    
If the argument plot is set to False the the output of this function is same as the function in Q6. However, if it is set to True then for every stock we get a line plot in the JPEG format exported in a separate folder named **Graphs** inside the parent directory.

In [13]:
## Reading all csvs together code(glob) taken partially from https://www.geeksforgeeks.org/how-to-read-all-csv-files-in-a-folder-in-pandas/
# import necessary libraries
import pandas as pd
import os
import glob
import matplotlib.pyplot as plt
import seaborn as sns
  
#use glob to get all the csv files
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.csv"))

#Creating the Graphs folder
os.chdir(parentdict)
os.mkdir('Graphs')

#Changin back to NIFTY50 Folder to process the files
os.chdir(parentdict+"\\NIFTY50")

#define the function
def get_return(directory, plot):
    
    # loop through all the files
    for file in csv_files:

        # read the csv file
        df = pd.read_csv(file)
        
        #use the daily_return function to calculate and create the column for return of the close price 
        df = daily_return(df)
        
        #renaming the Return_Close to Return
        
        df.rename(columns = {'Return_Close':'Return'}, inplace = True)
        #export the updated csv file to the same folder
        df.to_csv(file, index=False)
        
        #if the plot is True we have to save the linegraphs in Graphs folder
        if plot : 
            
            #go to Graphs Folder
            os.chdir(parentdict+"\\Graphs")
            
            #plot the line graph using seaborn lineplot function
            sns.lineplot(data = df['Return'])           
            #labelling graph
            plt.xlabel('Dates')
            plt.ylabel('Return')
            plt.title("Return for "+file[file.rfind('\\')+1:-4])
            
            #saving the figure with nomenclature:  company_name.jpeg
            plt.savefig(file[file.rfind('\\')+1:-4] + ".jpeg")
            
            #to avoid the cumulative graphs we clear after each iteration
            plt.clf()
            
            #after each saving graph, go back to NIFTY 50 folder to read the next file
            os.chdir(parentdict+"\\NIFTY50")
        
get_return(os.getcwd(), True)

<Figure size 432x288 with 0 Axes>