<img style="float: right;" width="120" src="../Images/supplier-logo.png">
<img style="float: left; margin-top: 0" width="80" src="../Images/client-logo.png">
<br><br><br>


# Overview

Aswell as importing data into DataFrames from files such as csv or excel spreadsheets

Users can also import data from other soruces

This notebook shows a few 

- WebScraping

- Quandl - An online Data providers

- Relational Databases - sqlite

- Exporting Data to Excel

- Sending Emails


# Web Scraping

## The Federal Reserve of New York

Import over night rates into a Dataframe

### Load in the libraries

In [None]:
import pandas as pd
import numpy as np
from datetime import datetime

# the python package to follow a URL and return its results
import requests

# beautiful soup - for web scraping
import bs4 as bs


### Import the data 

Get the response from the web page

Convert it to `soup`

Find the table we are looking for and store it in a local variable (`table`)

In [None]:
resp = requests.get('https://apps.newyorkfed.org/markets/autorates/fed%20funds')

soup = bs.BeautifulSoup(resp.text, 'lxml')

table = soup.find('table', {'id':'TBLDetails'})

### Iterate over the table from soup

In [None]:
dates = []
rates = []
p1s = []
p25s = []
p75s = []
p99s = []
vols = []
targets = []

now = datetime.now()

for row in table.findAll('tr')[2:]:
    
    dt = str(now.year) + "/" + row.findAll('td')[0].text.split()[0]
    
    dates.append(dt)
    if len(row.findAll('td')) == 9:
        rates.append(row.findAll('td')[1].text.split()[0])
        p1s.append(row.findAll('td')[2].text.split()[0][20:25].replace("'",""))
        p25s.append(row.findAll('td')[3].text.split()[0][20:25].replace("'",""))
        p75s.append(row.findAll('td')[4].text.split()[0][20:25].replace("'",""))
        p99s.append(row.findAll('td')[5].text.split()[0][20:25].replace("'",""))    
        vols.append(row.findAll('td')[6].text.split()[0])
    
        ts = row.findAll('td')[7].text.split()
        target = ts[0]+ts[1]+ts[2]
        targets.append(target)
    else:
        rates.append("0.0")
        p1s.append("0.0")
        p25s.append("0.0")
        p75s.append("0.0")
        p99s.append("0.0")  
        vols.append("0.0")
        targets.append("0.0")



### Use the lists of data to populate a datframe

Need to change the datatypes from strings 
- to numeric
- to dates
- also note that I need to use the str replace function to massage the volumne column

In [None]:
df = pd.DataFrame()

df['Date'] = dates
df['Rate'] = pd.to_numeric(rates)
df['1st PCile'] = pd.to_numeric(p1s)
df['25th PCile'] = pd.to_numeric(p25s)
df['75th PCile'] = pd.to_numeric(p75s)
df['99th PCile'] = pd.to_numeric(p99s)
df['Target Range'] = targets
df['Vol'] = vols

df['Vol'] = pd.to_numeric(df['Vol'].str.replace(',', ''))
df['Date'] = pd.to_datetime(df['Date'])

df.index = df['Date']

del df['Date']

In [None]:
df.head()

### Export to excel and save

In [None]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
fname = 'fedfunds' + now.strftime('%Y%m%d') + '.xlsx'

writer = pd.ExcelWriter(path=fname, engine='xlsxwriter')

# Convert the DataFrame to an XlsxWriter Excel object.
# In this case we'll put each of the FANG columns in a separate sheet.
df.to_excel(writer, sheet_name='Fed Funds')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

## Wikipedia

Extract the stock tickers for the SP500 from Wikipedia into a list

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# the python package to follow a URL and return its results
import requests

# beautiful soup - for web scraping
import bs4 as bs

In [None]:
resp = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = bs.BeautifulSoup(resp.text, 'lxml')
table = soup.find('table', {'class':'wikitable sortable'})

tickers = []

for row in table.findAll('tr')[1:]:
    ticker = row.findAll('td')[0].text
    tickers.append(ticker)

tickers

## Wikipedia

Second method, read all tables directly into a pandas DataFrame.


Use the pandas function `read_html(...)`, pass it a url and it will return a list of dataframes, each dataframe contining the contents of each table in the url.

THis is very basic but reasonably satisfactory, tho a user will need to expend some effort in chanign names of columns, the datatypes of idividual cells etc.


For the wiki paage above, `read_html(...)` returns 2 dataframes, one for each table it found.

In [None]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')

print("Number of tables on page: ", len(dfs))

for df in dfs:
    display(df.head())


# Financial Data from the internet

There is a plethora of web sites, web services that allow users to download all sorts of data in all manner of formats.

e.g. Manual, Automated excel format, pdfs, word, json, csv and text and DataFrames!!!

A very good website is https://www.quandl.com/

This offers good quality data for free and for a fee.

Log on, register and download data in a variety of formats.

To use their APIs you need an api_key

### Import the libraries

And set the quandle key

In [None]:
import quandl

quandl.ApiConfig.api_key = "YOIR API KEY HERE" 

In [None]:
quandl.get('NSE/RELIANCE', start_date = '2017-JAN-01', end_date='2019-JAN-24')


In [None]:
quandl.get('NSE/RELIANCE', start_date = '2017-JAN-01', end_date='2019-JAN-24')
quandl.get('OPEC/ORB', start_date='2009-01-23', end_date='2019-01-24')
quandl.get('LBMA/GOLD', start_date='2018-01-01', end_date='2019-01-23')
quandl.get('WIKI/IBM', start_date='2018-01-01', end_date='2019-01-23')

### Make a simple function



In [None]:
def getData(start, end, symbol):
    data = 'WIKI/'+ symbol

    return quandl.get(dataset = data, start_date = start, end_date = end)

df = getData("2000-01-01", "2018-12-31","C")

df.head()

df[['Close']].plot()

# SQLLite

Use the flights database located in `../Data/DataBase/flights.db`

### Import the libraries

And connect to the database

In [None]:
import sqlite3
conn = sqlite3.connect("../Data/flights.db")

### Get a cursor frpm the database

Use the connection object ot get a cursor.

cursors allow users to exdcute SQL against the database

In [None]:
cur = conn.cursor()

### Execute some SQL

In [None]:
cur.execute("select * from airlines limit 5;")

Use `fetchall()` to view the results

In [None]:
res = cur.fetchall()

print(res)

The results are a list of uples


Users can chain the `execute()` and `fetchall()` methods together if they wish

In [None]:
cur.execute("select * from airlines limit 5;").fetchall()

Once finished with the connections and cursors, rememebr to **close** them

In [None]:
cur.close()
conn.close()

### Reading results into a pandas DataFrame

In [None]:
import pandas as pd
import sqlite3
conn = sqlite3.connect("../Data/flights.db")
df = pd.read_sql_query("select * from airlines limit 5;", conn)
df


### Inserting rows with Python

In [None]:
cur = conn.cursor()
cur.execute("insert into airlines values (6048, 19846, 'Test flight', '', '', null, null, null, 'Y')")


### Passing parameters into a query

Use `?` and a values parameter<br>
Any `?` value in the query will be replaced by a value in values. <br>
The first `?` will be replaced by the first item in values, the second by the second, and so on.

In [None]:
cur = conn.cursor()
values = ('Test Flight', 'Y')
cur.execute("insert into airlines values (6049, 19847, ?, '', '', null, null, null, ?)", values)
conn.commit()

### Updating rows

In [None]:
cur = conn.cursor()
values = ('USA', 19847)
cur.execute("update airlines set country=? where id=?", values)
conn.commit()

Simple test to verify the update happened

In [None]:
pd.read_sql_query("select * from airlines where id=19847;", conn)

## Deleting rows

In [None]:
cur = conn.cursor()
values = [19847]
cur.execute("delete from airlines where id=?", values)
conn.commit()

Simple test to verify the update happened

In [None]:
pd.read_sql_query("select * from airlines where id=19847;", conn)

# Exporting Data to Excel


We use the Pandas ExcelWriter function to output a DataFrame or Series to an Excel spreadsheet.

We can choose what data goes into each sheet in the Excel spreadsheet.


### Load Data

In [None]:
# Load in the famous FANG stocks
filename = '../Data/market_data.xls'
df_FB = pd.read_excel(io=filename, parse_dates=True, index_col='Date', sheet_name='FB')
df_AAPL = pd.read_excel(io=filename, parse_dates=True, index_col='Date', sheet_name='AAPL')
df_AMZN = pd.read_excel(io=filename, parse_dates=True, index_col='Date', sheet_name='AMZN')
df_NFLX = pd.read_excel(io=filename, parse_dates=True, index_col='Date', sheet_name='NFLX')
df_GOOGL = pd.read_excel(io=filename, parse_dates=True, index_col='Date', sheet_name='GOOGL')


In [None]:
df_FB

### Perform some analysis

In [None]:
# E.g. the daily returns
df_Returns = pd.DataFrame()

# Calculate the log of daily returns
df_Returns['FB'] = np.log(df_FB['Close'] / df_FB['Close'].shift(-1))
df_Returns['AAPL'] = np.log(df_AAPL['Close'] / df_AMZN['Close'].shift(-1))
df_Returns['AMZN'] = np.log(df_AMZN['Close'] / df_AMZN['Close'].shift(-1))
df_Returns['NFLX'] = np.log(df_NFLX['Close'] / df_NFLX['Close'].shift(-1))
df_Returns['GOOGL'] = np.log(df_GOOGL['Close'] / df_GOOGL['Close'].shift(-1))

# Do some aggregation functions
funcs = ['min', 'max', 'mean', 'std']
grpr = pd.Grouper(freq='BQ')

df = df_Returns.groupby(grpr).agg(funcs)
df.head()

### Save the Data to an excel Spreadsheet

In [None]:
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('../Output/FAANG.xlsx', engine='xlsxwriter')

# Convert the DataFrame to an XlsxWriter Excel object.
# In this case we'll put each of the FANG columns in a separate sheet.
df['FB'].to_excel(writer, sheet_name='FB')
df['AAPL'].to_excel(writer, sheet_name='AAPL')
df['AMZN'].to_excel(writer, sheet_name='AMZN')
df['NFLX'].to_excel(writer, sheet_name='NFLX')
df['GOOGL'].to_excel(writer, sheet_name='GOOGL')

# Close the Pandas Excel writer and output the Excel file.
writer.save()

# Check that the data has been correctly saved to a file called 'FANG.xlsx' in the same directory as this notebook

# Emailing

Treat these as templates

They use smtp to connect ot an email server rather than outlook.#

#Using outlook is somewhat restrtictive, it requires outlook to be installed on the host machine etc

This way the code is more portable


The domain name for the SMTP server is usually the name of your email providers domain name with **smtp.** in front of it

Ports are almost always 587

| Provider |  SMTP server domain name   |
|------|------|
|   Gmail  | smtp.gmail.com|
|   Outlook.com/Hotmail.com   | smtp-mail.outlook.com|
|   Yahoo Mail  | smtp.mail.yahoo.com|
|   iCloud   | smtp.mail.me.com|


## 5.1) Emailing Simple messages

In [None]:
import smtplib

import email.mime.application

from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

FROM = "someone@somewhere.com"
TO = "someone_else@someplace_else.com"
SUBJECT = "Results of Technical Analysis v2"
PASSWD = "PASSWORD"

msg = MIMEMultipart()

msg['From'] = FROM
msg['To'] = TO
msg['Subject'] = SUBJECT

# The body of an email is just an attachment, the same as any other attachment
BODY = """Resutls from last nights run

Regards

Pat
"""
msg.attach(MIMEText(BODY, 'plain'))

# Send the message
HOST = "smtp-mail.outlook.com"
PORT = "587"
SERVER = smtplib.SMTP(HOST, PORT)

SERVER.starttls()
SERVER.login(FROM, PASSWD)
text = msg.as_string()
SERVER.sendmail(FROM, TO, text)
SERVER.quit()

## Emailing attachments

Use `MIMEMultipart.attach()` to attach a file or text to an email message



In [None]:
import smtplib

import email
import email.mime.application

from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

FROM = "someone@somewhere.com"
TO = "someone_else@someplace_else.com"
SUBJECT = "Results of Technical Analysis v2"
PASSWD = "PASSWORD"

msg = MIMEMultipart()

msg['From'] = FROM
msg['To'] = TO
msg['Subject'] = SUBJECT

# The body of an email is just an attachment, the same as any other attachment
BODY = """Resutls from last nights run

Regards

Pat
"""
msg.attach(MIMEText(BODY, 'plain'))

# PDF attachment
filename='../Data/pdf_1.pdf'
fp=open(filename,'rb')
att = email.mime.application.MIMEApplication(fp.read(),_subtype="pdf")
fp.close()
att.add_header('Content-Disposition','attachment',filename="PDF Version.pdf")
msg.attach(att)

# Word attachment
filename='../Data/doc_1.docx'
fp=open(filename,'rb')
att = email.mime.application.MIMEApplication(fp.read(),_subtype="docx")
fp.close()
att.add_header('Content-Disposition','attachment',filename="Word Version.docx")
msg.attach(att)

# csv attachment
filename='../Data/FB.csv'
fp=open(filename,'rb')
att = email.mime.application.MIMEApplication(fp.read(),_subtype="text/csv")
fp.close()
att.add_header('Content-Disposition','attachment',filename="Facebook.csv")
msg.attach(att)

# Send the message
HOST = "smtp-mail.outlook.com"
PORT = "587"
SERVER = smtplib.SMTP(HOST, PORT)

SERVER.starttls()
SERVER.login(FROM, PASSWD)
text = msg.as_string()
SERVER.sendmail(FROM, TO, text)
SERVER.quit()