- Notebook Author: [Trenton McKinney](https://trenton3983.github.io/)
- Course: **[DataCamp: Introduction to Python for Finance](https://learn.datacamp.com/courses/introduction-to-python-for-finance)**
 - This [notebook](https://github.com/trenton3983/DataCamp/blob/master/2019-02-05_intro_to_python_for_finance.ipynb) was created as a reproducible reference.
 - The material is from the course
 - I completed the exercises
 - If you find the content beneficial, consider a [DataCamp Subscription](https://www.datacamp.com/pricing?period=yearly).

# My Comments about this course

* Most of the course material is very basic so has not been included in the Notebook
* Great course for those with little or not Python experience

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from numpy import NaN
from glob import glob
import re

In [None]:
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 300)
pd.set_option('display.expand_frame_repr', True)

### Data Files Location

* Most data files for the exercises can be found on the [course site](https://www.datacamp.com/courses/intro-to-python-for-finance)
    * [Stocks data (I)](https://assets.datacamp.com/production/repositories/1715/datasets/2623c8037df0505d619c87a09131af9105e5883d/stock_data.csv)
    * [Stocks data (II)](https://assets.datacamp.com/production/repositories/1715/datasets/d96bf818f1f6f52af429edcaaf9dd96d37ab7b0a/stock_data2.csv)
    * [S&P 100 data](https://assets.datacamp.com/production/repositories/1715/datasets/0ef2a37a04b12d12368f060efd02b93cd110bd29/sector.txt)    
* Other data files may be found in my [DataCamp repository](https://github.com/trenton3983/DataCamp/tree/master/data)

### Data File Objects

In [None]:
stocks1 = 'data/intro_to_python_for_finance/stock_data.csv'
stocks2 = 'data/intro_to_python_for_finance/stock_data2.csv'
sector_100 = 'data/intro_to_python_for_finance/sector.txt'
exercises = 'data/intro_to_python_for_finance/exercise_data.csv'

# Intro to Python for Finance

***Course Description***

The financial industry is increasingly adopting Python for general-purpose programming and quantitative analysis, ranging from understanding trading dynamics to risk management systems. This course focuses specifically on introducing Python for financial analysis. Using practical examples, you will learn the fundamentals of Python data structures such as lists and arrays and learn powerful ways to store and manipulate financial data to identify trends.

## Welcome to Python

This chapter is an introduction to basics in Python, including how to name variables and various data types in Python.

### Welcome to Python for Finance!

### Comments and Variables

Using variables to evaluate stock trends

$\text{Price to earning ratio} = \frac{\text{Maket Price}}{\text{Earnings per share}}$

In [None]:
price = 200
earnings = 5

pe_ratio = price/earnings
pe_ratio

### Data Types

* integer
* float
* string
* boolean

***Check type with:***
```python
type(variable)
```

#### Booleans in Python

Booleans are used to represent True or False statements in Python. Boolean comparisons include:

| Operators |      Descriptions     |
|:---------:|:---------------------:|
|     >     | greater than          |
|     >=    | greater than or equal |
|     <     | less than             |
|     <=    | less than or equal    |
|     ==    | equal (compare)       |
|     !=    | does not equal        |

## Lists

This chapter introduces lists in Python and how they can be used to work with data.

### Lists

### Nested Lists

### List methods and functions

| Methods                                                     | Functions                        |
|-------------------------------------------------------------|----------------------------------|
| All methods are functions                                   | Not all functions are methods    |
| List methods are a subset of built in functions in Python   |                                  |
| Used on an object                                           | Requires an input of an object   |
| prices.sort()                                               | type(prices)                     |

* Functions take objects as inputs or are "passed" an object
* Methods act on objects

## Arrays in Python

This chapter introduces packages in Python, specifically the NumPy package and how it can be efficiently used to manipulate arrays.

### Arrays

***Why use an array for financial analysis?***

* Arrays can handle very large datasets efficiently
    * Computationally memory efficient
    * Faster calculations and analysis than lists
    * Diverse functionality (many functions in Python packages)
* All dtypes in a numpy array are the same
* Each element of a python list keeps its dtype

#### Array operations

In [None]:
# Arrays - element-wise sum

array_A = np.array([1, 2, 3])
array_B = np.array([4, 5, 6])

array_A + array_B

In [None]:
# Lists - list concatenation

list_A = [1, 2, 3]
list_B = [4, 5, 6]

list_A + list_B

### 2D arrays and functions

### Using arrays for analysis

## Visualization in Python

In this chapter, you will be introduced to the Matplotlib package for creating line plots, scatter plots, and histograms.

### Visualization in Python

#### Single Plot

In [None]:
df = pd.read_csv(stocks1)
df.head()

In [None]:
plt.plot(df.Day, df.Price, color='red', linestyle='--')

# Add x and y labels
plt.xlabel('Days')
plt.ylabel('Prices, $')

# Add plot title
plt.title('Company Stock Prices Over Time')

#### Multiple plots I

In [None]:
df = pd.read_csv(stocks2)
df.head()

In [None]:
# Plot two lines of varying colors 
plt.plot(df.day, df.company1, color='red')
plt.plot(df.day, df.company2, color='green')

# Add labels
plt.xlabel('Days')
plt.ylabel('Prices, $')
plt.title('Stock Prices Over Time')

#### Multiple plots II

In [None]:
df[['company1', 'company2']].plot()

#### Scatterplots

In [None]:
plt.scatter(df.day, df.company1, color='green', s=0.1)

### Histograms

* Tell the distribution of the data
* Uses in Finance
    * Economic Indicators
    * Stock Returns
    * Commodity Prices

#### Why histograms for financial analysis?
    
![alt text](https://raw.githubusercontent.com/trenton3983/DataCamp/master/Images/2019-02-07_intro_to_python_for_finance/intro_to_python_for_finance_histogram.JPG?raw=true "Histogram")

* Is you data skewed?
* Is you data centered around the average?
* Do you have any abnormal data points (outliers) in your data?

#### Histograms and matplotlib.pyplot

```python
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=3)
plt.show()
```

#### Normalizing histogram data

```python
import matplotlib.pyplot as plt
plt.hist(x=prices, bins=6, density=True)
plt.show()
```

* At times it's beneficial to know the relative frequency or the percentage of observations (rather than frequency counts)

#### Layering histograms on a plot

```python
plt.hist(x=prices, bins=6, density=True)
plt.hist(x=prices2, bins=6, density=True)
plt.show()
```

#### Alpha: Changing transparency of histograms

```python
plt.hist(x=prices, bins=6, density=True, alpha=0.5)
plt.hist(x=prices2, bins=6, density=True, alpha=0.5)
plt.show()
```

#### Adding a legend

```python
plt.hist(x=prices, bins=6, density=True, alpha=0.5, label='Prices 1')
plt.hist(x=prices2, bins=6, density=True, alpha=0.5, label='Prices New')
plt.legend()
plt.show()
```

### Exercises

#### Is data normally distributed?

In [None]:
plt.hist(df.company2, bins=100, ec='black')
plt.show()

#### Comparing two histograms

In [None]:
df_exercises = pd.read_csv(exercises)
df_exercises.head()

##### pd.DataFrame.hist

In [None]:
df_exercises.hist(bins=100, alpha=0.4, ec='black')
plt.show()

##### matplotlib.pyplt as plt

In [None]:
# Plot histogram of stocks_A
plt.hist(df_exercises.stock_A, bins=100, alpha=0.4, label='Stock A')

# Plot histogram of stocks_B 
plt.hist(df_exercises.stock_B, bins=100, alpha=0.4, label='Stock B')

# Add the legend
plt.legend()

# Display plot
plt.show()

## S&P 100 Case Study

In this chapter, you will get a chance to apply all the techniques you learned in the course on the S&P 100 data.

### Introducing the dataset

#### Overall Review

* Python shell and scripts
* Variables and data types
* Lists
* Arrays
* Methods and functions
* Indexing and subsetting
* Matplotlib

#### S&P 100 Companies

***Standard and Poor's S&P 100:***

* made up of major companies that span multiple industry groups
* used to measure stock performance of large companies

#### S&P 100 Sectors

![alt text](https://raw.githubusercontent.com/trenton3983/DataCamp/master/Images/2019-02-07_intro_to_python_for_finance/intro_to_python_for_finance_s_and_p_sectors_pie.JPG?raw=true?raw=true "Sectors")

#### The Data

* EPS: earning per share

In [None]:
df = pd.read_csv(sector_100)
df.head()

In [None]:
df.tail()

#### Price to Earnings Ratio

$\text{Price to earning ratio} = \frac{\text{Maket Price}}{\text{Earnings per share}}$

* The dollar amount one can expect to invest in a company in order to receive one dollar of the company's earnings
* The ratio for valuing a company that measures its current share price relative to the per-share earnings
* In general, higher P/E ratio idicates higher growth expectations

#### Case Study Objective I:

***Given***

* List of data describing the S&P 100: names, prices, earnigns, sectors

***Objective Part I***

* Explore and analyze the S&P 100 data, specifically the P/E ratios of S&P 100 companies

#### Methods

* Step 1: examine the lists
* Step 2: Convert lists to arrays
* Step 3: Elementwise array operations

### Project Explorations

#### Data

In [None]:
names = df.Name.values
prices = df.Price.values
earnings = df.EPS.values
sectors = df.Sector.values

In [None]:
type(names)

#### Lists

Stocks in the S&P 100 are selected to represent sector balance and market capitalization. To begin, let's take a look at what data we have associated with S&P companies.

Four ***lists***, ***names***, ***prices***, ***earnings***, and ***sectors***, are available in your workspace.

***Instructions***

* Print the first four items in ***names***.
* Print the name, price, earning, and sector associated with the last company in the lists.

In [None]:
# First four items of names
print(names[:4])

# Print information on last company
print(names[-1])
print(prices[-1])
print(earnings[-1])
print(sectors[-1])

#### Arrays and NumPy

NumPy is a scientific computing package in Python that helps you to work with arrays. Let's use array operations to calculate price to earning ratios of the S&P 100 stocks.

The S&P 100 data is available as the lists: ***prices*** (stock prices per share) and ***earnings*** (earnings per share).

***Instructions***

* Import the ***numpy*** as ***np***.
* Convert the ***prices*** and ***earnings*** lists to arrays, ***prices_array*** and ***earnings_array***, respectively.
* Calculate the price to earnings ratio as ***pe***.

```python
# Convert lists to arrays
prices_array = np.array(prices)
earnings_array = np.array(earnings)
```

In [None]:
# Calculate P/E ratio 
pe = prices/earnings
pe[:10]

### A closer look at sectors

#### Case Study Objective II:

***Given***

* Numpy arrays of data describing the S&P 100: names, prices, earnings, sectors

***Objective Part II***

* Explore and analyze sector-specific P/E ratios within companies of the S&P 100

#### Methods

* Step 1: Create a boolean filtering array
* Step 2: Apply filtering array to subset another array
* Step 3: Summarize P/E ratios
    * Calculate the average and standard deviation of these sector-specific P/E ratios

### Project Explorations

#### Filtering arrays

In this lesson, you will focus on two sectors:

* Information Technology
* Consumer Staples

***numpy*** is imported as ***np*** and S&P 100 data is stored as arrays: ***names***, ***sectors***, and ***pe*** (price to earnings ratio).

***Instructions 1/2***

* Create a boolean array to determine which elements in ***sectors*** are ***'Information Technology'***.
* Use the boolean array to subset ***names*** and ***pe*** in the Information Technology sector.

In [None]:
# Create boolean array 
boolean_array = (sectors == 'Information Technology')

# Subset sector-specific data
it_names = names[boolean_array]
it_pe = pe[boolean_array]

# Display sector names
print(it_names)
print(it_pe)

***Instructions 2/2***

* Create a boolean array to determine which elements in ***sectors*** are ***'Consumer Staples'***.
* Use the boolean array to subset ***names*** and ***pe*** in the Consumer Staples sector.

In [None]:
# Create boolean array 
boolean_array = (sectors == 'Consumer Staples')

# Subset sector-specific data
cs_names = names[boolean_array]
cs_pe = pe[boolean_array]

# Display sector names
print(cs_names)
print(cs_pe)

#### Summarizing sector data

In this exercise, you will calculate the mean and standard deviation of P/E ratios for Information Technology and Consumer Staples sectors. ***numpy*** is imported as ***np*** and the ***it_pe*** and ***cs_pe*** arrays from the previous exercise are available in your workspace.

***Instructions 1/2***

Calculate the mean and standard deviation of the P/E ratios (***it_pe***) for the Industrial Technology sector.



In [None]:
# Calculate mean and standard deviation
it_pe_mean = np.mean(it_pe)
it_pe_std = np.std(it_pe)

print(it_pe_mean)
print(it_pe_std)

***Instructions 2/2***

* Calculate the mean and standard deviation of the P/E ratios (***cs_pe***) for the Consumer Staples sector.

In [None]:
# Calculate mean and standard deviation
cs_pe_mean = np.mean(cs_pe)
cs_pe_std = np.std(cs_pe)

print(cs_pe_mean)
print(cs_pe_std)

#### Plot P/E ratios
Let's take a closer look at the P/E ratios using a scatter plot for each company in these two sectors.

The arrays ***it_pe*** and ***cs_pe*** from the previous exercise are available in your workspace. Also, each company name has been assigned a numeric ID contained in the arrays ***it_id*** and ***cs_id***.

***Instructions***

* Draw a scatter plot of ***it_pe*** ratios with red markers and ***'IT'*** label.
* On the same plot, add the ***cs_pe*** ratios with green markers and ***'CS'*** label.
* Add a legend to this plot.
* Display the plot.

In [None]:
it_id = np.arange(0, 15)
cs_id = np.arange(0, 12)

# Make a scatterplot
plt.scatter(it_id, it_pe, color='red', label='IT')
plt.scatter(cs_id, cs_pe, color='green', label='CS')

# Add legend
plt.legend()

# Add labels
plt.xlabel('Company ID')
plt.ylabel('P/E Ratio')
plt.show()

***Notice that there is one company in the IT sector with an unusually high P/E ratio***

### Visualizating trends

#### Case Study Objective III:

***Objective Part III***

* Investigate the outlier from the scatter plot

#### Methods

* Step 1: Make a histogram of the P/E ratios
* Step 2:
    * Identify the outlier P/E ratio
    * Create a boolean array filter to subset this company
    * Filter out this company information from the provided datasets

### Project Explorations

#### Histogram of P/E ratios

To visualize and understand the distribution of the P/E ratios in the IT sector, you can use a histogram.

The array ***it_pe*** from the previous exercise is available in your workspace.

***Instructions***

* Selectively import the ***pyplot*** module of ***matplotlib*** as ***plt***.
* Plot a histogram of ***it_pe*** with 8 bins.
* Add the x-label as ***'P/E ratio'*** and y-label as ***'Frequency'***.
* Display the plot.

In [None]:
# Plot histogram 
plt.hist(it_pe, bins=8, ec='black')

# Add x-label
plt.xlabel('P/E ratio')

# Add y-label
plt.ylabel('Frequency')

# Show plot
plt.show()

#### Identify the outlier

* Histograms can help you to identify outliers or abnormal data points. Which P/E ratio in this histogram is an example of an outlier?

***A stock with P/E ratio > 50.***

#### Name the outlier

You've identified that a company in the Industrial Technology sector has a P/E ratio of greater than 50. Let's identify this company.

***numpy*** is imported as ***np***, and arrays ***it_pe*** (P/E ratios of Industrial Technology companies) and ***it_names*** (names of Industrial Technology companies) are available in your workspace.

Instructions

* Identify the P/E ratio greater than 50 and assign it to ***outlier_price***.
* Identify the company with P/E ratio greater than 50 and assign it to ***outlier_name***.

In [None]:
# Identify P/E ratio within it_pe that is > 50
outlier_price = it_pe[it_pe > 50]

# Identify the company with PE ratio > 50
outlier_name = it_names[it_pe == outlier_price]

# Display results
print(f'In 2017 {outlier_name[0]} had an abnormally high P/E ratio of {round(outlier_price[0], 2)}.')

# Certificate

![](https://raw.githubusercontent.com/trenton3983/DataCamp/master/Images/2019-02-07_intro_to_python_for_finance/2019-02-07_intro_to_python_for_finance_certificate.JPG)