### Calculating Financial Statistics

The rate of return is a measure of the amount of money gained or lost in an investment. A positive return signifies a profit and a negative return indicates a loss. The risk of an investment is defined as the likelihood of suffering a financial loss.


In [1]:
rate_of_return = 0.075

def display_as_percentage(val):
    return str(round(val*100, 1))+'%'
print(display_as_percentage(rate_of_return))

7.5%


#### Simple return rate
The most basic type of return is the simple rate of return. It is defined as the difference between the starting and ending price of an investment over a time period, divided by the starting price. If an investment returns dividends, those dividends should be added to the numerator.

R=(E−S+D)/S

    R: simple rate of return
    S: starting price of investment
    E: ending price of investment
    D: dividend


In [2]:
def calculate_simple_return(start_price, end_price, dividend=0):
    return (end_price - start_price + dividend)/start_price
simple_return = calculate_simple_return(200,250, 20)
print(simple_return)

0.35


#### Logaritmic return rate
Another type of return is the logarithmic rate of return, also known as the continuously compounded return. This is the expected return for an investment where the earnings are assumed to be continually reinvested over the time period. It is calculated by taking the difference between the log of the ending price and the log of the starting price.

r=log(E)−log(S)=log(S/E)

    r: logarithmic rate of return
    S: starting price of investment
    E: ending price of investment


In [3]:
from math import log
def calculate_log_return(start_price, end_price):
    return log(end_price)-log(start_price)
log_return = calculate_log_return(200, 250)
print(log_return)

0.2231435513142097


#### Annualizing
An investment with a 2% rate of return over one day is surely not the same as an investment with a 2% rate of return over one month. Thus, it is common to convert returns to a standard time period. Often, this means converting to the annual rate of return in a process called annualizing.

r=r0∗t

    r: converted log rate of return
    r0: original log rate of return
    t: the number of original time periods in the new time period


In [4]:
def annualize_return(log_return, t):
    return log_return*t

daily_return_a = 0.001
monthly_return_b = 0.022

annual_return_a = annualize_return(daily_return_a, 252)
annual_return_b = annualize_return(monthly_return_b, 12)
print(display_as_percentage(annual_return_a))
print(display_as_percentage(annual_return_b))

25.2%
26.4%


Now, let’s look at an extension of the previous conversion formula. Suppose we know the log rate of return for 5 days of a given year. Which daily log return would we use to calculate the annual return?

In this case, we can first take the average of the 5 daily log returns, then multiple by 252, the number of trading days in a year

In [5]:
import numpy as np

def convert_returns(log_returns, t):
    return np.mean(log_returns)*t

daily_returns = [0.002, -0.002, 0.003, 0.002, -0.001]

annual_return = convert_returns(daily_returns, 252)
print(display_as_percentage(annual_return))

#if you know all you can just sum up
weekly_return = sum(daily_returns)
print(display_as_percentage(weekly_return))

20.2%
0.4%


#### Variance

Now that we have a good understanding of rate of return, let’s shift our focus to assessing the risk involved in an investment. One of the key statistics for understanding risk is variance. Variance is a measure of the spread of a dataset, or how far apart each value is from the mean. The greater the variance, the more spread out or variable the data is.


In [6]:
# historical annual stock returns for the Walt Disney Company (DIS) and CBS Corporation (CBS)
returns_disney = [0.22, 0.12, 0.01, 0.05, 0.04]
returns_cbs = [-0.13, -0.15, 0.31, -0.06, -0.29]

# variance_disney = np.var(returns_disney)
# variance_cbs = np.var(returns_cbs)

def calculate_variance(dataset):
    mean = np.mean(dataset)
    numerator = sum([(i-mean)**2 for i in dataset])
    variance = numerator/len(dataset)
    return variance

variance_disney = calculate_variance(returns_disney)
variance_cbs = calculate_variance(returns_cbs)
print('The variance of Disney stock returns is', round(variance_disney, 3))
print('The variance of CBS stock returns is', round(variance_cbs, 3))

The variance of Disney stock returns is 0.006
The variance of CBS stock returns is 0.041


#### Standard Deviation

Although the variance is useful in determining the relative risk of an investment, it is sometimes not the easiest statistic to interpret since it does not have the same unit as the original data. As an alternative, it is common to use the standard deviation to describe the spread of the dataset. 

In [7]:
from math import sqrt

def calculate_stddev(dataset):
    variance = calculate_variance(dataset)
    stddev = sqrt(variance)
    return stddev

stddev_disney = calculate_stddev(returns_disney)
stddev_cbs = calculate_stddev(returns_cbs)
print('The standard deviation of Disney stock returns is', (stddev_disney))
print('The standard deviation of CBS stock returns is', (stddev_cbs))

The standard deviation of Disney stock returns is 0.07520638270785267
The standard deviation of CBS stock returns is 0.2013554071784515


#### Correlation 

Another important statistic for assessing risk is the correlation between the returns of two assets. Correlation is a measure of how closely two datasets are associated with each other. It is often represented by the correlation coefficient, which is a value that ranges between -1 and 1. This indicates whether there is a positive correlation, negative correlation, or no correlation:

    Positive correlation – when the rate of return of one asset deviates upward from its mean, the other usually deviates upward as well.

    Negative correlation – when the rate of return of one asset deviates upward from its mean, the other usually deviates downward.

    No correlation – when a change in one asset’s rate of return does not dictate a change in another. The correlation coefficient will be close to 0.


In [8]:
def calculate_correlation(set_x, set_y):
    # Sum of all values in each dataset
    sum_x = sum(set_x)
    sum_y = sum(set_y)

    # Sum of all squared values in each dataset
    sum_x2 = sum([x ** 2 for x in set_x])

    sum_y2 =sum([y ** 2 for y in set_y])

    # Sum of the product of each respective element in each dataset 
    sum_xy = sum([x*y for x,y in zip(set_x, set_y)])

    # Length of dataset
    n = len(set_x)

    # Calculate correlation coefficient
    numerator = (n * sum_xy) - (sum_x * sum_y)
    denominator = sqrt((n * sum_x2 - sum_x ** 2) * (n * sum_y2 - sum_y ** 2))

    return numerator / denominator


In [9]:
returns_general_motors = [0.018, -0.005, -0.047, -0.009, -0.012, 0.003, -0.027, -0.014, 0.029, -0.062, 0.009]
returns_ford = [0.002, -0.004, -0.027, -0.022, -0.001, 0.002, -0.006, -0.017, 0.035, -0.029, 0.002]
returns_exxon_mobil = [0.008, 0.015, 0.009, 0.012, 0.003, -0.007, 0.006, 0.005, -0.048, 0.025, -0.012]
returns_apple = [-0.002, 0.007, -0.004, -0.004, 0.002, 0.013, -0.011, 0.017, -0.001, 0.012, 0.006]

corr_gm_ford = calculate_correlation(returns_general_motors, returns_ford)
print('The correlation coefficient between General Motors and Ford is', corr_gm_ford)

# Write code here
print('The correlation coefficient between General Motors and ExxonMobil is ', 
      calculate_correlation(returns_general_motors, returns_exxon_mobil))
print('The correlation coefficient between General Motors and Apple is ', 
      calculate_correlation(returns_general_motors, returns_apple))

corrcoef_matrix = np.corrcoef([returns_general_motors, returns_ford, returns_exxon_mobil, returns_apple])
print(corrcoef_matrix)

The correlation coefficient between General Motors and Ford is 0.8414599743167742
The correlation coefficient between General Motors and ExxonMobil is  -0.7032246241393197
The correlation coefficient between General Motors and Apple is  -0.05181389942186936
[[ 1.          0.84145997 -0.70322462 -0.0518139 ]
 [ 0.84145997  1.         -0.87407739 -0.1286648 ]
 [-0.70322462 -0.87407739  1.          0.09955855]
 [-0.0518139  -0.1286648   0.09955855  1.        ]]


#### Review

Congratulations on reaching the end!

calculate and understand the rate of return of an investment:

    Simple Rate of Return – advantageous for aggregating over assets
    Logarithmic Rate of Return – advantageous for aggregating over time

key financial statistics and what they signify in terms of the risk of an investment:

    Variance – measure of the spread of a dataset; an asset with low variance is less risky
    Standard Deviation – square root of the variance; easier to interpret than variance because it has the same unit as the original dataset
    Correlation – measure of the association between datasets; assets with no correlation have returns that are independent of each other
