# __Portfolio optimization using Genetic Algorithm__

### Background : 
   
   **Portfolio optimization** is one of the most interesting fields of study of financial mathematics. Since the birth of Modern Portfolio Theory (MPT) by Harry Markowitz, many scientists have studied a lot of analytical and numerical methods to build the best investment portfolio according to a defined set of assets. The power of genetic algorithms makes it possible to find the optimal portfolio.
In dealing with this Optimization problem, Harry Markowitz 1959 developed a quantitative model, also called **mean-variance model**. The mean-variance model has been usually considered as either the minimization of an objective function representing the portfolio variance (risk) for a given level of return or the maximization of an objective function representing the portfolio return for a given level of risk.


### Problem Statement: 
   Let’s say we have selected N financial assets we want to invest in. They can be stock, funds, bonds, ETF etc. Each one of them has many historical returns, that is the price relative difference from one period to another. Periods can be days, weeks, months and so on. Build an investment portfolio with a  mix of many assets (They can be stock, funds, bonds, ETF) together allocating a fraction x of total  capital to each one of them. Each fraction is called weight.  The goal of portfolio optimization is to find the values of the weights that Maximize returns and minimize risk simultaneously of our portfolio under some constraints. 

#### Given Data:
Monthly Closing Stock values of HDFC, ITC, L&T, M&M, Sun Pharma and TCS from June 2015 to June 2018.

### Approach and Tasks:

1. Read the data and combine them into one dataframe.
2. Calculate the historical returns for 3 months, 6 months, 12 months, 24 months and 36 months for each of the stocks.
3. Define **Gene** (Scalar): A fraction of the total capital assigned to a stock.
4. Define **Chromosome** (1D Array): Set of genes i.e. fractions of total capital assigned to each stock.
        Check! Sum of each chromosome should be equal to 1.
5. Generate **Initial Population** (2D Array): A set of randomly generated chromosomes.
6. **Fitness function** (Define a Function): 
The **Sharpe ratio**, S, is a measure for quantifying the performance (Fitness) of the portfolio which works on "Maximisation of return (mean) and minimisation of risk (Variance) simultaneously" and is computed as
follows:
                
                S = (µ − r)/σ
    
    Here µ is the return of the portfolio over a specified period or Mean portfolio return, 
         r is the risk-free rate over the same period and 
         σ is the standard deviation of the returns over the specified period or Standard deviation of portfolio return.

      
    Mean portfolio return = Mean Return * Fractions of Total Capital (Chromosome).
    Risk-free rate = 0.0697 ( as per google)
    Standard deviation of portfolio return = (chromosome * Standard deviation)**2 + Covariance * Respective weights in chromosome.
    
7. Select **Elite Population** (Define a Function): It filters the elite chromosomes which have highest returns, which was calculated in fitness function.
    
8. **Mutation**: A function that will perform mutation in a chromosome. Randomly we shall choose 2 numbers between 0, 5 and those elements we shall swap.

9. Crossover: **Heuristic crossover** or **Blend Crossover** uses the ﬁtness values of two parent chromosomes to ascertain the direction of the search. It moves from worst parent to best parent. 
The oﬀspring are created according to the equation:
            Off_spring A = Best Parent  + β ∗ ( Best Parent − Worst Parent)
            Off_spring B = Worst Parent - β ∗ ( Best Parent − Worst Parent)
                Where β is a random number between 0 and 1.
This crossover type is good for real-valued genomes.

10. **Next Generation** (define a Function): A function which does mutation,mating or crossover based on a probability and builds a new generation of chromosomes.
    
11. **Iterate the process**: Iterate the whole process till their is no change in maximum returns or for fixed number of iterations. 

#### References:
1. https://www.researchgate.net/publication/286952225_A_heuristic_crossover_for_portfolio_selection
2. https://pdfs.semanticscholar.org/9888/061ea3326ff9b41c807ed21f0c10463b7879.pdf
3. https://www.math.kth.se/matstat/seminarier/reports/M-exjobb12/121008.pdf


#### Pre-requisite tasks:

In [1]:
import numpy as np
import pandas as pd
from functools import reduce

### Task #1:
#### Read the data and combine them into one dataframe.

In [2]:
files=['hdfc.csv','itc.csv','l&t.csv','m&m.csv','sunpha.csv','tcs.csv']
dfs=[]

for file in files:
    temp=pd.read_csv(file)
    temp.columns=['Date',file.replace('.csv','')]
    dfs.append(temp)

stocks = reduce(lambda left,right: pd.merge(left,right,on='Date'), dfs)
print(stocks.shape)
stocks.head()

(37, 7)


Unnamed: 0,Date,hdfc,itc,l&t,m&m,sunpha,tcs
0,June 2018,2108.05,266.05,1271.3,896.8,560.55,1847.2
1,May 2018,2136.15,271.6,1367.6,923.5,480.15,1744.8
2,Apr 2018,1944.6,281.45,1400.6,872.65,528.15,1765.7
3,Mar 2018,1891.45,255.9,1311.9,740.2,495.4,1424.65
4,Feb 2018,1883.8,265.1,1319.1,728.75,535.35,1519.13


### Task #2:
#### Calculate the historical returns for 3 months, 6 months, 12 months, 24 months and 36 months for each of the stock.

**Stock Return**:
The formula for the total stock return is the appreciation in the price plus any dividends paid, divided by the original price of the stock.


In [3]:
def hist_return(months):
    ''' It calculates Stock returns for various months and returns a dataframe.
        Input: Months in the form of a list.
        Output: Historical returns in the form of a DataFrame. '''
    idx=[]
    df=pd.DataFrame()
    for mon in months:
        temp=(stocks.iloc[0,1:] - stocks.iloc[mon,1:])/(stocks.iloc[mon,1:])
        idx.append(str(mon)+'_mon_return')
        df=pd.concat([df, temp.to_frame().T], ignore_index=True)
    df.index=idx
    return df    

In [4]:
hist_stock_returns=hist_return([3,6,12,24,36])
hist_stock_returns

Unnamed: 0,hdfc,itc,l&t,m&m,sunpha,tcs
3_mon_return,0.114515,0.0396639,-0.0309475,0.211564,0.13151,0.296599
6_mon_return,0.125163,0.0112125,0.0114165,0.194062,-0.0179573,0.368094
12_mon_return,0.275866,-0.178478,0.129783,0.330899,0.0109107,0.562537
24_mon_return,0.792712,0.0842367,0.274461,0.255179,-0.265911,0.44833
36_mon_return,0.974847,0.266844,0.0696137,0.399938,-0.358785,0.447535


### Task #3:
Define **Gene** (Scalar): A fraction of the total capital assigned to a stock. Lets address them as weights.

    Gene can be a fractional value between 0 to 1, such as 0.32 of HDFC or 0.21 of ITC or 0.56 of TCS.

In [5]:
gene = np.random.rand()
gene

0.47865763414205487

In [20]:
import time
def gen_mc_grid(rows, cols, n, N):  # , xfname): generate monte carlo wind farm layout grids
        np.random.seed(seed=int(time.time()))  # init random seed
        layouts = np.zeros((n, rows * cols), dtype=np.int32)  # one row is a layout
        # layouts_cr = np.zeros((n*, 2), dtype=np.float32)  # layouts column row index
        positionX = np.random.randint(0, cols, size=(N * n * 2))
        positionY = np.random.randint(0, rows, size=(N * n * 2))
        ind_rows = 0  # index of layouts from 0 to n-1
        ind_pos = 0  # index of positionX, positionY from 0 to N*n*2-1
        # ind_crs = 0
        while ind_rows < n:
            layouts[ind_rows, positionX[ind_pos] + positionY[ind_pos] * cols] = 1
            if np.sum(layouts[ind_rows, :]) == N:
                # for ind in range(rows * cols):
                #     if layouts[ind_rows, ind] == 1:
                #         r_i = np.floor(ind / cols)
                #         c_i = np.floor(ind - r_i * cols)
                #         layouts_cr[ind_crs, 0] = c_i
                #         layouts_cr[ind_crs, 1] = r_i
                #         ind_crs += 1
                ind_rows += 1
            ind_pos += 1
            if ind_pos >= N * n * 2:
                print("Not enough positions")
                break
        # filename = "positions{}by{}by{}N{}.dat".format(rows, cols, n, N)
#         np.savetxt(lofname, layouts, fmt='%d', delimiter="  ")
        # np.savetxt(xfname, layouts_cr, fmt='%d', delimiter="  ")
        return layouts

def gen_mc_grid_with_NA_loc(rows, cols, n, N,NA_loc):  # , xfname): generate monte carlo wind farm layout grids
        np.random.seed(seed=int(time.time()))  # init random seed
        layouts = np.zeros((n, rows * cols), dtype=np.int32)  # one row is a layout, NA loc is 0

        layouts_NA= np.zeros((n, rows * cols), dtype=np.int32)  # one row is a layout, NA loc is 2
        for i in NA_loc:
            layouts_NA[:,i-1]=2

        # layouts_cr = np.zeros((n*, 2), dtype=np.float32)  # layouts column row index
        positionX = np.random.randint(0, cols, size=(N * n * 2))
        positionY = np.random.randint(0, rows, size=(N * n * 2))
        ind_rows = 0  # index of layouts from 0 to n-1
        ind_pos = 0  # index of positionX, positionY from 0 to N*n*2-1
        # ind_crs = 0
        N_count=0
        while ind_rows < n:
            cur_state=layouts_NA[ind_rows, positionX[ind_pos] + positionY[ind_pos] * cols]
            if cur_state!=1 and cur_state!=2:
                layouts[ind_rows, positionX[ind_pos] + positionY[ind_pos] * cols]=1
                layouts_NA[ind_rows, positionX[ind_pos] + positionY[ind_pos] * cols] = 1
                N_count+=1
                if np.sum(layouts[ind_rows, :]) == N:
                    ind_rows += 1
                    N_count=0
            ind_pos += 1
            if ind_pos >= N * n * 2:
                print("Not enough positions")
                break
        # filename = "positions{}by{}by{}N{}.dat".format(rows, cols, n, N)
#         np.savetxt(lofname, layouts, fmt='%d', delimiter="  ")
#         np.savetxt(loNAfname, layouts_NA, fmt='%d', delimiter="  ")
        # np.savetxt(xfname, layouts_cr, fmt='%d', delimiter="  ")
        return layouts,layouts_NA

In [24]:
gen_mc_grid(5, 5, 100, 50)
gen_mc_grid_with_NA_loc(5, 5, 100, 50,range(10))

Not enough positions
Not enough positions


(array([[0, 0, 0, ..., 1, 1, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]),
 array([[2, 2, 2, ..., 1, 1, 2],
        [2, 2, 2, ..., 0, 0, 2],
        [2, 2, 2, ..., 0, 0, 2],
        ...,
        [2, 2, 2, ..., 0, 0, 2],
        [2, 2, 2, ..., 0, 0, 2],
        [2, 2, 2, ..., 0, 0, 2]]))

### Tash #4:
Define **Chromosome** (1D Array): Set of genes i.e. fractions of total capital assigned to each stock. Set of weights.

Its a 1d Array of the fractional values of all the stocks such that sum of the array will not be over 1. 
As we have 6 company stocks, we shall generate 6 fractional values (genes) which constitues 1 chromosome.
    
**Why sum should be equal to 1?** As these are fraction of the total capital, we are assuming total capital to be 1 unit.
    
**How to make sure sum =1?** Just generate 6 random numbers and then calculate a factor which is 1 / [sum of random numbers]. Finally multiply each of the random numbers with that factor. The sum will be 1.

In [6]:
def chromosome(n):
    ''' Generates set of random numbers whose sum is equal to 1
        Input: Number of stocks.
        Output: Array of random numbers'''
    ch = np.random.rand(n)
    return ch/sum(ch)

In [7]:
child=chromosome(6)
print(child,sum(child))

[0.18847662 0.13161772 0.1005894  0.23775264 0.24674973 0.09481388] 0.9999999999999999


### Task #5:

Generate **Initial Population** (2D Array): A set of randomly generated chromosomes

In [9]:
n=6 # Number of stocks = 6
pop_size=100 # initial population = 100

population = np.array([chromosome(n) for _ in range(pop_size)])
print(population.shape)
print(population)

(100, 6)
[[0.14554317 0.05323605 0.04853585 0.31264573 0.41718254 0.02285666]
 [0.2875455  0.15549612 0.18128256 0.35049057 0.01685638 0.00832886]
 [0.18358573 0.0311423  0.24115086 0.25331826 0.14778547 0.14301738]
 [0.19170941 0.3109625  0.23224438 0.16223576 0.02116156 0.08168639]
 [0.09132046 0.17587005 0.17840903 0.22931073 0.19778628 0.12730344]
 [0.27210556 0.02034836 0.22920557 0.26601167 0.16859344 0.04373541]
 [0.19158213 0.19363595 0.0913293  0.21157745 0.20955971 0.10231545]
 [0.09823584 0.18393095 0.1716758  0.19055137 0.09588278 0.25972326]
 [0.05482614 0.30728054 0.11694934 0.03161965 0.3810014  0.10832293]
 [0.07712078 0.08899992 0.27130371 0.09590809 0.1524314  0.3142361 ]
 [0.26877779 0.2073808  0.00558867 0.08837378 0.28741677 0.14246218]
 [0.23455029 0.21513175 0.23756531 0.02242049 0.06430355 0.2260286 ]
 [0.00477431 0.08938275 0.02992419 0.34187952 0.25474779 0.27929145]
 [0.13604273 0.1303769  0.08181577 0.15101997 0.36454976 0.13619487]
 [0.13738522 0.09126552 0

### Task #6:

**Fitness function** (Define a Function): 
The Sharpe ratio, S, is a measure for quantifying the performance (Fitness) of the portfolio and is computed as
follows:
                
                S = (µ − r)/σ
    
    Here µ is the return of the portfolio over a specified period or Mean portfolio return, 
         r is the risk-free rate over the same period and 
         σ is the standard deviation of the returns over the specified period or Standard deviation of portfolio return.

      
Mean portfolio return = Mean Return * Fractions of Total Capital (Chromosome).

Risk-free rate = 0.0697 ( as per google)

Standard deviation of portfolio return = (chromosome * Standard deviation)**2 + Covariance * Respective weights in chromosome.

 #### Fitness function Sub Task 1:
 Calculate Mean, Standard deviation and covariance of the Historical stock returns.

In [10]:
# Convert to numeric columns from Object datatypes.
print(hist_stock_returns.info())
cols=hist_stock_returns.columns
hist_stock_returns[cols] = hist_stock_returns[cols].apply(pd.to_numeric, errors='coerce')
print(hist_stock_returns.info())

<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 3_mon_return to 36_mon_return
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   hdfc    5 non-null      object
 1   itc     5 non-null      object
 2   l&t     5 non-null      object
 3   m&m     5 non-null      object
 4   sunpha  5 non-null      object
 5   tcs     5 non-null      object
dtypes: object(6)
memory usage: 280.0+ bytes
None
<class 'pandas.core.frame.DataFrame'>
Index: 5 entries, 3_mon_return to 36_mon_return
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   hdfc    5 non-null      float64
 1   itc     5 non-null      float64
 2   l&t     5 non-null      float64
 3   m&m     5 non-null      float64
 4   sunpha  5 non-null      float64
 5   tcs     5 non-null      float64
dtypes: float64(6)
memory usage: 280.0+ bytes
None


#### Calculate covariance of historical returns

In [11]:
cov_hist_return=hist_stock_returns.cov()

print(cov_hist_return)

# For ease of calculations make covariance of same variable as zero.
for i in range(6):
    cov_hist_return.iloc[i][i]=0
    
cov_hist_return

            hdfc       itc       l&t       m&m    sunpha       tcs
hdfc    0.160272  0.045393  0.027916  0.024127 -0.079078  0.014362
itc     0.045393  0.025467 -0.000718  0.004381 -0.023178 -0.005554
l&t     0.027916 -0.000718  0.014206  0.002510 -0.013841  0.007330
m&m     0.024127  0.004381  0.002510  0.007412 -0.011042  0.005700
sunpha -0.079078 -0.023178 -0.013841 -0.011042  0.041781 -0.007211
tcs     0.014362 -0.005554  0.007330  0.005700 -0.007211  0.009923


Unnamed: 0,hdfc,itc,l&t,m&m,sunpha,tcs
hdfc,0.0,0.045393,0.027916,0.024127,-0.079078,0.014362
itc,0.045393,0.0,-0.000718,0.004381,-0.023178,-0.005554
l&t,0.027916,-0.000718,0.0,0.00251,-0.013841,0.00733
m&m,0.024127,0.004381,0.00251,0.0,-0.011042,0.0057
sunpha,-0.079078,-0.023178,-0.013841,-0.011042,0.0,-0.007211
tcs,0.014362,-0.005554,0.00733,0.0057,-0.007211,0.0


#### Calculate the mean of historical returns

In [12]:
mean_hist_return=hist_stock_returns.mean()
mean_hist_return

hdfc      0.456621
itc       0.044696
l&t       0.090865
m&m       0.278328
sunpha   -0.100047
tcs       0.424619
dtype: float64

#### Calculate Standard deviation of historical returns:

In [13]:
sd_hist_return=hist_stock_returns.std()
sd_hist_return

hdfc      0.400340
itc       0.159583
l&t       0.119189
m&m       0.086091
sunpha    0.204405
tcs       0.099615
dtype: float64

#### Fitness function Sub Task 2:
 Calculate Expected portfolio return and portfolio variance.

#### Calculate Expected returns of portfolio.

In [16]:
def mean_portfolio_return(child):
    return np.sum(np.multiply(child,mean_hist_return))

In [17]:
mean_portfolio_return(population[0])

0.1282335133420478

#### Calculate portfolio variance.

In [18]:
def var_portfolio_return(child):
    part_1 = np.sum(np.multiply(child,sd_hist_return)**2)
    temp_lst=[]
    for i in range(6):
        for j in range(6):
            temp=cov_hist_return.iloc[i][j] * child[i] * child[j]
            temp_lst.append(temp)
    part_2=np.sum(temp_lst)
    return part_1+part_2

In [19]:
var_portfolio_return(population[0])

0.000982608701567269

#### Risk free factor.

In [20]:
rf= 0.0697

#### Fitness Function of a portfolio.

In [21]:
def fitness_fuction(child):
    ''' This will return the Sharpe ratio for a particular portfolio.
        Input: A child/chromosome (1D Array)
        Output: Sharpe Ratio value (Scalar)'''
    return (mean_portfolio_return(child)-rf)/np.sqrt(var_portfolio_return(child))

In [22]:
fitness_fuction(population[7])

1.9787880234723128

### Task #7:
Select **Elite Population** (Define a Function): It filters the elite chromosomes which have highest returns, which were calculated in fitness function.

In [23]:
def Select_elite_population(population, frac=0.3):
    ''' Select elite population from the total population based on fitness function values.
        Input: Population and fraction of population to be considered as elite.
        Output: Elite population.'''
    population = sorted(population,key = lambda x: fitness_fuction(x),reverse=True)
    percentage_elite_idx = int(np.floor(len(population)* frac))
    return population[:percentage_elite_idx]

In [26]:
print(len(Select_elite_population(population, frac=0.3)))
Select_elite_population(population, frac=0.3)

30


[array([0.0300546 , 0.19109883, 0.09594844, 0.07925928, 0.27000814,
        0.33363071]),
 array([0.11913629, 0.13621485, 0.02920441, 0.06233012, 0.31866454,
        0.33444978]),
 array([0.06122813, 0.10357033, 0.05950721, 0.26943875, 0.25735895,
        0.24889662]),
 array([0.0800426 , 0.09409237, 0.09689896, 0.10340996, 0.30375507,
        0.32180104]),
 array([0.00477431, 0.08938275, 0.02992419, 0.34187952, 0.25474779,
        0.27929145]),
 array([0.02969185, 0.33336563, 0.01835348, 0.0914634 , 0.21536053,
        0.3117651 ]),
 array([0.05663205, 0.23783276, 0.04213542, 0.06909813, 0.10807197,
        0.48622967]),
 array([0.09697377, 0.04016734, 0.01207257, 0.40841179, 0.26139648,
        0.18097805]),
 array([0.01303018, 0.25337432, 0.20574804, 0.06360371, 0.25561332,
        0.20863042]),
 array([0.10712039, 0.01972178, 0.13329135, 0.2460833 , 0.27250082,
        0.22128235]),
 array([0.05521395, 0.13231446, 0.21162305, 0.12171155, 0.23815877,
        0.24097822]),
 array([0.

In [27]:
[fitness_fuction(x) for x in population][:3]

[1.867300827082432, 1.1254795377974085, 1.6745622339147634]

### Task #8:
**Mutation**: A function that will perform mutation in a chromosome. 
            
    Randomly choose 2 numbers between [0, 5] and those elements should be swapped.


In [28]:
def mutation(parent):
    ''' Randomy choosen elements of a chromosome are swapped
        Input: Parent
        Output: Offspring (1D Array)'''
    child=parent.copy()
    n=np.random.choice(range(6),2)
    while (n[0]==n[1]):
        n=np.random.choice(range(6),2)
    child[n[0]],child[n[1]]=child[n[1]],child[n[0]]
    return child

In [29]:
mutation(population[1]),population[1]

(array([0.2875455 , 0.15549612, 0.18128256, 0.35049057, 0.00832886,
        0.01685638]),
 array([0.2875455 , 0.15549612, 0.18128256, 0.35049057, 0.01685638,
        0.00832886]))

### Task #9:
Crossover: **Heuristic crossover** or **Blend Crossover** uses the ﬁtness values of two parent chromosomes to ascertain the direction of the search. It moves from worst parent to best parent. 

The oﬀspring are created according to the equation:
            
            Off_spring A = Best Parent  + β ∗ ( Best Parent − Worst Parent)
            Off_spring B = Worst Parent - β ∗ ( Best Parent − Worst Parent)
                Where β is a random number between 0 and 1.
This crossover type is good for real-valued genomes.

In [37]:
def Heuristic_crossover(parent1,parent2):
    ''' The oﬀsprings are created according to the equation:
            Off_spring A = Best Parent  + β ∗ ( Best Parent − Worst Parent)
            Off_spring B = Worst Parent - β ∗ ( Best Parent − Worst Parent)
                Where β is a random number between 0 and 1.
        Input: 2 Parents
        Output: 2 Children (1d Array)'''
    ff1=fitness_fuction(parent1)
    ff2=fitness_fuction(parent2)
    diff=parent1 - parent2
    beta=np.random.rand()
    if ff1>ff2:
        child1=parent1 + beta * diff
        child2=parent2 - beta * diff
    else:
        child2=parent1 + beta * diff
        child1=parent2 - beta * diff
    return child1,child2

In [39]:
for i in population[:30]:
    for j in population[:30]:
        print(Arithmetic_crossover(i,j))

(array([0.14554317, 0.05323605, 0.04853585, 0.31264573, 0.41718254,
       0.02285666]), array([0.14554317, 0.05323605, 0.04853585, 0.31264573, 0.41718254,
       0.02285666]))
(array([0.2732508 , 0.14520209, 0.16791958, 0.34668091, 0.05715532,
       0.0097913 ]), array([0.15983787, 0.06353009, 0.06189883, 0.31645539, 0.3768836 ,
       0.02139421]))
(array([0.16332233, 0.04291056, 0.1385543 , 0.28491909, 0.29128007,
       0.07901366]), array([0.16580657, 0.0414678 , 0.15113241, 0.2810449 , 0.27368794,
       0.08686038]))
(array([0.1608517 , 0.13869701, 0.10945279, 0.26277045, 0.28586373,
       0.04236433]), array([0.17640089, 0.22550155, 0.17132744, 0.21211104, 0.15248037,
       0.06217871]))
(array([0.10450475, 0.14605152, 0.14683029, 0.24957369, 0.25113262,
       0.10190714]), array([0.13235889, 0.08305459, 0.08011459, 0.29238277, 0.3638362 ,
       0.04825295]))
(array([0.18583215, 0.04276682, 0.10604898, 0.29780057, 0.33804843,
       0.02950305]), array([0.23181658, 0.03081

       0.12681969]))
(array([0.09148333, 0.28468458, 0.1824891 , 0.11408835, 0.1362178 ,
       0.19103684]), array([0.14488928, 0.28101811, 0.14976204, 0.09858455, 0.09543741,
       0.23030862]))
(array([0.03877933, 0.27333584, 0.22358071, 0.137806  , 0.19521598,
       0.13128215]), array([0.18499974, 0.13981247, 0.20653741, 0.16480739, 0.23829055,
       0.06555244]))
(array([0.08481571, 0.28002579, 0.08261057, 0.21656473, 0.1157254 ,
       0.2202578 ]), array([0.05482478, 0.28449147, 0.14989165, 0.17797886, 0.15066032,
       0.18215293]))
(array([0.13826581, 0.26469663, 0.08155545, 0.15111358, 0.14269726,
       0.22167127]), array([0.05706279, 0.28189407, 0.18138231, 0.13962668, 0.17546542,
       0.16456872]))
(array([0.02394257, 0.28791124, 0.22565264, 0.13485813, 0.1879567 ,
       0.13967872]), array([0.28251825, 0.14380606, 0.22620746, 0.16396358, 0.00531784,
       0.17818681]))
(array([0.14066515, 0.04698936, 0.1925467 , 0.35374558, 0.25947263,
       0.00658057]), array

In [36]:
def Arithmetic_crossover(parent1,parent2):
    ''' The oﬀsprings are created according to the equation:
            Off spring A = α ∗ Parent1 + (1 −α) ∗ Parent2
            Off spring B = (1 −α) ∗ Parent1 + α ∗ Parent2
            
                Where α is a random number between 0 and 1.
        Input: 2 Parents
        Output: 2 Children (1d Array)'''
    alpha = np.random.rand()
    child1 = alpha * parent1 + (1-alpha) * parent2
    child2 = (1-alpha) * parent1 + alpha * parent2
    return child1,child2

In [32]:
Arithmetic_crossover(population[2],population[3])

(array([0.13139653, 0.11284077, 0.17259212, 0.24228214, 0.2591194 ,
        0.08176905]),
 array([0.13211776, 0.11807491, 0.17223741, 0.21086049, 0.25753075,
        0.10917868]))

### Task#10:
**Next Generation**: A function which does mutation,mating or crossover based on a probability and builds a new generation of chromosomes.

In [27]:
def next_generation(pop_size,elite,crossover=Heuristic_crossover):
    ''' Generates new population from elite population with mutation probability as 0.4 and crossover as 0.6. 
        Over the final stages, mutation probability is decreased to 0.1.
        Input: Population Size and elite population.
        Output: Next generation population (2D Array).'''
    new_population=[]
    elite_range=range(len(elite))
#     print(elite_range)
    while len(new_population) < pop_size:
        if len(new_population) > 2*pop_size/3: # In the final stages mutation frequency is decreased.
            mutate_or_crossover = np.random.choice([0, 1], p=[0.9, 0.1])
        else:
            mutate_or_crossover = np.random.choice([0, 1], p=[0.4, 0.6])
#         print(mutate_or_crossover)
        if mutate_or_crossover:
            indx=np.random.choice(elite_range)
            new_population.append(mutation(elite[indx]))
        else:
            p1_idx,p2_idx=np.random.choice(elite_range,2)
            c1,c2=crossover(elite[p1_idx],elite[p2_idx])
            chk=0
            for gene in range(6):
                if c1[gene]<0:
                    chk+=1
                else:
                    chk+=0
            if sum(chk)>0:
                p1_idx,p2_idx=np.random.choice(elite_range,2)
                c1,c2=crossover(elite[p1_idx],elite[p2_idx])
            new_population.extend([c1,c2])
    return new_population

In [37]:
elite=Select_elite_population(population)
next_generation(100,elite)[:3]

[array([ 0.09437892, -0.01370747,  0.22206274,  0.25272078,  0.07117298,
         0.37337204]),
 array([0.073464  , 0.29611467, 0.18044526, 0.04717225, 0.15122987,
        0.25157395]),
 array([0.02673085, 0.10270371, 0.18426066, 0.26773308, 0.14581252,
        0.27275918])]

In [38]:
elite=Select_elite_population(population)
next_generation(100,elite,Arithmetic_crossover)[:3]

[array([0.08867454, 0.04535518, 0.23082464, 0.20133852, 0.0940633 ,
        0.33974382]),
 array([0.08696143, 0.03944916, 0.2521492 , 0.19025681, 0.10191857,
        0.32926483]),
 array([0.07197337, 0.26818012, 0.15645521, 0.11225894, 0.16966041,
        0.22147195])]

### Task #11:
**Iterate the process**: Iterate the whole process till their is no change in maximum returns/min risk or for fixed number of iterations. 

#### With Heuristic_crossover:

In [528]:
n=6 # Number of stocks = 6
pop_size=100 # initial population = 100

# Initial population
population = np.array([chromosome(n) for _ in range(pop_size)])

# Get initial elite population
elite = Select_elite_population(population)

iteration=0 
Expected_returns=0
Expected_risk=1

while (Expected_returns < 0.30 and Expected_risk > 0.0005) or iteration <= 40:
    print('Iteration:',iteration)
    population = next_generation(100,elite)
    elite = Select_elite_population(population)
    Expected_returns=mean_portfolio_return(elite[0])
    Expected_risk=var_portfolio_return(elite[0])
    print('Expected returns of {} with risk of {}\n'.format(Expected_returns,Expected_risk))
    iteration+=1


print('Portfolio of stocks after all the iterations:\n')
[print(hist_stock_returns.columns[i],':',elite[0][i]) for i in list(range(6))]

Iteration: 0
Expected returns of 0.19791260054505444 with risk of 0.0005599724194694641

Iteration: 1
Expected returns of 0.14775559319668513 with risk of 0.00011502210065133991

Iteration: 2
Expected returns of 0.18044974977915618 with risk of 0.0001792561337124951

Iteration: 3
Expected returns of 0.2469869391434918 with risk of 0.0004702512459861894

Iteration: 4
Expected returns of 0.2484463536957734 with risk of 0.0004769430337559274

Iteration: 5
Expected returns of 0.22970948049369214 with risk of 0.000449272216911158

Iteration: 6
Expected returns of 0.21407127171197168 with risk of 0.000335974853370535

Iteration: 7
Expected returns of 0.1586974101345943 with risk of 5.695056486200965e-05

Iteration: 8
Expected returns of 0.23851853089246866 with risk of 0.000497856800164051

Iteration: 9
Expected returns of 0.22633872214618578 with risk of 0.0003618425140521006

Iteration: 10
Expected returns of 0.2260552603649895 with risk of 0.00034635825024487144

Iteration: 11
Expected re

[None, None, None, None, None, None]

### Weights and their respective returns:

In [531]:
print('Portfolio of stocks after all the iterations:\n')
[print(hist_stock_returns.columns[i],':',elite[0][i]) for i in list(range(6))]

print('\nExpected returns of {} with risk of {}\n'.format(Expected_returns,Expected_risk))

Portfolio of stocks after all the iterations:

hdfc : -0.12831416862824624
itc : 0.5004863932832381
l&t : 0.005806138626803911
m&m : -0.2799114848675405
sunpha : 0.07911835804614265
tcs : 0.822814763539606

Expected returns of 0.22960077055291656 with risk of 0.00035861319099093555



Although heuristic method works well, it gives negative returns which is not expected.

In [537]:
fitness_fuction(elite[5])

8.383262199916318

## BEST Approach using Arithmetic_crossover:

In [40]:
n=6 # Number of stocks = 6
pop_size=100 # initial population = 100

# Initial population
population = np.array([chromosome(n) for _ in range(pop_size)])

# Get initial elite population
elite = Select_elite_population(population)

iteration=0 
Expected_returns=0
Expected_risk=1

while (Expected_returns < 0.30 and Expected_risk > 0.0005) or iteration <= 40:
    print('Iteration:',iteration)
    population = next_generation(100,elite,Arithmetic_crossover)
    elite = Select_elite_population(population)
    Expected_returns=mean_portfolio_return(elite[0])
    Expected_risk=var_portfolio_return(elite[0])
    print('Expected returns of {} with risk of {}\n'.format(Expected_returns,Expected_risk))
    iteration+=1


print('Portfolio of stocks after all the iterations:\n')
[print(hist_stock_returns.columns[i],':',elite[0][i]) for i in list(range(6))]

Iteration: 0
Expected returns of 0.1347100807182474 with risk of 0.0003289719237192859

Iteration: 1
Expected returns of 0.1385422319534627 with risk of 0.0002846355089227608

Iteration: 2
Expected returns of 0.14961316235397137 with risk of 0.00032095904516944804

Iteration: 3
Expected returns of 0.14910269061417047 with risk of 0.00030940879721631526

Iteration: 4
Expected returns of 0.14832159577473447 with risk of 0.0003042478278929327

Iteration: 5
Expected returns of 0.14956658063244427 with risk of 0.0003139437692639777

Iteration: 6
Expected returns of 0.14575560446725683 with risk of 0.0002726046486622828

Iteration: 7
Expected returns of 0.1476922703328571 with risk of 0.0002848789044769184

Iteration: 8
Expected returns of 0.14656215432044567 with risk of 0.000274220685405638

Iteration: 9
Expected returns of 0.14424628731711825 with risk of 0.00025692507378476115

Iteration: 10
Expected returns of 0.14388902685297023 with risk of 0.00025453437522572446

Iteration: 11
Expect

[None, None, None, None, None, None]

In [41]:
print('Portfolio of stocks after all the iterations:\n')
[print(hist_stock_returns.columns[i],':',elite[0][i]) for i in list(range(6))]

print('\nExpected returns of {} with risk of {}\n'.format(Expected_returns,Expected_risk))

Portfolio of stocks after all the iterations:

hdfc : 0.03338912374928946
itc : 0.25892793802773834
l&t : 0.08571133727686643
m&m : 0.06183100580231876
sunpha : 0.27661979411361814
tcs : 0.28352080103016897

Expected returns of 0.14453015567745017 with risk of 0.0002592181124772639

