# Bayes

In this file I will compute the probability that the next day will be positive.
To do this computation, I am using Bayes' formula:$P(A|B)=\frac{P(A\bigcap{B})}{P(B)}$.
$P(A\bigcap{B})$represents the possibility that the next day will be positive,
and B represents all the possibilities we have in the current state with the knowledge of the last sequence.
To gather information about these events, I used historical data of the stock and calculated the probability for each case.


In [1]:
import pandas as pd
import yfinance as yf
import datetime as datetime

In [2]:

def adjust(stock,t,start='2021-01-01'):
    """
    A function that retrieves the price of a stock throughout a specified period.

    Parameters:
        stock (str): The stock name or ticker symbol.
        t (str): The option to consider for the current price. 'N' ignores the current price and provides the price up until the current moment.
                  'Y' includes the current price.
        start (str): The start date of the period. Format: 'YYYY-MM-DD'. Default is '2021-01-01'.

    Returns:
        pandas.Series: A series containing the price changes of the stock over the specified period.
        """
    stock=pd.DataFrame(yf.download(stock,start=start)).reset_index()
    if t=='N': # Ignore the current price
        stock=stock[:-1]
    stock=stock['Close'].pct_change()[1:]*100 # change to percent
    return(stock.round(2))
adjust('ba',t='N').head()

[*********************100%***********************]  1 of 1 completed


1    4.40
2   -0.28
3    0.80
4   -1.32
5   -1.48
Name: Close, dtype: float64

In [3]:
def dict_of_days(neg,name_of_stock,t=0):
    """
    A function that computes specific stock sequences consisting of positive or negative days and determines
    how many times each series appeared. The function takes a stock name as input and returns a DataFrame
    along with the length of the last sequence. The DataFrame contains all possible sequences and the frequency
    of each series occurrence.

    Parameters:
        neg (bool): If True, computes negative sequences. If False, computes positive sequences.
        name_of_stock (str): The stock name or ticker symbol.
        t (int): The option to consider for the current price. Default is 0.

    Returns:
        pandas.DataFrame: A DataFrame containing all possible sequences and the frequency of each series occurrence.
        int: The length of the last sequence.
    """
    stock=adjust(name_of_stock,t=t)

    d_dict={}
    count=0
    if neg ==True: # if we want neg sequence
        for i in stock:
            if i<0 :
                count+=1
            else:
                try:
                    d_dict.update({count:d_dict[count]+1})
                except:
                    d_dict[count]=1

                count=0
    else:# 'if we want pos sequence '
        for i in stock:  # save each appearance in the dict
            if i>=0 : # check if the current day is positive
                count+=1
            else:
                try: # try to update the dict if the keys was stored before
                    d_dict.update({count:d_dict[count]+1})
                except: # if not create a new keys
                    d_dict[count]=1

                count=0  # restart the count

    data_frame_d=pd.DataFrame([list(d_dict.keys()),list((d_dict.values()))],index=['index of day','numbers of occurs']).T
    data_frame_d=data_frame_d.sort_values(by='index of day').reset_index(drop=True)
    data_frame_d=data_frame_d.iloc[1:].reset_index(drop=True)

    return (data_frame_d,count)


In [4]:

def cal_prob(name_of_stock,t,neg=False):
    """
    A function that computes the probability that the current day will be positive for any given sequence of days.
    The function takes a stock name as input and, based on the knowledge of the last sequence, utilizes Bayes's rule
    to determine the likelihood of a positive probability.

    Parameters:
        name_of_stock (str): The stock name or ticker symbol.
        t (int): The option to consider for the current price.
        neg (bool): If True, computes negative sequences. If False, computes positive sequences. Default is False.

    Returns:
        float: The probability of the current day being positive.
        int: The length of the last sequence.
        int: The total number of occurrences of the last sequence.
    """
    sum_d,c=dict_of_days(neg,name_of_stock,t) # get the seq
    numerator=sum_d[sum_d['index of day']>c]['numbers of occurs'].sum()  # take all the sum of days that I have more negative days than now
    denominator=sum_d[sum_d['index of day']==c]['numbers of occurs'].values[0]+numerator if sum_d['index of day'].isin([c]).any()  else numerator # t
    N=denominator
    if denominator!=0:
        prob = 1-(numerator/(denominator)) if neg ==True else numerator/denominator   # take the correct prob
    try:
        return((round(float(prob),3)),c,N)
    except:  # if we dont have a data we take 0.5
        return(0.5,c,N)


In [5]:

def calculate_Bayes(name_of_stock,t=0):
    """
    A function that takes the name of a stock as input and returns the following information:
    the last sequence of the stock's performance (whether it was positive or negative),
    the length of that sequence, and the probability that the next day will be positive.

    Parameters:
        name_of_stock (str): The stock name or ticker symbol.
        t (int): The option to consider for the current price. Default is 0.

    Returns:
        list: A list containing the stock name, the last sequence (P for positive or N for negative),
              the length of the sequence, the probability of the next day being positive.
    """
    today=datetime.date.today() # get the current date
    date_of_start=today-datetime.timedelta(days=30)
    stock=adjust(name_of_stock,start=date_of_start,t=t)
    'Check the current sequence if it was positive or negative. and go calculated for this the specific sequence the probability for positive day'
    seq='P' if list(stock)[-1]>0 else 'N'
    neg=False if list(stock)[-1]>=0 else True
    r=cal_prob(name_of_stock,t,neg=neg)

    return [name_of_stock.upper(),seq,r[1],r[0],r[2]]


In [8]:

def matrix_of_Bayes(t):
    """
    A function that loops over a list of stocks and returns a DataFrame containing the following information for each stock:
    the last sequence of the stock's performance (whether it was positive or negative), the length of that sequence,
    and the probability that the next day will be positive.

    Parameters:
        t (int): The option to consider for the current price.

    Returns:
        pandas.DataFrame: A DataFrame containing the information for each stock.
    """
    dicy_bayes = pd.DataFrame.from_dict([{'Name': [], 'Seq': [],'Length of seq':[],'Prob':[],'N':[]}])# creat a relevant data frame
    names_of_stocks = ['MSFT', 'AAPL', 'NVDA', 'AMZN', 'SPOT', 'DELL', 'ARKQ', 'BUG', 'WCLD', 'HACK', 'SBUX', 'RL', 'ANF',
       'M', 'UBER', 'LYFT', 'BA', 'UAL', 'IMAX', 'MCS', 'CNK','UAL','AMD']
    #names_of_stocks=['MSFT', 'AAPL', 'NVDA', 'AMZN', 'TSLA', 'SPOT','DELL', 'ARKQ', 'BUG', 'WCLD']

    for i in range(len(names_of_stocks)): # run over the different stocks
        dicy_bayes.loc[i]= calculate_Bayes(names_of_stocks[i],t=t) # update the dict with the relevant information
    dicy_bayes=dicy_bayes.sort_values(by='Prob',ascending=False).reset_index(drop=True) # sort the dict by the prob for a positive day
    return dicy_bayes
matrix_of_Bayes(t=0)


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%********

Unnamed: 0,Name,Seq,Length of seq,Prob,N
0,RL,P,4,0.75,20
1,BA,N,2,0.556,81
2,SBUX,N,1,0.554,166
3,DELL,N,1,0.539,165
4,HACK,N,1,0.528,161
5,UBER,N,1,0.506,162
6,AMZN,N,2,0.506,83
7,BUG,N,1,0.497,163
8,UAL,N,2,0.493,75
9,UAL,N,2,0.493,75
