# The empirical analysis of CAPM in SSE 180 Index

## Introduction for SSE 180 Index 

The SSE 180 Index (also known as SSE Component Index) refers to the Shanghai Stock Exchange's adjustment and renaming of the original SSE 30 Index. The sample stocks of SSE 180 are the most representative samples of all A-share stocks, and have been officially released since July 1, 2002.As the core of SSE index series, the preparation plan of SSE 180 index aims to establish a benchmark index that reflects the general situation and operation of Shanghai securities market, is operable and investable, and can be used as the investment evaluation criterion and the basis of financial derivatives.

## Overview of CAPM theoretical framework 

CPM model was built on the earlier work on diversification and modern portfolio theory, which was later improved by Sharpe, Markowitz and Merton Miller. The CAPM developed in the 1960s was a true revolution in financial theory, due to their contribution, William F. Sharpe received the Nobel prize in Economic Sciences in 1990 This Model basically is for estimating the capital assets using a linear equilibrium model of returns on investments that explains returns above risk-free rate using covariance with the overall market. (Džaja & Aljinović,2013). In market equilibrium the CAPM expresses the relationship between an asset’s expected return and so-called systematic risk beta in a liner function. (Sharpe, Litner, Mossin, 1966).. CAPM predicts that the expected return on an asset equals the risk-free rate plus a risk premium, it is linearly related to systematic risk measured by the assets’ beta. (Basu and Chawla, 2010). Due to the simplicity and rationality of CAPM, CAPM still remains very popular in many situations. 

In the development of the asset pricing model, the assumptions of the models are( Džaja Aljinović,2013)( Basu& Chawla,2008)( Blac ,Jensen& Scholes,1972)

1)Investors are risk averse and willing to maximize their expected rate of return over a period

2)Investors prefer a portfolio with higher returns 

3)Free to lend and borrow with a risk-free interest 

4)The property is indefinitely divisible 

5)Trade without transaction or taxation costs.

## Methodology 

In empirical analysis, this model is expressed: 
    
                                     R_p=R_f+β_p (R_m-R_f )+ε_p   (1)
        
Where R_p  is the expected return of individual or portfolio stock in real estate industry, 𝑅𝑓 is a risk-free interest rate, 𝑅𝑚 represents the expected return of a market portfolio, 𝛽𝑝 measures the systematic risk of individual stock or market portfolio and 𝜀𝑝 is the stochastic error term. In this paper, CAPM is used to analyze the Shanghai property stock market and test whether the effectiveness of this model is verified.

This paper is to use time series regression based on CAPM model to make estimation about the individual and portfolio β, which will be useful to understand the risk about Shanghai real estate market; cross sectional regression-based method developed by Black, Jensen and Scholes (1972) is to verify the relationship between risk and expected return. 

The detailed steps are: First, separate the whole time range from May 1st, 2010 to May 1st, 2019 into three periods; Second: Use the data in the first period to calculate the beta for individual stock; Third, rank the individual beta of each stock from lowest to highest, and make different portfolios; Fourth: use the data of second term to estimate the portfolio beta; fifth, make the cross sectional regression using the third period data coupled with these portfolio betas to determine the relationships between expected return and risk. (Tao Zhao, 2001) (Wang and Geng, 2012) (Zhu, 2010)

## Data 
The data is collected from BaoStock, this is a free, open source securities data platform providing a large number of accurate, complete securities historical, financial data of public companies. Data information is obtained through python to meet the data requirements for econometric research. 
This paper selects 30 stocks from the Stock pool of SSE 180 Index from december, 2013 to december 30,2019, daily yield of trading days are almost six years. The 30 sample stocks gernerally cover the national economy including steel, real estate, finance, culture, medicine, energy, communications,transportation, software, agriculture, and so on. Also,the trading days were basically maintained for more than 1,400 days during the observation period. In general, the sample can reflect the overall operation of the market and China's securities marketand in line with the hypothesis of CAPM.

Here, SSE 180 is regarded as the market portfolio index and the return on stock of SSE 180 as the return of market portfolio. The Chinese researchers normally will either do the split-adjusted share prices about the data in stock markets (Wang& Geng, 2012), or reverse split-adjusted share prices (Liao& Wang, 2003) (Zhao, 2013). In this paper, all the data are collected using the reverse split-adjusted share prices, this is to consider the influence of dividend, allotment and stock segmentation on the data (Zhao, 2013). 
Regarding the return on every stock, there are two methods of calculation, one is logarithmic algorithm, using this way( Sakouvogui and Nganje, 2019), the data processed is approximately normal distribution, which is useful to improve the stability of time series; another is to subtract the closing price of the specific stock at the ending term and the initial closing price and then divide by the beginning closing price. (Zhao, 2013) (Wang & Geng 2012) 

Here, in order to ensure the stability of time series, logarithmic algorithm is used to calculate the daily return about stocks.

                                   R_it=Ln(P_it/P_(i(t-1)) ) (2)

Where, 𝑃𝑖𝑡 is the closing price of stock i at moment t, 𝑃𝑖(𝑡−1) is the closing price at moment t-1.

# Estimation for individual stock 
Rewrite the equation (1) into (3), Use Single Index Model estimation: 

                            R_it−R_f=α_i+β_i(R_mt−R_f) +ε_it (3)

Where R_it is the daily return of stock i at time t; R_mt is the return of market portfolio; R_f is the return of the risk-free asset, here R_f equates 0.009027% (one-year deposit rate on bank of China/360) as Chinese researchers basically regard the deposit rate as risk free asset by default. (Liao and Shen, 2004).


In [1]:
import pandas as pd
df = pd.read_csv ('final .csv') 
print(df.shape)     # read the row 
print(df.columns)   # read the columns 
print(df.head(3))   # print the first row 
print(df.tail(3))   # print the last row to check the missing variables 

(2189, 33)
Index(['date', 'sse180', 'rate', 'sh.600009', 'sh.600010', 'sh.600295',
       'sh.600019', 'sh.600028', 'sh.600031', 'sh.600036', 'sh.600048',
       'sh.600050', 'sh.600066', 'sh.600068', 'sh.600085', 'sh.600111',
       'sh.601628', 'sh.600150', 'sh.600196', 'sh.600295.1', 'sh.600406',
       'sh.601933', 'sh.600536', 'sh.600547', 'sh.600598', 'sh.600600',
       'sh.600690', 'sh.600875', 'sh.600887', 'sh.600895', 'sh.601006',
       'sh.601877', 'sh.601899'],
      dtype='object')
         date    sse180    rate  sh.600009  sh.600010  sh.600295  sh.600019  \
0  2011-01-04  6651.797  0.0001      12.37       3.91      19.79       6.55   
1  2011-01-05  6625.325  0.0001      13.09       3.84      20.06       6.45   
2  2011-01-06  6589.212  0.0001      13.17       3.84      20.78       6.46   

   sh.600028  sh.600031  sh.600036  ...  sh.600547  sh.600598  sh.600600  \
0       8.20      21.63      13.02  ...      53.00      13.45      35.03   
1       8.11      22.00      1

#### Split the data into three periods and save as csv
#### First Period from 2013-10-08 to 2015-10-08

In [2]:
# df[(df['date']>='20131008')(df['date']<='20151008')]
mask = (df['date'] > '2013-10-06') & (df['date'] <='2015-10-08' )
# mask = (df['date'] > '10-08-2013') & (df['date'] <='10-08-2015' )
df = df.loc[mask]
df.loc[mask].to_csv('final_first period')
df

Unnamed: 0,date,sse180,rate,sh.600009,sh.600010,sh.600295,sh.600019,sh.600028,sh.600031,sh.600036,...,sh.600547,sh.600598,sh.600600,sh.600690,sh.600875,sh.600887,sh.600895,sh.601006,sh.601877,sh.601899
664,2013-10-08,5275.141,0.0001,14.98,4.08,6.99,4.24,4.47,7.64,10.91,...,22.25,12.23,42.65,13.81,11.96,44.90,8.55,7.14,22.62,2.60
665,2013-10-09,5298.426,0.0001,15.15,4.05,6.98,4.23,4.49,7.61,10.96,...,22.14,13.45,42.32,14.31,11.89,46.31,8.55,7.06,22.75,2.59
666,2013-10-10,5228.953,0.0001,14.67,4.03,6.80,4.20,4.44,7.44,10.78,...,21.79,12.94,41.90,15.74,11.73,46.76,8.48,7.00,22.88,2.57
667,2013-10-11,5327.928,0.0001,15.25,4.08,6.99,4.25,4.47,7.58,10.94,...,21.93,12.75,43.46,15.82,12.07,46.36,9.00,7.09,23.15,2.59
668,2013-10-14,5326.885,0.0001,15.87,4.10,7.18,4.26,4.45,7.74,10.80,...,21.72,13.34,46.87,15.95,12.29,45.43,8.82,7.30,25.47,2.59
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1149,2015-09-25,7125.868,0.0001,28.17,3.64,7.11,5.56,4.87,6.44,17.72,...,17.32,13.53,31.46,9.43,12.64,15.40,20.42,9.09,30.62,3.51
1150,2015-09-28,7119.292,0.0001,28.23,3.66,7.21,5.60,4.83,6.47,17.51,...,17.32,13.71,31.97,9.37,12.74,15.42,20.28,9.07,30.62,3.47
1151,2015-09-29,6976.287,0.0001,27.68,3.53,7.04,5.47,4.72,6.32,17.44,...,17.32,13.74,31.94,9.03,12.33,15.27,19.29,8.82,30.62,3.37
1152,2015-09-30,7030.858,0.0001,27.72,3.55,7.25,5.59,4.74,6.36,17.77,...,17.32,14.32,32.14,9.06,12.50,15.38,19.02,8.83,30.62,3.38


#### Second Period from 2015-10-08 to 2017-10-08

In [3]:
# df[(df['date']>='20131008')(df['date']<='20151008')]
import pandas as pd
df = pd.read_csv ('final .csv') 
mask = (df['date'] > '2015-10-06') & (df['date'] <='2017-10-09' )
# mask = (df['date'] > '10-08-2013') & (df['date'] <='10-08-2015' )
df = df.loc[mask]
df.loc[mask].to_csv('final_second period')
df

Unnamed: 0,date,sse180,rate,sh.600009,sh.600010,sh.600295,sh.600019,sh.600028,sh.600031,sh.600036,...,sh.600547,sh.600598,sh.600600,sh.600690,sh.600875,sh.600887,sh.600895,sh.601006,sh.601877,sh.601899
1153,2015-10-08,7216.037,0.0001,28.65,3.64,7.50,5.77,4.89,6.58,17.73,...,18.03,15.75,32.42,9.34,13.27,15.86,19.85,9.02,30.62,3.48
1154,2015-10-09,7314.600,0.0001,28.57,3.70,7.55,5.80,5.03,6.70,17.96,...,19.39,16.19,32.44,9.37,13.54,15.86,19.97,9.08,30.62,3.55
1155,2015-10-12,7531.195,0.0001,28.57,3.86,7.81,5.98,5.15,7.00,18.19,...,20.02,16.81,33.56,9.72,14.23,16.16,20.80,9.35,30.62,3.74
1156,2015-10-13,7522.679,0.0001,29.38,3.87,7.91,5.94,5.09,6.93,18.06,...,19.45,17.58,33.60,9.72,14.80,16.29,21.15,9.28,30.62,3.70
1157,2015-10-14,7449.195,0.0001,28.68,4.20,7.74,5.88,5.07,6.76,17.85,...,19.48,16.79,33.51,9.56,14.50,15.99,20.91,9.15,30.02,3.67
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1638,2017-09-26,8293.150,0.0001,38.43,2.70,14.74,7.27,5.96,7.35,25.71,...,31.74,11.75,30.51,14.52,10.40,25.80,16.82,8.91,21.56,3.93
1639,2017-09-27,8271.729,0.0001,38.15,2.72,15.15,7.46,5.92,7.55,25.40,...,31.38,11.75,30.56,14.80,10.42,25.44,16.75,8.84,21.92,3.91
1640,2017-09-28,8269.078,0.0001,38.33,2.70,14.73,7.34,5.90,7.48,25.20,...,31.19,11.96,30.59,14.91,10.42,26.40,16.76,8.85,21.50,3.87
1641,2017-09-29,8297.697,0.0001,37.98,2.70,14.70,7.39,5.90,7.65,25.55,...,31.43,11.93,30.81,15.09,10.43,27.50,16.77,8.75,21.39,3.87


#### Third Period from 2017-10-08 to 2019-10-08

In [4]:
# df[(df['date']>='20131008')(df['date']<='20151008')]
import pandas as pd
df = pd.read_csv ('final .csv') 
mask = (df['date'] > '2017-10-06') & (df['date'] <='2019-10-08' )
# mask = (df['date'] > '10-08-2013') & (df['date'] <='10-08-2015' )
df = df.loc[mask]
df.loc[mask].to_csv('final_third period')
df

Unnamed: 0,date,sse180,rate,sh.600009,sh.600010,sh.600295,sh.600019,sh.600028,sh.600031,sh.600036,...,sh.600547,sh.600598,sh.600600,sh.600690,sh.600875,sh.600887,sh.600895,sh.601006,sh.601877,sh.601899
1642,2017-10-09,8366.398,0.0001,38.56,2.72,14.81,7.46,5.87,7.77,26.08,...,31.56,12.34,31.10,15.35,10.51,27.79,16.81,8.87,22.18,3.90
1643,2017-10-10,8369.725,0.0001,37.82,2.69,14.63,7.34,5.85,7.79,26.12,...,31.49,12.67,31.19,14.76,10.56,27.59,16.85,9.02,22.63,3.90
1644,2017-10-11,8391.506,0.0001,38.63,2.67,14.40,7.28,5.86,7.76,26.04,...,31.20,12.55,31.44,14.80,10.49,28.57,16.95,9.02,22.21,3.87
1645,2017-10-12,8415.264,0.0001,39.99,2.67,14.14,7.37,5.82,7.81,26.37,...,31.30,12.53,31.29,15.62,10.35,28.49,17.09,9.06,23.24,3.87
1646,2017-10-13,8414.460,0.0001,40.04,2.66,14.14,7.53,5.81,7.90,26.20,...,31.18,12.89,32.85,15.93,10.45,27.60,17.08,8.97,23.29,3.90
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2124,2019-09-25,8527.464,0.0001,80.68,1.48,7.89,6.01,5.01,14.40,34.35,...,37.32,9.69,49.13,15.15,9.30,27.84,16.03,7.67,22.14,3.41
2125,2019-09-26,8491.745,0.0001,80.35,1.47,7.78,6.03,5.00,14.39,35.22,...,34.18,9.70,48.86,15.25,9.09,28.05,15.51,7.66,22.06,3.31
2126,2019-09-27,8495.943,0.0001,79.93,1.46,7.77,5.92,4.99,14.41,35.20,...,34.05,9.74,48.75,15.37,9.16,28.29,15.53,7.61,22.01,3.29
2127,2019-09-30,8409.486,0.0001,79.78,1.46,7.92,5.91,5.02,14.28,34.75,...,33.89,9.69,48.50,15.30,9.21,28.52,15.50,7.59,21.83,3.27


#### Caculate the percentage change  and obtain the logarithm of them in order to do the OLS about individual stock

In [5]:
# df['sse180'].div(df['sse180'].shift(1))
import pandas as pd
df = pd.read_csv ('final .csv',index_col = "date",parse_dates = True)
# print(df['sse180'].div(df['sse180'].shift(1)))

In [6]:
df.pct_change( )

Unnamed: 0_level_0,sse180,rate,sh.600009,sh.600010,sh.600295,sh.600019,sh.600028,sh.600031,sh.600036,sh.600048,...,sh.600547,sh.600598,sh.600600,sh.600690,sh.600875,sh.600887,sh.600895,sh.601006,sh.601877,sh.601899
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-01-04,,,,,,,,,,,...,,,,,,,,,,
2011-01-05,-0.003980,0.0,0.058205,-0.017903,0.013643,-0.015267,-0.010976,0.017106,-0.003840,0.003582,...,-0.044151,0.002974,-0.018270,-0.005000,0.025707,-0.006130,-0.004449,-0.010101,0.016543,-0.028743
2011-01-06,-0.005451,0.0,0.006112,0.000000,0.035892,0.001550,-0.003699,-0.021818,-0.005397,0.026410,...,-0.012041,0.005930,0.002326,-0.005743,-0.030911,-0.052172,0.003352,0.001276,-0.014646,-0.014797
2011-01-07,0.004825,0.0,-0.013667,0.005208,-0.039461,0.015480,0.012376,-0.012546,0.027132,0.000695,...,-0.021179,-0.014738,0.017116,-0.000722,-0.030747,-0.004338,0.012249,0.006369,-0.007845,-0.012516
2011-01-10,-0.017165,0.0,-0.030023,-0.025907,-0.017535,-0.006098,0.002445,-0.012235,-0.017358,-0.029882,...,0.004287,0.004488,-0.030519,-0.028540,-0.002965,-0.025327,-0.023102,0.005063,-0.006242,-0.013942
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-12-25,-0.001614,0.0,-0.002956,-0.007634,0.001159,-0.010638,-0.001992,0.038920,-0.005788,-0.007138,...,0.007253,-0.007269,-0.009145,-0.014941,-0.008686,-0.008598,-0.002717,0.003708,-0.001543,0.004854
2019-12-26,0.009237,0.0,0.003481,0.000000,0.015046,0.008961,0.001996,0.001813,0.008203,0.039869,...,0.000000,0.002092,0.010634,0.040795,0.016429,0.004670,0.023161,0.001232,0.013138,0.012077
2019-12-27,0.000658,0.0,-0.007451,0.015385,0.002281,0.008881,0.005976,0.004222,-0.006299,-0.002514,...,0.011584,-0.001044,0.001588,-0.022613,-0.015086,0.017264,-0.000666,-0.003690,-0.010297,0.064439
2019-12-30,0.015634,0.0,0.007766,0.000000,0.015927,0.012324,0.005941,0.007808,-0.000792,0.025205,...,-0.005881,0.012539,0.012686,-0.007198,0.003282,0.006201,0.020653,0.014815,0.036994,0.006726


In [7]:
# data = df.pct_change()
# data.to_csv('sse.csv') 

In [8]:
import pandas as pd
df = pd.read_csv ('sse1.csv') 
# df['sse180-rate'] = df["sse180"] - df["rate"]
df['sse180'] = df["sse180"] - df["rate"]
df['sh.600009'] = df["sh.600009"] - df["rate"]
df['sh.600010'] = df["sh.600010"] - df["rate"]
df['sh.600295'] = df["sh.600295"] - df["rate"]
df['sh.600028'] = df["sh.600028"] - df["rate"]
df['sh.600031'] = df["sh.600031"] - df["rate"]
df['sh.600036'] = df["sh.600036"] - df["rate"]
df['sh.600048'] = df["sh.600048"] - df["rate"]
df['sh.600050'] = df["sh.600050"] - df["rate"]
df['sh.600066'] = df["sh.600066"] - df["rate"]
df['sh.600068'] = df["sh.600068"] - df["rate"]
df['sh.600085'] = df["sh.600085"] - df["rate"]
df['sh.600111'] = df["sh.600111"] - df["rate"]
df['sh.601628'] = df["sh.601628"] - df["rate"]
df['sh.600150'] = df["sh.600150"] - df["rate"]
df['sh.600196'] = df["sh.600196"] - df["rate"]
df['sh.600295.1'] = df["sh.600295.1"] - df["rate"]
df['sh.600406'] = df["sh.600406"] - df["rate"]
df['sh.601933'] = df["sh.601933"] - df["rate"]
df['sh.600536'] = df["sh.600536"] - df["rate"]
df['sh.600547'] = df["sh.600547"] - df["rate"]
df['sh.600598'] = df["sh.600598"] - df["rate"]
df['sh.600600'] = df["sh.600600"] - df["rate"]
df['sh.600690'] = df["sh.600690"] - df["rate"]
df['sh.600875'] = df["sh.600875"] - df["rate"]
df['sh.600887'] = df["sh.600887"] - df["rate"]
df['sh.600895'] = df["sh.600895"] - df["rate"]
df['sh.601006'] = df["sh.601006"] - df["rate"]
df['sh.601877'] = df["sh.601877"] - df["rate"]
df['sh.601899'] = df["sh.601899"] - df["rate"]
df

Unnamed: 0.1,Unnamed: 0,date,sse180,rate,sh.600009,sh.600010,sh.600295,sh.600019,sh.600028,sh.600031,...,sh.600547,sh.600598,sh.600600,sh.600690,sh.600875,sh.600887,sh.600895,sh.601006,sh.601877,sh.601899
0,1,2011-01-04,,,,,,,,,...,,,,,,,,,,
1,2,2011-01-05,-0.004070,0.00009,0.058115,-0.017993,0.013553,-0.015267,-0.011066,0.017016,...,-0.044241,0.002884,-0.018360,-0.005090,0.025617,-0.006221,-0.004540,-0.010191,0.016452,-0.028833
2,3,2011-01-06,-0.005541,0.00009,0.006021,-0.000090,0.035802,0.001550,-0.003789,-0.021908,...,-0.012131,0.005840,0.002236,-0.005833,-0.031001,-0.052262,0.003262,0.001185,-0.014736,-0.014887
3,4,2011-01-07,0.004735,0.00009,-0.013758,0.005118,-0.039551,0.015480,0.012286,-0.012637,...,-0.021269,-0.014829,0.017026,-0.000812,-0.030837,-0.004429,0.012159,0.006279,-0.007935,-0.012606
4,5,2011-01-10,-0.017255,0.00009,-0.030113,-0.025997,-0.017625,-0.006098,0.002355,-0.012326,...,0.004196,0.004397,-0.030609,-0.028631,-0.003055,-0.025417,-0.023193,0.004973,-0.006332,-0.014032
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2184,2185,2019-12-25,-0.001704,0.00009,-0.003047,-0.007724,0.001068,-0.010638,-0.002082,0.038830,...,0.007163,-0.007359,-0.009235,-0.015031,-0.008776,-0.008688,-0.002808,0.003618,-0.001633,0.004764
2185,2186,2019-12-26,0.009147,0.00009,0.003390,-0.000090,0.014956,0.008961,0.001906,0.001722,...,-0.000090,0.002002,0.010544,0.040705,0.016339,0.004580,0.023070,0.001141,0.013047,0.011987
2186,2187,2019-12-27,0.000567,0.00009,-0.007541,0.015294,0.002190,0.008881,0.005886,0.004132,...,0.011494,-0.001134,0.001498,-0.022703,-0.015176,0.017174,-0.000756,-0.003780,-0.010388,0.064349
2187,2188,2019-12-30,0.015544,0.00009,0.007676,-0.000090,0.015837,0.012324,0.005850,0.007718,...,-0.005971,0.012449,0.012596,-0.007288,0.003192,0.006111,0.020563,0.014725,0.036904,0.006636


In [9]:
data1 = df
data1.to_csv('sse4.csv') 

In [10]:
from regression import Regression
reg = Regression()
print(reg)

You created an instance of Stats
<regression.Regression object at 0x0000025AA4D114C0>


In [11]:
import pandas as pd  
import copy
from stats import *    
  
class Regression:  
    def __init__(self):  
        self.stats = stats()  
          
    def OLS(self, reg_name, data, y_name, beta_names, 
            min_val = 0,  
                max_val = None, constant = True):  
        self.min_val = min_val  
        if max_val != None:  
            self.max_val = max_val  
        else:  
            self.max_val = len(data)  
        self.reg_name = reg_name  
        self.y_name = y_name  
        self.beta_names = copy.copy(beta_names)
        self.data = data.copy()  
        if constant:  
            self.add_constant()  
  
    def add_constant(self):  
        self.data["Constant"] = 1  
        self.beta_names.append("Constant")  

In [12]:
import pandas as pd
from regression import Regression
reg = Regression()
print(reg)
reg

You created an instance of Stats
<regression.Regression object at 0x0000025AABBC9BB0>


<regression.Regression at 0x25aabbc9bb0>

In [26]:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import statsmodels.api as sm

df = pd.read_csv("sse3.csv", index_col = "DATE")
X = df['sse180']
ylabes = ['sh.600009','sh.600010', 'sh.600295', 'sh.600019', 'sh.600028',
         'sh.600031' , 'sh.600036' ,'sh.600048','sh.600050','sh.600066','sh.600068',
         'sh.600085','sh.600111','sh.601628','sh.600150','sh.600196', 'sh.600295.1','sh.600406',  
          'sh.601933',  'sh.600536' ,'sh.600547','sh.600598','sh.600600', 'sh.600690' , 'sh.600875', 
          'sh.600887', 'sh.600895', 'sh.601006', 'sh.601877', 'sh.601899']
for y in ylabes:
    Y = df[y]
    est = sm.OLS(Y, X).fit()
    print(est.summary())
    print(est.params)

                                 OLS Regression Results                                
Dep. Variable:              sh.600009   R-squared (uncentered):                   0.474
Model:                            OLS   Adj. R-squared (uncentered):              0.473
Method:                 Least Squares   F-statistic:                              440.0
Date:                Sat, 21 Nov 2020   Prob (F-statistic):                    3.79e-70
Time:                        17:22:07   Log-Likelihood:                          1205.5
No. Observations:                 490   AIC:                                     -2409.
Df Residuals:                     489   BIC:                                     -2405.
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
sse180    1.132
dtype: float64
                                 OLS Regression Results                                
Dep. Variable:              sh.601933   R-squared (uncentered):                   0.211
Model:                            OLS   Adj. R-squared (uncentered):              0.209
Method:                 Least Squares   F-statistic:                              130.8
Date:                Sat, 21 Nov 2020   Prob (F-statistic):                    5.21e-27
Time:                        17:22:07   Log-Likelihood:                          957.94
No. Observations:                 490   AIC:                                     -1914.
Df Residuals:                     489   BIC:                                     -1910.
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                