# IES FSS CUNI
# JEM092 Asset Pricing - Homework 1
### Summer Semester 2023/2024

### General instructions:
* You should complete the Homework using R-language.
* Homework should be uploaded into **"SIS->Study group roster->Asset Pricing lecture"** as a zip file containing:
    + Jupyter notebook and the html version of it 
    
    or
    
    + .pdf file and commented R-script
    + For the **Task 2** and **Task 3** make sure that you comment on all the important findings. Be concise (no lengthy essays, please), although, be sure to include all important things as we cannot second-guess your work. 
     
    
    
* Name of the uploaded file should be:
    + "HW_1_Student1surname_Student2surname.zip"
* **Only one member of each group should submit the solution.**
    
    
* Inter-group discussion in solving the assignment is encouraged. However, each group is supposed to write their own answers in their own words. 
* Note that not only numerical results (40%) but also verbal comments (40% points) and appearance (20% points) are parts of the evaluation.
* Not including code or sending only code will result in a **50% penalty** . 

Homework is subject to late penalties 
* 20% for the first day
* 50% for the second and later day.

#### Deadline: 16.4.2024 23:59

## Homework 1 
One of the biggest problems you will face while working on your master thesis (and later in your job) is data collection and preparation. Here you will get used to using *for-loops* to download quite a big amount of the data. Some of the data will be easy to get (e.g. price from finance.yahoo.com), to get the other (e.g. market capitalization), you will need to do a bit of web-scraping.

In the zip file with the homework assignment, you will find a .csv file containing tickers of the S&P 500 index constituents and a zip file containing selected stocks for each student - since you should create groups of 2 students, you can pick student number of the member of the group. The stocks were randomly assigned using the following code where "group_number" is the student number and serves as a seed for a random number generator. 
* Please note that if you run the code on your PC, you might get slightly different random numbers due to the different hardware-software configurations of your PC.

In [3]:
# 250 random stocks
tickers<- as.character(read.csv("symbols_sp500_long_history.csv")[,2]) # load S&P 500 firms
group_number<-as.numeric(read.csv("students_2023_2024.csv")[,1]) # load groups numbers 

firms_groups<-matrix(ncol=250,nrow=length(group_number)) # initialize empty matrix for storing firms
for(i in 1:length(group_number)){
    set.seed(group_number[i]) # seed for random number generation
    firms_groups[i,]<-sample(tickers,250, replace = FALSE) # generate 250 random firms
    }
row.names(firms_groups)<-group_number # rename rows to names of student groups

for(i in 1:length(group_number)){
    write.csv(firms_groups[i,],paste0(group_number[i],"_data_download.csv")) # save data as csv
              }

# 20 random stocks from downloaded data
rand_stock_download<-matrix(ncol=20,nrow=length(group_number))
for(i in 1:length(group_number)){
    set.seed(2*group_number[i])
    rand_stock_download[i,]<-sample(firms_groups[i,],20, replace = FALSE)
    }
row.names(rand_stock_download)<-group_number

for(i in 1:length(group_number)){
    write.csv(rand_stock_download[i,],paste0(group_number[i],"_rand_download.csv")) # save data as csv
              }

# 20 random stocks from seminar data
library(readr)
tickers_seminar <- read_csv("sap100_tickers.csv",col_names = FALSE) # load S&P 100 firms
rand_stock_seminar<-matrix(ncol=20,nrow=length(group_number)) 
for(i in 1:length(group_number)){
  set.seed(3*group_number[i])
  rand_stock_seminar[i,]<-sample(tickers_seminar$X1,20, replace = FALSE)
}
row.names(rand_stock_seminar)<-group_number

for(i in 1:length(group_number)){
    write.csv(rand_stock_seminar[i,],paste0(group_number[i],"_rand_seminar.csv")) # save data as csv
              }

# zip it
# install.packages("zip")
library(zip)

Zip_Files <- list.files(pattern = "download.csv|seminar.csv", full.names=TRUE)  # list all generated csv files 
zip::zipr(zipfile = "data_HW1.zip", files = Zip_Files) # Zip the files and place the zipped file in working directory

[1mRows: [22m[34m96[39m [1mColumns: [22m[34m1[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m (1): X1

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
"package 'zip' was built under R version 4.3.3"

Attaching package: 'zip'


The following objects are masked from 'package:utils':

    unzip, zip




In [4]:
file.remove(Zip_Files) #remove csv files

### Task 1 - Data download (8 pts)
In the previous step, you were assigned **250** firms for which you will get the data, i.e. firms in "**data_HW1.zip->student_number_data_download.csv**". You will download 
* daily adjusted close price and volume data from www.finance.yahoo.com
* Market Capitalization and Book Value per Share data from www.macrotrends.net
* Sample period: 01.01.2010 - 28.02.2024 

Getting the closing price and volume data is trivial (revise the seminar). Downloading MarketCap and Book-value per share will require writing couple more lines of code, but no worries, it is nothing complicated. The easy way is to use the  _httr_ and _rvest_ libraries - a very nice example of how to use _httr_ and _rvest_ can be found for instance here https://github.com/keithmcnulty/scraping

#### Book Value per Share:
* great source of the financial data is  www.macrotrends.net 
    + macrotrends does not like web scraping therefore you need to be a bit creative 
       - https://stackoverflow.com/questions/77142471/how-to-download-data-from-macrotrends-web-site-with-r
    + Hint: 
        - when downloading data you should change "user_agent" in GET function
        - in for-loop it is wise to add some break between individual downloads (something between 30-60 seconds should do the trick)
* for example, Apple price ratios can be found here  https://www.macrotrends.net/stocks/charts/AAPL/apple/price-book
    + to get the data of the other companies, you need to change the ticker (AAPL) and name of the company (apple)
    + since you have a list of tickers, the only catch is to get the names of the companies $\rightarrow$ try to search in google  https://www.macrotrends.net/stocks/charts/AAPL and observe what happens to the url you have searched.
    + to automize searching for company names, use *tickers*, *for-loop*, and function *GET* to obtain url that contains the company name. Once you have the urls replace the "AAPL/apple" by the other "ticker/name_of_company"
        - Hint: observe what happen to url if you put two tickers instead of name of company e.g. https://www.macrotrends.net/stocks/charts/AAPL/AAPL/price-book
    + use the library _httr_ and _rvest_ and functions *read_html*, *html_table* to retrieve table. 

Hint:

In [None]:
library(httr)
library(rvest)
temp<-GET("https://www.macrotrends.net/stocks/charts/AAPL",user_agent(" "))
str(temp)

List of 10
 $ url        : chr "https://www.macrotrends.net/stocks/charts/AAPL/apple/"
 $ status_code: int 404
 $ headers    :List of 11
  ..$ date             : chr "Wed, 20 Mar 2024 13:43:48 GMT"
  ..$ content-type     : chr "text/html; charset=UTF-8"
  ..$ transfer-encoding: chr "chunked"
  ..$ connection       : chr "keep-alive"
  ..$ vary             : chr "Accept-Encoding"
  ..$ cache-control    : chr "max-age=14400"
  ..$ cf-cache-status  : chr "HIT"
  ..$ age              : chr "179"
  ..$ server           : chr "cloudflare"
  ..$ cf-ray           : chr "8676221fb848b348-PRG"
  ..$ content-encoding : chr "gzip"
  ..- attr(*, "class")= chr [1:2] "insensitive" "list"
 $ all_headers:List of 2
  ..$ :List of 3
  .. ..$ status : int 301
  .. ..$ version: chr "HTTP/1.1"
  .. ..$ headers:List of 11
  .. .. ..$ date             : chr "Wed, 20 Mar 2024 13:43:48 GMT"
  .. .. ..$ content-type     : chr "text/html; charset=UTF-8"
  .. .. ..$ transfer-encoding: chr "chunked"
  .. .. ..$ con

#### Market Cap:
* getting the market capitalization is quite similar; the only difference is that the data are stored in a graph not in the table at www.macrotrends.net web
* the values of Apple (for other companies just change ticker) market cap can be found here https://www.macrotrends.net/assets/php/market_cap.php?t=AAPL 
    + you can download the page using combinations of *GET* and *html_read* function 
    + using *html_node* and *html_children* you will get the body of the web page
    + use *html_text* to extract data from second body node $\rightarrow$ your data are stored between square brackets, i.e. "[","]" $\rightarrow$ split the text and store the dates and values of market capitalization

### Task 2 - Markowitz portfolio (4 pts)
Using the data of the 20 stocks illustrate the portfolio performance by forming an efficient frontier. For the analysis, use either
* 20 random stocks from the data you have downloaded, i.e. stocks from "student_number_rand_download.csv" file
* 20 random stocks from the seminar, i.e. stocks from "studnet_number_rand_seminar.csv" file
  + choose this option in case you were not able to download data in Task 1   

In this task, you will form 2 portfolios
* portfolio A will consist of stocks 1 to 10
* portfolio B will consist of stocks 11 to 20

Using the adjusted close price construct (and plot in the single figure) global minimum variance portfolio and the efficient frontier of both portfolio A and B. Since we will be working with monthly returns later in the course, use the monthly returns from the period 2015-2024 as the input for the task. **Comment the important features of the figure, e.g. which portfolio will you choose? Why? Which stock(s) is driving the shape of the frontier? etc.**

> Hint:
> * convert daily data to monthly, e.g. use "to.period" function
> * from the monthly data calculate monthly returns as $$r_{i,t}=\frac{P_{i,t}-P_{i,t-1}}{P_{i,t-1}}$$
>   + alterantively you can calculate it as "Close - Open" returns, i.e. $$r_{i,t}=\frac{P_{i,t,Close}-P_{i,t,Open}}{P_{i,t,Open}}$$
> * obtain the Global Minimum Variance Portfolio, e.g. you can use "optimize.portfolio" function with proper specification
> * form the efficient frontier for both portfolios; the constraints on the weights should be following 
>   + minimum weight should be minimum from the GMVP portfolio weights $\rightarrow$ GMVP weights can be negative!
>   + maximum weight should be 1
>   + you can use "create.EfficientFrontier" function with proper specification
> * your final figure should look like this
> 
![](./sample_eff_front.png)

### Task 3 - Stock index (3 pts)
Using the monthly prices from *Task 2* create three price indexes
* simple price weighted index, i.e. $$Index_t=\frac{\sum_{i=1}^N P_{i,t}}{divisor}$$
  + as a first step when forming the price weighted index you will need to adjust the price for the stock splits! 
  + in the first period divisor will be equal to number of stocks
* market-capitalization weighted index, $$Index_t=\sum_{i=1}^N w_{i,t}*P_{i,t},$$
    + where $w_{i,t}=\frac{MKT-CAP_{i,t}}{TOTAL-MKT-CAP_t}$
        + i.e. weighted average of prices where weights are market capitalization of individual companies 
* equally weighted index, i.e. weighted average of prices where weights are $w_i=\frac{1}{N}$

**Create the plot containing all three indexes and comment on their difference.**