Merge pull request #38 from tidy-finance/assign-portfolio-change

Assign portfolio change
tidy-finance · May 17, 2023 · 2209bb1 · 2209bb1
2 parents beeed3c + 2576c75
commit 2209bb1
Show file tree

Hide file tree

Showing 5 changed files with 65 additions and 52 deletions.
diff --git a/changelog.qmd b/changelog.qmd
@@ -5,4 +5,4 @@ title: Changelog
 - [Mar. 30, 2023, Issue 29:](https://github.com/tidy-finance/website/issues/29) We upgraded to `tidyverse` 2.0.0 and R 4.3.2 and removed all explicit loads of `lubridate`
 - [Feb. 15, 2023, Commit bfda6af: ](https://github.com/tidy-finance/website/commit/bfda6af6169a42f433568e32b7a9cce06cb948ac) We corrected an error in the calculation of the annualized average return volatility in the Chapter [Introduction to Tidy Finance](https://tidy-finance.quarto.pub/website/introduction-to-tidy-finance.html#the-efficient-frontier)
 - [Mar. 06, 2023, Commit 857f0f5: ](https://github.com/tidy-finance/website/commit/857f0f5893a8e7e4c2b4475e1461ebf3d0abe2d6) We corrected an error in the label of [Figure 6](https://tidy-finance.quarto.pub/website/introduction-to-tidy-finance.html#fig-106) which wrongly claimed to show the efficient tangency portfolio.
-- [Mar. 09, 2023, Commit fae4ac3: ](https://github.com/tidy-finance/website/commit/fae4ac3fd12797d66a48f43af3d8e84ded694f13) We corrected a typo in the definition of the power utility function in Chapter [Portfolio Performance](https://tidy-finance.quarto.pub/website/parametric-portfolio-policies.html#portfolio-performance). The utility function implemented in the code is now consistent with the text. 
+- [Mar. 09, 2023, Commit fae4ac3: ](https://github.com/tidy-finance/website/commit/fae4ac3fd12797d66a48f43af3d8e84ded694f13) We corrected a typo in the definition of the power utility function in Chapter [Portfolio Performance](https://tidy-finance.quarto.pub/website/parametric-portfolio-policies.html#portfolio-performance). The utility function implemented in the code is now consistent with the text. 
diff --git a/replicating-fama-and-french-factors.qmd b/replicating-fama-and-french-factors.qmd
@@ -78,24 +78,27 @@ variables_ff <- me_ff |>
 Next, we construct our portfolios with an adjusted `assign_portfolio()` function.\index{Portfolio sorts} Fama and French rely on NYSE-specific breakpoints, they form two portfolios in the size dimension at the median and three portfolios in the dimension of book-to-market at the 30%- and 70%-percentiles, and they use independent sorts. The sorts for book-to-market require an adjustment to the function in Chapter 9 because the `seq()` we would produce does not produce the right breakpoints. Instead of `n_portfolios`, we now specify `percentiles`, which take the breakpoint-sequence as an object specified in the function's call. Specifically, we give `percentiles = c(0, 0.3, 0.7, 1)` to the function. Additionally, we perform an `inner_join()` with our return data to ensure that we only use traded stocks when computing the breakpoints as a first step.\index{Breakpoints}
 
 ```{r}
-assign_portfolio <- function(data, var, percentiles) {
+assign_portfolio <- function(data, 
+                             sorting_variable, 
+                             percentiles) {
   breakpoints <- data |>
     filter(exchange == "NYSE") |>
-    reframe(breakpoint = quantile(
-      {{ var }},
-      probs = {{ percentiles }},
-      na.rm = TRUE
-    )) |>
-    pull(breakpoint) |>
-    as.numeric()
+    pull({{ sorting_variable }}) |>
+    quantile(
+      probs = percentiles,
+      na.rm = TRUE,
+      names = FALSE
+    )
 
   assigned_portfolios <- data |>
-    mutate(portfolio = findInterval({{ var }},
+    mutate(portfolio = findInterval(
+      pick(everything()) |>
+        pull({{ sorting_variable }}),
       breakpoints,
       all.inside = TRUE
     )) |>
     pull(portfolio)
-
+  
   return(assigned_portfolios)
 }
 
@@ -105,12 +108,12 @@ portfolios_ff <- variables_ff |>
   mutate(
     portfolio_me = assign_portfolio(
       data = pick(everything()),
-      var = me_ff,
+      sorting_variable = me_ff,
       percentiles = c(0, 0.5, 1)
     ),
     portfolio_bm = assign_portfolio(
       data = pick(everything()),
-      var = bm_ff,
+      sorting_variable = bm_ff,
       percentiles = c(0, 0.3, 0.7, 1)
     )
   ) |>

diff --git a/size-sorts-and-p-hacking.qmd b/size-sorts-and-p-hacking.qmd
@@ -152,22 +152,25 @@ To replicate the NYSE-centered sorting procedure, we introduce `exchanges` as an
 assign_portfolio <- function(n_portfolios,
                              exchanges,
                              data) {
+  # Compute breakpoints
   breakpoints <- data |>
     filter(exchange %in% exchanges) |>
-    reframe(breakpoint = quantile(
-      mktcap_lag,
+    pull(mktcap_lag) |>
+    quantile(
       probs = seq(0, 1, length.out = n_portfolios + 1),
-      na.rm = TRUE
-    )) |>
-    pull(breakpoint) |>
-    as.numeric()
+      na.rm = TRUE,
+      names = FALSE
+    )
 
+  # Assign portfolios
   assigned_portfolios <- data |>
     mutate(portfolio = findInterval(mktcap_lag,
       breakpoints,
       all.inside = TRUE
     )) |>
     pull(portfolio)
+  
+  # Output
   return(assigned_portfolios)
 }
 ```

diff --git a/univariate-portfolio-sorts.qmd b/univariate-portfolio-sorts.qmd
@@ -102,27 +102,34 @@ The results indicate that we cannot reject the null hypothesis of average return
 
 ## Functional Programming for Portfolio Sorts
 
-Now we take portfolio sorts to the next level. We want to be able to sort stocks into an arbitrary number of portfolios. For this case, functional programming is very handy: we employ the [curly-curly](https://www.tidyverse.org/blog/2019/06/rlang-0-4-0/#a-simpler-interpolation-pattern-with-)-operator to give us flexibility concerning which variable to use for the sorting, denoted by `var`.\index{Curly-curly} We use `quantile()` to compute breakpoints for `n_portfolios`. Then, we assign portfolios to stocks using the `findInterval()` function. The output of the following function is a new column that contains the number of the portfolio to which a stock belongs.\index{Functional programming}
+Now we take portfolio sorts to the next level. We want to be able to sort stocks into an arbitrary number of portfolios. For this case, functional programming is very handy: we employ the [curly-curly](https://www.tidyverse.org/blog/2019/06/rlang-0-4-0/#a-simpler-interpolation-pattern-with-)-operator to give us flexibility concerning which variable to use for the sorting, denoted by `sorting_variable`.\index{Curly-curly} We use `quantile()` to compute breakpoints for `n_portfolios`. Then, we assign portfolios to stocks using the `findInterval()` function. The output of the following function is a new column that contains the number of the portfolio to which a stock belongs.\index{Functional programming} 
+
+In some applications, the variable used for the sorting might be clustered (e.g., at a lower bound of 0). Then, multiple breakpoints may be identical, leading to empty portfolios. Similarly, some portfolios might have a very small number of stocks at the beginning of the sample. Cases, where the number of portfolio constituents differs substantially due to the distribution of the characteristics, require careful consideration and, depending on the application, might require customized sorting approaches.
 
 ```{r}
-assign_portfolio <- function(data, var, n_portfolios) {
+assign_portfolio <- function(data, 
+                             sorting_variable, 
+                             n_portfolios) {
+  # Compute breakpoints
   breakpoints <- data |>
-    reframe(
-      breakpoint = quantile(
-        {{ var }},
-        probs = seq(0, 1, length.out = n_portfolios + 1),
-        na.rm = TRUE)
-      ) |>
-    pull(breakpoint) |>
-    as.numeric()
+    pull({{ sorting_variable }}) |>
+    quantile(
+      probs = seq(0, 1, length.out = n_portfolios + 1),
+      na.rm = TRUE,
+      names = FALSE
+    )
 
+  # Assign portfolios
   assigned_portfolios <- data |>
-    mutate(portfolio = findInterval({{ var }},
+    mutate(portfolio = findInterval(
+      pick(everything()) |>
+        pull({{ sorting_variable }}),
       breakpoints,
       all.inside = TRUE
     )) |>
     pull(portfolio)
-
+  
+  # Output
   return(assigned_portfolios)
 }
 ```
@@ -135,7 +142,7 @@ beta_portfolios <- data_for_sorts |>
   mutate(
     portfolio = assign_portfolio(
       data = pick(everything()),
-      var = beta_lag,
+      sorting_variable = beta_lag,
       n_portfolios = 10
     ),
     portfolio = as.factor(portfolio)

diff --git a/value-and-bivariate-sorts.qmd b/value-and-bivariate-sorts.qmd
@@ -10,9 +10,6 @@ The current chapter relies on this set of packages.
 ```{r, eval = TRUE, message = FALSE}
 library(tidyverse)
 library(RSQLite)
-library(scales)
-library(lmtest)
-library(sandwich)
 ```
 
 ## Data Preparation
@@ -92,28 +89,31 @@ data_for_sorts <- data_for_sorts |>
   drop_na()
 ```
 
-The last step of preparation for the portfolio sorts is the computation of breakpoints. We continue to use the same function allowing for the specification of exchanges to use for the breakpoints. Additionally, we reintroduce the argument `var` into the function for defining different sorting variables via `curly-curly`.\index{Curly-curly}
+The last step of preparation for the portfolio sorts is the computation of breakpoints. We continue to use the same function allowing for the specification of exchanges to use for the breakpoints. Additionally, we reintroduce the argument `sorting_variable` into the function for defining different sorting variables.
 
 ```{r}
-assign_portfolio <- function(data, var, n_portfolios, exchanges) {
+assign_portfolio <- function(data, 
+                             sorting_variable, 
+                             n_portfolios, 
+                             exchanges) {
   breakpoints <- data |>
     filter(exchange %in% exchanges) |>
-    reframe(
-      breakpoint = quantile(
-        {{ var }},
-        probs = seq(0, 1, length.out = n_portfolios + 1),
-        na.rm = TRUE)
-      ) |>
-    pull(breakpoint) |>
-    as.numeric()
+    pull({{ sorting_variable }}) |>
+    quantile(
+      probs = seq(0, 1, length.out = n_portfolios + 1),
+      na.rm = TRUE,
+      names = FALSE
+    )
 
   assigned_portfolios <- data |>
-    mutate(portfolio = findInterval({{ var }},
+    mutate(portfolio = findInterval(
+      pick(everything()) |>
+        pull({{ sorting_variable }}),
       breakpoints,
       all.inside = TRUE
     )) |>
     pull(portfolio)
-
+  
   return(assigned_portfolios)
 }
 ```
@@ -132,13 +132,13 @@ value_portfolios <- data_for_sorts |>
   mutate(
     portfolio_bm = assign_portfolio(
       data = pick(everything()),
-      var = bm,
+      sorting_variable = "bm",
       n_portfolios = 5,
       exchanges = c("NYSE")
     ),
     portfolio_me = assign_portfolio(
       data = pick(everything()),
-      var = me,
+      sorting_variable = "me",
       n_portfolios = 5,
       exchanges = c("NYSE")
     ),
@@ -170,22 +170,22 @@ The resulting annualized value premium is 4.608 percent.
 
 In the previous exercise, we assigned the portfolios without considering the second variable in the assignment. This protocol is called independent portfolio sorts. The alternative, i.e., dependent sorts, creates portfolios for the second sorting variable within each bucket of the first sorting variable.\index{Portfolio sorts!Dependent bivariate} In our example below, we sort firms into five size buckets, and within each of those buckets, we assign firms to five book-to-market portfolios. Hence, we have monthly breakpoints that are specific to each size group. The decision between independent and dependent portfolio sorts is another choice for the researcher. Notice that dependent sorts ensure an equal amount of stocks within each portfolio.
 
-To implement the dependent sorts, we first create the size portfolios by calling `assign_portfolio()` with `var = me`. Then, we group our data again by month and by the size portfolio before assigning the book-to-market portfolio. The rest of the implementation is the same as before. Finally, we compute the value premium.
+To implement the dependent sorts, we first create the size portfolios by calling `assign_portfolio()` with `sorting_variable = "me"`. Then, we group our data again by month and by the size portfolio before assigning the book-to-market portfolio. The rest of the implementation is the same as before. Finally, we compute the value premium.
 
 ```{r}
 value_portfolios <- data_for_sorts |>
   group_by(month) |>
   mutate(portfolio_me = assign_portfolio(
     data = pick(everything()),
-    var = me,
+    sorting_variable = "me",
     n_portfolios = 5,
     exchanges = c("NYSE")
   )) |>
   group_by(month, portfolio_me) |>
   mutate(
     portfolio_bm = assign_portfolio(
       data = pick(everything()),
-      var = bm,
+      sorting_variable = "bm",
       n_portfolios = 5,
       exchanges = c("NYSE")
     ),