Skip to content

Commit

Permalink
Merge pull request #87 from tidy-finance/r-update-joins
Browse files Browse the repository at this point in the history
Update *_join() to use join_by() instead of by
  • Loading branch information
christophscheuch committed Jan 6, 2024
2 parents de74ea5 + 67a60d3 commit e9ab1a3
Show file tree
Hide file tree
Showing 67 changed files with 565 additions and 559 deletions.

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions _freeze/r/beta-estimation/execute-results/html.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 1 addition & 3 deletions _freeze/r/cover-and-logo-design/execute-results/html.json
Expand Up @@ -3,9 +3,7 @@
"result": {
"engine": "knitr",
"markdown": "---\ntitle: Cover and Logo Design\naliases:\n - ../cover-and-logo-design.html\npre-render:\n - pre_render_script.R\n---\n\n\nThe cover of the book is inspired by the fast growing generative art community in R.\\index{Generative art}\nGenerative art refers to art that in whole or in part has been created with the use of an autonomous system. \nInstead of creating random dynamics we rely on what is core to the book: The evolution of financial markets. \nEach circle in the cover figure corresponds to daily market return within one year of our sample. Deviations from the circle line indicate positive or negative returns. \nThe colors are determined by the standard deviation of market returns during the particular year. \nThe few lines of code below replicate the entire figure. \nWe use the Wes Andersen color palette (also throughout the entire book), provided by the package `wesanderson` [@wesanderson]\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(RSQLite)\nlibrary(wesanderson)\n\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\nfactors_ff3_daily <- tbl(\n tidy_finance,\n \"factors_ff3_daily\"\n) |>\n collect()\n\ndata_plot <- factors_ff3_daily |>\n select(date, mkt_excess) |>\n group_by(year = floor_date(date, \"year\")) |>\n mutate(group_id = cur_group_id())\n\ndata_plot <- data_plot |>\n group_by(group_id) |>\n mutate(\n day = 2 * pi * (1:n()) / 252,\n ymin = pmin(1 + mkt_excess, 1),\n ymax = pmax(1 + mkt_excess, 1),\n vola = sd(mkt_excess)\n ) |>\n filter(year >= \"1962-01-01\" & year <= \"2021-12-31\")\n\nlevels <- data_plot |>\n distinct(group_id, vola) |>\n arrange(vola) |>\n pull(vola)\n\ncp <- coord_polar(\n direction = -1,\n clip = \"on\"\n)\n\ncp$is_free <- function() TRUE\ncolors <- wes_palette(\"Zissou1\",\n n_groups(data_plot),\n type = \"continuous\"\n)\n\ncover <- data_plot |>\n mutate(vola = factor(vola, levels = levels)) |>\n ggplot(aes(\n x = day,\n y = mkt_excess,\n group = group_id,\n fill = vola\n )) +\n cp +\n geom_ribbon(aes(\n ymin = ymin,\n ymax = ymax,\n fill = vola\n ), alpha = 0.90) +\n theme_void() +\n facet_wrap(~group_id,\n ncol = 10,\n scales = \"free\"\n ) +\n theme(\n strip.text.x = element_blank(),\n legend.position = \"None\",\n panel.spacing = unit(-5, \"lines\")\n ) +\n scale_fill_manual(values = colors)\n\nggsave(\n plot = cover,\n width = 10,\n height = 6,\n filename = \"images/cover.png\",\n bg = \"white\"\n)\n```\n:::\n\n\nTo generate our logo, we focus on year 2021 - the end of the sample period at the time we published tidy-finance.org for the first time. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nlogo <- data_plot |>\n ungroup() |> \n filter(year == \"2021-01-01\") |> \n mutate(vola = factor(vola, levels = levels)) |>\n ggplot(aes(\n x = day,\n y = mkt_excess,\n fill = vola\n )) +\n cp +\n geom_ribbon(aes(\n ymin = ymin,\n ymax = ymax,\n fill = vola\n ), alpha = 0.90) +\n theme_void() +\n theme(\n strip.text.x = element_blank(),\n legend.position = \"None\",\n plot.margin = unit(c(-0.15,-0.15,-0.15,-0.15), \"null\")\n ) +\n scale_fill_manual(values = \"white\") \n\nggsave(\n plot = logo,\n width = 840,\n height = 840,\n units = \"px\",\n filename = \"images/logo-website-white.png\",\n)\n\nggsave(\n plot = logo +\n scale_fill_manual(values = wes_palette(\"Zissou1\")[1]), \n width = 840,\n height = 840,\n units = \"px\",\n filename = \"images/logo-website.png\",\n)\n```\n:::\n\n\nHere is the code to generate the vector graphics for our buttons.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nbutton_r <- data_plot |>\n ungroup() |> \n filter(year == \"2000-01-01\") |> \n mutate(vola = factor(vola, levels = levels)) |>\n ggplot(aes(\n x = day,\n y = mkt_excess,\n fill = vola\n )) +\n cp +\n geom_ribbon(aes(\n ymin = ymin,\n ymax = ymax,\n fill = vola\n ), alpha = 0.90) +\n theme_void() +\n theme(\n strip.text.x = element_blank(),\n legend.position = \"None\",\n plot.margin = unit(c(-0.15,-0.15,-0.15,-0.15), \"null\")\n ) \n\nggsave(\n plot = button_r +\n scale_fill_manual(values = wes_palette(\"Zissou1\")[1]), \n width = 100,\n height = 100,\n units = \"px\",\n filename = \"images/button-r-blue.svg\",\n)\n\nggsave(\n plot = button_r +\n scale_fill_manual(values = wes_palette(\"Zissou1\")[4]), \n width = 100,\n height = 100,\n units = \"px\",\n filename = \"images/button-r-orange.svg\",\n)\n\nbutton_python <- data_plot |>\n ungroup() |> \n filter(year == \"1991-01-01\") |> \n mutate(vola = factor(vola, levels = levels)) |>\n ggplot(aes(\n x = day,\n y = mkt_excess,\n fill = vola\n )) +\n cp +\n geom_ribbon(aes(\n ymin = ymin,\n ymax = ymax,\n fill = vola\n ), alpha = 0.90) +\n theme_void() +\n theme(\n strip.text.x = element_blank(),\n legend.position = \"None\",\n plot.margin = unit(c(-0.15,-0.15,-0.15,-0.15), \"null\")\n ) \n\nggsave(\n plot = button_python +\n scale_fill_manual(values = wes_palette(\"Zissou1\")[1]), \n width = 100,\n height = 100,\n units = \"px\",\n filename = \"images/button-python-blue.svg\",\n)\n\nggsave(\n plot = button_python +\n scale_fill_manual(values = wes_palette(\"Zissou1\")[4]), \n width = 100,\n height = 100,\n units = \"px\",\n filename = \"images/button-python-orange.svg\",\n)\n```\n:::\n",
"supporting": [
"cover-and-logo-design_files"
],
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down
4 changes: 2 additions & 2 deletions _freeze/r/difference-in-differences/execute-results/html.json

Large diffs are not rendered by default.

Large diffs are not rendered by default.

4 changes: 1 addition & 3 deletions _freeze/r/fama-macbeth-regressions/execute-results/html.json
Expand Up @@ -3,9 +3,7 @@
"result": {
"engine": "knitr",
"markdown": "---\ntitle: Fama-MacBeth Regressions\naliases:\n - ../fama-macbeth-regressions.html\npre-render:\n - pre_render_script.R\nmetadata:\n pagetitle: Fama-MacBeth Regressions with R\n description-meta: Estimate risk premiums via Fama-MacBeth regressions using the programming language R.\n---\n\n\nIn this chapter, we present a simple implementation of @Fama1973, a regression approach commonly called Fama-MacBeth regressions. Fama-MacBeth regressions are widely used in empirical asset pricing studies. We use individual stocks as test assets to estimate the risk premium associated with the three factors included in @Fama1993.\n\nResearchers use the two-stage regression approach to estimate risk premiums in various markets, but predominately in the stock market. \nEssentially, the two-step Fama-MacBeth regressions exploit a linear relationship between expected returns and exposure to (priced) risk factors. \nThe basic idea of the regression approach is to project asset returns on factor exposures or characteristics that resemble exposure to a risk factor in the cross-section in each time period. \nThen, in the second step, the estimates are aggregated across time to test if a risk factor is priced. \nIn principle, Fama-MacBeth regressions can be used in the same way as portfolio sorts introduced in previous chapters.\n\n\\index{Regression!Fama-MacBeth}\\index{Fama-MacBeth} The Fama-MacBeth procedure is a simple two-step approach: \nThe first step uses the exposures (characteristics) as explanatory variables in $T$ cross-sectional regressions. For example, if $r_{i,t+1}$ denote the excess returns of asset $i$ in month $t+1$, then the famous Fama-French three factor model implies the following return generating process [see also @Campbell1998]:\n$$\\begin{aligned}r_{i,t+1} = \\alpha_i + \\lambda^{M}_t \\beta^M_{i,t} + \\lambda^{SMB}_t \\beta^{SMB}_{i,t} + \\lambda^{HML}_t \\beta^{HML}_{i,t} + \\epsilon_{i,t}.\\end{aligned}$$ \nHere, we are interested in the compensation $\\lambda^{f}_t$ for the exposure to each risk factor $\\beta^{f}_{i,t}$ at each time point, i.e., the risk premium. Note the terminology: $\\beta^{f}_{i,t}$ is a asset-specific characteristic, e.g., a factor exposure or an accounting variable. *If* there is a linear relationship between expected returns and the characteristic in a given month, we expect the regression coefficient to reflect the relationship, i.e., $\\lambda_t^{f}\\neq0$. \n\nIn the second step, the time-series average $\\frac{1}{T}\\sum_{t=1}^T \\hat\\lambda^{f}_t$ of the estimates $\\hat\\lambda^{f}_t$ can then be interpreted as the risk premium for the specific risk factor $f$. We follow @Zaffaroni2022 and consider the standard cross-sectional regression to predict future returns. If the characteristics are replaced with time $t+1$ variables, then the regression approach captures risk attributes rather than risk premiums. \n\nBefore we move to the implementation, we want to highlight that the characteristics, e.g., $\\hat\\beta^{f}_{i}$, are often estimated in a separate step before applying the actual Fama-MacBeth methodology. You can think of this as a *step 0*. You might thus worry that the errors of $\\hat\\beta^{f}_{i}$ impact the risk premiums' standard errors. Measurement error in $\\hat\\beta^{f}_{i}$ indeed affects the risk premium estimates, i.e., they lead to biased estimates. The literature provides adjustments for this bias [see, e.g., @Shanken1992; @Kim1995; @Chen2015, among others] but also shows that the bias goes to zero as $T \\to \\infty$. We refer to @Gagliardini2016 for an in-depth discussion also covering the case of time-varying betas. Moreover, if you plan to use Fama-MacBeth regressions with individual stocks: @Hou2020 advocates using weighed-least squares to estimate the coefficients such that they are not biased toward small firms. Without this adjustment, the high number of small firms would drive the coefficient estimates.\n\nThe current chapter relies on this set of R packages. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(tidyverse)\nlibrary(RSQLite)\nlibrary(sandwich)\nlibrary(broom)\n```\n:::\n\n\n## Data Preparation\n\nWe illustrate @Fama1973 with the monthly CRSP sample and use three characteristics to explain the cross-section of returns: market capitalization, the book-to-market ratio, and the CAPM beta (i.e., the covariance of the excess stock returns with the market excess returns). We collect the data from our `SQLite`-database introduced in [Accessing and Managing Financial Data](accessing-and-managing-financial-data.qmd) and [WRDS, CRSP, and Compustat](wrds-crsp-and-compustat.qmd).\\index{Data!CRSP}\\index{Data!Compustat}\\index{Beta}\n\n\n::: {.cell}\n\n```{.r .cell-code}\ntidy_finance <- dbConnect(\n SQLite(),\n \"data/tidy_finance_r.sqlite\",\n extended_types = TRUE\n)\n\ncrsp_monthly <- tbl(tidy_finance, \"crsp_monthly\") |>\n select(permno, gvkey, month, ret_excess, mktcap) |>\n collect()\n\ncompustat <- tbl(tidy_finance, \"compustat\") |>\n select(datadate, gvkey, be) |>\n collect()\n\nbeta <- tbl(tidy_finance, \"beta\") |>\n select(month, permno, beta_monthly) |>\n collect()\n```\n:::\n\n\nWe use the Compustat and CRSP data to compute the book-to-market ratio and the (log) market capitalization.\\index{Book-to-market ratio}\\index{Market capitalization} \nFurthermore, we also use the CAPM betas based on monthly returns we computed in the previous chapters.\\index{Beta}\\index{CAPM}\n\n\n::: {.cell}\n\n```{.r .cell-code}\ncharacteristics <- compustat |>\n mutate(month = floor_date(ymd(datadate), \"month\")) |>\n left_join(crsp_monthly, by = c(\"gvkey\", \"month\")) |>\n left_join(beta, by = c(\"permno\", \"month\")) |>\n transmute(gvkey,\n bm = be / mktcap,\n log_mktcap = log(mktcap),\n beta = beta_monthly,\n sorting_date = month %m+% months(6)\n )\n\ndata_fama_macbeth <- crsp_monthly |>\n left_join(characteristics, by = c(\"gvkey\", \"month\" = \"sorting_date\")) |>\n group_by(permno) |>\n arrange(month) |>\n fill(c(beta, bm, log_mktcap), .direction = \"down\") |>\n ungroup() |>\n left_join(crsp_monthly |>\n select(permno, month, ret_excess_lead = ret_excess) |>\n mutate(month = month %m-% months(1)),\n by = c(\"permno\", \"month\")\n ) |>\n select(permno, month, ret_excess_lead, beta, log_mktcap, bm) |>\n drop_na()\n```\n:::\n\n\n## Cross-sectional Regression\n\nNext, we run the cross-sectional regressions with the characteristics as explanatory variables for each month. We regress the returns of the test assets at a particular time point on the characteristics of each asset. By doing so, we get an estimate of the risk premiums $\\hat\\lambda^{f}_t$ for each point in time. \\index{Regression!Cross-section}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nrisk_premiums <- data_fama_macbeth |>\n nest(data = c(ret_excess_lead, beta, log_mktcap, bm, permno)) |>\n mutate(estimates = map(\n data,\n ~ tidy(lm(ret_excess_lead ~ beta + log_mktcap + bm, data = .x))\n )) |>\n unnest(estimates)\n```\n:::\n\n\n## Time-Series Aggregation\n\nNow that we have the risk premiums' estimates for each period, we can average across the time-series dimension to get the expected risk premium for each characteristic. Similarly, we manually create the $t$-test statistics for each regressor, which we can then compare to usual critical values of 1.96 or 2.576 for two-tailed significance tests. \n\n\n::: {.cell}\n\n```{.r .cell-code}\nprice_of_risk <- risk_premiums |>\n group_by(factor = term) |>\n summarize(\n risk_premium = mean(estimate) * 100,\n t_statistic = mean(estimate) / sd(estimate) * sqrt(n())\n )\n```\n:::\n\n\nIt is common to adjust for autocorrelation when reporting standard errors of risk premiums. As in [Univariate Portfolio Sorts](univariate-portfolio-sorts.qmd), the typical procedure for this is computing @Newey1987 standard errors. We again recommend the data-driven approach of @Newey1994 using the `NeweyWest()` function, but note that you can enforce the typical 6 lag settings via `NeweyWest(., lag = 6, prewhite = FALSE)`.\\index{Standard errors!Newey-West}\n\n\n::: {.cell}\n\n```{.r .cell-code}\nregressions_for_newey_west <- risk_premiums |>\n select(month, factor = term, estimate) |>\n nest(data = c(month, estimate)) |>\n mutate(\n model = map(data, ~ lm(estimate ~ 1, .)),\n mean = map(model, tidy)\n )\n\nprice_of_risk_newey_west <- regressions_for_newey_west |>\n mutate(newey_west_se = map_dbl(model, ~ sqrt(NeweyWest(.)))) |>\n unnest(mean) |>\n mutate(t_statistic_newey_west = estimate / newey_west_se) |>\n select(factor,\n risk_premium = estimate,\n t_statistic_newey_west\n )\n\nleft_join(price_of_risk,\n price_of_risk_newey_west |>\n select(factor, t_statistic_newey_west),\n by = \"factor\"\n)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 4 × 4\n factor risk_premium t_statistic t_statistic_newey_west\n <chr> <dbl> <dbl> <dbl>\n1 (Intercept) 1.22 4.77 3.98 \n2 beta 0.00515 0.0499 0.0446\n3 bm 0.151 3.22 2.75 \n4 log_mktcap -0.104 -2.94 -2.60 \n```\n\n\n:::\n:::\n\n\nFinally, let us interpret the results. Stocks with higher book-to-market ratios earn higher expected future returns, which is in line with the value premium. The negative value for log market capitalization reflects the size premium for smaller stocks. Consistent with results from earlier chapters, we detect no relation between beta and future stock returns.\n\n## Exercises\n\n1. Download a sample of test assets from Kenneth French's homepage and reevaluate the risk premiums for industry portfolios instead of individual stocks.\n1. Use individual stocks with weighted-least squares based on a firm's size as suggested by @Hou2020. Then, repeat the Fama-MacBeth regressions without the weighting scheme adjustment but drop the smallest 20 percent of firms each month. Compare the results of the three approaches. ",
"supporting": [
"fama-macbeth-regressions_files"
],
"supporting": [],
"filters": [
"rmarkdown/pagebreak.lua"
],
Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Binary file modified _freeze/r/introduction-to-tidy-finance/figure-html/fig-100-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/r/introduction-to-tidy-finance/figure-html/fig-101-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/r/introduction-to-tidy-finance/figure-html/fig-103-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/r/introduction-to-tidy-finance/figure-html/fig-104-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/r/introduction-to-tidy-finance/figure-html/fig-105-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _freeze/r/introduction-to-tidy-finance/figure-html/fig-106-1.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

0 comments on commit e9ab1a3

Please sign in to comment.