Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group_by calculate metric #421

Open
SHo-JANG opened this issue Apr 16, 2023 · 1 comment
Open

Group_by calculate metric #421

SHo-JANG opened this issue Apr 16, 2023 · 1 comment
Labels

Comments

@SHo-JANG
Copy link

SHo-JANG commented Apr 16, 2023

Here's what I want to do specifically.
For example, let's say I have monthly trading data for all tickers in the stock market.
I want to be able to sort the predicted returns for all stocks by year and month.
Then, I want to calculate a statistic for only the top 10% of stocks by predicted return for each month of the year.
The specific metric is up to you, but you can calculate the RMSE of the top 10% .
The goal is to tune the hyperparameters so that the actual returns of the top 10% predicted stocks are higher.

In other words, I want to find a hyperparameter that tends to get the top 10% right, even if it gets the bottom 90% wrong, rather than getting the whole universe right.

I tried to define custom_metric well, but it was limited. I wanted to put group_by(yearmonth) in the process, but I didn't really know how to do it.
so I made a makeshift code.

# customize_metric --------------------------------------------------------
# irr = The return of a portfolio of stocks with a predicted top 10% return, calculated monthly.
# return_pct = monthly cumulative return ratio. ex. 5.1~ 5.31 's cumulative return ratio.
irr_vec <- function(truth,
                    estimate,
                    n_tiles =10,
                    purpose_tile = 10, #predicted top 10% return =10 , bottom 10% = 1
                    na_rm = TRUE,
                    case_weights = NULL,
                    ...) {
  
  
  
  irr_impl <- function(truth, estimate,..., case_weights = NULL) {
    
    
    fold_index <<- NULL
    for( i in 1:dim(valid_years_splited)[1]){
      if(length(truth) == nrow(valid_years_splited$data[[i]]) ){
        fold_index <<-i
      } 
    }
    
    
    valid_years_splited$data[[fold_index]] |> 
      mutate(estimate = estimate) |> 
      group_by(yearmonth) |> 
      mutate(top_n_pct = ntile(estimate,10)) |> 
      filter(top_n_pct == purpose_tile) |> 
  #mean_y := portpolio which is composed by predicted return top10% 
      summarise(mean_y = mean(return_pct)) |> 
 #irr := portpolio 1year cumulative return ratio
      mutate(irr =cumprod(mean_y/100+1) )  |> slice_tail(n = 1) |> pull(irr) ->irr
    
    
    #If another folder has same length each other, this code is unusable .
    
# cross -validation summarize by mean default. So I can calculate geometric mean by log( )
    return(log(irr))      
    
    
  }
  
  metric_vec_template(
    metric_impl = irr_impl,
    truth = truth,
    estimate = estimate,
    na_rm = na_rm,
    case_weights = case_weights,
    cls = "numeric"
  )
}


irr <- function(data, ...) {
  UseMethod("irr")
}

irr <- new_numeric_metric(
  irr,
  direction = "maximize"
)

irr.data.frame <- function(data,
                           truth,
                           estimate,
                           na_rm = TRUE,
                           case_weights = NULL,
                           ...) {
  
  
  metric_summarizer(
    metric_nm = "irr",
    metric_fn = irr_vec,
    data = data,
    truth = !!enquo(truth),
    estimate = !!enquo(estimate),
    na_rm = na_rm,
    case_weights = !!enquo(case_weights)
  )
}

In the custom metric code, valid_years_splited is the result of organizing time-series-cross-validation into 3 folders with 1 year term. It is also defined as a global variable via <<-.
This results in three rows, each containing one year's worth of monthly stock trading data for all sectors. This is what we did to calculate the metric per folder.

However, I realize that this is not a perfect solution.

@EmilHvitfeldt
Copy link
Member

Hello @SHo-JANG 👋

I like the idea of what you are trying to do. Would you be able to show some example input data and what you would want the output to look like? I wanna make sure I completely understand what you are trying to accomplish before giving feedback 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants