textrecipes tuning parameters #16

EmilHvitfeldt · 2018-11-15T02:56:16Z

In accordance to tidymodels/textrecipes#14 here are my thoughts on what should be tunable.

step_texthash: num_terms integer.

step_tf: weight numeric.

step_tokenfilter: max numeric. min numeric. max_tokens integer.

Question.
Would something like weight_scheme in step_tf be tunable as it takes a couple of different (method as characters) values?

The text was updated successfully, but these errors were encountered:

topepo · 2018-11-19T22:31:00Z

Could you change min and max to something like min_occurance or min_times or something more specific?

Would something like weight_scheme in step_tf be tunable as it takes a couple of different (method as characters) values?

Sure. We have qualitative parameters in other models too.

EmilHvitfeldt · 2018-11-25T05:56:59Z

Could you change min and max to something like min_occurance or min_times or something more specific?

Done. Changed to *_times.

Sure. We have qualitative parameters in other models too.

Then I have these additions.

step_tf: weight_scheme takes the following values: "binary", "raw count", "term frequency", "log normalization", "double normalization".

step_tokenize: token takes the following values: "characters", "character_shingle", "lines", "ngrams", "paragraphs", "ptb", "regex", "sentences", "skip_ngrams", "tweets", "words", "word_stems".

EmilHvitfeldt · 2018-12-01T00:45:10Z

Would a whole step be dialable?

topepo · 2018-12-08T20:02:20Z

I've thought about the issue of including a step or not. We could add an eval_step option that is logical and add a tuning parameter for it that way.

EmilHvitfeldt · 2018-12-08T20:15:01Z

That would be great!

topepo · 2018-12-08T20:36:14Z

Should weight be between [0, 1]?

EmilHvitfeldt · 2018-12-08T20:38:35Z

Mainly weight should be positive. But I think it is reasonable to bound it in [0, 1].

topepo · 2018-12-08T20:49:28Z

Mind if I default weight to be on the log scale?

EmilHvitfeldt · 2018-12-08T20:51:53Z

That should be fine.

topepo · 2018-12-08T21:04:09Z

Take a look at this commit and let me know if the default ranges (or anything else) should be changed.

EmilHvitfeldt · 2018-12-08T21:10:15Z

Looks good.

topepo · 2018-12-08T21:14:58Z

Gak. I think that we need large numbers instead of Inf:

> max_times
Maximum Token Frequency  (quantitative)
Range: [1, Inf]
> grid_random(max_times, size = 5)
 Show Traceback
 
 Rerun with Debug
 Error in min(unlist(object$range)):max(unlist(object$range)) : 
  result would be too long a vector

What should we put in? We could do:

> .Machine$integer.max
[1] 2147483647
> library(dials)
> max_times
Maximum Token Frequency  (quantitative)
Range: [1, 2147483647]
> grid_random(max_times, size = 5)
# A tibble: 5 x 1
   max_times
       <int>
1 1024987753
2 2080355927
3 1342632065
4   48813909
5   85432412

Maybe something smaller like as.integer(10^5)?

EmilHvitfeldt · 2018-12-08T21:24:28Z

So in essence inf was taken as 'do not remove no matter how many times it appears'. But we can use as.integer(10^5). (I don't know and haven't been able to research on the subject, but i feel like this parameter would work well on log scale). Thinking 10, 100, 1000, 10000 is better then 2500, 5000, 7500, 10000

topepo · 2018-12-08T23:51:18Z

merged PR

github-actions · 2021-03-07T00:56:48Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

topepo added a commit that referenced this issue Dec 8, 2018

new parameters for #16

ea877ef

topepo closed this as completed Dec 8, 2018

github-actions bot locked and limited conversation to collaborators Mar 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textrecipes tuning parameters #16

textrecipes tuning parameters #16

EmilHvitfeldt commented Nov 15, 2018

topepo commented Nov 19, 2018

EmilHvitfeldt commented Nov 25, 2018

EmilHvitfeldt commented Dec 1, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

github-actions bot commented Mar 7, 2021

textrecipes tuning parameters #16

textrecipes tuning parameters #16

Comments

EmilHvitfeldt commented Nov 15, 2018

topepo commented Nov 19, 2018

EmilHvitfeldt commented Nov 25, 2018

EmilHvitfeldt commented Dec 1, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

EmilHvitfeldt commented Dec 8, 2018

topepo commented Dec 8, 2018

github-actions bot commented Mar 7, 2021