Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unconsistent behavior in word function #245

Closed
pgrandinetti opened this issue Jun 25, 2018 · 6 comments
Closed

Unconsistent behavior in word function #245

pgrandinetti opened this issue Jun 25, 2018 · 6 comments
Labels

Comments

@pgrandinetti
Copy link

@pgrandinetti pgrandinetti commented Jun 25, 2018

I don't get if the following behavior of the word function is desired, or is a bug

> words = "this is a very long sentence to test"
> word(words, start=-4, end=-1)  # last 4 words, OK
[1] "long sentence to test"
> word(words, start=-8, end=-1)  # OK
[1] "this is a very long sentence to test"
> word(words, start=-9, end=-1)  # OK, imho
[1] NA
> word(words, start=-10, end=-1)  # what is this?!
[1] "is a very long sentence to test" "a very long sentence to test"    "very long sentence to test"     
[4] "long sentence to test"           "sentence to test"                "to test"                        
[7] "test"    
@hadley
Copy link
Member

@hadley hadley commented Aug 28, 2018

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.

@pgrandinetti
Copy link
Author

@pgrandinetti pgrandinetti commented Aug 28, 2018

I guess we just need to clarify what is the behavior of the word function when the input argument start is negative and is greater than the sentence length.

From your doc:

start: integer vector giving position of first word to extract. Defaults to first word. If negative, counts backwards from last character.

But the behavior is not always clear. Sometimes I get a list as result.

If you copy this and run reprex() should give what you asked for (assuming that I understood what you meant!)

library(stringr)
library(reprex)
sentence = "this is a very long sentence to test"
word(sentence, start=-8, end=-1)  # OK
word(sentence, start=-9, end=-1)  # OK, imho because 9 > length
word(sentence, start=-10, end=-1)  # what is this, a vector?!

@hadley
Copy link
Member

@hadley hadley commented Aug 28, 2018

Please include the output of using reprex in the issue so I can see both the input and the output.

@pgrandinetti
Copy link
Author

@pgrandinetti pgrandinetti commented Aug 28, 2018

library(stringr)
library(reprex)
sentence = "this is a very long sentence to test"
word(sentence, start=-8, end=-1)  # OK
#> [1] "this is a very long sentence to test"
word(sentence, start=-9, end=-1)  # OK, imho because 9 > length
#> [1] NA
word(sentence, start=-10, end=-1)  # what is this, a vector?!
#> [1] "is a very long sentence to test" "a very long sentence to test"   
#> [3] "very long sentence to test"      "long sentence to test"          
#> [5] "sentence to test"                "to test"                        
#> [7] "test"

@hadley hadley added the bug label Jan 4, 2019
@pdelboca
Copy link
Contributor

@pdelboca pdelboca commented Feb 8, 2019

I've been testing this bug and the problem is that the formula it uses to convert negative values into actual positions doesn't work when the negative number used in since parameter is bigger (in absolute value) than the len of words in the sentence. (It still returns a negative start)

start[neg_start] <- start[neg_start] + len[neg_start] + 1L

As start is then used for indexing a matrix this causes the behavior reported by @pgrandinetti.

One way to solve the problem is, after converting the negative values, replace all the start values lower than 1 to 1. (Currently replaces only indexes past end with NA)

This will fix the issue and give consistent api with str_sub that returns all the string when abs(-since) > len(string).

If this solution is ok I could implement it and make a PR.

@hadley
Copy link
Member

@hadley hadley commented Feb 9, 2019

Sounds good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants