Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset and dplyr::filter give different results when some columns are un-named #483

Closed
jimoeppen opened this issue Jul 4, 2014 · 3 comments
Assignees
Labels
Milestone

Comments

@jimoeppen
Copy link

@jimoeppen jimoeppen commented Jul 4, 2014

Hi,

If you read in data and only give names to some columns, then subset and filter give different results. For example, assume the following data in file test.csv

t <- read.csv( textConnection("
1,11,16,21
2,12,17,22
3,13,18,23
4,14,19,24
5,15,20,25
"), header = FALSE)
colnames(t) <- c("ID", "X")
 t
#   ID  X NA NA
#1  1 11 16 21
#2  2 12 17 22
#3  3 13 18 23
#4  4 14 19 24
#5  5 15 20 25

subset(t, ID < 3)
#  ID  X NA NA.1
#1  1 11 16   21
#2  2 12 17   22

t %>% dplyr::filter(ID < 3)
#  ID  X NA NA
#1  1 11 16 16
#2  2 12 17 17

filter() seems to make multiple copies of the first NA column it encounters. Obviously it is unwise to name some columns and not others, but it would be safer to make filter() more idiot-proof like subset().

Best wishes,
Jim

@hadley hadley added the bug label Jul 28, 2014
@hadley hadley added this to the 0.3 milestone Jul 28, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 10, 2014

@hadley do we really want that. I mean #wat:

> subset(t, ID < 3) %>% attr( "names" )
[1] "ID"   "X"    NA     "NA.1"

Can I suggest having some test that shouts if the colnames is not of the right size.

Loading

@hadley
Copy link
Member

@hadley hadley commented Sep 10, 2014

Good point. Can we add a generic test to ensure that the colnames are unique?

Also not priority so could wait until 0.3.1

Loading

@hadley hadley added this to the 0.3.1 milestone Sep 12, 2014
@hadley hadley removed this from the 0.3 milestone Sep 12, 2014
@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Sep 22, 2014

I've added the check_valid_colnames function which is now used in many verbs. it asserts that colnames are unique.

Loading

@hadley hadley closed this Sep 22, 2014
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants