Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table parsing could be better #111

Closed
hadley opened this issue Oct 8, 2015 · 1 comment
Closed

Table parsing could be better #111

hadley opened this issue Oct 8, 2015 · 1 comment

Comments

@hadley
Copy link
Member

hadley commented Oct 8, 2015

library(rvest)

url <- "http://www.basketball-reference.com/boxscores/201506140GSW.html"
page <- read_html(url)

page %>% 
  html_nodes(".stats_table") %>% 
  .[[3]] %>% 
  html_table(header = FALSE)
@johncollins
Copy link
Contributor

It seems like some synergy could be captured here between this issue, #111, and #116. It seems as if the way to solve the colspan and rowspan issue is by copying the element, i.e. if cell[2,3] has colspan=2, then cell[2, 4] <- cell[2, 3]. I just left a PR (#119) with the more straightforward rowspan part solved in the manner proposed. WRT colspan, currently the first column element is set to the value of the colspan element and then the others spanned by it are filled with NAs. The only issue with the solution proposed above (i.e. repetition) would then be the header whereby columns cannot have equal names. This could then be viewed as a special case, solved by #116 as proposed by @r2evans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants