Table parsing could be better #111

hadley · 2015-10-08T19:43:36Z

library(rvest)

url <- "http://www.basketball-reference.com/boxscores/201506140GSW.html"
page <- read_html(url)

page %>% 
  html_nodes(".stats_table") %>% 
  .[[3]] %>% 
  html_table(header = FALSE)

johncollins · 2015-11-15T21:58:47Z

It seems like some synergy could be captured here between this issue, #111, and #116. It seems as if the way to solve the colspan and rowspan issue is by copying the element, i.e. if cell[2,3] has colspan=2, then cell[2, 4] <- cell[2, 3]. I just left a PR (#119) with the more straightforward rowspan part solved in the manner proposed. WRT colspan, currently the first column element is set to the value of the colspan element and then the others spanned by it are filled with NAs. The only issue with the solution proposed above (i.e. repetition) would then be the header whereby columns cannot have equal names. This could then be viewed as a special case, solved by #116 as proposed by @r2evans.

johncollins mentioned this issue Dec 2, 2015

Implementation of rowspan parsing plus tests #119

Merged

hadley closed this as completed in 0158a77 Dec 12, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table parsing could be better #111

Table parsing could be better #111

hadley commented Oct 8, 2015

johncollins commented Nov 15, 2015

Table parsing could be better #111

Table parsing could be better #111

Comments

hadley commented Oct 8, 2015

johncollins commented Nov 15, 2015