I currently use the XML package only because of the XML::readHTMLTable function. The xml2 package does not have a function to read tables in the HTML files, correct? I've already tried using the rvest :: html_table function, but readHTMLTable is 10x faster and produces a cleaner data table.
In the example below, the rvest::html_table function creates one more column; included the table footer and handles the lines with NA. Already XML::readTable captures only the header and the body of the table.
However, the biggest "problem" for me is the issue of execution speed.
tab_example.html.gz
library(magrittr)
library(XML)
library(xml2)
# HTML - Table example
html <- "tab_example.html"
# Like XML
get_tab_XML <- html %>%
XML::htmlParse(encoding = "UTF-8") %>%
XML::readHTMLTable(stringsAsFactors = FALSE, which = 2)
# Like xml2
get_tab_xml2 <- html %>%
xml2::read_html() %>%
rvest::html_node("#tabelaResultado") %>%
rvest::html_table(fill = TRUE)

Unit: seconds
expr min lq mean median uq max neval
XML 3.816312 3.955153 4.173987 4.093994 4.352824 4.611654 3
xml2 33.720705 34.495118 35.144829 35.269531 35.856891 36.444251 3
I currently use the XML package only because of the
XML::readHTMLTablefunction. The xml2 package does not have a function to read tables in the HTML files, correct? I've already tried using thervest :: html_tablefunction, but readHTMLTable is 10x faster and produces a cleaner data table.In the example below, the
rvest::html_tablefunction creates one more column; included the table footer and handles the lines with NA. AlreadyXML::readTablecaptures only the header and the body of the table.However, the biggest "problem" for me is the issue of execution speed.
tab_example.html.gz