New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace BH by tidylibs #3674
replace BH by tidylibs #3674
Conversation
I have mixed feelings about this and I might still prefer #3672, i.e. embark the files in dplyr directly so that we are in better control. |
How many files are there in tidylib? |
it does not have all the files needed by readr yet (romainfrancois/tidylibs#1), but at the moment that's a fraction of bh: library(fs)
library(dplyr, warn.conflicts = FALSE)
f <- function(path){
dir_ls(path, recursive = TRUE, type = "file") %>%
file_info() %>%
summarise(n=n(), size = sum(size))
}
f("/Users/romain/git/tidyverse/tidylibs/inst/include")
#> # A tibble: 1 x 2
#> n size
#> <int> <fs::bytes>
#> 1 1114 6.33M
f("/Users/romain/git/tidyverse/tidylibs/bh/inst/include")
#> # A tibble: 1 x 2
#> n size
#> <int> <fs::bytes>
#> 1 11007 116M Created on 2018-06-14 by the reprex package (v0.2.0). |
With dependencies needed by readr: library(fs)
library(dplyr, warn.conflicts = FALSE)
f <- function(path){
dir_ls(path, recursive = TRUE, type = "file") %>%
file_info() %>%
summarise(n=n(), size = sum(size))
}
f("/Users/romain/git/tidyverse/tidylibs/inst/include")
#> # A tibble: 1 x 2
#> n size
#> <int> <fs::bytes>
#> 1 2958 28.3M
f("/Users/romain/git/tidyverse/tidylibs/bh/inst/include")
#> # A tibble: 1 x 2
#> n size
#> <int> <fs::bytes>
#> 1 11007 116M |
Let’s tackle this systematically — can you review the boost needs of all tidyverse and r-lib packages (I don’t think that many actually use bh) and then see how big that is? Cc @jimhester |
Looks like there are only 4 packages using BH in r-lib, tidyverse https://github.com/search?q=filename%3ADESCRIPTION+BH+user%3Ar-lib+user%3Atidyverse. In terms of tidylib we need to have very clear providence in the package about how it is assembled and what parts of boost are included to avoid it becoming a maintenance nightmare down the line. |
There is currently a script in data-raw/ that essentially looks for boost/ in the .cpp and .h files of dplyr and tidyr. bcp does the rest based on the boost that is in BH. The script i guess coule be generalized to mechanically do it. Also perhaps the headers that we currently use are being used because BH ships them anyway. Maybe we would not use the same if there was some sort of measure for its cost... |
This is what I get with
It's much better than BH, but is there any chance we could get rid of the top contributors |
We definitely use spirit in readr (although we might be able to switch to something that gives us similarly performance). What are mpl and fusion used for? They don't ring a bell, so maybe they're pulled in by something else? |
bcp --boost=/usr/local/Cellar/boost/1.67.0_1/include --list boost/spirit/include/qi.hpp
Given the extremely large number of dependencies for this file I think we should probably look into an alternative for readr in the future and also drop it from tidylibs for the time being. |
But |
library(tidyverse)
bcp <- function(libs){
map_df(libs, ~{
cmd <- str_glue("bcp --boost=/usr/local/include --list {.x}")
tibble::tibble(module = .x, file = system(cmd, intern = TRUE))
})
}
deps <- bcp(c("boost/spirit/include/qi.hpp", "shared_ptr", "scoped_ptr", "weak_ptr", "unordered_map", "unordered_set"))
options(width = 120)
deps <- deps %>%
# only directories
mutate(file = fs::path_dir(file)) %>%
unique() %>%
mutate(dep = TRUE) %>%
spread(module, dep, fill = FALSE)
deps
#> # A tibble: 291 x 7
#> file `boost/spirit/include/qi.hpp` scoped_ptr shared_ptr unordered_map unordered_set weak_ptr
#> <fs::path> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 boost TRUE TRUE TRUE TRUE TRUE TRUE
#> 2 boost/bind TRUE FALSE FALSE FALSE FALSE FALSE
#> 3 boost/concept TRUE FALSE FALSE FALSE FALSE FALSE
#> 4 boost/concept/detail TRUE FALSE FALSE FALSE FALSE FALSE
#> 5 boost/config TRUE TRUE TRUE TRUE TRUE TRUE
#> 6 boost/config/abi TRUE TRUE TRUE TRUE TRUE TRUE
#> 7 boost/config/compiler TRUE TRUE TRUE TRUE TRUE TRUE
#> 8 boost/config/detail TRUE TRUE TRUE TRUE TRUE TRUE
#> 9 boost/config/no_tr1 TRUE TRUE TRUE TRUE TRUE TRUE
#> 10 boost/config/platform TRUE TRUE TRUE TRUE TRUE TRUE
#> # ... with 281 more rows
deps %>%
filter(file %in% c("boost/mpl", "boost/fusion"))
#> # A tibble: 2 x 7
#> file `boost/spirit/include/qi.hpp` scoped_ptr shared_ptr unordered_map unordered_set weak_ptr
#> <fs::path> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 boost/fusion TRUE FALSE FALSE FALSE FALSE FALSE
#> 2 boost/mpl TRUE FALSE FALSE FALSE FALSE FALSE Created on 2018-06-18 by the reprex package (v0.2.0). |
Let's drop spirit from tidylibs (which I'd suggest renaming to tinybh), and when we update readr next we'll switch from BH to tinybh, and figure out how to replace |
The rationale of the naming was to allow non boost libraries too, in case we decided to switch to some other impl of hash maps or whatever. |
Hmmmm, I think if we use other libraries, we'd be picking them because they're small and embeddable, and so we'd embed in individual packages. |
I see: $ bcp --boost=/home/kirill/R/x86_64-pc-linux-gnu-library/3.5/BH/include/ shared_ptr scoped_ptr weak_ptr functional/hash unordered_map unordered_set .
$ du --inodes boost | sort -nr | head
1222 boost
595 boost/mpl
577 boost/mpl/aux_
529 boost/mpl/aux_/preprocessed
151 boost/predef
149 boost/preprocessor
89 boost/config
66 boost/type_traits
49 boost/smart_ptr
48 boost/mpl/aux_/preprocessed/plain Maybe the provenance detection is imperfect because |
This must be different versions of boost? What version are you using (I am using 1.67.0) and there is no mpl used for those modules. > bcp --boost=/usr/local/Cellar/boost/1.67.0_1/include shared_ptr scoped_ptr weak_ptr functional/hash unordered_map unordered_set .
> du --inodes boost | sort -nr | head | pbcopy
559 boost
151 boost/predef
87 boost/config
83 boost/preprocessor
69 boost/type_traits
49 boost/smart_ptr
44 boost/smart_ptr/detail
30 boost/predef/compiler
27 boost/config/compiler
25 boost/predef/os |
boost 1.63. Interesting. I remember seeing the same with BH -- maybe my |
Closing that for now, because I'm relatively confident that we'll use hashing from |
What about all the smart pointer stuff? |
I'd say that uses much less files, and potentially we don't need boost:
And I'm not so sure why |
I still think an approach like tidyboost might be useful in the short-term, even if we end up eliminating it in the long-term. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Alternative to #3672 where the files we need from boost are in the
tidylibs
📦