Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one plot/panel to show seconds vs N #52

Closed
DorisAmoakohene opened this issue Jun 12, 2024 · 14 comments
Closed

one plot/panel to show seconds vs N #52

DorisAmoakohene opened this issue Jun 12, 2024 · 14 comments

Comments

@DorisAmoakohene
Copy link
Collaborator

DorisAmoakohene commented Jun 12, 2024

@tdhock, @Anirban166 and I have been trying to figure out how to create a plot for the gitHub action for only seconds and N, he run into a series of errors that he's been unable to resolve so far.
@Anirban166, Kindly share some of the error messages so @tdhock can assist.

I will also share what did also

@Anirban166
Copy link
Collaborator

I'm not aware of what you tried? Please share what you did, cause its not "we've" for what I tried.

I was simply trying to hack the plot method to filter out the time data (for which I sent you a snippet on Monday after our meeting, which you can work on?), and I think modifying this is the easiest way to go about it:

atime/R/atime.R

Line 201 in 27bc253

plot.atime <- function(x, ...){

I can continue debugging those errors now (will post in another issue) or Toby can ship something out faster (which is why I told you to contact him since the start of this week) but I would suggest you to try something yourself in the meantime instead of just waiting.

@tdhock
Copy link
Owner

tdhock commented Jun 13, 2024

just use atime.result$measurements as the data in a ggplot

@DorisAmoakohene
Copy link
Collaborator Author

Yes I did that in this code, But my problem is how to get the base, CRAN and head of the graph as well
Can that also be found in the atime.result$measurements?

library(atime)
library(ggplot2)
library(data.table)

tdir <- tempfile()
dir.create(tdir)
git2r::clone("https://github.com/Rdatatable/data.table", tdir)

atime.result <- atime::atime_versions(
  pkg.path=tdir,
  pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
    pkg_find_replace <- function(glob, FIND, REPLACE){
      atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
    }
    Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
    Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
    new.Package_ <- paste0(Package_, "_", sha)
    pkg_find_replace(
      "DESCRIPTION", 
      paste0("Package:\\s+", old.Package),
      paste("Package:", new.Package))
    pkg_find_replace(
      file.path("src","Makevars.*in"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      sprintf('packageVersion\\("%s"\\)', old.Package),
      sprintf('packageVersion\\("%s"\\)', new.Package))
    pkg_find_replace(
      file.path("src", "init.c"),
      paste0("R_init_", Package_regex),
      paste0("R_init_", gsub("[.]", "_", new.Package_)))
    pkg_find_replace(
      "NAMESPACE",
      sprintf('useDynLib\\("?%s"?', Package_regex),
      paste0('useDynLib(', new.Package_))
  },
  N=10^seq(1,7),
  setup={ 
    DT = replicate(N, 1, simplify = FALSE)
  },
  expr=data.table:::setDT(DT),
  "slow"= "c4a2085e35689a108d67dacb2f8261e4964d7e12", #Parent of the first commit in the PR that fixes the issue(https://github.com/Rdatatable/data.table/commit/7cc4da4c1c8e568f655ab5167922dcdb75953801),
  "Fast"="1872f473b20fdcddc5c1b35d79fe9229cd9a1d15")#last commit in the PR that fixes the issue (https://github.com/Rdatatable/data.table/pull/5427/commits)

png("new.gg.png",res = 200, width = 5, height = 3, unit = "in")
new.gg = ggplot()+
  geom_line(aes(x = N, y = median, group = expr.name, colour = expr.name), data =  atime.result$measurements)+
  geom_ribbon(aes(x = N, ymin = min, ymax = max, fill = expr.name), data = atime.result$measurements, alpha = 0.5 )+
  scale_x_log10("N = data size",limits = c(NA, 1e5))+
  scale_y_log10("Computational Time (Seconds)")
directlabels::direct.label(new.gg, list(cex = 0.8, "right.polygons"))
dev.off()

new gg

@tdhock
Copy link
Owner

tdhock commented Jun 13, 2024

you can add HEAD, etc as other versions, arguments to atime_versions (along with fast and slow)
for CRAN version, use CRAN="" empty string means use version from CRAN (not github)

@DorisAmoakohene
Copy link
Collaborator Author

yes, sure thanks, got it

@tdhock
Copy link
Owner

tdhock commented Jun 13, 2024

make sure to do install.packages("data.table") to get most recent CRAN version, here are the docs from ?atime_versions

    For convenience, versions can be specified either as code (...),
     data (‘sha.vec’), or both.  Each version should be either ‘""’ (to
     use currently installed version of package, or if missing, install
     most recent version from CRAN) or a SHA1 hash, which is passed as
     branch arg to ‘git2r::checkout’; version names used to
     identify/interpret the output/plots.

@DorisAmoakohene
Copy link
Collaborator Author

new gg1
This is the new plot

library(atime)
library(ggplot2)
library(data.table)

tdir <- tempfile()
dir.create(tdir)
git2r::clone("https://github.com/Rdatatable/data.table", tdir)

atime.result <- atime::atime_versions(
  pkg.path=tdir,
  pkg.edit.fun=function(old.Package, new.Package, sha, new.pkg.path){
    pkg_find_replace <- function(glob, FIND, REPLACE){
      atime::glob_find_replace(file.path(new.pkg.path, glob), FIND, REPLACE)
    }
    Package_regex <- gsub(".", "_?", old.Package, fixed=TRUE)
    Package_ <- gsub(".", "_", old.Package, fixed=TRUE)
    new.Package_ <- paste0(Package_, "_", sha)
    pkg_find_replace(
      "DESCRIPTION", 
      paste0("Package:\\s+", old.Package),
      paste("Package:", new.Package))
    pkg_find_replace(
      file.path("src","Makevars.*in"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      Package_regex,
      new.Package_)
    pkg_find_replace(
      file.path("R", "onLoad.R"),
      sprintf('packageVersion\\("%s"\\)', old.Package),
      sprintf('packageVersion\\("%s"\\)', new.Package))
    pkg_find_replace(
      file.path("src", "init.c"),
      paste0("R_init_", Package_regex),
      paste0("R_init_", gsub("[.]", "_", new.Package_)))
    pkg_find_replace(
      "NAMESPACE",
      sprintf('useDynLib\\("?%s"?', Package_regex),
      paste0('useDynLib(', new.Package_))
  },
  N=10^seq(1,7),
  setup={ 
    DT = replicate(N, 1, simplify = FALSE)
  },
  expr=data.table:::setDT(DT),
  "slow"= "c4a2085e35689a108d67dacb2f8261e4964d7e12", #Parent of the first commit in the PR that fixes the issue(https://github.com/Rdatatable/data.table/commit/7cc4da4c1c8e568f655ab5167922dcdb75953801),#last commit in the PR that fixes the issue (https://github.com/Rdatatable/data.table/pull/5427/commits)
  "Fast"="1872f473b20fdcddc5c1b35d79fe9229cd9a1d15",#tag 1.15.4,https://github.com/Rdatatable/data.table/commit/aa75d79376478b3e8f80fd6f31dcf53be8bf3404
  "Head"="ff900d1e6a8bcfaa0385bd1304af7b90657d3c4d", #lastest commit on the data.table page https://github.com/Rdatatable/data.table/commit/ff900d1e6a8bcfaa0385bd1304af7b90657d3c4d
  "CRAN"="",
  "BASE"="3e5d038ee4a800104b665ac39392ceed46b1189d" #1.14.6,https://github.com/Rdatatable/data.table/commit/3e5d038ee4a800104b665ac39392ceed46b1189d)
)
  
  png("new.gg1.png", res = 600, width = 15, height = 10, unit = "in")
new.gg1 = ggplot() +
  geom_line(aes(x = N, y = median, group = expr.name, colour = expr.name), data = atime.result$measurements) +
  geom_ribbon(aes(x = N, ymin = min, ymax = max, fill = expr.name), data = atime.result$measurements, alpha = 0.7) +
  labs(title = "setDT extremely slow for very wide input #5426") +
  theme(
    plot.title = element_text(size = 30),
    text = element_text(size = 30)
  ) +
  scale_x_log10("N = data size", limits = c(NA, 1e8)) +
  scale_y_log10("Computational Time (Seconds)")
directlabels::direct.label(new.gg1, list(cex = 3, "right.polygons")) 
dev.off()

@Anirban166
Copy link
Collaborator

new gg1 This is the new plot

Looks good to me 👍🏻

@Anirban166
Copy link
Collaborator

For that test case (or this in your repository), you might want to update it to the code being used in data.table as currently only the CRAN version is aligned with the Slow label (the PR/5427 was merged to fix this so BASE at present should be with the Fast label)

Seems like you're running this with an older version of data.table in your system.

@tdhock
Copy link
Owner

tdhock commented Jun 13, 2024

looks good overall
I would change "BASE" to "base=master"
good to keep CRAN all caps though
If please keep slow/fast consistently capitalized, either Slow/Fast or slow/fast but not slow/Fast please

@DorisAmoakohene
Copy link
Collaborator Author

new gg1

@tdhock
Copy link
Owner

tdhock commented Jun 14, 2024

HEAD should be all caps https://stackoverflow.com/questions/2304087/what-is-head-in-git
why can't we see any geom_line for median?

@DorisAmoakohene
Copy link
Collaborator Author

DorisAmoakohene commented Jun 14, 2024

I used alpha = 0.7, for geom_line and geom_ribbon

png("new.gg1.png", res = 600, width = 15, height = 10, unit = "in")
new.gg1 = ggplot() +
  geom_line(aes(x = N, y = median, group = expr.name, colour = expr.name), data = atime.result$measurements, alpha = 0.7) +
  geom_ribbon(aes(x = N, ymin = min, ymax = max, fill = expr.name), data = atime.result$measurements, alpha = 0.7) +
  labs(title = "setDT extremely slow for very wide input #5426") +
  theme(
    plot.title = element_text(size = 30),
    text = element_text(size = 30)
  ) +
  scale_x_log10("N = data size", limits = c(NA, 1e8)) +
  scale_y_log10("Computational Time (Seconds)")
directlabels::direct.label(new.gg1, list(cex = 3, "right.polygons")) 
dev.off()

new gg1

@tdhock
Copy link
Owner

tdhock commented Jun 14, 2024

i usually use alpha=0.5 for ribbon and default alpha=1 for line

@tdhock tdhock closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants