Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irt_gen function #40

Closed
Sinan-Yavuz opened this issue Feb 4, 2021 · 11 comments
Closed

irt_gen function #40

Sinan-Yavuz opened this issue Feb 4, 2021 · 11 comments
Assignees
Labels

Comments

@Sinan-Yavuz
Copy link
Contributor

Sinan-Yavuz commented Feb 4, 2021

Hello,

Thank you very much for creating "lsasim".

I have found a bug and want to share it with you.

irt_gen function doesn't produce a 0-1 matrix based on the parameters. I have changed the following to produce a proper dataset.

irt_gen2 <- function (theta, a_par = 1, b_par, c_par = 0, D = 1)
{
  response_pr <- c_par + (1 - c_par) * (exp(a_par*(theta - b_par)) / (1 + exp(a_par*(theta - b_par))))
  y <- rbinom(1, size = 1, prob = response_pr)
  return(y)
}

Let me know if you need me to provide you some examples, why irt_gen doesn't work.

@wleoncio wleoncio added the bug label Feb 5, 2021
@wleoncio
Copy link
Collaborator

wleoncio commented Feb 5, 2021

Hi @Sinan-Yavuz, thank you for your contribution! We're gonna take a look at the issue and implement the necessary changes ASAP.

@wleoncio
Copy link
Collaborator

wleoncio commented Feb 5, 2021

In the meantime, it'd be great if you could provide us with some examples why the current implementation of the function doesn't work, so we can add them to our unit test battery.

@Sinan-Yavuz
Copy link
Contributor Author

Sinan-Yavuz commented Feb 5, 2021

Hi @wleoncio, thank you very much for your quick reply.

Here is the example;


response_gen2 <- function (subject, item, theta, a_par = NULL, b_par, c_par = NULL, 
                           d_par = NULL, item_no = NULL, ogive = "Logistic") 
{
  if (length(subject) != length(item)) 
    stop("subject and item vectors must be equal length.", 
         call. = FALSE)
  if (is.null(a_par)) 
    a_par <- rep(1, length(unique(item)))
  if (is.null(c_par)) 
    c_par <- rep(0, length(unique(item)))
  if (ogive == "Logistic") 
    DD <- 1
  if (ogive == "Normal") 
    DD <- 1.7
  if (is.null(item_no)) 
    item_no <- seq(length(unique(item)))
  if (is.null(d_par)) 
    b_pars <- split(b_par, seq(length(b_par)))
  if (!is.null(d_par)) {
    d_mat <- do.call("cbind", d_par)
    b_pars <- list()
    for (i in 1:length(b_par)) {
      if (sum(abs(d_mat[i, ])) != 0) {
        b_list <- list()
        for (j in 1:length(d_mat[i, ])) b_list[[j]] <- b_par[i] + 
            d_mat[i, j]
        b_pars[[i]] <- unlist(b_list)
      }
      if (sum(abs(d_mat[i, ])) == 0) 
        b_pars[[i]] <- b_par[i]
    }
  }
  names(b_pars) <- item_no
  
  y <- numeric(length(subject))
  for (n in 1:length(subject)) {
    y[n] <- irt_gen2(theta = theta[subject[n]], a_par = a_par[which(item_no == 
                                                                      item[n])], b_par = b_pars[[which(item_no == item[n])]], 
                     c_par = c_par[which(item_no == item[n])], D = DD)
  }
  df_l <- data.frame(item = item, subject = subject, response = y)
  df_w <- reshape(df_l, timevar = "item", idvar = "subject", 
                  direction = "wide")
  df_item_old <- colnames(df_w)[2:length(df_w)]
  df_item_num <- gsub("[^[:digit:]]", "", df_item_old)
  df_item_new <- ifelse(nchar(df_item_num) == 1, paste0("i00", 
                                                        df_item_num), ifelse(nchar(df_item_num) == 2, paste0("i0", 
                                                                                                             df_item_num), paste0("i", df_item_num)))
  colnames(df_w)[2:length(df_w)] <- df_item_new
  df_w <- df_w[, order(names(df_w))]
  rownames(df_w) <- NULL
  return(y = df_w)
}



irt_gen2 <- function (theta, a_par = 1, b_par, c_par = 0, D = 1)
{
  response_pr <- c_par + (1 - c_par) * (exp(a_par*(theta - b_par)) / (1 + exp(a_par*(theta - b_par))))
  y <- rbinom(1, size = 1, prob = response_pr)
  return(y)
}


set.seed(345)

#this is a normal example
subject <- 1:100
theta <- rnorm(100,0,1)
student_dt <- data.frame(subject, theta)


item <- as.integer(c(1:n_items))
a <- runif(n_items, 0.5, 1.5)
b <- runif(n_items, -3, 3)
c <- runif(n_items, 0, .15)
item_par <- data.frame(item, a, b, c)

resp_matrix <- response_gen(subject = sort(rep(subject, 30)), item = rep(item,100), theta = theta, a_par = a, b_par = b, c_par = c)

#item means make sense and everything looks fine
colMeans(resp_matrix[item])

lm(colMeans(resp_matrix[item]) ~ b)


#Let's make it very extreme, student mean theta is -20, I don't expect them to solve anything. But guessing parameters are super high. So, regardles if their theta 80%-90% of them should solve these items.
subject <- 1:100
theta <- rnorm(100, -20 ,1)
student_dt <- data.frame(subject, theta)


item <- as.integer(c(1:n_items))
a <- runif(n_items, 0.5, 1.5)
b <- runif(n_items, -3, 3)

#chance is super high
c <- runif(n_items, 0.8, .9)
item_par <- data.frame(item, a, b, c)

resp_matrix <- response_gen(subject = sort(rep(subject, 30)), item = rep(item,100), theta = theta, a_par = a, b_par = b, c_par = c)

colMeans(resp_matrix[item]) #look at the results, they are around 50%, what happened to .8-.9 guessing parameter
lm(colMeans(resp_matrix[item]) ~ b)


#let's check after the fix - but it only works for dichotomous variables
resp_matrix2 <- response_gen2(subject = sort(rep(subject, 30)), item = rep(item,100), theta = theta, a_par = a, b_par = b, c_par = c)
colMeans(resp_matrix2[item]) #look at the results, they are around 50%, what happened to .8-.9 guessing parameter
lm(colMeans(resp_matrix2[item]) ~ b)

@pdbailey0
Copy link
Contributor

Hi, I worked to identify this bug with @Sinan-Yavuz and the way you can tell this is a bug is that the intercept of the lm should move up as the guessing parameter increases.

Your formula works so long as the c parameter is 0. When it is not zero you make the wrong probability of a zero

The problem is that the binary is being treated like the polytomous case. In the binary case there is a guessing parameter and you do not want to divide by the denominator, the second element of the numerator is already the probability of a 1. c_par decreases, not increases the probability of a zero, but your formula increases it.

@mollyolaf
Copy link
Contributor

Thanks, Paul and Sinan - that's super helpful. We'll sort this out. Glad you are using lsasim!

@wleoncio wleoncio self-assigned this Feb 8, 2021
@wleoncio
Copy link
Collaborator

wleoncio commented Feb 8, 2021

Checking the MRE

OK, so I've created a markdown document based on the code above. Please check it here and let me know if the output conforms to your expectations.

Fixing the bug

As for implementing the changes, I am not sure how to proceed. The irt_gen2() function proposed in the OP doesn't break any of the current unit tests, but I understand it only works for dichotomous variables, so I am not sure if it can simply replace irt_gen() and call it a day. I'd also like to understand a bit better the differences between lsasim::response_gen() and the response_gen2() function from your MRE.

@pdbailey0
Copy link
Contributor

I think you may want to break up dichotomous from polytomous cases. I've never seen a polytomous case with a c parameter.

I think you could add this line right before you sample

  response_pr[1] <- ifelse(c_par > 0, 1 - response_pr[2], response_pr[1])

so irt_get would be:

irt_gen <- function(theta, a_par = 1, b_par, c_par = 0, D = 1) {
  unsummed <- c(0, D * a_par * (theta - b_par))
  num <- exp(cumsum(unsummed))
  den <- sum(num)
  response_pr <- c_par + (1 - c_par) * (num / den)
  response_pr[1] <- ifelse(c_par > 0, 1 - response_pr[2], response_pr[1])
  y <- sample(1:length(response_pr) - 1, size = 1, prob = response_pr)
  return(y)
}

but I haven't tested this and I'm still not entirely sure what is vectorized here.

Another workaround would be to use Sinan's code when c_par > 0 and your existing code otherwise.

@wleoncio
Copy link
Collaborator

wleoncio commented Feb 9, 2021

Good ideas, @pdbailey0, we'll work on that ASAP. 👍🏽

@wleoncio
Copy link
Collaborator

Changes implemented and will be included on the next CRAN release of the package. @Sinan-Yavuz and @pdbailey0, thank you once again for your contribution; we have added you as package contributors, please let us know if the information below needs any modification (e.g. typo or addition of a field such as e-mail):

lsasim/DESCRIPTION

Lines 17 to 18 in 97934b4

person("Sinan", "Yavuz", role = "ctb"),
person("Paul", "Bailey", role = "ctb")

@Sinan-Yavuz
Copy link
Contributor Author

@wleoncio, I am honored to be mentioned as a contributor though it is not necessary. I will make a modification to my name and email. At some point, if you change your mind and think my contribution is no longer valid or important, feel free to remove me.

@wleoncio
Copy link
Collaborator

Looking forward to the Pull Request containing the modifications, @Sinan-Yavuz!

Your and Paul's mentions were based on the documentation for the person() function (see extract below) and your authorship of the code on commits 5f3396f and dfb1450.

‘"ctb"’ (Contributor) Use for authors who have made smaller
contributions (such as code patches etc.) but should not show
up in the package citation.

By "package citation", they mean the contents of citation("lsasim").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants