Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iematch - issue with long numeric strings #252

Closed
kbjarkefur opened this issue Mar 18, 2021 · 1 comment
Closed

iematch - issue with long numeric strings #252

kbjarkefur opened this issue Mar 18, 2021 · 1 comment
Assignees
Labels
minor bug Bug unlikely to lead to incorrect analysis resolved but not yet published Issue is fixed, but not yet published on SSC

Comments

@kbjarkefur
Copy link
Contributor

Hi DIME Analytics,

I’ve tried using iematch, but when I run the command it continuously runs and never completes. I’ve let it run for several minutes. I don’t have an error code to provide because the command never completes or breaks. I’m testing it on a subset of data for which I have 40 observations, 10 in the base group and 30 in the target group. I’m trying to execute a 1-1 match. I would not think the command would take several minutes to run on 40 observations.

Set seed 1956
iematch if pair==1 & grade==2, grpdummy(srm_treatment) matchvar(orf) idvar(student_id) seedok replace

I assume this is a user-issue, but I wanted to verify that there was not some other issue with the command.

Sean


Hi Sean,

Thanks for letting us know. When I developed this I had to account for many infinite loop issues, but since the release I have not had anyone report another case. Are you able to share a deidentified version of the data that I can test myself on? If there is an error I’d like to fix it as others might have had the same issue without reporting it.

I do not see any error in the information you have provided so far.

Best,
Kristoffer


[External]
Hi Kristoffer,

Thanks for the reply. Attached is a de-identified dataset using a subset of the data. I’ve included the first two pairs of matched schools. The full dataset has 25 matched school pairs. For treatment schools, we assessed 10 students, but for control schools we assessed 30 students. Students from Grade 2 and Grade 4 were assessed. The goal is to match at the student-level within the matched schools, find a match control student for each treatment student. This has to be done for students in Grade 2 and Grade 4.

A few notes on the dataset that I’ve attached. The student ID variables are randomly generated by our data collection app, so the numbers have no meaning outside of the dataset. I’ve recoded the school codes with numbers from a random number generator. The original school codes are tied to EMIS codes in the country where the study is happening. The variable for the match is orf, which is oral reading fluency. Let me know if you have any questions about the attached dataset.

Sean

@kbjarkefur kbjarkefur added the minor bug Bug unlikely to lead to incorrect analysis label Mar 18, 2021
@kbjarkefur kbjarkefur self-assigned this Mar 18, 2021
@kbjarkefur
Copy link
Contributor Author

The issue is that the data collection app created IDs that was so long that they were stored in doubles. The tempvars iematch creates to keep of the tracking were created as the default value when not specifying a type. for example like this gen `prefID' = . The default value is float.

When the command copied the ID var to these tempvars it lost precision. I thought that Stata would have converted the tempvar to a double to not lose data but it didn't. When the command then checked towards the original ID variable it found no matches as the values was now not identical. This is why the command was stuck in a infinite loop.

The solution was to create all ID vars with the same data type as the oringainl ID var like this gen `:type `idvar'' `prefID' = . This is implemented in 94db45f

@kbjarkefur kbjarkefur added the resolved but not yet published Issue is fixed, but not yet published on SSC label Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
minor bug Bug unlikely to lead to incorrect analysis resolved but not yet published Issue is fixed, but not yet published on SSC
Projects
None yet
Development

No branches or pull requests

1 participant