iematch - issue with long numeric strings #252

kbjarkefur · 2021-03-18T19:21:44Z

Hi DIME Analytics,

I’ve tried using iematch, but when I run the command it continuously runs and never completes. I’ve let it run for several minutes. I don’t have an error code to provide because the command never completes or breaks. I’m testing it on a subset of data for which I have 40 observations, 10 in the base group and 30 in the target group. I’m trying to execute a 1-1 match. I would not think the command would take several minutes to run on 40 observations.

Set seed 1956
iematch if pair==1 & grade==2, grpdummy(srm_treatment) matchvar(orf) idvar(student_id) seedok replace

I assume this is a user-issue, but I wanted to verify that there was not some other issue with the command.

Sean

Hi Sean,

Thanks for letting us know. When I developed this I had to account for many infinite loop issues, but since the release I have not had anyone report another case. Are you able to share a deidentified version of the data that I can test myself on? If there is an error I’d like to fix it as others might have had the same issue without reporting it.

I do not see any error in the information you have provided so far.

Best,
Kristoffer

[External]
Hi Kristoffer,

Thanks for the reply. Attached is a de-identified dataset using a subset of the data. I’ve included the first two pairs of matched schools. The full dataset has 25 matched school pairs. For treatment schools, we assessed 10 students, but for control schools we assessed 30 students. Students from Grade 2 and Grade 4 were assessed. The goal is to match at the student-level within the matched schools, find a match control student for each treatment student. This has to be done for students in Grade 2 and Grade 4.

A few notes on the dataset that I’ve attached. The student ID variables are randomly generated by our data collection app, so the numbers have no meaning outside of the dataset. I’ve recoded the school codes with numbers from a random number generator. The original school codes are tied to EMIS codes in the country where the study is happening. The variable for the match is orf, which is oral reading fluency. Let me know if you have any questions about the attached dataset.

Sean

kbjarkefur · 2021-03-18T19:36:53Z

The issue is that the data collection app created IDs that was so long that they were stored in doubles. The tempvars iematch creates to keep of the tracking were created as the default value when not specifying a type. for example like this gen `prefID' = . The default value is float.

When the command copied the ID var to these tempvars it lost precision. I thought that Stata would have converted the tempvar to a double to not lose data but it didn't. When the command then checked towards the original ID variable it found no matches as the values was now not identical. This is why the command was stuck in a infinite loop.

The solution was to create all ID vars with the same data type as the oringainl ID var like this gen `:type `idvar'' `prefID' = . This is implemented in 94db45f

kbjarkefur added the minor bug Bug unlikely to lead to incorrect analysis label Mar 18, 2021

kbjarkefur self-assigned this Mar 18, 2021

kbjarkefur added a commit that referenced this issue Mar 18, 2021

[iematch] - set temp ID var to same type as orig ID var #252

94db45f

kbjarkefur added the resolved but not yet published Issue is fixed, but not yet published on SSC label Mar 18, 2021

kbjarkefur mentioned this issue Mar 18, 2021

[iematch] - issues with long IDs #253

Merged

kbjarkefur closed this as completed Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iematch - issue with long numeric strings #252

iematch - issue with long numeric strings #252

kbjarkefur commented Mar 18, 2021

kbjarkefur commented Mar 18, 2021

iematch - issue with long numeric strings #252

iematch - issue with long numeric strings #252

Comments

kbjarkefur commented Mar 18, 2021

kbjarkefur commented Mar 18, 2021