Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new command - ieddtable - neat tables for diff-in-diff regressions #135

Closed
kbjarkefur opened this issue Apr 26, 2018 · 18 comments
Closed

new command - ieddtable - neat tables for diff-in-diff regressions #135

kbjarkefur opened this issue Apr 26, 2018 · 18 comments
Assignees
Labels
new command resolved but not yet published Issue is fixed, but not yet published on SSC
Projects

Comments

@kbjarkefur
Copy link
Contributor

kbjarkefur commented Apr 26, 2018

This new command was suggested by Esteban J. Quiñones (@estebanjq).

There are many commands (estout, outreg, etc.) that can create neat tables from regressions. This command will target the diff-in-diff (or double difference) regression model only and create tables tailored for exactly this model specification.

We want the table be of the format shown in the mock-up below:
image

The command is expected to be specified on the following format:

ieddtable varlist, dummies(D T DT)

Where varlist is a list of outcome variables and D is the treatment dummy, T is the time dummy and DT is the interaction of the two. The command will test that the regression is valid in the sense that there is at least some observations in each group, that D * T actually equals DT and so fourth.

Only the last column showing the double difference mean will be taken from the diff-in-diff regression. The other four means will be taken from regular means calculated separately. The reason for this is that we want to allow the user to include fixed effects, control variables etc., and when that is used the intercept, D, T D*T dummy betas can no longer be used by themselves to calculate means in the four groups. We do not want the means in the four leftmost columns in the mock-up to be impacted by fixed effects and control variables as it can create odd values such as negative harvest etc. A note will be included at the bottom of the table when control variables or fixed effects are used, which will explain why the mean in the fifth column cannot be calculated from the first four when FE and control variables are used.

The command should also be able to display the number of observations and the variation for each group for each outcome variable in addition to only the mean as in the mock-up. We do not know yet what should be the default. The number of observations should also be possible to display at the bottom of the table for each group, and then the command will test that the number is the same for that group in all outcome variables.

  • The table will be possible to export in CSV or in TeX.
  • The default labels in the table will be those in the mock-up, but all of them should be possible to specify manually.
  • The variable labels for the outcome variables should be possible to set to varname, var label or to be specified manually.
  • We have not decided yet if we want stars in a separate column.
  • Star intervals should be possible to set manually.

This is just a first draft of the specifications for this command. Please comment blow if you have any additional options you want us to include.

@bbdaniels
Copy link
Contributor

For first differences I have previously written a command with a similar reporting layout – you can see it at https://github.com/worldbank/stata/tree/development/dev/Statistics/RandomTrialRegression and the corresponding formatted Table 1 of http://science.sciencemag.org/content/354/6308/aaf7384/tab-figures-data

@kbjarkefur
Copy link
Contributor Author

That's a really cool command. Can you post picture here in thread of Table 1 in the science link? It requires log in to view (might log in automatically when browsing from WB IP).

It is in many sense similar to what we want to do, but I think we should write our own for the following reasons (this is not a list against your command, it is just my reflections when comparing your implementation to the one I had envisioned for ieddtable that I wanted to documents somewhere):

  • We want something that output in both LaTeX, and in Excel as well as output in Stata's result window. Your command needs some work to not only write to Excel.
  • The way you write to Excel requires putexcel. That would require us to change the lowest level of Stata needed for ietoolkit which we do not yet have an intention to do. (Everyone in well funded institutions have newer versions of Stata, but that's not the only audience we are targeting)
  • We want to test something on this command that we intend to use for a re-write of iebaltab. That re-write would make the section where stats are generated output type agnostic. As in, that section only creates a matrix with all output values, and then different sections for different outputs types (Excel, LaTeX etc.) reads that matrix. The code for iebaltab is starting to get very difficult to follow as we are writing the output in between the code that generates the stats.

@luizaandrade , let me know what you think!

We will let you know if we intend to borrow something from your code.

@kbjarkefur
Copy link
Contributor Author

In commit f26dd5c I have made a quick but documented draft of what I meant with the stats section being agnostic to the output format by creating a matrix of all stats that then can be passed to sub-command that creates the outputs

@bbdaniels
Copy link
Contributor

Totally agree with all of the above! The reason I did this one using putexcel is that I wanted to write confidence intervals and CIs with ( ) so it couldn't go in a matrix. I have since decided that it is a terrible idea especially since putexcel has major backwards compatibility issues even between Stata 13 and 14.

You may also be interested to look at the regression output handling commands I wrote recently for working with CSV tables in TeX if you haven't already (mat2csv and reg2csv here). These leave all the line styling out currently but have the useful convention of building two underlying matrices: results and results_STARS, which can be sensibly looped over to add non-numeric characters to a table like this before exporting to CSV.

screenshot 2018-05-03 11 27 02

@estebanjq
Copy link

Wow @bbdaniels , rctreg looks like a great command, thanks for sharing it. Hopefully, ietoolkit can further generalize it across input and output formats, as well as providing additional flexibility.

The option of being able to present SEs or CIs in an appropriate format would certainly be appreciated.

One thing I mentioned in a previous (off thread) conversation with @luizaandrade and @kbjarkefur is that it would be great if a single command could handle and present the relevant information for single differences, single differences controlling for group means at baseline (i.e., ancova), and difference in differences (aka, double difference).

Looking forward to seeing the fruits of this labor!

@kbjarkefur
Copy link
Contributor Author

Showing the first difference instead of simple means for all group was also the main feedback when we showed this to some of the economists at our unit. So that will definitely be included. Either as default or as an option, we have not decided yet what will be the default.

kbjarkefur added a commit that referenced this issue May 4, 2018
- The first difference are here calcualted properly
- The N are included
- se also for mean instead of sd
- All stats are restricted to the same sample
@luizaandrade
Copy link
Member

luizaandrade commented May 7, 2018

I've presented the idea for this command in our lightning seminar, here's some of the feedback:

  • When there's attrition, we should only include complete observations, i.e., those in the double difference regression, in the table. I think we can also add an option to include all observations, as long as the complete observations are the default.

  • It was suggested that it would be more intuitive to display the baseline levels, the baseline to endline change and then the double difference. That would look something like the figure below. The argument for this is that it may be confusing for a less technical audience if the dd coefficient is not the difference of the means displayed. This could either be an option or the default, and we would probably need to give some thought as to whether we want the two main columns to be the rounds or the treatment arms (i.e. Control and Treatment with subcolumns Baseline and Endline, or Baseline and Endline with subcolumns for Control and Treatment).
    image

  • People like both the single difference and the ANCOVA options. Single difference would be something like the image below, and ANCOVA would be similar to diff-in-diff, but with a different title for the regression coefficient in the last column.
    image

@estebanjq
Copy link

Sounds good @luizaandrade . I find the means more informative and intuitive than showing the differences, but I can imagine how others would feel differently. It is fair to that the options to show either (or both, i.e. means followed by the differences) may be quite useful regardless of the default that is chosen.

@kbjarkefur kbjarkefur added this to Issues in progress in Version 6.0 Sep 27, 2018
@kbjarkefur kbjarkefur moved this from Issues in progress to Issues waiting to be tested in Version 6.0 Oct 17, 2018
@kbjarkefur kbjarkefur moved this from Issues waiting to be tested to Update documentation in Version 6.0 Oct 17, 2018
@kbjarkefur kbjarkefur added the resolved but not yet published Issue is fixed, but not yet published on SSC label Oct 19, 2018
@kbjarkefur
Copy link
Contributor Author

This command was merged to the development branch in merge #159. We will finalize this version of ietoolkit and submit to SSC.

@kbjarkefur
Copy link
Contributor Author

@estebanjq , thanks again for suggesting this command!

Please let us know if you do NOT want to be mentioned in the help file where we currently give you credit for suggesting this command.

We are looking forward to any feedback you might have once this is published, unless you want to sync the files from this repository and try out the command before it is online on SSC. Let us know if you want any advice on how to do that.

We have not implemented any more advanced estimation model yet like ANCOVA as you suggested. We might do that later, but we will first collect feedback of the first version before decided on what to do next with this command.

Thanks again!

@estebanjq
Copy link

@kbjarkefur

It is great to hear that this idea has come to fruition.

  1. Please feel free to mention me as you see fit. FYI, my affiliation is the University of Wisconsin-Maryland.

  2. I can wait for it to be available via SSC, unless you think that will take a long time. If so, let me know the best way to sync the files.

  3. It makes sense to start with the most straightforward approach. Additional capabilities can be added later on.

Thanks again for creating this public good!!!

kbjarkefur added a commit that referenced this issue Oct 20, 2018
@kbjarkefur
Copy link
Contributor Author

Great! Thanks!

I am spelling your name Esteban J. Quinnones as the ñ does not display properly in earlier versions of Stata. I hope that is OK. When I went to your GitHub profile page to get the spelling of your name I saw that your affiliation was listed there as University of Wisconsin-Madison, and unless you are doing some cross program with University of Maryland then I think that is what you meant to write.

We intend to submit the new version next week and then it usually take a day or two, or at least not more than a week.

@estebanjq
Copy link

estebanjq commented Oct 20, 2018 via email

@luizaandrade
Copy link
Member

@estebanjq, publishing the command on SSC may take a few more days, but you can already use the version in the develop branch. You can find instruction here on how to use it.

@estebanjq
Copy link

estebanjq commented Oct 21, 2018 via email

@kbjarkefur kbjarkefur moved this from Update documentation to Issues ready to be published in Version 6.0 Oct 22, 2018
kbjarkefur added a commit that referenced this issue Oct 22, 2018
Version 6.0 - merge from Develop

Addressing issue #135, , #137, #139, #141, #142. #145, #146, #153. #158 and partially addressing #152.
@kbjarkefur
Copy link
Contributor Author

ietoolkit is now updated and ieddtab is now released. Type adoupdate, update to install all available updates to all SSC commands you have previously installed,or type ssc install ietoolkit, replace to update only ietoolkit.

I will now close this issue.

@bajwaih
Copy link

bajwaih commented Apr 22, 2019

Thank you all it is great help

@kbjarkefur
Copy link
Contributor Author

We are happy you found it helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new command resolved but not yet published Issue is fixed, but not yet published on SSC
Projects
No open projects
Version 6.0
Issues ready to be published
Development

No branches or pull requests

5 participants