# Required Packages Installation

The following stata code will uninstall any previous version prior to re-installing the user-written stata packages ```reghdfe``` and ```specurve```.

In [1]:
%%capture
import stata_setup, os
if os.name == 'nt':
    stata_setup.config('C:/Program Files/Stata17/','mp')
else:
    stata_setup.config('/usr/local/stata17','mp')

### reghdfe, ivreghdfe, ppmlhdfe | Models with Many Levels of Fixed Effects
Source Code: [https://scorreia.com/software/](https://scorreia.com/software/)

In [2]:
%%stata -qui
* Install ftools (remove program if it existed previously)
cap ado uninstall ftools
net install ftools, from("https://raw.githubusercontent.com/sergiocorreia/ftools/master/src/")

* Install reghdfe 6.x (remove program if it existed previously)
cap ado uninstall reghdfe
net install reghdfe, from("https://raw.githubusercontent.com/sergiocorreia/reghdfe/master/src/")

* Install parallel, if using the parallel() option; don't install from SSC
cap ado uninstall parallel
net install parallel, from(https://raw.github.com/gvegayon/parallel/stable/) replace
mata mata mlib index

* Install ivreghdfe (remove program if it existed previously)
cap ado uninstall ivreghdfe
cap ssc install ivreg2 // Install ivreg2, the core package
net install ivreghdfe, from(https://raw.githubusercontent.com/sergiocorreia/ivreghdfe/master/src/)

* Install ppmlhdfe
cap ado uninstall ppmlhdfe
net install ppmlhdfe, from("https://raw.githubusercontent.com/sergiocorreia/ppmlhdfe/master/src/")




### specurve | Specification Curve Analysis
Source Code: [https://github.com/mgao6767/specurve](https://github.com/mgao6767/specurve)

In [3]:
%%stata -qui
* Install specurve
net install specurve, from("https://raw.githubusercontent.com/mgao6767/specurve/master") replace




### loocv | Leave-One-Out Regression I
Source Code: [http://fmwww.bc.edu/repec/bocode/l/loocv.ado](http://fmwww.bc.edu/repec/bocode/l/loocv.ado)

In [4]:
%%stata -qui
ssc install loocv

### cv_regress | Leave-One-Out Regression II
Source Code: [https://github.com/friosavila/stpackages/tree/main/cv_regress](https://github.com/friosavila/stpackages/tree/main/cv_regress)

In [5]:
%%stata -qui
ssc install cv_regress

### crossfold | _k_-Fold Cross-Validation
Source Code: [https://ideas.repec.org/c/boc/bocode/s457426.html](https://ideas.repec.org/c/boc/bocode/s457426.html)

In [6]:
%%stata -qui
ssc install crossfold

### Stata Schemepack
Source Code: [https://github.com/asjadnaqvi/stata-schemepack](https://github.com/asjadnaqvi/stata-schemepack)

In [7]:
%%stata -qui
net install schemepack, from("https://raw.githubusercontent.com/asjadnaqvi/stata-schemepack/main/installation/") replace

## estout | Making Regression Tables in Stata
Source Code: [https://github.com/benjann/estout](https://github.com/benjann/estout)

In [8]:
%%stata -qui
net install estout, from(https://raw.githubusercontent.com/benjann/estout/master/) replace

# Data

||   |
|-|-|
|<img src="../figures/cover.png" width="140">| Rodriguez, Belicia, Kim P. Huynh,<br> David Tomás Jacho-Chávez, and<br> Leonardo Sánchez-Aragón, (2024),<br> "<em>Abstract Readability: Evidence from Top-5 Economics Journals</em>,"<br> <strong>Economics Letters</strong>, 111541.<br> - [Manuscript](../paper/Rodriguez_et_al_econ_letters_2024.pdf)<br> - [Supplemental Materials](../paper/Rodriguez_et_al_2024_supp_material.pdf)<br>- [GitHub Repository](https://github.com/lfsanche/econ_letters)|







In [9]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', 150)
import pyreadstat

# Load the Stata dataset
datos, meta = pyreadstat.read_dta("../data/data.dta")

# Create a DataFrame from the variable names and labels
var_labels = pd.DataFrame({
    "Variable Name": meta.column_names,
    "Variable Label": meta.column_labels
})

# Print the DataFrame
var_labels

Unnamed: 0,Variable Name,Variable Label
0,title,Title of the paper
1,idpaper,Unique identifier for each paper
2,journal,Journal where the paper was published
3,journal_num,"Journal code. [1] AER, [2] ECM, [3] JPE, [4] QJE, [5] RES"
4,year,Year of the article publication
5,month,Month of the article publication
6,volume,Volume of the journal in which the article was published
7,issue,Issue of the journal in which the article was published
8,jelcodes,JEL classification codes assigned by the authors
9,keywords,Keywords specified by the authors for the article


## Flesch Reading Ease

These readability tests are used extensively in the field of education. The "Flesch–Kincaid Grade Level Formula" presents a score as a U.S. grade level, making it easier for teachers, parents, librarians, and others to judge the readability level of various books and texts. It can also mean the number of years of education generally required to understand this text, relevant when the formula results in a number greater than 10. The grade level is calculated with the following formula:

$$
0.39\left(\frac{\text { total words }}{\text { total sentences }}\right)+11.8\left(\frac{\text { total syllables }}{\text { total words }}\right)-15.59
$$

The result is a number that corresponds with a U.S. grade level.

* The lowest grade level score in theory is −3.40.
* Due to the formula's construction, the score does not have an upper bound.



Source: [https://en.wikipedia.org/wiki/Flesch-Kincaid_readability_tests](https://en.wikipedia.org/wiki/Flesch-Kincaid_readability_tests)

In [10]:
# Define the list of variables (columns) to include
columns_of_interest = [
    "log_flesch_kincaid_grade_level", "log_num_authors", "log_num_pages",
    "both_genders", "prop_women", "journal", "jelcodes", "year", "cluster", "jel_flag"
]

# Check if all the columns exist in the DataFrame
missing_columns = [col for col in columns_of_interest if col not in datos.columns]
if missing_columns:
    print(f"The following columns are missing from the DataFrame: {missing_columns}")
else:
    # Select the specified columns and sample 5 random rows
    random_rows = datos[columns_of_interest].sample(n=5, random_state=542)

    # Print the result
    print(random_rows)

      log_flesch_kincaid_grade_level  log_num_authors  log_num_pages  \
4707                        2.525838         0.000000       3.433987   
3807                        2.908348         1.098612       3.784190   
4579                        2.359175         1.098612       3.401197   
4551                        2.884002         0.693147       2.484907   
101                         2.850737         0.693147       2.944439   

      both_genders  prop_women                             journal  \
4707             0    0.000000      The Review of Economic Studies   
3807             0    0.000000  The Quarterly Journal of Economics   
4579             1    0.666667      The Review of Economic Studies   
4551             0    0.000000      The Review of Economic Studies   
101              0    0.000000            American Economic Review   

                 jelcodes  year  cluster  jel_flag  
4707                   J1  2015        1         1  
3807  D61;D62;D84;G12;G14  2014        1