Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[documentation] fix 2 broken links in README #1

Merged
merged 2 commits into from
Oct 18, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 00_documentation/Contribution_and_Replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

### Table of Contents
1. [Replicating this Repository](#replicating-this-repository)
1. [Contributing to this Repository](#contributing-to-this-repository)
1. [Contributing to this Repository](#contributing-to-this-repository)
2.1. [Bug reports and feature requests](#bug-reports-and-feature-requests)
2.2. [Contribution conventions](#contribution-conventions)
1. [Workflow in this Repository](#workflow-in-this-repository)
Expand Down
50 changes: 50 additions & 0 deletions 00_documentation/Harmonized_Variables_in_GLAD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Harmonized variable names
<sup>back to the [README](https://github.com/worldbank/GLAD/blob/master/README.md) :leftwards_arrow_with_hook:</sup>

This page list the harmonized variable names that will be used in all GLADs regardless of which assessment, year or country the data comes from. The first table has the variables that are included in all data sets. The second table includes variable names we have harmonized but exist only in some data sets.

varname | varclass | varlabel | vartype | note
-- | -- | -- | -- | -- |
surveyid | key | SurveyID (Region_Year_Assessment) | String | |
countrycode | key | WB country code (3 letters) | String | (a) |
national_level | key | Idcntry_raw is a national level | Indicator (1=National) | (a) |
idcntry_raw | id | Country ID, as coded in rawdata | Numerical or String | (b) |  
idschool | id | School ID | Numerical | | 
idgrade | id | Grade ID | Numerical | | 
idclass | id | Class ID | Numerical | (c) |
idlearner | id | Learner ID | Numerical | | 
score_[*assessment*]\_[*subject*]\_[*pv*] | value | [Plausible value *pv*:] *assessment* score for *subject* | Numerical ||  
level_[*assessment*]\_[*subject*]\_[*pv*] | value | [Plausible value *pv*:] *assessment* level for *subject* | Categorical | | 
age | trait | Learner age at time of assessment | Numerical |  |
urban | trait | School is located in urban/rural area | Indicator (1=Urban) | |  
male | trait | Learner gender is male/female | Indicator (1=Male) |  |
escs | trait | Learner socio-economic status _(Purposefully not labeled yet)_ | Numerical | |
learner_weight | sample | Total learner weight | Numerical |  |

**Notes:**

For all assessment-years, the id variables (*idcntry_raw, idschool, idgrade, idclass, idlearner*) compose a unique id.

(a) The full correspondence of *countrycode*, *national_level* and *idcntry_raw* is found in the [master countrycode list](https://github.com/worldbank/GLAD/blob/master/01_harmonization/011_rawdata/master_countrycode_list.csv). Some examples:
* in LLECE 1997 the *countrycode* MEX is linked to both the sample from the country Mexico (idcntry_raw = 21) and for the sample from the subnational unit of Nueva Leon (idntry_raw = 11). However, the first is considered *national_level* of 1, while the later is *national_level* of 0. That means that both samples are found in the GLAD module ALL, but the module CLO for Mexico is calculated using only the first sample, discarding the later.
* in PIRLS 2001 the *countrycode* GBR is linked to both the samples from England (idcntry_raw = 926) and Scotland (idcntry_raw = 927) and both are considered *national_level* of 1. That means that both samples are found in the GLAD module ALL and the module CLO for United Kingdom is calculated pooling both samples without distinction.

(b) The variable *idcntry_raw* is preserved as found in the raw data. Most assesment-years have it as a numerical variable. The only exception so far is PASEC 1996, for which this variable is a string.

(c) Some assessment-years may not have the variable _idclass_.

---

## Variables specific to a single assessment or year

Though the variable _learner_weight_ exist in all assessments, other sample-related variables vary across assessments.

varname | value | varlabel| vartype | note
-- | -- | -- | -- | --
year | key | Year of assessment | Date | PASEC, EGRA only (when multi-year bundles)
urban_o* | trait | Original variable of urban | Categorical | PIRLS, TIMSS, SACMEQ only (whenever available)
learner_weight_subject* | sample | Total learner weight for specific subject | Numerical | LLECE only
strata* | sample | Strata | Numerical | LLECE, PASEC only
jkzone | sample | Jackknife zone | Numerical | PIRLS, TIMSS, PASEC 2014 only
jkrep | sample | Jackknife replicate code | Numerical | PIRLS, TIMSS, PASEC 2014 only
weight_replicate* | sample | Replicate weight # | Numerical | PASEC 2014 only
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ For an example of analysis enabled by this collection, please check the [**Learn
Starts from the original datasets of each assessment (pulled from _eduraw_ collection in _datalibweb_ or from a local copy, directly downloaded from the data publishers)
and ends with the creation of the dataset GLAD_ALL and GLAD_ALL-BASE. Files receive a master vintage that reflects any possible updates of a surveyid (_region_year_assessment_).

Those two modules of GLAD (ALL and ALL-BASE) are at the learner level, that is, one observation corresponds to one learner or student or pupil. Both modules contain the [harmonized variables](https://github.com/worldbank/GLAD/wiki/Agreed-variables-to-include-in-GLAD-datasets), but the module ALL-BASE additionally includes all the original variables from the raw data. Since the ALL-BASE file may be very large, we recommend using the module ALL whenever possible.
Those two modules of GLAD (ALL and ALL-BASE) are at the learner level, that is, one observation corresponds to one learner or student or pupil. Both modules contain the [harmonized variables](https://github.com/worldbank/GLAD/blob/master/00_documentation/Harmonized_Variables_in_GLAD.md), but the module ALL-BASE additionally includes all the original variables from the raw data. Since the ALL-BASE file may be very large, we recommend using the module ALL whenever possible.

The output files are saved in the clone with adaptation vintage _wrk_A_, and corresponding markdown documents are generated with the same name. The assessments currently in the loop are (click on the links for each file's markdown documentation):

Expand All @@ -43,7 +43,7 @@ The output files are saved in the clone with adaptation vintage _wrk_A_, and cor

### Technical notes

The GLAD programs by default use data from _datalibweb_. Please see [guidelines to retrieve data from datalibweb](#--Guidelines-to-Retrieve-Data-from-datalibweb). Note that _datalibweb_ requires access and authentication to the WorldBank network.
The GLAD programs by default use data from _datalibweb_. Please see [guidelines to retrieve data from datalibweb](https://github.com/worldbank/GLAD/blob/master/00_documentation/Datalibweb_Guidelines.md). Note that _datalibweb_ requires access and authentication to the WorldBank network.

The GLAD programs also make use of the _edukit_ package. The latest version of _edukit_ and installation instructions can be found in the [EduAnalyticsToolkit repo](https://github.com/worldbank/EduAnalyticsToolkit).

Expand Down