[Website hyperlink](zair87.github.io)

## <center>**Background and Introduction to Dataset**</center>

During the past year, government agencies varied in their responses to COVID-19, however why such variability exists has yet to be fully understood. For example, the state of California was the first state in the United States to enact mandates to shelter in place in early 2020$^{1}$, whereas states that were in the Midwest or were Republican-led had less restricted responses$^{2}$. One factor that, in part, impacts state- and individual-level responses to COVID-19 (e.g., where to receive vaccines, adherence of mask mandates) could relate to people’s general inclinations to believe in conspiracy theories. In their recent past (and present), Americans have confronted conspiracy theories related to election results to the origins of COVID-19 to the “real” reason behind government public health directives like mask-wearing enforcements.

One relevant open dataset called the “Measuring Belief In Conspiracy Theories”$^{3}$ contains data that examined general conspiracist beliefs as measured by the 15-item Generic Conspiracist Beliefs Scale (GCBS)$^{4}$, Big 5 personality constructs (TIPI)$^{5}$, and demographic variables such as gender, age, education, and religious affiliation. These data were collected on approximately 2, 495 respondents on the platform, Open Psychometrics, an open-source project that features several personality surveys$^{6}$.

## <center>**Current Research Question**</center>

As part of an exploratory study, I intend to examine the previously mentioned dataset in order to determine which Big 5 personality variables (i.e., openness to new experience, neuroticism, extraversion, conscientiousness, agreeableness) are related to overall conspiracist beliefs, as well as the five factors produced by the GCBS (i.e., government malfeasance, extraterrestrial cover-up, malevolent global, personal well-being, control of information), controlling for age, education, gender, and religious affiliation. I would also like to see if and how religious affiliation might be associated with general conspiracist beliefs, although the religious affiliation item is constraining in that it does not capture related—but distinct—constructs that I would predict may be especially related to conspiracist beliefs like religiosity, or level of religious commitment. Lastly, to loop back to my introduction, the data currently do not contain information about in which states or cities respondents live, so whether state-level conspiracist beliefs do indeed impact public health policies cannot be fully examined.

## <center>**Collaboration Plan**</center>

As I graduate student in this course, I plan to work independently on this project.


### <center>**References**</center>

1.	https://calmatters.org/explainers/coronavirus-california-explained-newsom/
2.	https://www.bsg.ox.ac.uk/research/publications/variation-us-states-responses-covid-19
3.	https://www.kaggle.com/yamqwe/measuring-belief-in-conspiracy-theories/version/10
4.	Brotherton, R., French, C. C., & Pickering, A. D. (2013). Measuring belief in conspiracy theories: The generic conspiracist beliefs scale. *Frontiers in psychology, 4*, 279.
5.	Gosling, S. D., Rentfrow, P. J., & Swann Jr, W. B. (2003). A very brief measure of the Big-Five personality domains. *Journal of Research in personality, 37*, 504-528.
6.	https://openpsychometrics.org/

### <center>**Websites used for help with code**</center>
1. Creating superscripts in Jupyter Markdown: https://stackoverflow.com/questions/46011785/how-to-do-superscripts-and-subscripts-in-jupyter-notebook
2. Centering in Jupyter Markdown: https://newbedev.com/centering-text-in-ipython-notebook-markdown-heading-cells
3. Indenting in Jupyter Markdown: https://www.ibm.com/docs/en/db2-event-store/2.0.0?topic=notebooks-markdown-jupyter-cheatsheet


In [67]:
#Load dataset and parse into shape using principles of tidy data.

import pandas as pd #Load pandas
import numpy as np #Load Numpy

# Widens the notebook and lets us display data easily.
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:95% !important; }</style>"))

# Show a very large number of rows and columns
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [68]:
#Get kaggle webpage that contains csv file with data
url = "https://www.kaggle.com/yamqwe/measuring-belief-in-conspiracy-theories"
r = requests.get(url)

r.status_code

200

In [69]:
#Manually downloaded csv data onto Docker as dataset is downloaded once link on website clicked.

df = pd.read_csv("data.csv")
df.head()

display(df)


Unnamed: 0,Q1,Q2,Q3,Q4,Q5,Q6,Q7,Q8,Q9,Q10,Q11,Q12,Q13,Q14,Q15,E1,E2,E3,E4,E5,E6,E7,E8,E9,E10,E11,E12,E13,E14,E15,introelapse,testelapse,surveyelapse,TIPI1,TIPI2,TIPI3,TIPI4,TIPI5,TIPI6,TIPI7,TIPI8,TIPI9,TIPI10,VCL1,VCL2,VCL3,VCL4,VCL5,VCL6,VCL7,VCL8,VCL9,VCL10,VCL11,VCL12,VCL13,VCL14,VCL15,VCL16,education,urban,gender,engnat,age,hand,religion,orientation,race,voted,married,familysize,major
0,5,5,3,5,5,5,5,3,4,5,5,5,3,5,5,7070,7469,7383,6540,9098,4998,6971,4713,6032,5878,4031,4386,9077,5113,4204,11,95,142,5,3,6,2,6,6,7,2,7,1,1,1,1,1,1,0,0,0,0,1,1,0,1,1,1,1,3,0,1,2,28,1,2,1,5,2,1,1,ACTING
1,5,5,5,5,5,3,5,5,1,4,4,5,4,4,5,4086,13107,2807,5030,7405,7864,16234,2603,14174,9423,11683,12718,4816,6806,4823,6,125,144,6,7,6,7,6,3,7,5,1,1,1,1,0,1,1,0,0,0,0,1,1,0,0,1,1,1,1,2,2,1,14,1,1,2,4,2,1,1,
2,2,4,1,2,2,2,4,2,2,4,2,4,0,2,4,27535,7814,7762,10290,8558,10538,4740,4162,6492,11512,6874,11440,0,11418,9872,7,141,90,6,6,6,1,7,5,6,5,7,7,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,1,4,2,2,2,26,1,1,1,4,1,1,2,philosophy
3,5,4,1,2,4,5,4,1,4,5,5,5,1,4,5,4561,5589,3506,3784,5093,3555,3158,1887,7678,2304,3604,2724,2689,2657,3824,5,58,135,6,7,7,5,7,6,5,1,5,1,1,1,1,1,1,0,0,1,0,1,1,0,1,1,0,1,3,1,1,1,25,1,12,1,4,1,1,3,history
4,5,4,1,4,4,5,4,3,1,5,5,5,3,5,5,8841,7575,3832,7775,4160,5216,7559,5792,10296,5455,3864,11799,7872,10543,4224,4,105,210,1,3,7,2,6,4,5,5,5,3,1,1,0,1,1,0,0,0,0,1,0,0,1,1,1,1,2,2,1,1,37,1,2,2,4,2,2,2,Psychology
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2490,5,2,2,2,4,4,2,2,1,4,5,4,2,2,5,50238,6560,6208,4037,4273,4290,3680,2830,2100,3554,2610,7033,3800,4660,3936,1,116,63,3,5,5,3,7,5,5,5,3,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,1,1,3,3,1,1,32,1,2,1,4,2,1,3,English
2491,1,1,1,1,1,3,1,1,1,1,2,1,1,1,3,5715,6350,2032,5553,1911,9704,5007,2329,1544,2216,3928,424,4248,10208,3048,13,68,111,5,5,6,7,6,3,5,2,3,2,1,1,0,1,1,0,0,1,0,1,0,0,1,1,1,1,4,2,2,2,25,2,2,1,4,1,1,2,psychology
2492,5,5,1,5,5,5,5,1,3,1,1,5,5,5,5,6653,9447,6337,4856,4408,4391,6471,3625,7999,11593,5768,8207,6089,5864,5952,4,100,225,4,2,6,4,7,4,7,4,6,1,1,1,0,1,1,0,0,1,0,1,0,0,1,1,1,1,2,3,2,2,34,1,12,2,5,0,2,2,
2493,2,1,4,1,1,1,1,3,1,2,1,1,1,1,4,11728,19822,15605,1507,7669,8903,7964,6918,3832,9024,3874,6608,22332,4574,7581,190,139,133,1,6,7,1,7,7,3,3,7,2,1,1,0,1,1,0,0,0,0,1,0,1,1,0,1,1,2,2,1,2,19,1,1,1,4,1,1,2,


## <center>**Note about data and challenges with data**</center>

The data shown above are from the "Measuring Belief In Conspiracy Theories" dataset. The first 15 columns (Q1-Q15) are items from the Generic Conspiracist Beliefs Scales. These are assessed using a Likert_like scale from 1 'disagree' to 5 'agree'. I will need to rename these items particularly because aside from an over all composite score, I will need to compute composite scores for each of its 5 factors. Composite scores will be averages of the items. Internal reliablity tests will need be conducted to determine if similar items proposed to measure the same construct produce similar scores.

The next 15 columns (E1-E15) are variables that indicate how long participants took in answering each question. These variables may be useful to determine if a respondent's time is an extreme outlier (either too fast or too slow) relative to other respondents' times. I may consider removing extreme outliers when cleaning the data. Internal consistency tests will also be conducted with this measure to determine similar items produce similar scores.

The TIPI columns assess Big 5 personality measures. According to the original article, some items will need to be reverse coded, and composite scores for each of the 5 facets will need to be computed. I will also rename these variables for clarity.

The VCL columns serve as validity checks with items 6, 9, and 12 being the fake words. Participants who indicated they knew what the words meant in English for those specific items may need to be removed before analyses. Their responses indicate either lack of comprehension or attention.

I will also need to replace all missing data with NaN before continuing with analyses.

I was unable to write code to download the .csv file directly from the website as the file is downloaded directly onto the local computer by clicking on a link.