## QTM 350: Data Science Computing

### Assignment 05 - Literate Programming with Quarto 

### Due Date: 11:59 PM on Wednesday, October 09, 2024

### Instructions

In this assignment, you will demonstrate your proficiency with Quarto by creating data science reports and presentations. You will analyse a sample of the [World Development Indicators dataset](https://databank.worldbank.org/source/world-development-indicators), focusing on one year (2022) and 14 variables. Your task involves performing data analysis, generating visualisations, and producing reproducible documents in multiple formats.

Please write a `README.md` file that includes the URL of the repository you create, along with the URLs of the HTML report and slides published on GitHub Pages or GitHack (not the raw files in your repository). The resulting PDF should be stored in the repository, as should all the `.qmd` files.

### Data

The sample dataset is provided in the file `wdi.csv`. The dataset is available in [our GitHub repository](https://github.com/danilofreire/qtm350/tree/main/assignments/wdi.csv). You can also create the dataset by running the Python code below.

In [2]:
# Install the necessary libraries
# pip install pandas
# pip install wbgapi

# Import the libraries
import pandas as pd
data = pd.read_csv("wdi.csv")

print(wb.columns.tolist())

['country', 'inflation_rate', 'exports_gdp_share', 'gdp_growth_rate', 'gdp_per_capita', 'adult_literacy_rate', 'primary_school_enrolment_rate', 'education_expenditure_gdp_share', 'measles_immunisation_rate', 'health_expenditure_gdp_share', 'income_inequality', 'unemployment_rate', 'life_expectancy', 'total_population']


### Tasks

1. Please initialise a new `.qmd` file with an appropriate `YAML` header. Include metadata such as `title`, `author`, `date`, and specify the output format as `HTML` and `PDF`.
   
2. Load the dataset using your preferred programming language (R or Python). 
   
3. Conduct exploratory data analysis on at least three indicators of your choice. Summarise your findings in markdown sections. Show your code and results.
   
4. Create at least two different types of plots (e.g., bar chart, scatter plot) to represent your analysis. Use Quarto code chunks to embed these visualisations. Add a title and axis labels to each plot. Use Quarto to include a caption and a reference to the source of the data. Hide your code in the final document.
   
5. Construct a table that highlights some key statistics from your analysis. Ensure the table is well-formatted and included in the report.
   
6. Include cross-references to your figures and tables within the text. Demonstrate proper labeling and referencing techniques.
   
7. Add a bibliography using BibTeX (`.bib`). Cite at least two sources related to your analysis.
   
8.  Create a new `.qmd` file configured for `revealjs` output. Include a title slide, a few content slides, and a concluding slide. 
   
9.  Incorporate your analysis and visualisations from the report into the presentation.
    
10. Customise the presentation theme and incorporate at least one transition effect between slides.
    
11. Render your report and presentation to HTML, PDF, and Revealjs (HTML) formats. 
    
12. Use Git to manage your project and create a repository on GitHub. Submit the link to your repository on Canvas.
    
13. Set up GitHub Pages (preferably) or use GitHack to host your HTML report and presentation. Add the links of the published pages to your `README.md` file. Do not forget to include the PDF report and the `.qmd` files in your repository.

### Bonus Questions

14. Develop an interactive dashboard within your report using Quarto's dashboard features. Incorporate dynamic filters or widgets.
    
15. Configure automated rendering of your report using Quarto's command-line interface, possibly integrating with GitHub Actions for continuous integration.

In [3]:
# Summary statistics for GDP per Capita
gdp_per_capita_stats = wb['gdp_per_capita'].describe()
print("GDP per Capita Statistics:")
print(gdp_per_capita_stats)

# Summary statistics for Life Expectancy
life_expectancy_stats = wb['life_expectancy'].describe()
print("\nLife Expectancy Statistics:")
print(life_expectancy_stats)

# Summary statistics for Education Expenditure (% of GDP)
education_expenditure_stats = wb['education_expenditure_gdp_share'].describe()
print("\nEducation Expenditure (% of GDP) Statistics:")
print(education_expenditure_stats)


GDP per Capita Statistics:
count       203.000000
mean      20345.707649
std       31308.942225
min         259.025031
25%        2570.563284
50%        7587.588173
75%       25982.630050
max      240862.182448
Name: gdp_per_capita, dtype: float64

Life Expectancy Statistics:
count    209.000000
mean      72.416519
std        7.713322
min       52.997000
25%       66.782000
50%       73.514634
75%       78.475000
max       85.377000
Name: life_expectancy, dtype: float64

Education Expenditure (% of GDP) Statistics:
count    105.000000
mean       4.226215
std        2.069486
min        1.027000
25%        2.898000
50%        3.887000
75%        5.156000
max       16.582462
Name: education_expenditure_gdp_share, dtype: float64
