# **Homelessness - Project 1**
### Analyzing qualitative and quantitative variables.

# **Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called **modules** that add extra features to the basic setup. The name of the modules is after the `import` statement, and the purpose is in a non-code comment after thew hashtag (#).



In [1]:
import pandas as pd                 #Data analysis
import numpy as np                  #Calculations
import plotly.express as px         #Graphing
import matplotlib.pyplot as plt     #Graphing
from IPython.display import Image   #Display images
import warnings                     #Ignore version warnings
warnings.simplefilter('ignore', FutureWarning)


In [2]:
# Replace 'image_url' with the URL of the image you want to display
image_url = 'https://endhomelessness.org/wp-content/uploads/2019/03/homelessness-statistics.jpg'

# Display the image
Image(url=image_url)

# **Context**

The Continuum of Care (CoC) Program is designed to promote communitywide commitment to the goal of ending homelessness; provide funding for efforts by nonprofit providers, and State and local governments to quickly rehouse homeless individuals and families while minimizing the trauma and dislocation caused to homeless individuals, families, and communities by homelessness; promote access to and effect utilization of mainstream programs by homeless individuals and families; and optimize self-sufficiency among individuals and families experiencing homelessness. For more information on the Program, please visit https://www.hudexchange.info/programs/coc/

The U.S. Department of Housing and Urban Development (HUD) provides Point-in-Time (PIT) count reports of sheltered and unsheltered persons experiencing homelessness, by household type and subpopulation. This data is available at the national and state level, and for each CoC. HUD also provides Housing Inventory Count (HIC) reports, which provide a snapshot of a CoC’s inventory of beds and units available on the night designated for the count by program type, and include beds dedicated to serve persons who are homeless as well as persons in Permanent Supportive Housing. This raw data set contains PIT estimates of homelessness, and the corresponding accompanying HIC data from 2021.

Attribution: Adapted from U.S. Department of Housing and Urban Development


# **About the Dataset**

This dataset contains 100 rows corresponding to a random sample of localities (typically counties or similar large regions) that received Continuum of Care funding from HUD. A total of 21 variables are provided as listed in the table below.

| Variable Name(s)         | Description |
|:-------------------------|:--- |
| CoC Number<br>CoC Name | CoC locality identifier |
| CoC Category             | Community setting designation: Rural, Urban, etc |
| Type of Count            | Description of whether fully or partially sheltered individuals (or both) are included |
| Overall Homeless, 2021   | Total homeless count in all facilities indicated by the “Type of Count” variable |
| HMIS Participation Rate for Year-Round Beds (ES,TH,SH) | Proportion of facilities in the locality that participate in the Homeless Management Information System (HMIS) |
| Total Year-Round Beds (ES, TH, SH)<br>Total Year-Round Beds (ES)<br>Total Year-Round Beds (TH)<br>Total Year-Round Beds (SH) | (all types combined)<br>ES = Emergency Shelter<br>TH = Transitional Housing<br>SH = Safe Haven |
| Total Units for Households with Children (ES, TH, SH)<br>Total Beds for Households with Children (ES, TH, SH) | Number of units and beds designated for households with children |
| Sheltered ES Homeless 2021 | Estimate of the number of individuals sheltered in an Emergency Facility at the time of the study |
| Sheltered ES Homeless - Age 18 to 24, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by age |
| Sheltered ES Homeless - Female, 2021<br>Sheltered ES Homeless - Male, 2021<br>Sheltered ES Homeless - Trans+, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by gender |
| Sheltered ES Homeless - Hispanic/Latino, 2021<br>Sheltered ES Homeless - White, 2021<br>Sheltered ES Homeless - Black or African American, 2021<br>Sheltered ES Homeless - Asian or Pacific Islander<br>Sheltered ES Homeless - American Indian or Alaska Native, 2021<br>Sheltered ES Homeless - Multiple Races, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by race/ethnicity |



Let's take a look at the data. To do this, first we import it directly from the url below.



# **A Snippet of the Data**

In [3]:
url='https://raw.githubusercontent.com/thamilton562/STAT108_Projects_Students/main/DataSets/Homelessness.csv'
df=pd.read_csv(url)

Next, we can display the data by *typing the name* of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [4]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)
df

Unnamed: 0,CoC Number,CoC Name,CoC Category,Type of Count,"Overall Homeless, 2021","HMIS Participation Rate for Year-Round Beds (ES, TH, SH)","Total Year-Round Beds (ES, TH, SH)",Total Year-Round Beds (ES),Total Year-Round Beds (TH),Total Year-Round Beds (SH),"Total Units for Households with Children (ES, TH, SH)","Total Beds for Households with Children (ES, TH, SH)","Sheltered ES Homeless, 2021","Sheltered ES Homeless - Age 18 to 24, 2021","Sheltered ES Homeless - Female, 2021","Sheltered ES Homeless - Male, 2021","Sheltered ES Homeless - Trans+, 2021","Sheltered ES Homeless - Hispanic/Latino, 2021","Sheltered ES Homeless - White, 2021","Sheltered ES Homeless - Black or African American, 2021",Sheltered ES Homeless - Asian or Pacific Islander,"Sheltered ES Homeless - American Indian or Alaska Native, 2021","Sheltered ES Homeless - Multiple Races, 2021"
0,LA-505,Monroe/Northeast Louisiana CoC,Largely Rural CoC,Sheltered-Only Count,78,0.233,133,69,64,0,18,52,40,4,21,19,0,0,17,19,0,4,0
1,MN-502,Rochester/Southeast Minnesota CoC,Largely Rural CoC,Sheltered-Only Count,419,0.604,599,378,221,0,114,372,261,18,110,150,1,42,186,51,9,3,12
2,PA-508,Scranton/Lackawanna County CoC,Largely Suburban CoC,Sheltered and full unsheltered,165,0.839,155,64,80,11,25,71,81,5,34,46,1,7,61,17,2,0,1
3,OK-503,Oklahoma Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,125,0.295,149,135,14,0,5,34,125,24,49,75,1,52,88,25,1,8,3
4,OH-504,Youngstown/Mahoning County CoC,Other Largely Urban CoC,Sheltered-Only Count,62,0.192,198,187,11,0,39,115,56,6,41,13,2,21,25,19,1,0,11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,KS-507,Kansas Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,783,0.162,1170,703,452,15,170,572,430,41,153,276,1,48,307,60,7,23,33
96,CA-507,Marin County CoC,Largely Suburban CoC,Sheltered-Only Count,225,0.727,293,163,130,0,46,142,105,6,41,63,1,28,83,16,1,1,4
97,TN-512,"Morristown/Blount, Sevier, Campbell, Cocke Cou...",Largely Rural CoC,Sheltered and full unsheltered,472,0.316,133,117,16,0,28,65,104,5,61,43,0,5,99,1,1,1,2
98,MA-515,Fall River CoC,Largely Suburban CoC,Sheltered and full unsheltered,324,0.877,236,212,24,0,67,215,223,15,122,101,0,72,135,80,1,2,5


# **INSTRUCTIONS**

* Use Python to analyze the data set and complete each of the following.
* Replace ellipsis (...) with the relavent names or code.  
* For problems that require a written response, double click the text box to start typing.
* Reference the 3 tutorials from activity for assistance.
* Attend office hours if you still need help.

## **QUESTION 1**
Determine whether the four variables below are qualitative or quantitative. If they are quantitative, specify whether they are continuous or discrete.

| Variable                   | Classification            |
|:---------------------------|:--------------------------|
| CoC Category               | ...  |
| Type of Count              | ...  |
| HMIS Participation Rate    | ...  |
| Sheltered ES Homeless 2021 | ...  |

## **QUESTION 2**

For question 2 you will analyze a qualitative variable. Find your variable based on your last name and use that variable when answering all parts of question 2.

Once you find your variable description, scroll up to "About the Dataset" to find the variable name. Then look at the "Snippet of Data" to get the exact variable name, especially since variable names are case sensitive. You can scroll to the right, if needed.

Construct a frequency table, relative frequency table, and relative frequency bar chart to describe the distribution of your variable. State any fact that jumps out to you.

| **Last Name** | **Variable Description** |
|:--------------|:-------------------------|
| A-L           | CoC Category             |
| M-Z           | Type of Count            |



**2a)** Construct a table that contains the frequency and relative frequency distribution for your variable. Round relative frequency to 3 decimal places.

In [15]:
# Define the name of the variable to be analyzed
variable = df['...']

# Create the frequency table and sort the categories in numerical order.
# .sort_index works here because the category names are numerical.
# rename "count" to "frequency"
freq_table = pd.value_counts(variable)
freq_table = freq_table.rename('Frequency')

# Create the relative frequency table, and rename the counts column to
#   Relative Frequency.
relative_freq_table = freq_table/...           #HINT: look back at Project 0 or Tutorial 1.
relative_freq_table = relative_freq_table.rename('...').round(3)

# Combine both tables
# axis=1 says to put the tables together as columns
combined_table=pd.concat([..., ...], axis=1)

# Print the combined table.
...


KeyError: '...'

**2b)** Construct a relative frequency bar chart to describe the distribution of chest pain type.

In [16]:
# The argument in (...) tells the system which table to use to make the
#      bar graph. Since we want a relative freqency bar chart, which is the
#      correct label to replace the ellipsis (...) in this code?
dfrf = pd.DataFrame(...)

# Create the bar graph
fig = px.bar(x=dfrf.index,y=dfrf['Relative Frequency'],
             title='...')

# Update axis labels
fig.update_layout(xaxis_title='...')
fig.update_layout(yaxis_title='...')

# Show the bar graph
fig.show()

ValueError: DataFrame constructor not properly called!

**2c)** Describe the distribution of your variable.

...

## **Question 3**

For question 3 you will analyze a quantitative variable. Find your variable based on your last name and use that variable when answering all parts of question 3.  

Once you find your variable description, scroll up to "About the Dataset" to find the variable name. Then look at the "Snippet of Data" to get the exact variable name, especially since variable names are case sensitive.

| **Last Name** | **Variable Description**               |
|---------------|----------------------------------------|
| A-L           | HMIS Participation Rate for Year-Round Beds (ES, TH, SH) |
| M-Z           | Sheltered ES Homeless, 2021   |



**3a)** Construct a histogram for your variable. Use number of bins = 18.

In [17]:
# Create the histogram, with the x-axis being the variable specified in the
#   table based on your last name.
fig = px.histogram(x=df['...'],nbins = ...,
             title='...',
             labels={'x':'...'})

# Update the vertical axis title.
fig.update_layout(yaxis_title='...')

# Print the histogram
fig.show()

KeyError: '...'

**3b)** Construct a boxplot for your variable.  

In [18]:
# Create the boxplot, with a title, and specify horizontal axis label.
px.box(x=df['...'],
       title=...,
       labels={'x':...})

KeyError: '...'

**3c)** Calculate the following summary statistics for your variable: 5 number summary, mean, and standard deviation. Round to three decimal places.

In [19]:
# Calculate the numerical summaries
# Indicate your variable.
descriptive_stats = df[[...]].describe().round(...)

# Print the results.
...


KeyError: "None of [Index([Ellipsis], dtype='object')] are in the [columns]"

**3d)** Use information from (3a), (3b) and 3(c) to describe your variable in terms of shape, center, spread, and outliers.
* Use the correct center and the correct spread based on the shape of the distribution.
* Specify which center and which spread you are using. For ex: Say "The mean is ..." or "The median is ...", rather than "The center is ..."
* When addressing outliers, if any, list the values of **all** outliers.
* Include units, if any, for all numbers.

...

**3e)** Interpret the standard deviation in context.

...

**3f)** Interpret the IQR in context.

...

## **QUESTION 4**

How do major city CoC's and largely rural CoC's compare with respect to whites and blacks/African Americans in emergency shelters?

Calculate the mean number of sheltered ES whites and sheltered ES blacks for major city CoC's and largely rural CoC's. Round to 2 decimal places. Compre the results. Then answer a question about the code.

**4a)** Calculate and state the number of sheltered ES whites and sheltered ES blacks for major city CoC's and largely rural CoC's.  Round to two decimal places.

In [20]:
# major_city_means = df[df['CoC Category'] == '...'][['quant1', 'quant']].mean().round(...)
major_city_means = df[df['CoC Category'] == '...'][['...', '...']].mean().round(...)
largely_rural_means = df[df['...'] == 'Largely Rural CoC'][['...', '...']].mean().round(...))

# Combine the two table and specify labels.
combined_means = pd.DataFrame({'White': [major_city_means['Sheltered ES Homeless - White, 2021'], largely_rural_means['Sheltered ES Homeless - White, 2021']],
                               'Black': [major_city_means['Sheltered ES Homeless - Black or African American, 2021'], largely_rural_means['Sheltered ES Homeless - Black or African American, 2021']]},
                        index=['Major City Coc', 'Largely Rural CoC'])

# Print the table
...

SyntaxError: unmatched ')' (<ipython-input-20-d42303040fc2>, line 3)

**4b)** Compare the results for people with and without heart disease.

...

**4c)** In the table below are some snippets of code.

| **Last Name Initial** | **Code**                    |
|:-----------------------|:----------------------------------|
| A-L                    | df['CoC Category'] == 'Major City CoC'  |
| M-Z                    | df[df['CoC Category'] == 'Major City CoC']|


Based on your last name, interpret the snippet of code.

...


## **QUESTION 5**

Generate a paragraph of at least 100 words to address one of the following questions. That is, answer only 5a or 5b, but not both.

**5a)** Discuss how analyzing your chosen data set using statistical methods could help you become better prepared for future courses in your major?

...

--OR--

**5b)** Discuss how analyzing your chosen data set using statistical methods could be instrumental in becoming better prepared for your future career?

...


<br><br>
### Once you are done and ready to submit, follow the instructions below to save as a PDF and submit to GradeScope.

### Save as PDF
1. Run all code one last time
2. File-> Print Preview opens in a new browser window
3. Verify you can see all graphs. If not, go back to step 1.
4. File -> Print (or ctrl-p/cmnd-p)
5. Change destination to PDF (don't save, yet)
6. Scroll through preview to make sure you can see your graphs entirely. If not, click Cancel. Make the browser window narrower. Go back to step 4.
7. Repeat steps 4-6 until you can see your graphs completely. But do not make them too narrow.
8. Save the PDF, taking note of where it is saved.

### Submit to GradeScope
1. Login to the Canvas course
2. Click on GradeScope in the course navigation.
3. If you see multiple courses in GradeScope, click on the STAT 108 course
4. Click on the "Tutorial 2 Practice Upload" assignment
5. Click on "Submit Work", select PDF
6. Select the PDF you just created
7. You need to tell GradeScope which page each problem answer/output is on. You should see a list of problems on the right, and a display of pages (thumbnails) on the right.
Assign pages to questions by clicking on the question number on the left, then clicking on all pages that question is on.
8. After ALL questions have been assigned to their respective page(s), click "Submit"
