# **Homelessness - Project 3**
## Inference for the Population Proportion

# **Importing Necessary Python Modules**

Python incorporates a variety of open source add-ins called modules that add extra features to the basic setup. The name of the modules is after the import or from statement, and the purpose is in a non-code comment after the hashtag (#).




In [13]:
import pandas as pd                 #Data analysis
import numpy as np                  #Calculations
from IPython.display import Image   #Display images
from scipy.stats import norm        #Confidence Interval

In [14]:
# Assigns the URL of the image to display to the name 'image_url'.
image_url = 'https://endhomelessness.org/wp-content/uploads/2019/03/homelessness-statistics.jpg'

# Display the image
Image(url=image_url, width = 500)

# **Context**

The Continuum of Care (CoC) Program is designed to promote communitywide commitment to the goal of ending homelessness; provide funding for efforts by nonprofit providers, and State and local governments to quickly rehouse homeless individuals and families while minimizing the trauma and dislocation caused to homeless individuals, families, and communities by homelessness; promote access to and effect utilization of mainstream programs by homeless individuals and families; and optimize self-sufficiency among individuals and families experiencing homelessness. For more information on the Program, please visit https://www.hudexchange.info/programs/coc/

The U.S. Department of Housing and Urban Development (HUD) provides Point-in-Time (PIT) count reports of sheltered and unsheltered persons experiencing homelessness, by household type and subpopulation. This data is available at the national and state level, and for each CoC. HUD also provides Housing Inventory Count (HIC) reports, which provide a snapshot of a CoC’s inventory of beds and units available on the night designated for the count by program type, and include beds dedicated to serve persons who are homeless as well as persons in Permanent Supportive Housing. This raw data set contains PIT estimates of homelessness, and the corresponding accompanying HIC data from 2021.

Attribution: Adapted from U.S. Department of Housing and Urban Development


# **About the Dataset**

This dataset contains 100 rows corresponding to a random sample of localities (typically counties or similar large regions) that received Continuum of Care funding from HUD. A total of 21 variables are provided as listed in the table below.

| Variable Name(s)         | Description |
|:-------------------------|:--- |
| CoC Number<br>CoC Name | CoC locality identifier |
| CoC Category             | Community setting designation: Rural, Urban, etc |
| Type of Count            | Description of whether fully or partially sheltered individuals (or both) are included |
| Overall Homeless, 2021   | Total homeless count in all facilities indicated by the “Type of Count” variable |
| HMIS Participation Rate for Year-Round Beds (ES,TH,SH) | Proportion of facilities in the locality that participate in the Homeless Management Information System (HMIS) |
| Total Year-Round Beds (ES, TH, SH)<br>Total Year-Round Beds (ES)<br>Total Year-Round Beds (TH)<br>Total Year-Round Beds (SH) | (all types combined)<br>ES = Emergency Shelter<br>TH = Transitional Housing<br>SH = Safe Haven |
| Total Units for Households with Children (ES, TH, SH)<br>Total Beds for Households with Children (ES, TH, SH) | Number of units and beds designated for households with children |
| Sheltered ES Homeless 2021 | Estimate of the number of individuals sheltered in an Emergency Facility at the time of the study |
| Sheltered ES Homeless - Age 18 to 24, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by age |
| Sheltered ES Homeless - Female, 2021<br>Sheltered ES Homeless - Male, 2021<br>Sheltered ES Homeless - Trans+, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by gender |
| Sheltered ES Homeless - Hispanic/Latino, 2021<br>Sheltered ES Homeless - White, 2021<br>Sheltered ES Homeless - Black or African American, 2021<br>Sheltered ES Homeless - Asian or Pacific Islander<br>Sheltered ES Homeless - American Indian or Alaska Native, 2021<br>Sheltered ES Homeless - Multiple Races, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by race/ethnicity |



Let's take a look at the data. To do this, first we import it directly from the url below.



# **A Snippet of the Data**

In [15]:
url='https://raw.githubusercontent.com/thamilton562/STAT108_Projects_Students/main/DataSets/Homelessness.csv'

# Reads in the CSV data file and assigns it to the DataFrame 'df'.
df=pd.read_csv(url)

Next, we can display the data by *typing the name* of the DataFrame. To ensure we can see all columns, we'll use the *pd.set_option* method.

In [16]:
# Set display options to show all columns
pd.set_option('display.max_columns', None)

# When you type the object name, the object gets printed.
df

Unnamed: 0,CoC Number,CoC Name,CoC Category,Type of Count,"Overall Homeless, 2021","HMIS Participation Rate for Year-Round Beds (ES, TH, SH)","Total Year-Round Beds (ES, TH, SH)",Total Year-Round Beds (ES),Total Year-Round Beds (TH),Total Year-Round Beds (SH),"Total Units for Households with Children (ES, TH, SH)","Total Beds for Households with Children (ES, TH, SH)","Sheltered ES Homeless, 2021","Sheltered ES Homeless - Age 18 to 24, 2021","Sheltered ES Homeless - Female, 2021","Sheltered ES Homeless - Male, 2021","Sheltered ES Homeless - Trans+, 2021","Sheltered ES Homeless - Hispanic/Latino, 2021","Sheltered ES Homeless - White, 2021","Sheltered ES Homeless - Black or African American, 2021",Sheltered ES Homeless - Asian or Pacific Islander,"Sheltered ES Homeless - American Indian or Alaska Native, 2021","Sheltered ES Homeless - Multiple Races, 2021"
0,LA-505,Monroe/Northeast Louisiana CoC,Largely Rural CoC,Sheltered-Only Count,78,0.233,133,69,64,0,18,52,40,4,21,19,0,0,17,19,0,4,0
1,MN-502,Rochester/Southeast Minnesota CoC,Largely Rural CoC,Sheltered-Only Count,419,0.604,599,378,221,0,114,372,261,18,110,150,1,42,186,51,9,3,12
2,PA-508,Scranton/Lackawanna County CoC,Largely Suburban CoC,Sheltered and full unsheltered,165,0.839,155,64,80,11,25,71,81,5,34,46,1,7,61,17,2,0,1
3,OK-503,Oklahoma Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,125,0.295,149,135,14,0,5,34,125,24,49,75,1,52,88,25,1,8,3
4,OH-504,Youngstown/Mahoning County CoC,Other Largely Urban CoC,Sheltered-Only Count,62,0.192,198,187,11,0,39,115,56,6,41,13,2,21,25,19,1,0,11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,KS-507,Kansas Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,783,0.162,1170,703,452,15,170,572,430,41,153,276,1,48,307,60,7,23,33
96,CA-507,Marin County CoC,Largely Suburban CoC,Sheltered-Only Count,225,0.727,293,163,130,0,46,142,105,6,41,63,1,28,83,16,1,1,4
97,TN-512,"Morristown/Blount, Sevier, Campbell, Cocke Cou...",Largely Rural CoC,Sheltered and full unsheltered,472,0.316,133,117,16,0,28,65,104,5,61,43,0,5,99,1,1,1,2
98,MA-515,Fall River CoC,Largely Suburban CoC,Sheltered and full unsheltered,324,0.877,236,212,24,0,67,215,223,15,122,101,0,72,135,80,1,2,5


# **INSTRUCTIONS**

* Use Python to analyze the data set and complete each of the following.
* Replace ellipsis (...) with the relavent names or code.  
* For problems that require a written response, replace the ellipsis (...) by double clicking the text box to start typing.
* Reference the tutorial from activity for assistance.
* If you still need help:
 * Watch the video.
 * Attend office hours.

# **The variable to analyze**
You will analyze a category of a qualitative variable. Based on the first initial of your LAST name, analyze the category of the variable listed in the table. Use this category for the entire project.

| Last Name | Variable = Category |
|-----------|-------------------------------|
| A-L       | Type of Count = Sheltered-Only Count  |
| M-Z       | Type of Count = Sheltered and full unsheltered  |

In [24]:
# Print all the category names.
# Use this list to ensure correct spelling of your category.

print("... category names")                #Replace ... with the variable name written out
print("--------------------------------")
freq_table = pd.Series(df['...']).value_counts()        #Replace ... with the variable name
print(freq_table)



... category names
--------------------------------


KeyError: '...'

# **QUESTION 1**
## Confidence Interval

**Last Names A-L:** Construct and interpret the 94% confidence interval for the population proportion of counties for which the type of count is sheltered-only count.

**Last Names M-Z:** Construct and interpret the 94% confidence interval for the population proportion of counties for which the type of count is sheltered and full unsheltered.



**1.1) Parameter: Define the parameter, using correct notation.**

...

**1.2) Method: Name the method you will use.**

...

**1.3) Assumptions:**

Complete the code below to find out how many majors fall under the category assigned to you.

In [23]:
# Count total observations
n = len(df)

#Use this code for students
# Count total successes
# Replace the 1st ... with the variable name
# Replace the 2nd ... with the name of the major category to be analyzed
obs_count = df['...'].value_counts().get('...')

print(f"{obs_count} out of {n} counties have ....") #Replace ... with the name of your variable's category.


KeyError: '...'

**Show that both assumptions are met.**

...

**1.4) Calculate: Complete the code below to calculate the sample proportion of majors that fall under the engineering category, and the confidence interval.**

In [22]:
# Define the confidence level
# Replace the ... with the stated confidence level, as a decimal (ex: 0.83, not 83%)
CL = ...

#Calculate the values needed; p-hat, critical value (CV), and standard error (se).
p_hat = obs_count / n
cv = norm.ppf((1+CL)/2)
se = np.sqrt(p_hat * (1-p_hat) / n)

#Calculate the bounds of the interval
ci_lower = (p_hat - cv * se)
ci_upper = (p_hat + cv * se)

print(f"p-hat = {obs_count}/{n} = {p_hat.round(5)}")
print(f"The {CL*100}% CI is ({ci_lower.round(5)}, {ci_upper.round(5)})")


TypeError: unsupported operand type(s) for +: 'int' and 'ellipsis'

**1.5) Communicate Results: Interpret the confidence interval calculated in 1.4 above. Round to three (3) decimal places.**

...

**1.6) Show work to calculate the margin of error. Then interpret the margin of error.**

...



# **Question 2**

## **Hypothesis Test**

**A-L:** There are 3 categories of the type of count variable. If each type of count were equally likely, then 1/3 of counties would be classified as sheltered only count. Based on this data set, is there convincing evidence that the population proportion of counties for which the type of count is sheltered-only count is different from 1/3? Use α=0.04. Write up the solution using the PMACC procedure. NOTE: Use 1/3, not the decimal approximation.

**M-Z:** There are 3 categories of the type of count variable. If each type of count were equally likely, then 1/3 of counties would be classified as sheltered and full unsheltered count. Based on this data set, is there convincing evidence that the population proportion of counties for which the type of count is sheltered and full unsheltered count is different from 1/3? Use α=0.04. Write up the solution using the PMACC procedure. NOTE: Use 1/3, not the decimal approximation.

**2.1) Parameter: Define the parameter, using correct notation.**

...

**2.2) Method: Name the method you will use, and write the hypotheses.**

**Method name:**

...

**Hypotheses:**

...

**2.3) Assumptions: Show that both assumptions are met. Round to 1 decimal place.**

...



**2.4) Calculate: Complete the code below to calculate the values required.**

In [21]:
#Use this code for students
#Define p0, the value in H0.
p_0 = ...     #Replace ... with p0.

#Calculate the values needed; p-hat, and standard error (se).
p_hat = obs_count / n
se = np.sqrt(p_0 * (1-p_0) / n)

#Calculate the z-score of our p-hat, under the assumption H0 is true.
z_score = (p_hat - p_0) / se

#Calculate the p-value for 1- and 2-sided tests
p_value1 = (1 - norm.cdf(abs(z_score)))
p_value2 = 2 * p_value1

print(f"p-hat = {obs_count}/{n} = {p_hat.round(7)}")
print(f"z-score = {z_score.round(7)}")
print(f"1 sided p-value = {p_value1:.11f}")
print(f"2 sided p-value = {p_value2:.11f}")


TypeError: unsupported operand type(s) for -: 'int' and 'ellipsis'

**2.5) Communicate Results: What conclusion is made about the null hypothesis? And what does that mean about the alternate hypothesis?**

...

# **Question 3**

## **Do you make the same conclusion if you use the confidence interval?**

**In question 2 you concluded that we either do have or do not have convincing evidence for the alternate hypothesis. Using your confidence interval from question 1, do you reach the same conclusion?**

...

# **QUESTION 4**

Generate a paragraph of at least 100 words to address one of the following questions. That is, answer only 4a or 4b, but not both.

**4a)** Discuss how analyzing your chosen data set using statistical methods could help you become better prepared for future courses in your major?

...

--OR--

**4b)** Discuss how analyzing your chosen data set using statistical methods could be instrumental in becoming better prepared for your future career?

...


<br><br>
### Once you are done and ready to submit, follow the instructions below to save as a PDF and submit to GradeScope.

### Save as PDF
Note 1: You do not have to select Print Preview. You can print directly from the notebook.
Note 2: Image and graph sizes have been set so you should be able to see them correctly without making any changes to the browser width or the layout (portrait vs landscape).
1. Run all code one last time and make sure your graphs can be seen.
2. File -> Print (or ctrl-p/cmnd-p)
3. Change the "Desination" to PDF.
4. Save the PDF, taking note of where it is saved.

### Submit to GradeScope
**Watch the "GradeScope Submission" video for help.**
1. Login to the Canvas course
2. Click on GradeScope in the course navigation.
3. If you see multiple courses in GradeScope, click on the STAT 108 course
4. Click on the name of the assignment that matches your data set
5. Click on "Submit Work", select PDF
6. Select the PDF you just created
7. You need to tell GradeScope which page each problem answer/output is on. You should see a list of problems on the left, and a display of pages (thumbnails) on the right. Assign pages to questions by clicking on the question number on the left, then clicking on all pages that question is on.
8. After ALL questions have been assigned to their respective page(s), click "Submit"

#### **Still need help? Your STAT 108 team is here to help. Take your laptop to office hours.**
