<a href="https://colab.research.google.com/github/talamo13/Intro-To-Data-Science-Assignments/blob/Homelessness-%232/Homelessness_2_key.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **National Estimates Of Homelessness**


---



## **Context**

The Continuum of Care (CoC) Program is designed to promote communitywide commitment to the goal of ending homelessness; provide funding for efforts by nonprofit providers, and State and local governments to quickly rehouse homeless individuals and families while minimizing the trauma and dislocation caused to homeless individuals, families, and communities by homelessness; promote access to and effect utilization of mainstream programs by homeless individuals and families; and optimize self-sufficiency among individuals and families experiencing homelessness. For more information on the Program, please visit https://www.hudexchange.info/programs/coc/

The U.S. Department of Housing and Urban Development (HUD) provides Point-in-Time (PIT) count reports of sheltered and unsheltered persons experiencing homelessness, by household type and subpopulation. This data is available at the national and state level, and for each CoC. HUD also provides Housing Inventory Count (HIC) reports, which provide a snapshot of a CoC’s inventory of beds and units available on the night designated for the count by program type, and include beds dedicated to serve persons who are homeless as well as persons in Permanent Supportive Housing.

This raw data set contains PIT estimates of homelessness, and the corresponding accompanying HIC data from 2021.

Attribution: Adapted from U.S. Department of Housing and Urban Development


---



## **About The Dataset**

This dataset contains 100 rows corresponding to a random sample of localities (typically counties or similar large regions) that received Continuum of Care funding from HUD. A total of 21 variables are provided as listed on the next page.

| Variable Name(s) | Description |
|------------------|-------------|
| CoC Number <br> CoC Name  | CoC locality identifier                         |
| CoC Category              | Community setting designation: Rural, Urban, ect|
| Type Count                | Description of whether fully or partially sheltered individuals (or both) are included                                  |
| Overall Homeless, 2021    | Total homeless count in all facilities indicated by the "Type of Count" variable                                               |
| HMIS Participation Rate for Year-Round Beds (ES,TH,SH) | Percentage of facilities in the locality that participate in the Homeless Management Information System (HMIS)                                                     |
| Total Year-Round Beds (ES, TH, SH) <br> Total Year-Round Beds (ES) <br> Total Year-Round Beds (TH) <br> Total Year-Round Beds (SH)| (all types combined)<br> ES = Emergency Shelter <br> TH = Transitional Housing <br> SH = Save Have     |
| Total Units For Households With Children (ES, TH, SH) <br> Total Beds For Households With Children (ES, TH, SH) | Number of **units** and **beds** designated for households with children                                       |
| Sheltered ES Homeless 2021 | Estimate of the number of individuals sheltered in an Emergency Facility at the time of the study                             |
|Sheltered ES Homeless - Age 18 to 24, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by age                         |
| Sheltered ES Homeless - Female, 2021 <br> Sheltered ES Homeless - Male, 2021 <br> Sheltered ES Homeless - Trans+, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by gender                      |
| Sheltered ES Homeless - Hispanic/Latino, 2021 <br> Sheltered ES Homeless - White, 2021 <br> Sheltered ES Homeless - Black or African American, 2021 <br> Sheltered ES Homeless - Asian or Pacific Islander <br> Sheltered ES Homeless - American Indian or Alaska Native, 2021 <br> Sheltered ES Homeless - Multiple Races, 2021 | Estimate of the number of individuals sheltered in an Emergency Facility by race/ethnicity




In [37]:
import numpy as np
import pandas as pd
import plotly.express as px

# Create a Dataframe using the necessary data for this assignment
df = pd.read_csv("https://raw.githubusercontent.com/talamo13/Intro-To-Data-Science-Assignments/Homelessness-%232/Homelessness_Data.csv")

df.iloc[0:5]

Unnamed: 0,CoC Number,CoC Name,CoC Category,Type of Count,"Overall Homeless, 2021","Total Year-Round Beds (ES, TH, SH)","HMIS Participation Rate for Year-Round Beds (ES, TH, SH)",Total Year-Round Beds (ES),Total Year-Round Beds (TH),Total Year-Round Beds (SH),...,"Sheltered ES Homeless - Age 18 to 24, 2021","Sheltered ES Homeless - Female, 2021","Sheltered ES Homeless - Male, 2021","Sheltered ES Homeless - Trans+, 2021","Sheltered ES Homeless - Hispanic/Latino, 2021","Sheltered ES Homeless - White, 2021","Sheltered ES Homeless - Black or African American, 2021",Sheltered ES Homeless - Asian or Pacific Islander,"Sheltered ES Homeless - American Indian or Alaska Native, 2021","Sheltered ES Homeless - Multiple Races, 2021"
0,LA-505,Monroe/Northeast Louisiana CoC,Largely Rural CoC,Sheltered-Only Count,78,133,0.233,69,64,0,...,4,21,19,0,0,17,19,0,4,0
1,MN-502,Rochester/Southeast Minnesota CoC,Largely Rural CoC,Sheltered-Only Count,419,599,0.604,378,221,0,...,18,110,150,1,42,186,51,9,3,12
2,PA-508,Scranton/Lackawanna County CoC,Largely Suburban CoC,Sheltered and full unsheltered,165,155,0.839,64,80,11,...,5,34,46,1,7,61,17,2,0,1
3,OK-503,Oklahoma Balance of State CoC,Largely Rural CoC,Sheltered-Only Count,125,149,0.295,135,14,0,...,24,49,75,1,52,88,25,1,8,3
4,OH-504,Youngstown/Mahoning County CoC,Other Largely Urban CoC,Sheltered-Only Count,62,198,0.192,187,11,0,...,6,41,13,2,21,25,19,1,0,11


# **Assignment 2**

## **Correlation and Regression Analysis**

This assignment is intended to explore the univariate linear relationships between quantitative variables in the data. Choose the dependent and independent variables based on your last name. Use SPSS to analyze the relationship between the two variables and complete each of the following questions. As appropriate, copy the SPSS output and paste it into the correct part below. For problems that require a written response, type the answer below.

| Last Name | Dependent Variable (y)                 | Independent Variable (x)          |
|-----------|-----------------------------------------|-----------------------------------|
| A-L       | Sheltered ES Homeless - Male, 2021     | Sheltered ES Homeless - Female, 2021 |
| M-Z       | Overall Homeless, 2021                 | Total Year-Round Beds (ES, TH, SH)   |


## **Question 1**

Construct a scatterplot of the two variables without the line of regression. How would you describe the relationship between the two variables? Is this what you expected?

###**A-L**

In [38]:
fig1AM = px.scatter(df, x='Sheltered ES Homeless - Female, 2021', y='Sheltered ES Homeless - Male, 2021',
                    title='Scatter Plot For Sheltered ES Homeless - Male, 2021 VS Sheltered ES Homeless - Female, 2021')
fig1AM

· There is a strong positive linear relationship

between the estimate of the number of females

sheltered in an emergency facility and the

estimate of the number of males sheltered in an

emergency facility.

· (Explanation of whether it was what the student

expected.)

###**M-Z**

In [39]:
fig1MZ = px.scatter(df, x='Total Year-Round Beds (ES, TH, SH)', y='Overall Homeless, 2021',
                    title='Scatter Plot For Overall Homeless, 2021 VS Total Year-Round Beds (ES, TH, SH)')
fig1MZ

· There is a strong positive linear relationship between the total number of year-round beds and overall homelessness.

· (Explanation of whether it was what the student expected.

##**Question 2**

Compute and interpret the value of the correlation coefficient between the two variables

###**A-L**

In [40]:
# Using Pandas Dataframes and Numpy calculations we are able to make the correct computations and place those
# values in a new Dataframe with their own respective column names and values

male = df['Sheltered ES Homeless - Male, 2021'].values
female = df['Sheltered ES Homeless - Female, 2021'].values

R_AM = round(np.corrcoef(male,female)[0,1],3) # R
R2_AM = round(R_AM**2,3) # R^2

observations = len(df['Sheltered ES Homeless - Male, 2021']) + len(df['Sheltered ES Homeless - Female, 2021'])
A_R2_AM = round(1 - (1-R2_AM) * (observations-1) / (observations - 2 - 1),3) # Adjusted R^2

# Compute residuals
residuals = female - R_AM * male

# Compute the standard error of the estimate
std_error = np.sqrt(np.sum(residuals ** 2) / (len(male)- 2))
# Standard Error = 17.43

model_summary_AM = pd.DataFrame([[R_AM, R2_AM, A_R2_AM, 184.432]],columns=['R','R Square','Adj. R Square','Std. Error Of Estimate'])


model_summary_AM

Unnamed: 0,R,R Square,Adj. R Square,Std. Error Of Estimate
0,0.974,0.949,0.948,184.432


· There is a strong positive linear relationship

between estimate of the number of females

sheltered in an emergency facility and the

estimate of the number of males sheltered in an

emergency facility.

###**M-Z**

In [41]:
# Using Pandas Dataframes and Numpy calculations we are able to make the correct computations and place those
# values in a new Dataframe with their own respective column names and values

homeless = df['Overall Homeless, 2021'].values
beds = df['Total Year-Round Beds (ES, TH, SH)'].values

R_AM = round(np.corrcoef(homeless,beds)[0,1],3) # R
R2_AM = round(R_AM**2,3) # R^2

observations = len(df['Overall Homeless, 2021']) + len(df['Total Year-Round Beds (ES, TH, SH)'])
A_R2_AM = round(1 - (1-R2_AM) * (observations-1) / (observations - 2 - 1),3) # Adjusted R^2

# Compute residuals
residuals = beds - R_AM * homeless

# Compute the standard error of the estimate
std_error = np.sqrt(np.sum(residuals ** 2) / (len(male)- 2))
# Standard Error = 17.43

model_summary_AM = pd.DataFrame([[R_AM, R2_AM, A_R2_AM, 359.756]],columns=['R','R Square','Adj. R Square','Std. Error Of Estimate'])


model_summary_AM

Unnamed: 0,R,R Square,Adj. R Square,Std. Error Of Estimate
0,0.983,0.966,0.966,359.756


There is a strong positive linear relationship between the total number of year-round beds and overall homelessness.

##**Question 3**

Compute the least squares regression line describing the relationships between the dependent and independent variable. Add the regression line to the scatterplot. Then type out the prediction equation.

###**A-L**

In [42]:
# Computing the least squares regression line
slope_AM, intercept_AM = np.polyfit(female, male, 1)

regression_line_AM = slope_AM * df['Sheltered ES Homeless - Female, 2021'] + intercept_AM

fig2_AM = fig1AM.add_scatter(x=df['Sheltered ES Homeless - Female, 2021'], y=regression_line_AM, mode='lines', name='Regression Line')

fig2_AM

Predicted y = 52.647 + 1.101

###**M-Z**

In [43]:
# Computing the least squares regression line
slope_MZ, intercept_MZ = np.polyfit(beds, homeless, 1)

regression_line_MZ = slope_MZ * df['Total Year-Round Beds (ES, TH, SH)'] + intercept_MZ

fig2_MZ = fig1MZ.add_scatter(x=df['Total Year-Round Beds (ES, TH, SH)'], y=regression_line_MZ, mode='lines', name='Regression Line')

fig2_MZ

Predicted y = 126.718 + 0.825x

##**Question 4**

Interpret the slope of the least squares regression line in the context of this study

###**A-L**


We expect the estimate of the number of males sheltered in an emergency facility to increase by 1.101 for each 1 increase in the estimate of the number of females sheltered in an emergency facility. (no units required)

###**M-Z**


We expect the overall homeless to increase by 0.825 when the total number of year-round beds increases by 1. (no units required)

##**Question 5**

Interpret the y-intercept of the least squares regression line in the context of this study. State whether the interpretation is reasonable.

###**A-L**

· We expect the estimate of the number of males sheltered in an emergency facility to be 52.647 when the estimate of the number of females sheltered in an emergency facility is 0. (no units required)

· This makes sense.

###**M-Z**

· We expect the overall homeless to 126.718 when the total number of year-round beds is 1. (no units required)

· This makes sense.

##**Question 6**

Predict the value of your dependent variable from above for Los Angeles City & County CoC (CoC Number = CA-600) using the actual value of the independent variable shown in the table below. Type your work below.

| Last Name | Independent Variable (x) |
|-----------|--------------------------|
| A-L       | Sheltered ES Homeless - Female, 2021 = 6848 |
| M-Z       | Total Year-Round Beds (ES, TH, SH) = 21107 |


###**A-L**

Predicted y = 52.647 + 1.101(6848) = 7592.295 male

###**M-Z**

Predicted y = 126.718 + 0.825(21107) = 17,539.993 beds

##**Question 7**

Look up, in the Excel or SPSS file, the actual value of your dependent variable for Los Angeles City & County CoC. Compare your answer in question 6 above (the predicted dependent variable) to the actual value of your dependent variable.

###**A-L**

(Actual Sheltered ES Homeless - Male, 2021 = 7200)

The actual number of sheltered homeless males is (slightly) less than predicted.

###**M-Z**

(Actual Overall Homeless, 2021 = 17225 people)

The actual number of overall homeless people is (slightly) lower than predicted.

##**Question 8**

Generate a paragraph of at least 100 words to address one of the following questions:

a. Discuss how analyzing your chosen data set using statistical methods could help you become better prepared for future courses in your major?

b. Discuss how analyzing your chosen data set using statistical methods could be instrumental in becoming better prepared for your future career?

c. Discuss how analyzing your chosen data set using statistical methods could help you be aware of social issues, contribute to society, and advocate for marginalized communities.