# Import Libraries

In [1]:
import pandas as pd
import scipy.stats as stats
from scipy.stats import pearsonr

# Load the DataFrame

In [2]:
healthcare_df = pd.read_csv('barcelona_healthcare_df.csv')
healthcare_df

Unnamed: 0,neighborhood_name,district_name,median_income,medical_center,type,admin,population,residential_area_ha,net_density_hab_ha,none,private,public,medical_center_count,med_centers_per_pop,med_centers_per_density,income_quartile,private_proportion
0,Baró de Viver,Sant Andreu,29438,,,,2645,3.9,673,1,0,0,0,0.000000,0.000000,Q1,0.000000
1,Can Baró,Horta-Guinardó,45922,,,,9233,14.3,645,1,0,0,0,0.000000,0.000000,Q2,0.000000
2,Can Peguera,Nou Barris,28117,,,,2204,6.1,359,1,0,0,0,0.000000,0.000000,Q1,0.000000
3,Canyelles,Nou Barris,43158,,,,6797,11.0,620,1,0,0,0,0.000000,0.000000,Q2,0.000000
4,Ciutat Meridiana,Nou Barris,29393,,,,11026,15.1,729,1,0,0,0,0.000000,0.000000,Q1,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,la Vila Olímpica del Poblenou,Sant Martí,83317,Centre d'Atenció Primària Vila Olímpica,CAP,Public,9240,24.3,380,0,0,1,1,0.000108,0.002632,Q4,0.000000
69,la Vila de Gràcia,Gràcia,50733,"Clínica Sanza, Centre d'Atenció Primària Vila...","Other, CAP, Other","Private, Public, Private",49492,84.0,589,0,2,1,3,0.000061,0.005093,Q3,0.666667
70,les Corts,Les Corts,65816,"Clínica Institut Marquès, Hospital de Barcelon...","Other, Hospital, Hospital, CAP","Private, Private, Private, Public",45422,64.7,702,0,3,1,4,0.000088,0.005698,Q4,0.750000
71,les Roquetes,Nou Barris,31584,Centre d'Atenció Primària Roquetes,CAP,Public,16373,18.2,897,0,0,1,1,0.000061,0.001115,Q1,0.000000


# Hypothesis testing 1: Pearson

### Explore whether higher income neighborhoods have a higher number of medical centers (medical_center_count).

In the context of urban studies and public health, it is crucial to understand the dynamics between socioeconomic factors and healthcare facility distribution. This project focuses on Barcelona's diverse neighborhoods, analyzing a potential relationship between the neighborhoods' median income and the accessibility to healthcare services, represented by the quantity of medical centers and hospitals. 

The central question driving this analysis is: "What is the relationship between the median income of Barcelona's neighborhoods and the availability of healthcare facilities within these areas?"

To address this question, I hypothesize a positive correlation between the median income and the number of healthcare facilities. In other words, it's presumed that neighborhoods with higher median incomes are likely to have a greater number of medical centers and hospitals.

To test this hypothesis, a Pearson correlation analysis is used due to its effectiveness in measuring the degree of linear relationship between two continuous variables. This method will help determine if the data supports the hypothesis or if we should consider it unsubstantiated.

## Null Hypothesis (H0) and Alternative Hypothesis (H1)

**Null Hypothesis (H0):** There is no correlation between the median income of neighborhoods in Barcelona and the number of healthcare facilities within these areas. This means that the median income does not affect the distribution of medical centers and hospitals.

**Alternative Hypothesis (H1):** There is a positive correlation between the median income of neighborhoods in Barcelona and the number of healthcare facilities within these areas. This suggests that neighborhoods with higher median incomes tend to have more medical centers and hospitals.

### Step-by-step process and the corresponding code

In [3]:
# Calculate the Pearson correlation coefficient between median income and number of medical centers
correlation_coef, p_value = pearsonr(healthcare_df['median_income'], healthcare_df['medical_center_count'])

### Pearson correlation coefficient and the p-value

In [4]:
# Print out the Pearson correlation coefficient and the p-value
print(f'Pearson Correlation Coefficient: {correlation_coef}')
print(f'P-value: {p_value}')

Pearson Correlation Coefficient: 0.46269441929168725
P-value: 3.7639366355539245e-05


**Pearson Correlation Coefficient: 0.46**

This value is the Pearson correlation coefficient, which measures the strength and direction of the linear relationship between two continuous variables. In this case, the value is approximately 0.46, suggesting a **moderate positive correlation between the two variables**.

**P-value: 3.76e-05**

This value is the p-value, representing the probability that the observed correlation between the two variables is due to random chance. The lower the p-value, the stronger the evidence against the null hypothesis. In this case, the **p-value is very small**, indicating that it is highly unlikely that the observed correlation is due to random chance. Therefore, we can conclude that **there is a significant correlation between the two variables**.

### Results

In [5]:
# Interpret the results
# If the p-value is less than 0.05, the correlation is statistically significant
if p_value < 0.05:
    print("The null hypothesis is rejected, indicating a statistically significant correlation between median income and the number of medical centers.")
else:
    print("The null hypothesis cannot be rejected, indicating no statistically significant correlation between median income and the number of medical centers.")

The null hypothesis is rejected, indicating a statistically significant correlation between median income and the number of medical centers.


**The rejection of the null hypothesis (H0) implies that the data provides evidence in favor of the alternative hypothesis (H1).**

In simpler terms, it means that **there is a statistically significant relationship between higher median incomes and a greater number of medical centers in neighborhoods**.

The total count of medical centers directly reflects the absolute availability of healthcare facilities in a neighborhood, which aligns well with the initial hypothesis regarding the relationship between income and healthcare facility availability.

On the other hand, med_centers_per_density takes into account the population density, providing a measure of healthcare availability relative to the number of people living in the area. This metric could be more indicative of the accessibility of medical services for the residents.

medical_center_count it's based on simplicity and a direct approach to testing the hypothesis, because the main question is about the absolute availability of healthcare resources rather than a per capita or density-based measure. However, I will check on the accessibility of healthcare relative to population needs, med_centers_per_density. It might be a more appropriate variable to test.

# Hypothesis testing 2: Pearson

### Examine if the number of medical centers relative to population density (med_centers_per_density) is associated with higher median income.

## Null Hypothesis (H0) and Alternative Hypothesis (H1)

**Null Hypothesis (H0):** There is no correlation between the median income of neighborhoods in Barcelona and the number of medical centers per unit of population density. This means that the median income does not significantly predict the medical centers' density.

**Alternative Hypothesis (H1):** There is a positive correlation between the median income of neighborhoods in Barcelona and the number of medical centers per unit of population density. This implies that neighborhoods with higher median incomes have a higher density of medical centers relative to their population.

### Step-by-step process and the corresponding code

In [6]:
# Import necessary libraries
import scipy.stats as stats

In [7]:
# Calculate the Pearson correlation coefficient and p-value
correlation_coef, p_value = stats.pearsonr(healthcare_df['median_income'], healthcare_df['med_centers_per_density'])

# Output the Pearson correlation coefficient and the p-value
print(f"Pearson Correlation Coefficient: {correlation_coef}")
print(f"P-value: {p_value}")

Pearson Correlation Coefficient: 0.47169436787919383
P-value: 2.5250981467307166e-05


#### Pearson Correlation Coefficient: 0.47

The coefficient is approximately 0.47, indicating a **moderate positive linear correlation between median income and the number of medical centers per population density**.

#### P-value: 2.53e-05

The extremely small p-value suggests that the correlation observed is highly statistically significant. This means that there is **strong evidence to conclude that there is a real and non-random relationship between median income and the number of medical centers per population density**.

### Results

In [8]:
# Interpret the results
if p_value < 0.05:
    print("The null hypothesis is rejected, indicating a statistically significant correlation between median income and medical centers per population density.")
else:
    print("The null hypothesis cannot be rejected, indicating no statistically significant correlation between median income and medical centers per population density.")

The null hypothesis is rejected, indicating a statistically significant correlation between median income and medical centers per population density.


**The rejection of the null hypothesis (H0) implies that the data provides evidence in favor of the alternative hypothesis (H1).**

In simpler terms, it also means that **there is a statistically significant relationship between higher median incomes and a greater number of medical centers in neighborhoods**.

# Additional Hypothesis Tests

Given the focus on understanding the socioeconomic factors influencing healthcare distribution, the following hypothesis tests seem most relevant:

Tests like the correlation between income and the type of medical centers, and the influence of population density on healthcare access, directly contribute to understanding the socioeconomic factors influencing healthcare distribution.

These tests would directly address the socio-economic disparities in healthcare access and provision.

# Hypothesis testing 3: Pearson

### Correlation between Median Income and Type (public or private) of Medical Centers.

## Null Hypothesis (H0) and Alternative Hypothesis (H1)

**Null Hypothesis (H0):** There is no significant difference in the proportion of private vs. public medical centers across neighborhoods with varying median incomes.

**Alternative Hypothesis (H1):** Neighborhoods with higher median incomes have a significantly different proportion of private vs. public medical centers compared to neighborhoods with lower median incomes.

### Step-by-step process and the corresponding code

In [9]:
# Calculate the proportion of private medical centers
healthcare_df['private_proportion'] = healthcare_df['private'] / (healthcare_df['private'] + healthcare_df['public'] + healthcare_df['none'])

In [10]:
healthcare_df[['private_proportion']]

Unnamed: 0,private_proportion
0,0.000000
1,0.000000
2,0.000000
3,0.000000
4,0.000000
...,...
68,0.000000
69,0.666667
70,0.750000
71,0.000000


In [11]:
# Perform a Pearson correlation test
correlation_coef, p_value = stats.pearsonr(healthcare_df['median_income'], healthcare_df['private_proportion'])

# Print out the Pearson correlation coefficient and the p-value
print(f'Pearson Correlation Coefficient: {correlation_coef}')
print(f'p-value: {p_value}')

Pearson Correlation Coefficient: 0.41550925852958875
p-value: 0.0002565906044986082


#### Pearson Correlation Coefficient: 0.41

Suggests a **moderate positive relationship between the median income of neighborhoods and the proportion of private medical centers**. This means that **as the median income in a neighborhood increases, the proportion of private medical centers compared to public ones tends also to increase**.

#### P-value: 0.00026 

Is less than the typical alpha level of 0.05, indicating that **the result is statistically significant**. Therefore, you can reject the null hypothesis, which stated there was no significant correlation between median income and the proportion of private versus public medical centers.

### Results

In [12]:
# Interpret the results
if p_value < 0.05:
    print("The null hypothesis is rejected, indicating a statistically significant difference in the proportion of private vs. public medical centers across neighborhoods with varying median incomes.")
else:
    print("The null hypothesis cannot be rejected, indicating no statistically significant difference in the proportion of private vs. public medical centers across neighborhoods with varying median incomes.")

The null hypothesis is rejected, indicating a statistically significant difference in the proportion of private vs. public medical centers across neighborhoods with varying median incomes.


#### Insight: 

This significant correlation suggests that **wealthier neighborhoods in Barcelona are likely to have a higher proportion of private medical centers relative to public ones, which could reflect economic disparities in access to different types of healthcare facilities**. 

It implies that **in neighborhoods with higher incomes, residents might have greater access to private healthcare services.**

# Hypothesis testing 4: Pearson

### Influence of Population Density on Healthcare Access

## Null Hypothesis (H0) and Alternative Hypothesis (H1)

**Null Hypothesis (H0):** Population density is not a significant predictor of the total number of healthcare facilities in neighborhoods.

**Alternative Hypothesis (H1):** There is a significant correlation between population density and the total number of healthcare facilities, with more densely populated neighborhoods having more healthcare facilities.

### Step-by-step process and the corresponding code

In [13]:
# Perform a Pearson correlation test
correlation_coef, p_value = stats.pearsonr(healthcare_df['net_density_hab_ha'], healthcare_df['medical_center_count'])

# Print out the Pearson correlation coefficient and the p-value
print(f'Pearson Correlation Coefficient: {correlation_coef}')
print(f'p-value: {p_value}')

Pearson Correlation Coefficient: -0.09551118358468128
p-value: 0.42151071945081625


#### Pearson Correlation Coefficient: -0.095

This coefficient value is close to zero and negative, indicating a **very weak inverse relationship between population density and the total number of healthcare facilities**. This means that **as population density increases, the total number of healthcare facilities does not necessarily increase** and may decrease slightly.

#### P-value: 0.42 

This value is much greater than the common alpha level of 0.05, which suggests that the **observed correlation (or lack thereof) could very likely be due to random chance rather than a true effect in the population**.

### Results

In [14]:
# Interpret the results
if p_value < 0.05:
    print("The null hypothesis (H0) is rejected, indicating a statistically significant correlation between population density and the total number of healthcare facilities.")
else:
    print("The null hypothesis (H0) cannot be rejected, indicating no statistically significant correlation between population density and the total number of healthcare facilities.")

The null hypothesis (H0) cannot be rejected, indicating no statistically significant correlation between population density and the total number of healthcare facilities.


#### Interpretation of Results: 

Because the p-value is not less than 0.05, we fail to reject the null hypothesis. There is **no statistically significant evidence to suggest that population density is a predictor of the total number of healthcare facilities**. 

In practical terms, this means that **the analysis did not find a meaningful or systematic relationship between how densely populated a neighborhood is and how many healthcare facilities it has**.

#### In terms of insights:

It implies that **other factors may play a more significant role in determining the number of healthcare facilities in a neighborhood** than simply how many people live there per unit area. It could be useful to look at other variables such as median income (as we did), the availability of land, zoning laws, or historical investment patterns to understand the distribution of healthcare facilities better.

### Check healthcare_df

In [15]:
healthcare_df

Unnamed: 0,neighborhood_name,district_name,median_income,medical_center,type,admin,population,residential_area_ha,net_density_hab_ha,none,private,public,medical_center_count,med_centers_per_pop,med_centers_per_density,income_quartile,private_proportion
0,Baró de Viver,Sant Andreu,29438,,,,2645,3.9,673,1,0,0,0,0.000000,0.000000,Q1,0.000000
1,Can Baró,Horta-Guinardó,45922,,,,9233,14.3,645,1,0,0,0,0.000000,0.000000,Q2,0.000000
2,Can Peguera,Nou Barris,28117,,,,2204,6.1,359,1,0,0,0,0.000000,0.000000,Q1,0.000000
3,Canyelles,Nou Barris,43158,,,,6797,11.0,620,1,0,0,0,0.000000,0.000000,Q2,0.000000
4,Ciutat Meridiana,Nou Barris,29393,,,,11026,15.1,729,1,0,0,0,0.000000,0.000000,Q1,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,la Vila Olímpica del Poblenou,Sant Martí,83317,Centre d'Atenció Primària Vila Olímpica,CAP,Public,9240,24.3,380,0,0,1,1,0.000108,0.002632,Q4,0.000000
69,la Vila de Gràcia,Gràcia,50733,"Clínica Sanza, Centre d'Atenció Primària Vila...","Other, CAP, Other","Private, Public, Private",49492,84.0,589,0,2,1,3,0.000061,0.005093,Q3,0.666667
70,les Corts,Les Corts,65816,"Clínica Institut Marquès, Hospital de Barcelon...","Other, Hospital, Hospital, CAP","Private, Private, Private, Public",45422,64.7,702,0,3,1,4,0.000088,0.005698,Q4,0.750000
71,les Roquetes,Nou Barris,31584,Centre d'Atenció Primària Roquetes,CAP,Public,16373,18.2,897,0,0,1,1,0.000061,0.001115,Q1,0.000000


# Saving

In [16]:
# Save the DataFrame to a CSV file for a later use
healthcare_df.to_csv('barcelona_healthcare_df.csv', index=False)