# Dropping rows
The cleaning process involved removing rows from the dataset that had negative values in key columns such as median gross rent, median home value, and median household income. Negative values in these categories indicate incorrect or missing data. By removing these rows, the dataset was reduced by 192 entries, leaving a total of 601 rows with valid and accurate information. This ensures that any further analysis will be based on reliable data.

In [1]:
import pandas as pd


df = pd.read_csv("/content/Colorado_Utah_Demographics_Migration_Real_Estate.csv")


# Drop rows with negative values in key columns
cols_to_check = ['median_gross_rent', 'median_home_value', 'median_household_income']

# Filter out rows where any of these columns have negative values
cleaned_data = df[(df[cols_to_check] >= 0).all(axis=1)]

# Check how many rows were dropped
rows_dropped = df.shape[0] - cleaned_data.shape[0]

cleaned_data.info(), rows_dropped

<class 'pandas.core.frame.DataFrame'>
Index: 601 entries, 0 to 777
Data columns (total 25 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   state                                     601 non-null    object 
 1   county                                    601 non-null    object 
 2   city                                      601 non-null    object 
 3   zip_code                                  601 non-null    int64  
 4   total_population                          601 non-null    int64  
 5   total_population_2024                     601 non-null    int64  
 6   median_age                                601 non-null    float64
 7   housing_units                             601 non-null    int64  
 8   median_gross_rent                         601 non-null    int64  
 9   median_home_value                         601 non-null    int64  
 10  median_household_income                   6

(None, 192)

In [2]:
cleaned_data.to_csv("cleaned_CO_UT_DMR.csv")