<div align="center">
<h1>25 Pandas Coding Mistakes You Should Avoid</h1>

</div>
<div align="center">
Link for the original video: https://youtu.be/_gaAoJBMJ_Q
</div>

### 1. Writing into csv with unnecessary index column

In [1]:
import pandas as pd
df = pd.read_csv('housing.csv')

# Using `index=False` avoids adding unnecassary 
# unnamed index column to the output csv file 
df.to_csv('output.csv', index=False)

### 2. Using column names that include spaces

In [2]:
# It's preferable to avoid including spaces 
# in column names And use underscores instead
df['bedroom_percentage'] = df['total_bedrooms'] / df['total_rooms']

# The column becomes accessible via the dot synatx
df.bedroom_percentage

0        0.228617
1        0.248497
2        0.241667
3        0.224517
4        0.224209
           ...   
16995    0.177718
16996    0.224777
16997    0.198356
16998    0.206587
16999    0.164835
Name: bedroom_percentage, Length: 17000, dtype: float64

In [3]:
# And querying it, is now much easier
df.query('bedroom_percentage >= 0.7')

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,bedroom_percentage
17,-114.65,32.79,21.0,44.0,33.0,64.0,27.0,0.8571,25000.0,0.75
2990,-117.79,35.21,4.0,2.0,2.0,6.0,2.0,2.375,137500.0,1.0
6119,-118.23,34.05,52.0,346.0,270.0,346.0,251.0,2.5313,225000.0,0.780347
6193,-118.24,34.04,52.0,116.0,107.0,171.0,92.0,1.0769,112500.0,0.922414
6380,-118.26,34.05,52.0,58.0,52.0,41.0,27.0,4.0972,500001.0,0.896552
6478,-118.27,34.05,37.0,350.0,245.0,1122.0,248.0,2.7634,137500.0,0.7
8188,-118.44,34.28,46.0,11.0,11.0,24.0,13.0,2.875,162500.0,1.0
11653,-121.29,37.95,52.0,107.0,79.0,167.0,53.0,0.7917,22500.0,0.738318
12282,-121.49,38.58,52.0,569.0,405.0,509.0,367.0,0.9196,137500.0,0.711775
13206,-121.9,37.37,20.0,78.0,72.0,120.0,69.0,1.0938,187500.0,0.923077


### 3. Not leveraging the QUERY method

In [4]:
df.query('housing_median_age > 50 and median_income < 1')

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,bedroom_percentage
1466,-117.19,32.75,52.0,25.0,5.0,13.0,5.0,0.536,162500.0,0.2
1826,-117.27,34.12,52.0,954.0,246.0,943.0,256.0,0.8658,87500.0,0.257862
2888,-117.75,34.06,52.0,62.0,9.0,44.0,16.0,0.4999,112500.0,0.145161
6274,-118.25,34.05,52.0,2806.0,1944.0,2232.0,1605.0,0.6775,350000.0,0.692801
6476,-118.27,34.05,52.0,1292.0,864.0,2081.0,724.0,0.9563,275000.0,0.668731
6631,-118.28,33.93,52.0,117.0,33.0,74.0,45.0,0.4999,90600.0,0.282051
11130,-121.01,37.65,52.0,178.0,53.0,152.0,62.0,0.4999,82500.0,0.297753
11649,-121.29,37.96,52.0,287.0,119.0,154.0,85.0,0.8738,75000.0,0.414634
11652,-121.29,37.95,52.0,288.0,86.0,272.0,54.0,0.696,42500.0,0.298611
11653,-121.29,37.95,52.0,107.0,79.0,167.0,53.0,0.7917,22500.0,0.738318


### 4.  Formulating query strings using string methods

<div align="center">
pandas queries can access external variables by simply using the `@` symbol before the variable instead of formatting them via string methods
</div>

In [5]:
min_pop = 1000
min_income = 0.7

df.query('population > @min_pop and median_income > @min_income')

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,bedroom_percentage
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0,0.228617
1,-114.47,34.40,19.0,7650.0,1901.0,1129.0,463.0,1.8200,80100.0,0.248497
6,-114.58,33.61,25.0,2907.0,680.0,1841.0,633.0,2.6768,82400.0,0.233918
8,-114.59,33.61,34.0,4789.0,1175.0,3134.0,1056.0,2.1782,58400.0,0.245354
10,-114.60,33.62,16.0,3741.0,801.0,2434.0,824.0,2.6797,86500.0,0.214114
...,...,...,...,...,...,...,...,...,...,...
16991,-124.23,41.75,11.0,3159.0,616.0,1343.0,479.0,2.4805,73200.0,0.194998
16993,-124.23,40.54,52.0,2694.0,453.0,1152.0,435.0,3.0806,106700.0,0.168151
16996,-124.27,40.69,36.0,2349.0,528.0,1194.0,465.0,2.5179,79000.0,0.224777
16997,-124.30,41.84,17.0,2677.0,531.0,1244.0,456.0,3.0313,103600.0,0.198356


### 5. Using the `inplace=True` paramter.

In [6]:
# It's preferable to explicitly overwrite modifications 
# As "inplace" is generally frowned upon and 
# Also could be removed in future versions.

df = df.fillna(0)

### 6. Iterating over the rows when vectorization is an option

In [7]:
# Bad practice
for i, row in df.iterrows():
    if row['housing_median_age'] > 50:
        df.loc[i, 'is_old'] = True
    else:
        df.loc[i, 'is_old'] = False
        

In [9]:
# Good practice
df['is_old'] = df['housing_median_age'] > 50
df

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value,bedroom_percentage,is_old
0,-114.31,34.19,15.0,5612.0,1283.0,1015.0,472.0,1.4936,66900.0,0.228617,False
1,-114.47,34.40,19.0,7650.0,1901.0,1129.0,463.0,1.8200,80100.0,0.248497,False
2,-114.56,33.69,17.0,720.0,174.0,333.0,117.0,1.6509,85700.0,0.241667,False
3,-114.57,33.64,14.0,1501.0,337.0,515.0,226.0,3.1917,73400.0,0.224517,False
4,-114.57,33.57,20.0,1454.0,326.0,624.0,262.0,1.9250,65500.0,0.224209,False
...,...,...,...,...,...,...,...,...,...,...,...
16995,-124.26,40.58,52.0,2217.0,394.0,907.0,369.0,2.3571,111400.0,0.177718,True
16996,-124.27,40.69,36.0,2349.0,528.0,1194.0,465.0,2.5179,79000.0,0.224777,False
16997,-124.30,41.84,17.0,2677.0,531.0,1244.0,456.0,3.0313,103600.0,0.198356,False
16998,-124.30,41.80,19.0,2672.0,552.0,1298.0,478.0,1.9797,85800.0,0.206587,False


### 7. Using the `apply` method when vectorization is an option

In [12]:
# Bad practice
df['population_squared'] = df.apply(lambda row: row['population'] ** 2, axis=1)

In [13]:
# Good practice
df['population_squared'] = df['population']**2

### 8. Treating a slice of a dataframe as if it was a new dataframe

### 9. chaining formulas is better than creating many intermediate dataframes

In [None]:
# Bad practice

In [None]:
# Good practice

### 10. properly set column dtypes

### 11. using Boolean instead of Strings

### 12.  pandas plot method instead of matplotlib import

### 13. pandas str.upper() instead apply and etc

### 14. use data pipeline once instead of repeating many times

### 15. learn proper way of renaming columns

### 16. learn proper way of grouping values

### 17. proper way of complex grouping values

### 18. percent_change or difference now could be implemend with function

### 19. save time and space with large datasets with pickle,parquet,feather formats

### 20. conditional format in pandas(like in Microsoft Excel)

### 21. use suffixes while merging TWO dataframes

### 22. check merging is success with validation 

### 23. wrapping expression so they are readable 

### 24. categorical datatypes use less space

### 25. duplicating columns after concatenating, code snippet