# Survey Analysis

Use the provided CSV file to perform the following operations:

1. **Reading and Renaming CSV Columns.**
    - Read the CSV file using ```pd.read_csv()``` and store it in a variable named "df".
    - Rename the columns using the ```.rename()``` function, passing a dictionary to the "columns" parameter with old names as keys and new names as values. Choose the following names:
        - time
        - country
        - capital
        - home_city
        - salary
        - age_class
        - height
        - gender
        - animals
        - colour
        
1. **Exploration**
    - Display the shape of the dataframe.
    - Show the number of missing values for each column.
    
1. **New Import of the File**
    - Examine the values of each column and set the best 'dtypes' (data types) for each column. Leave the date as 'string'.
    - Use the ```pd.read_csv()``` function again but provide a mapping dictionary to the "dtype" parameter so that each column is encoded in the correct format. Remember to rename the columns again. You can do this in multiple steps or do everything in the ```pd.read_csv()``` function using the "names" and "header" parameters as well.

1. **"Salary" Column**
    - Display one or more charts to visualize the distribution of values in the column (e.g., 'kde' or 'hist' are relevant), as well as basic statistical indicators on this column.
    - Use ```.loc[]``` to filter out outliers and generate the chart and basic statistical indicators again.

1. **"Country" and "City" Columns**
    - Display a pie chart representing the different countries present. Rectify the data using functions like ```str.upper()```, ```str.lower()```, ```.str.strip()```, or ```str.split()``` if necessary. Since this Series is not numeric, don't forget the ```.value_counts()``` function.
    - Do the same for the "home_city" Series.

1. **"Salary", "Age_class", "Gender" Columns**
    - Group the data by "age_class" and calculate the mean, median, minimum, and maximum of the "salary" column. Remove outliers if necessary.
    - Do the same with "gender".

1. **"Height" Column**
    - The "height" column is currently of type string. To perform calculations, clean it to convert it to cm (int). Create a custom function and use ```.map()``` so that each height is expressed in cm, without any characters other than numbers.
    - Convert it to int.
    - Examine the average height by gender.

1. **Join to Retrieve Population Information**
    - Download or directly read from Pandas the [following file](https://raw.githubusercontent.com/samayo/country-json/master/src/country-by-population.json). Store the result in a dataframe called "pop_df".
    - Perform a left join between the country names and the population to retrieve this information in our main dataframe.

1. **Unique List of Animals**
    - The "animals" column contains different names of animals separated, theoretically, by commas. Create a new Series, which will not be added to our df, containing all the names of animals.
        - You will likely need to use functions like ```str.split()```, ```str.strip()```, ```str.replace()```.
        - Also note that the ```.explode()``` method changes a Series that contains iterables into a unique Series.
        - If you want to remove empty strings, you can replace them with null values using the ```.replace('', pd.NA)``` method and then perform a ```.dropna()```.
    - Perform a ```.value_counts()``` and generate a chart to see which animals are the most popular.

1. **The "Gender" Column**
    - Replace the strings "Man" and "Woman" with, respectively, 0 and 1. All other values should be NaN. You can use ```.replace()``` by providing a mapping dictionary. Or even better, use ```.map()``` by providing a mapping dictionary.
    - Don't forget to convert the results to int and not float.

1. **The "Age_class" Column**
    - Generate a new column named "age_class_mean" that contains a float representing the average for each age class. There are several ways to do this: with a mapping dictionary, with ```.replace()```, with custom functions, etc.

1. **Correlation**
    - Calculate correlations between the following columns: 'salary', 'gender_int', 'age_class_mean'.
        - The easiest way is to create a new df named "corr_df" with only these 3 columns. Since the dataframe is small, you can use ```.copy(deep=True)``` to completely separate your two dfs. This way, in case of an error, you won't modify your original df.
    - Display relevant charts with the Seaborn library to represent these correlations and the corresponding linear regressions.

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Code here !
