# Manipulating DataFrames


This is an exercise in which you will learn to manipulate the structure of a Pandas DataFrame, namely its columns. Often times, it is necessary to alter the structure of a Pandas DataFrame to fit the needs of the user.

Instructions:

1. Import the `pandas`, `pathlib` and `numpy` libraries.

2. Create a variable `csvpath` that represents the path to the [people.csv](../Resources/people.csv) using the Path module from the pathlib library.

3. Read the CSV into a Pandas DataFrame using the Pandas `read_csv` function and the `csvpath` variable and view the first five rows of the DataFrame.

4. View the column names of the Pandas DataFrame.

5. View the column data types of the Pandas DataFrame.

6. Rename the columns of the Pandas DataFrame to "Person_ID", "First_Name", "Last_Name", "Email", "Gender", "University", "Occupation", "Salary".

7. Alternatively, rename the columns of the Pandas DataFrame using a Dictionary.

8. Re-order the columns of the Pandas DataFrame to "Person_ID", "Last_Name", "First_Name", "Gender", "University", "Occupation", "Salary", "Email".


**Bonus:** Tackle the bonus activity if you finish early.

9. Create two additional columns: `Age` and `Age_Copy`. Use the `randint` function from the `numpy` library with the `low`, `high`, and `size` parameters set to `22`, `65`, and `1000`, respectively, to randomly generate an integer from 22 to 65 for 1000 rows.

    The numpy code to generate the randome values is as follows:

    ```python
    np.random.randint(low=22, high=65, size=1000)
    ```

10. Delete the newly created `Age_Copy` column.

11. Using the Pandas `to_csv` function, write the modified DataFrame to a new CSV, and put the file in the `Resources` folder.

## Import the `pandas`, `pathlib` and `numpy` libraries.

In [1]:
# Import pandas, pathlib, and numpy libraries
import pandas as pd
from pathlib import Path
import numpy as np

## Create a variable `csvpath` that represents the path to the people.csv file using the Path module from the pathlib library.

In [2]:
# Use the Pathlib libary to set the path to the CSV
csvpath = Path("../Resources/people.csv")

## Read the CSV into a Pandas DataFrame using the Pandas `read_csv` function and the `csvpath` variable and view the first five rows of the DataFrame.

In [3]:
# Use the file path to read the CSV into a DataFrame
people_df = pd.read_csv(csvpath)

# View the first five rows of the DataFrame
people_df.head()

Unnamed: 0,id,first_name,last_name,email,gender,uni_grad,job_title,Income
0,1.0,Keriann,Lenormand,klenormand0@businessinsider.com,Female,Aurora University,Nurse Practicioner,58135.0
1,2.0,Huntley,Rupke,hrupke1@reuters.com,Male,Osaka University of Economics,Project Manager,96053.0
2,3.0,Gorden,Dalgarnowch,gdalgarnowch2@microsoft.com,Male,Ludong University,Environmental Tech,59196.0
3,4.0,Cullie,,cputten3@nymag.com,Male,Université des Sciences et de la Technologie d...,Legal Assistant,88493.0
4,5.0,Ariel,Strangman,astrangman4@bravesites.com,Female,Boise State University,Project Manager,89073.0


## View the column names of the Pandas DataFrame.

In [4]:
# Use the `columns` attribute to output the column names
people_df.columns

Index(['id', 'first_name', 'last_name', 'email', 'gender', 'uni_grad',
       'job_title', 'Income'],
      dtype='object')

## View the column data types of the Pandas DataFrame.

In [5]:
# Use the `dtypes` attribute to output the column names and data types
people_df.dtypes

id            float64
first_name     object
last_name      object
email          object
gender         object
uni_grad       object
job_title      object
Income        float64
dtype: object

## Rename the columns of the Pandas DataFrame to "Person_ID", "First_Name", "Last_Name", "Email", "Gender", "University", "Occupation", "Salary".

In [6]:
# Set the `columns` attribute to a new list of column names
columns = ["Person_ID", "First_Name", "Last_Name", "Email", "Gender", "University", "Occupation", "Salary"]
people_df.columns = columns

# View the first five rows of the DataFrame
people_df.head()

Unnamed: 0,Person_ID,First_Name,Last_Name,Email,Gender,University,Occupation,Salary
0,1.0,Keriann,Lenormand,klenormand0@businessinsider.com,Female,Aurora University,Nurse Practicioner,58135.0
1,2.0,Huntley,Rupke,hrupke1@reuters.com,Male,Osaka University of Economics,Project Manager,96053.0
2,3.0,Gorden,Dalgarnowch,gdalgarnowch2@microsoft.com,Male,Ludong University,Environmental Tech,59196.0
3,4.0,Cullie,,cputten3@nymag.com,Male,Université des Sciences et de la Technologie d...,Legal Assistant,88493.0
4,5.0,Ariel,Strangman,astrangman4@bravesites.com,Female,Boise State University,Project Manager,89073.0


## Alternatively, rename the columns of the Pandas DataFrame using a Dictionary.

In [7]:
# Use the `rename` function and set the `columns` parameter to a dictionary of new column names
people_df = people_df.rename(columns={
    "id": "Person_ID",
    "first_name": "First_Name",
    "last_name": "Last_Name", 
    "email": "Email",
    "gender": "Gender",
    "uni_grad": "University",
    "job_title": "Occupation",
    "Income": "Salary"
})

# View the first five rows of the DataFrame
people_df.head()

Unnamed: 0,Person_ID,First_Name,Last_Name,Email,Gender,University,Occupation,Salary
0,1.0,Keriann,Lenormand,klenormand0@businessinsider.com,Female,Aurora University,Nurse Practicioner,58135.0
1,2.0,Huntley,Rupke,hrupke1@reuters.com,Male,Osaka University of Economics,Project Manager,96053.0
2,3.0,Gorden,Dalgarnowch,gdalgarnowch2@microsoft.com,Male,Ludong University,Environmental Tech,59196.0
3,4.0,Cullie,,cputten3@nymag.com,Male,Université des Sciences et de la Technologie d...,Legal Assistant,88493.0
4,5.0,Ariel,Strangman,astrangman4@bravesites.com,Female,Boise State University,Project Manager,89073.0


## Re-order the columns of the Pandas DataFrame to "Person_ID", "Last_Name", "First_Name", "Gender", "University", "Occupation", "Salary", "Email".

In [8]:
# Use a list of re-ordered column names to alter the column order of the original DataFrame
people_df = people_df[["Person_ID", "Last_Name", "First_Name", "Gender", "University", "Occupation", "Salary", "Email"]]

# View the first five rows of the DataFrame
people_df.head()

Unnamed: 0,Person_ID,Last_Name,First_Name,Gender,University,Occupation,Salary,Email
0,1.0,Lenormand,Keriann,Female,Aurora University,Nurse Practicioner,58135.0,klenormand0@businessinsider.com
1,2.0,Rupke,Huntley,Male,Osaka University of Economics,Project Manager,96053.0,hrupke1@reuters.com
2,3.0,Dalgarnowch,Gorden,Male,Ludong University,Environmental Tech,59196.0,gdalgarnowch2@microsoft.com
3,4.0,,Cullie,Male,Université des Sciences et de la Technologie d...,Legal Assistant,88493.0,cputten3@nymag.com
4,5.0,Strangman,Ariel,Female,Boise State University,Project Manager,89073.0,astrangman4@bravesites.com


## Bonus - If you complete the first part of this activity early, attempt this bonus section.

### Create two additional columns: `Age` and `Age_Copy`. Use the `randint` function from the `numpy` library with the `low`, `high`, and `size` parameters set to `22`, `65`, and `1000`, respectively, to randomly generate an integer from 22 to 65 for 1000 rows.

In [9]:
# Use the `randint` function to randomly generate an `Age` from 22 to 65 for 1000 rows
people_df["Age"] = np.random.randint(low=22, high=65, size=1000)
people_df["Age_Copy"] = np.random.randint(low=22, high=65, size=1000)

# View the first five rows of the DataFrame
people_df.head()

Unnamed: 0,Person_ID,Last_Name,First_Name,Gender,University,Occupation,Salary,Email,Age,Age_Copy
0,1.0,Lenormand,Keriann,Female,Aurora University,Nurse Practicioner,58135.0,klenormand0@businessinsider.com,36,26
1,2.0,Rupke,Huntley,Male,Osaka University of Economics,Project Manager,96053.0,hrupke1@reuters.com,30,47
2,3.0,Dalgarnowch,Gorden,Male,Ludong University,Environmental Tech,59196.0,gdalgarnowch2@microsoft.com,44,60
3,4.0,,Cullie,Male,Université des Sciences et de la Technologie d...,Legal Assistant,88493.0,cputten3@nymag.com,24,38
4,5.0,Strangman,Ariel,Female,Boise State University,Project Manager,89073.0,astrangman4@bravesites.com,63,39


### Delete the newly created `Age_Copy` column.

In [10]:
# Use the `drop` function to delete the newly created `Age_Copy` column
people_df = people_df.drop(columns=["Age_Copy"])

# View the first five rows of the DataFrame
people_df.head()

Unnamed: 0,Person_ID,Last_Name,First_Name,Gender,University,Occupation,Salary,Email,Age
0,1.0,Lenormand,Keriann,Female,Aurora University,Nurse Practicioner,58135.0,klenormand0@businessinsider.com,36
1,2.0,Rupke,Huntley,Male,Osaka University of Economics,Project Manager,96053.0,hrupke1@reuters.com,30
2,3.0,Dalgarnowch,Gorden,Male,Ludong University,Environmental Tech,59196.0,gdalgarnowch2@microsoft.com,44
3,4.0,,Cullie,Male,Université des Sciences et de la Technologie d...,Legal Assistant,88493.0,cputten3@nymag.com,24
4,5.0,Strangman,Ariel,Female,Boise State University,Project Manager,89073.0,astrangman4@bravesites.com,63


### Using the Pandas `to_csv` function, write the modified DataFrame to a new CSV, and put the file in the `Resources` folder.

In [11]:
# Save the DataFrame to the `Resources` folder
people_df.to_csv("../Resources/people_reordered.csv")