# Data Manipulation with pandas

👋 Welcome to your new **workspace**! Here, you can experiment with the data you used in [Data Manipulation with pandas](https://app.datacamp.com/learn/courses/data-manipulation-with-pandas) and practice your newly learned skills with some challenges. You can find out more about DataCamp Workspace [here](https://workspace-docs.datacamp.com/).

On average, we expect users to take approximately **30 minutes** to complete the content in this workspace. However, you are free to experiment and practice in it as long as you would like!

## 1. Get Started
Below is a code cell. It is used to execute Python code. The code below imports three packages you used in Data Manipulation with pandas: pandas, NumPy, and Matplotlib. The code also imports data you used in the course as DataFrames using the pandas [`read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) function.

🏃**To execute the code, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell.**

In [5]:
# Import the course packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import the four datasets
avocado = pd.read_csv("../../Datasets/avocado.csv")
homelessness = pd.read_csv("../../Datasets/homelessness.csv")
temperatures = pd.read_csv("../../Datasets/temperatures.csv")
walmart = pd.read_csv("../../Datasets/walmart.csv")

# Print the first DataFrame
avocado

Unnamed: 0.1,Unnamed: 0,Date,AveragePrice,Total Volume,4046,4225,4770,Total Bags,Small Bags,Large Bags,XLarge Bags,type,year,region
0,0,2015-12-27,1.33,64236.62,1036.74,54454.85,48.16,8696.87,8603.62,93.25,0.0,conventional,2015,Albany
1,1,2015-12-20,1.35,54876.98,674.28,44638.81,58.33,9505.56,9408.07,97.49,0.0,conventional,2015,Albany
2,2,2015-12-13,0.93,118220.22,794.70,109149.67,130.50,8145.35,8042.21,103.14,0.0,conventional,2015,Albany
3,3,2015-12-06,1.08,78992.15,1132.00,71976.41,72.58,5811.16,5677.40,133.76,0.0,conventional,2015,Albany
4,4,2015-11-29,1.28,51039.60,941.48,43838.39,75.78,6183.95,5986.26,197.69,0.0,conventional,2015,Albany
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18244,7,2018-02-04,1.63,17074.83,2046.96,1529.20,0.00,13498.67,13066.82,431.85,0.0,organic,2018,WestTexNewMexico
18245,8,2018-01-28,1.71,13888.04,1191.70,3431.50,0.00,9264.84,8940.04,324.80,0.0,organic,2018,WestTexNewMexico
18246,9,2018-01-21,1.87,13766.76,1191.92,2452.79,727.94,9394.11,9351.80,42.31,0.0,organic,2018,WestTexNewMexico
18247,10,2018-01-14,1.93,16205.22,1527.63,2981.04,727.01,10969.54,10919.54,50.00,0.0,organic,2018,WestTexNewMexico


## 2. Write Code
After running the cell above, you have created four pandas DataFrames: `avocado`, `homelessness`, `temperatures`, and `walmart`. 

**Add code** to the code cells below to try one (or more) of the following challenges:

1. Print the highest weekly sales for each `department` in the `walmart` DataFrame. Limit your results to the top five departments, in descending order. If you're stuck, try reviewing this [video](https://campus.datacamp.com/courses/data-manipulation-with-pandas/aggregating-dataframes?ex=1).
2. What was the total `nb_sold` of organic avocados in 2017 in the `avocado` DataFrame? If you're stuck, try reviewing this [video](https://campus.datacamp.com/courses/data-manipulation-with-pandas/slicing-and-indexing-dataframes?ex=6).
3. Create a bar plot of the total number of homeless people by region in the `homelessness` DataFrame. Order the bars in descending order. Bonus: create a horizontal bar chart. If you're stuck, try reviewing this [video](https://campus.datacamp.com/courses/data-manipulation-with-pandas/creating-and-visualizing-dataframes?ex=1).
4. Create a line plot with two lines representing the temperatures in Toronto and Rome. Make sure to properly label your plot. Bonus: add a legend for the two lines. If you're stuck, try reviewing this [video](https://campus.datacamp.com/courses/data-manipulation-with-pandas/creating-and-visualizing-dataframes?ex=1).

Be sure to check out the **Answer Key** at the end to see one way to solve each problem. Did you try something similar?

**Reminder: To execute the code you add to a cell, click inside the cell to select it and click "Run" or the ► icon. You can also use Shift-Enter to run a selected cell.**

In [7]:
# 1. Print the highest weekly sales for each department
department_sales = walmart.groupby("department")[["weekly_sales"]].max()
best_departments = department_sales.sort_values(by="weekly_sales", ascending=False)
best_departments.head()

KeyError: 'department'

In [None]:
# 2. What was the total `nb_sold` of organic avocados in 2017?
avocado_2017 = avocado.set_index("date").sort_index().loc["2017":"2018"]
avocado_organic_2017 = avocado_2017.loc[(avocado_2017["type"] == "organic")]
avocado_organic_2017["nb_sold"].sum()

In [None]:
# 3. Create a bar plot of the number of homeless people by region
homelessness_by_region = (
    homelessness.groupby("region")["individuals"].sum().sort_values()
)
homelessness_by_region.plot(kind="barh")
plt.title("Total Number of Homeless People by Region")
plt.xlabel("Number")
plt.ylabel("Region")
plt.show()

In [None]:
# 4. Create a line plot of temperatures in Toronto and Rome
toronto = temperatures[temperatures.city == "Toronto"]
rome = temperatures[temperatures.city == "Rome"]
toronto.groupby("date")["avg_temp_c"].mean().plot(kind="line", color="blue")
rome.groupby("date")["avg_temp_c"].mean().plot(kind="line", color="red")
plt.title("Toronto and Rome Average Temperature (C)")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.legend(labels=["Toronto", "Rome"])
plt.show()

## 3. Next Steps
Feeling confident about your skills? Continue on to [Joining Data with pandas](https://app.datacamp.com/learn/courses/joining-data-with-pandas)! This course will teach you how to combine multiple datasets, an essential skill on the road to becoming a data scientist!

## 4. Answer Key
Below are potential solutions to the challenges shown above. Try them out and see how they compare to how you approached the problem!

In [None]:
# 1. Print the highest weekly sales for each department
department_sales = walmart.groupby("department")[["weekly_sales"]].max()
best_departments = department_sales.sort_values(by="weekly_sales", ascending=False)
best_departments.head()

In [None]:
# 2. What was the total `nb_sold` of organic avocados in 2017?
avocado_2017 = avocado.set_index("date").sort_index().loc["2017":"2018"]
avocado_organic_2017 = avocado_2017.loc[(avocado_2017["type"] == "organic")]
avocado_organic_2017["nb_sold"].sum()

In [None]:
# 3. Create a bar plot of the number of homeless people by region
homelessness_by_region = (
    homelessness.groupby("region")["individuals"].sum().sort_values()
)
homelessness_by_region.plot(kind="barh")
plt.title("Total Number of Homeless People by Region")
plt.xlabel("Number")
plt.ylabel("Region")
plt.show()

In [None]:
# 4. Create a line plot of temperatures in Toronto and Rome
toronto = temperatures[temperatures.city == "Toronto"]
rome = temperatures[temperatures.city == "Rome"]
toronto.groupby("date")["avg_temp_c"].mean().plot(kind="line", color="blue")
rome.groupby("date")["avg_temp_c"].mean().plot(kind="line", color="red")
plt.title("Toronto and Rome Average Temperature (C)")
plt.xlabel("Date")
plt.ylabel("Temperature")
plt.legend(labels=["Toronto", "Rome"])
plt.show()