# Lesson 5: Basic Data Analysis

# Topic Overview
Hey there! Curious about data's hidden secrets? Today, we dive into **Basic Data Analysis**—an essential step for data comprehension. We unearth patterns and guide decision-making across various fields, be it business, science, or daily life, with a powerful tool—the **pandas** Python library. Let's embark on this journey!

## Meaning of Basic Data Analysis
Rising to the challenge of solving a data mystery, **Basic Data Analysis** serves as the groundwork. It encompasses understanding and decision-making—be it a business owner understanding customer behavior, a scientist analyzing research data, or a student making sense of study material. With **pandas**, this process becomes effortless.

---

## Using `value_counts()` for Frequency Analysis
Firstly, we employ `value_counts()`, a method that swiftly counts the frequency of DataFrame elements. Consider an imaginary dataset of pets.

```python
import pandas as pd

# Creating DataFrame
data = {'Name': ['Tommy', 'Rex', 'Bella', 'Charlie', 'Lucy', 'Cooper'],
        'Type': ['Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Bird']}
pets_df = pd.DataFrame(data)

# Using the value_counts() function
print(pets_df['Type'].value_counts())
# Output:
# Dog     3
# Cat     2
# Bird    1
# Name: Type, dtype: int64
```

With `value_counts()`, establishing the frequency distribution in a series becomes straightforward.

---

## Grouping and Aggregating with `groupby()` and `agg()` Methods
For summarizing data, `groupby()` and `agg()` prove useful! Now let’s add weight to the pets in our DataFrame to illustrate these methods:

```python
import pandas as pd

# Creating DataFrame
data = {'Name': ['Tommy', 'Rex', 'Bella', 'Charlie', 'Lucy', 'Cooper'],
        'Type': ['Dog', 'Dog', 'Cat', 'Cat', 'Dog', 'Bird'],
        'Weight': [12, 15, 8, 9, 14, 1]}
pets_df = pd.DataFrame(data)

# Grouping and aggregating data
print(pets_df.groupby('Type').agg({'Weight': 'mean'}))
```

### Explanation:
1. `groupby('Type')`: Splits the data into groups based on the `'Type'`.
2. `.agg({'Weight': 'mean'})`: Applies the `'mean'` function to the `'Weight'` column for each group.

The resulting DataFrame shows the average weight for each pet type:
- Bird: 1.0  
- Cat: 8.5  
- Dog: 13.67  

Other functions like `min`, `max`, and `median` can also be used for aggregation.

---

## Sorting DataFrame with `sort_values()`
Lastly, let's sort our data. The `sort_values()` function sorts our DataFrame as per one or many columns. Let's arrange our pet DataFrame by pet weight:

```python
sorted_pets_df = pets_df.sort_values('Weight')
print(sorted_pets_df)
#       Name  Type  Weight
# 5   Cooper  Bird       1
# 2    Bella   Cat       8
# 3  Charlie   Cat       9
# 0    Tommy   Dog      12
# 4     Lucy   Dog      14
# 1      Rex   Dog      15
```

We obtained sorted data efficiently with just one simple command!

---

## Lesson Summary and Upcoming Practice
Great job! You've learned how to execute **Basic Data Analysis** using pandas functions. Here's what we explored:
- ✅ `value_counts()` for frequency analysis  
- ✅ `groupby()` and `agg()` for grouping and aggregating data  
- ✅ `sort_values()` for sorting data  

Are these concepts a lot to digest? Don't worry! Exciting upcoming exercises will reinforce these concepts. Let's delve into practice. Remember, each accomplished task boosts your data analysis skills!


## Average Pet Clinic Visits by Species

In the pet care dataset we've been exploring, imagine you're interested in finding out the median number of visits pets make to the clinic based on their species. The given code has already solved this puzzle by grouping the data by 'Specie' and calculating the median 'Visits' for each group. Click Run to reveal the mysteries of pet care data!

import pandas as pd

# Dictionary of pet care records
pet_care_data = {
    'PetID': [101, 102, 103, 104, 105],
    'Specie': ['Dog', 'Cat', 'Bird', 'Dog', 'Cat'],
    'Visits': [2, 3, 1, 4, 2]
}

# Create DataFrame from dictionary
pet_care_df = pd.DataFrame(pet_care_data)

# Calculate the median number of visits by species
median_visits_by_specie = pet_care_df.groupby('Specie').agg({'Visits': 'median'})
print(median_visits_by_specie)

Here’s the breakdown of the code and what it reveals about the pet care data:

```python
import pandas as pd

# Dictionary of pet care records
pet_care_data = {
    'PetID': [101, 102, 103, 104, 105],
    'Specie': ['Dog', 'Cat', 'Bird', 'Dog', 'Cat'],
    'Visits': [2, 3, 1, 4, 2]
}

# Create DataFrame from dictionary
pet_care_df = pd.DataFrame(pet_care_data)

# Calculate the median number of visits by species
median_visits_by_specie = pet_care_df.groupby('Specie').agg({'Visits': 'median'})
print(median_visits_by_specie)
```

### Explanation:
1. **Dataset Description**:
   - `PetID`: Unique identifier for each pet.
   - `Specie`: Species of the pet (e.g., Dog, Cat, Bird).
   - `Visits`: Number of visits made by the pet to the clinic.

2. **Grouping and Aggregating**:
   - The `groupby('Specie')` method groups the data based on the `Specie` column.
   - The `agg({'Visits': 'median'})` method calculates the **median** of the `Visits` column for each species.

3. **Output**:
   When the code is run, the output will display the median number of visits for each species:
   ```
          Visits
   Specie        
   Bird       1.0
   Cat        2.5
   Dog        3.0
   ```

### Interpretation:
- **Birds**: Median number of visits is 1.0.
- **Cats**: Median number of visits is 2.5.
- **Dogs**: Median number of visits is 3.0.

This insight could be valuable for planning clinic schedules, resources, or services tailored to different species based on their typical visit patterns.

## Sorting Pandas Data by Type in Reverse Order

In this challenge, you will fine-tune the DataFrame sorting mechanism. Adapt the current code to sort pets_df by the 'Type' column in descending order. Check out the new boolean parameter ascending that can be used for it! If ascending is set True, the sorting will be in the ascending order, and vice versa.
import pandas as pd

# Creating the DataFrame
pets_df = pd.DataFrame({
    'Name': ['Buddy', 'Max', 'Luna', 'Bella', 'Leo'],
    'Type': ['Cat', 'Dog', 'Dog', 'Bird', 'Cat'],
    'Height': [25, 40, 35, 15, 20]
})

# TODO: sort pets_df by the 'Type' column in descending order.
sorted_pets_df = pets_df.sort_values('Type', ascending=True)
print(sorted_pets_df)

To sort the `pets_df` DataFrame by the 'Type' column in **descending** order, you need to set the `ascending` parameter to `False`. Here's the updated code:

```python
import pandas as pd

# Creating the DataFrame
pets_df = pd.DataFrame({
    'Name': ['Buddy', 'Max', 'Luna', 'Bella', 'Leo'],
    'Type': ['Cat', 'Dog', 'Dog', 'Bird', 'Cat'],
    'Height': [25, 40, 35, 15, 20]
})

# Sort pets_df by the 'Type' column in descending order
sorted_pets_df = pets_df.sort_values('Type', ascending=False)
print(sorted_pets_df)
```

### Explanation:
- **`ascending=False`**: This ensures the sorting is done in **descending** order based on the 'Type' column.
  
### Output:
The sorted DataFrame will look like this:
```
    Name  Type  Height
1    Max   Dog      40
2   Luna   Dog      35
0  Buddy   Cat      25
4    Leo   Cat      20
3  Bella  Bird      15
```

### Key Points:
- **Sorting by 'Type' in Descending Order**: The pets are sorted in reverse alphabetical order of their `Type` values (`Dog > Cat > Bird`).
- You can adjust this behavior by changing the column or toggling the `ascending` parameter.

## Aggregating and Sorting Pet Care Data

Astounding progress, Space Voyager! A ripe mind indeed craves a challenge. In the realm of pet care, wield the mighty pandas to aggregate the data Your quest? Code the lines that summarize the visits by pet type! Use the sum aggregation function.
import pandas as pd

# Dataset for pet care management
pet_care = {'PetID': [101, 102, 103, 104, 105],
             'Type': ['Dog', 'Cat', 'Bird', 'Dog', 'Cat'],
             'Visits': [2, 3, 1, 5, 2]}

pets_df = pd.DataFrame(pet_care)
# TODO: group the data by Type
# TODO: aggregate by the sum of visits

Here’s the code to summarize the visits by pet type using the `sum` aggregation function:

```python
import pandas as pd

# Dataset for pet care management
pet_care = {'PetID': [101, 102, 103, 104, 105],
             'Type': ['Dog', 'Cat', 'Bird', 'Dog', 'Cat'],
             'Visits': [2, 3, 1, 5, 2]}

# Creating the DataFrame
pets_df = pd.DataFrame(pet_care)

# Group the data by 'Type' and aggregate by the sum of 'Visits'
visits_by_type = pets_df.groupby('Type').agg({'Visits': 'sum'})

# Display the result
print(visits_by_type)
```

### Explanation:
1. **Group by 'Type'**: The `groupby('Type')` method groups the data by the pet's type.
2. **Aggregate with `sum`**: The `agg({'Visits': 'sum'})` computes the total number of visits for each type of pet.
3. **Resulting DataFrame**:
   ```
          Visits
   Type          
   Bird        1
   Cat         5
   Dog         7
   ```

### Interpretation:
- **Birds** had a total of 1 visit.
- **Cats** had a total of 5 visits.
- **Dogs** had a total of 7 visits.

Your mission is complete, Space Voyager! Pandas has revealed the mysteries of pet care visits! 🐾

Great journey so far, Space Voyager! Let's see if you can discover how many of each type of pet we're taking care of. Complete the code to count these furry friends using the value_counts function.
import pandas as pd

# Pet Care DataFrame
pets_df = pd.DataFrame({
    'PetType': ['Dog', 'Cat', 'Dog', 'Bird', 'Dog', 'Cat'],
    'PetName': ['Buddy', 'Whiskers', 'Duke', 'Tweety', 'Max', 'Ginger']
})

# TODO: Find out how many of each type of pet we have
print(____)

Here’s how to complete the code using the `value_counts` function to count how many of each type of pet are in the `PetType` column:

```python
import pandas as pd

# Pet Care DataFrame
pets_df = pd.DataFrame({
    'PetType': ['Dog', 'Cat', 'Dog', 'Bird', 'Dog', 'Cat'],
    'PetName': ['Buddy', 'Whiskers', 'Duke', 'Tweety', 'Max', 'Ginger']
})

# Find out how many of each type of pet we have
print(pets_df['PetType'].value_counts())
```

### Explanation:
- **`value_counts`**: This function counts the frequency of each unique value in the `PetType` column.

### Output:
When the code is executed, the output will be:
```
Dog     3
Cat     2
Bird    1
Name: PetType, dtype: int64
```

### Interpretation:
- **Dogs**: There are 3 dogs.
- **Cats**: There are 2 cats.
- **Birds**: There is 1 bird.

Mission accomplished, Space Voyager! You've successfully tallied the furry (and feathered) friends. 🚀🐾