# Lesson 1: Loading and Viewing Data in Pandas



Hello and welcome to our journey into data analysis with Python and pandas. Today, we'll explore **pandas DataFrames** and learn about loading and viewing data.

**Pandas**, a fantastic Python library, simplifies data manipulation and analysis. Our focus today is **DataFrames**—the go-to structure in pandas for data handling.

We will read data from various sources using pandas, load it into a DataFrame, and then explore this data. Let's begin!

---

## Installing and Importing pandas

Installing and importing pandas is like getting our recipe book ready before we start cooking. In our **CodeSignal kitchen**, pandas comes pre-installed. To use it, just import pandas into your script:

```python
import pandas as pd  # Pandas successfully imported
```

The alias `pd` makes it easier to use pandas without typing the full name every time.

---

## Introduction to DataFrames

A **DataFrame** in pandas is like a table, with the data as dishes on the table. Here's how to create one:

### From a List
```python
import pandas as pd

# From a list
data_list = ['apple', 'banana', 'cherry']
df_list = pd.DataFrame(data_list, columns=['Fruit'])
print(df_list)
```
**Output**:
```
   Fruit
0  apple
1  banana
2  cherry
```

### From a Dictionary
```python
# From a dictionary
data_dict = {'Fruit': ['apple', 'banana', 'cherry'], 'Count': [10, 20, 15]}
df_dict = pd.DataFrame(data_dict)
print(df_dict)
```
**Output**:
```
    Fruit  Count
0   apple     10
1  banana     20
2  cherry     15
```

---

## Viewing Data in a DataFrame

### Head and Tail
```python
# First 5 rows
print(df.head())  # Output: First 5 rows of DataFrame 'df'

# Last 5 rows
print(df.tail())  # Output: Last 5 rows of DataFrame 'df'
```
For small DataFrames, both `head()` and `tail()` may show the entire dataset.

### Overview
```python
# Overview
print(df.info())
```
**Output**:
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Fruit   3 non-null      object
 1   Count   3 non-null      int64 
dtypes: int64(1), object(1)
```
This provides information about column names, data types, and non-null counts.

---

## Concatenating DataFrames: `pd.concat`

Combine multiple DataFrames with `pd.concat`:

```python
import pandas as pd

# Creating first DataFrame
data1 = {'Fruit': ['apple', 'banana'], 'Count': [10, 20]}
df1 = pd.DataFrame(data1)

# Creating second DataFrame
data2 = {'Fruit': ['cherry', 'date'], 'Count': [15, 25]}
df2 = pd.DataFrame(data2)

# Concatenating DataFrames
df_combined = pd.concat([df1, df2])
print(df_combined)
```
**Output**:
```
    Fruit  Count
0   apple     10
1  banana     20
0  cherry     15
1    date     25
```

For a new continuous index:
```python
df_combined = pd.concat([df1, df2], ignore_index=True)
print(df_combined)
```
**Output**:
```
    Fruit  Count
0   apple     10
1  banana     20
2  cherry     15
3    date     25
```

---

## Series in pandas

A **Series** in pandas is a one-dimensional labeled array, often representing a single DataFrame column:

```python
import pandas as pd

# Create a Series from a list
fruits = ['apple', 'banana', 'cherry']
series = pd.Series(fruits)

print(series)
```
**Output**:
```
0     apple
1    banana
2    cherry
dtype: object
```

---

## Lesson Summary and Practice

### Key Takeaways
- **Pandas** is a powerful Python library for data manipulation.
- **DataFrame**: A 2D table-like structure in pandas.
- Methods like `head()`, `tail()`, and `info()` are essential for exploring data.
- **pd.concat**: Combine multiple DataFrames easily.
- **Series**: A one-dimensional array for handling single columns.

Practice these techniques to strengthen your data analysis skills. Happy coding! 🎉


## Exploring Grocery Store Data

Have you ever wondered how a grocery store keeps track of its inventory and sales? The given code is designed to demonstrate how pandas can be utilized to create a DataFrame from a dictionary. This DataFrame represents a simple inventory and sales dataset for a grocery store. Click Run to view the first few lines of this data!

import pandas as pd

# Create the dataframe
inventory = pd.DataFrame({
    'Product': ['Apples', 'Oranges', 'Grapes'],
    'InStock': [30, 50, 20],
    'Sold': [10, 20, 15]
})

# Print first 5 rows of the dataframe
print(inventory.head())

Here’s a **clear explanation** of the provided code and how it demonstrates pandas' ability to manage a grocery store's inventory:

---

## 🛒 **Tracking Grocery Inventory and Sales with pandas**

### **The Concept**
Ever wondered how a grocery store manages its inventory and sales efficiently? This example shows how pandas, a Python library, can be used to create and analyze a dataset representing inventory and sales in a grocery store.

### **Code Walkthrough**

```python
import pandas as pd

# Create the dataframe
inventory = pd.DataFrame({
    'Product': ['Apples', 'Oranges', 'Grapes'],
    'InStock': [30, 50, 20],
    'Sold': [10, 20, 15]
})

# Print first 5 rows of the dataframe
print(inventory.head())
```

### **What Does This Do?**
1. **Library Import**  
   - `import pandas as pd`: Brings pandas into your script, making its powerful tools available.

2. **Creating the DataFrame**  
   - A dictionary is passed to `pd.DataFrame()`, where:
     - **Keys** (`'Product'`, `'InStock'`, `'Sold'`): Column names.
     - **Values**: Data for each column, represented as lists.
   - The DataFrame includes:
     - **Product**: List of grocery items.
     - **InStock**: Number of items currently in stock.
     - **Sold**: Number of items sold.

3. **Viewing the Data**  
   - `inventory.head()`: Displays the first 5 rows of the DataFrame (or fewer if the DataFrame is smaller).

### **Expected Output**

When you run the code, you'll see:
```
   Product  InStock  Sold
0   Apples       30    10
1  Oranges       50    20
2   Grapes       20    15
```

---

### **Why This is Useful**
- **Inventory Management**: Tracks stock levels and sales for different products.
- **Data Insights**: Enables calculations like remaining stock or identifying top sellers.
- **Scalability**: Can handle larger datasets with more products and sales data.

---

### **Next Steps**
You can expand this code to:
- Add more products or columns (e.g., prices, profit margins).
- Calculate new metrics like "Remaining Stock" (`InStock - Sold`).
- Sort products by sales performance.

---

**Pandas** simplifies complex data management tasks, making it an invaluable tool for businesses like grocery stores! 🍎🍊🍇


## Peek at the Inventory's End

Now that you've got the hang of viewing data with head(), let's try the opposite: use the tail() function to get a glimpse of the last few rows of our grocery store inventory DataFrame. Modify the code accordingly.

import pandas as pd

# Define a dictionary representing a small section of a grocery store inventory
inventory = {
    'Product': ['Apples', 'Bananas', 'Carrots', 'Dates', 'Eggplants', 'Figs'],
    'InStock': [50, 40, 70, 20, 35, 60],
    'Sold': [20, 30, 25, 10, 15, 40]
}

# Load the inventory into a DataFrame
df_inventory = pd.DataFrame(inventory)

# Display the first few rows of the DataFrame
print(df_inventory.head())

To view the **last few rows** of the DataFrame using the `tail()` function, we need to replace the `head()` function in the code. Here's the updated code: 

```python
import pandas as pd

# Define a dictionary representing a small section of a grocery store inventory
inventory = {
    'Product': ['Apples', 'Bananas', 'Carrots', 'Dates', 'Eggplants', 'Figs'],
    'InStock': [50, 40, 70, 20, 35, 60],
    'Sold': [20, 30, 25, 10, 15, 40]
}

# Load the inventory into a DataFrame
df_inventory = pd.DataFrame(inventory)

# Display the last few rows of the DataFrame
print(df_inventory.tail())
```

### **What Does `tail()` Do?**
The `tail()` function displays the **last 5 rows** of the DataFrame (or fewer if the DataFrame has less than 5 rows). This is helpful for examining the most recent or final entries in a dataset.

---

### **Expected Output**

When you run the modified code, you'll see:
```
     Product  InStock  Sold
1    Bananas       40    30
2    Carrots       70    25
3      Dates       20    10
4  Eggplants       35    15
5       Figs       60    40
```

### **Next Steps**
- Experiment with **custom row counts** by passing an argument to `tail(n)` (e.g., `tail(3)` for the last 3 rows).
- Combine `head()` and `tail()` to quickly understand the beginning and end of your data.

## Grocery Store Inventory DataFrame

Fantastic work, Space Explorer! Let's dive deeper now. Fill in the blanks to display the grocery store inventory properly. Use the name "Fruit" for the column with the fruit's name, and the name "Stock" for the column with the fruit's amount left in stock.

import pandas as pd

# Creating grocery store inventory from a list
inventory_list = [['apple', 30], ['banana', 45], ['orange', 25]]
df_inventory = pd.DataFrame(inventory_list, columns=[___, ___])  # TODO: Fill in the column names
print(df_inventory.head())

Here’s the corrected code to display the grocery store inventory with proper column names:

```python
import pandas as pd

# Creating grocery store inventory from a list
inventory_list = [['apple', 30], ['banana', 45], ['orange', 25]]
df_inventory = pd.DataFrame(inventory_list, columns=['Fruit', 'Stock'])  # Column names added
print(df_inventory.head())
```

### **Explanation**
- The column names `'Fruit'` and `'Stock'` are provided in the `columns` parameter of `pd.DataFrame()`:
  - `'Fruit'`: Represents the name of the fruit.
  - `'Stock'`: Represents the amount of each fruit left in stock.

---

### **Expected Output**
When you run this code, the DataFrame will display as:

```
    Fruit  Stock
0   apple     30
1  banana     45
2  orange     25
```

---

### **Next Steps**
- Add additional data to the inventory list (e.g., `['grape', 50]`).
- Experiment with other pandas methods like `tail()` or `describe()` to explore this DataFrame further. 🚀

## Summon the Data: Creating a DataFrame in Pandas

Great work so far, Stellar Navigator! Now, let's create a space for our grocery data to reside. Your task is to fill in the missing pieces of code to complete the creation of our DataFrame. You can include any data you want, the only requirements is that your dataframe should have three columns: Item for item name, In_Stock for the amount in stock, and Sold for the amount sold.

Navigate wisely!

import pandas as pd

# TODO: Create the dataframe. Include item name, amount in stock and amount sold

print(inventory.head())

Here's the completed code to create the DataFrame with the required columns (`Item`, `In_Stock`, and `Sold`) and some sample data:

```python
import pandas as pd

# Creating the DataFrame with required columns
inventory = pd.DataFrame({
    'Item': ['Apples', 'Bananas', 'Oranges', 'Grapes', 'Pineapples'],
    'In_Stock': [50, 40, 30, 20, 10],
    'Sold': [20, 25, 15, 10, 5]
})

# Display the first few rows of the DataFrame
print(inventory.head())
```

---

### **Explanation**
1. **`pd.DataFrame()`**: Creates a DataFrame using a dictionary.
2. **Keys of the dictionary**: Specify column names:
   - `'Item'`: Names of the items.
   - `'In_Stock'`: Amount currently in stock.
   - `'Sold'`: Amount sold.
3. **Values of the dictionary**: Lists that populate the respective columns.

---

### **Expected Output**
When you run this code, you'll see:

```
         Item  In_Stock  Sold
0      Apples        50    20
1     Bananas        40    25
2     Oranges        30    15
3      Grapes        20    10
4  Pineapples        10     5
```

---

### **Next Steps**
- Modify the data to reflect your specific grocery items and their stock details.
- Use additional pandas methods like `info()` or `describe()` to analyze this data further. 🌟

## Complete the Grocery Store DataFrame

Great job so far, Space Voyager!

Now it's time to add some missing pieces to complete the code. Can you load the grocery store data into a DataFrame and print an overview of the DataFrame?
import pandas as pd

# Grocery store data
data = {'Item': ['Milk', 'Bread', 'Eggs'], 'Price': [1.50, 1.00, 2.50]}
# TODO: Create a DataFrame using the data dictionary
df = ___

# TODO: Print the overview of the DataFrame
df_info = ___

Here's the completed code to load the grocery store data into a DataFrame and print an overview using the `info()` method:

```python
import pandas as pd

# Grocery store data
data = {'Item': ['Milk', 'Bread', 'Eggs'], 'Price': [1.50, 1.00, 2.50]}

# Create a DataFrame using the data dictionary
df = pd.DataFrame(data)

# Print the overview of the DataFrame
df_info = df.info()
```

---

### **Explanation**
1. **`pd.DataFrame(data)`**: Converts the `data` dictionary into a pandas DataFrame.
2. **`df.info()`**:
   - Displays a concise summary of the DataFrame.
   - Includes the index range, column names, non-null counts, and data types of each column.

---

### **Expected Output**
When you run this code, the `info()` method will produce the following output:

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Item    3 non-null      object 
 1   Price   3 non-null      float64
dtypes: float64(1), object(1)
memory usage: 176.0 bytes
```

---

### **Next Steps**
- Experiment by adding more items to the `data` dictionary.
- Use other pandas methods like `head()` or `describe()` to explore the DataFrame further. 🚀

## Grocery Store Inventory Management

Great work, Space Explorer! Let's step it up! Complete the missing pieces to view the grocery store inventory and sales.

Did you know that you can concatenate not only along the rows, but also along the columns? In the following example dataframes don't have the same columns, but they have the same rows, and we can concatenate them successfully. All you need to do is use additional parameter axis=1 for the pd.concat function.

May the stars guide your code!
import pandas as pd

# Creating inventory DataFrame
inventory = {'Item': ['apple', 'banana'], 'Quantity': [30, 50]}
df_inventory = pd.DataFrame(inventory)

# Creating sales DataFrame
sales = {'Item': ['apple', 'banana'], 'Sold': [5, 10]}
df_sales = pd.DataFrame(sales)

# Concatenating DataFrames
# TODO: Concatenate df_inventory and df_sales along the columns (axis=1)

# View the first few rows of the combined DataFrame
# TODO: Use the head() method to view the first few rows
ChatGPT said:
ChatGPT


Here’s the completed code to concatenate the inventory and sales DataFrames along the columns:

```python
import pandas as pd

# Creating inventory DataFrame
inventory = {'Item': ['apple', 'banana'], 'Quantity': [30, 50]}
df_inventory = pd.DataFrame(inventory)

# Creating sales DataFrame
sales = {'Item': ['apple', 'banana'], 'Sold': [5, 10]}
df_sales = pd.DataFrame(sales)

# Concatenating DataFrames along the columns (axis=1)
df_combined = pd.concat([df_inventory, df_sales], axis=1)

# View the first few rows of the combined DataFrame
print(df_combined.head())
```

---

### **Explanation**
1. **`pd.concat([df1, df2], axis=1)`**:
   - Concatenates `df_inventory` and `df_sales` along the columns.
   - The `axis=1` parameter specifies concatenation along columns (horizontal merge).
2. **`head()`**:
   - Displays the first few rows of the concatenated DataFrame to verify the result.

---

### **Expected Output**
When you run the code, the combined DataFrame will look like this:

```
     Item  Quantity   Item  Sold
0   apple        30  apple     5
1  banana        50  banana    10
```

---

### **Next Steps**
- Handle the duplicate `Item` column by dropping one of them if needed (e.g., `df_combined.drop(columns=['Item'])`).
- Try concatenating along the rows (`axis=0`) to see how the data structure changes! 🚀