# Pandas Objects
Pandas is one of the most widely used libraries in Python for data analysis. It provides powerful data structures like Series and DataFrames that make it easier to manipulate and analyze data. 

To understand how these objects work together, we’ll think of a scenario where we’re collecting information about several people — their height, age, and location.

In this tutorial, we’ll use three key objects from Pandas:

- Pandas Series: A one-dimensional array-like object with labeled indices.
- Pandas DataFrame: A two-dimensional table (like a spreadsheet or SQL table) where data is aligned into rows and columns, and both rows and columns have labels (indices).
- Pandas Index: A specialized object used for managing row and column labels efficiently, with support for operations like unions and intersections.

**Why know about these pandas objects**: Knowing their properties and behaviors can help you work more effectively. Smilar to knowing how to operate a drill saw.

## Step 1: Creating a Pandas Series
We begin with Series, which is a simple list-like object. For example, let’s say we are collecting the heights of four people. Each person's height will be stored along with an index (label) for easy access. 

In [None]:
# Import Pandas
import pandas as pd

# Create a Series to store heights of people
heights = pd.Series([5.5, 6.0, 5.8, 6.2], index=['Alice', 'Bob', 'Charlie', 'David'])
          # pd.Series(data, index): data first, then index.

# Display the Series
heights

Alice      5.5
Bob        6.0
Charlie    5.8
David      6.2
dtype: float64

In [2]:
# Accessing specific data by index
heights['Alice']

np.float64(5.5)

## Step 2. Create and Work with a DataFrame
Now, let’s take multiple Series (like heights and maybe ages) and combine them into a DataFrame.

In [3]:
# Create another Series (ages)
ages = pd.Series([25, 30, 28, 35], index=['Alice', 'Bob', 'Charlie', 'David'])

# Combine Series into a DataFrame
data = pd.DataFrame({'Height': heights, 'Age': ages})

# Display the DataFrame
data

Unnamed: 0,Height,Age
Alice,5.5,25
Bob,6.0,30
Charlie,5.8,28
David,6.2,35


## Step 3. Accessing Data Using Index
Use the Index to refer to rows in your DataFrame and Series.
- Row Identifier = Index
- Column Name = Label

Tips for Efficient Work:
- Use .loc[] for label-based indexing: It's great for when you want to work with specific row/column labels.
- Use .iloc[] for integer-based indexing: When you're accessing data by its position.
- Set operations: Experiment with Index operations like union, intersection, and difference for comparing data labels.

In [15]:
data.loc["Bob"]

Height     6.0
Age       30.0
Name: Bob, dtype: float64

In [16]:
data.iloc[1]

Height     6.0
Age       30.0
Name: Bob, dtype: float64

In [8]:
# Accessing a row by label (index)
print(f"About Alice: {data.loc['Alice']}")  # Access by row label

# Accessing a column by name aka lable
print(f"All heights: {data['Height']}")


About Alice: Height     5.5
Age       25.0
Name: Alice, dtype: float64
All heights: Alice      5.5
Bob        6.0
Charlie    5.8
David      6.2
Name: Height, dtype: float64


In [14]:
import pandas as pd

# Sales data index (customer IDs)
sales_index = pd.Index([1, 2, 3, 4, 5])

# Feedback data index (customer IDs)
feedback_index = pd.Index([4, 5, 6, 7, 8])

# Find customers who both bought and gave feedback (intersection)
customers_with_feedback = sales_index.intersection(feedback_index)  # Common customers

# Find all customers who either bought something or left feedback (union)
all_customers = sales_index.union(feedback_index)  # All unique customers

# Find all customers who bought something and did not leave feedback or vice versa (difference)
customers_difference = sales_index.symmetric_difference(feedback_index)  # Unique customers

# Print results
print("Customers with both sales and feedback:", customers_with_feedback)
print("All unique customers:", all_customers)
print(f"All unique customers (difference): {customers_difference}")


Customers with both sales and feedback: Index([4, 5], dtype='int64')
All unique customers: Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
All unique customers (difference): Index([1, 2, 3, 6, 7, 8], dtype='int64')
