# Introduction to Pandas and Series

**Pandas** is an open-source Python library built directly on top of NumPy. While NumPy provides high-performance multidimensional arrays, Pandas provides high-performance, easy-to-use data structures specifically designed for **tabular, heterogeneous data** (like SQL tables or Excel spreadsheets).

* **Primary Use Case:** Data cleaning, preparation, and rapid analysis.
* **Core Data Structures:**
1. **Series:** A 1-dimensional labeled array.
2. **DataFrame:** A 2-dimensional labeled data structure (think of it as a collection of Series sharing the same index).



---

## 1. The Pandas Series

A **Series** is very similar to a NumPy array, with one crucial distinction: **it has an explicit, customizable index (axis labels).**

While a NumPy array is strictly indexed by implicit integers (), a Series can be indexed by strings, dates, or any other hashable type.

### Creating a Series

You can construct a Series from various Python objects using `pd.Series(data, index)`.

```python
import numpy as np
import pandas as pd

# Raw Data
my_data = [10, 20, 30]
labels = ['a', 'b', 'c']

# 1. From a Python List (Default Integer Index)
s1 = pd.Series(data=my_data)
# Output:
# 0    10
# 1    20
# 2    30

# 2. From a Python List (Custom Index)
s2 = pd.Series(data=my_data, index=labels)
# Output:
# a    10
# b    20
# c    30

# 3. From a NumPy Array
arr = np.array([10, 20, 30])
s3 = pd.Series(arr, labels)

# 4. From a Dictionary (Fastest/Most Pythonic)
# Keys automatically become the index!
d = {'a': 10, 'b': 20, 'c': 30}
s4 = pd.Series(d)

```

### Data Type Flexibility

Unlike NumPy arrays, which must be homogeneous, a Pandas Series is incredibly flexible. A single Series can hold mixed data types, or even references to built-in Python functions, though this is rare in practical engineering.

---

## 2. Accessing Data (Indexing)

Because a Series behaves largely like a Python dictionary combined with a NumPy array, you retrieve data using the **Index Label**.

```python
ser1 = pd.Series([1, 2, 3, 4], index=['USA', 'Germany', 'USSR', 'Japan'])

# Retrieve value using the string index
print(ser1['USA'])  
# Output: 1

# If the index is integers (default), you use integers.
ser3 = pd.Series(['a', 'b', 'c'])
print(ser3[0]) 
# Output: 'a'

```

---

## 3. Operations & Data Alignment (Crucial Concept)

When you perform mathematical operations between two Series, Pandas automatically **aligns the data based on the Index Label**.

This is a massive advantage over standard arrays. If indices match, the operation is performed. If an index exists in one Series but not the other, Pandas inserts a `NaN` (Not a Number) to indicate missing data.

### Engineering Example

```python
ser1 = pd.Series([1, 2, 3, 4], index=['USA', 'Germany', 'USSR', 'Japan'])
ser2 = pd.Series([1, 2, 5, 4], index=['USA', 'Germany', 'Italy', 'Japan'])

# Add the two series together
result = ser1 + ser2

print(result)
# Output:
# Germany    4.0
# Italy      NaN  <-- Missing in ser1
# Japan      8.0
# USA        2.0
# USSR       NaN  <-- Missing in ser2
# dtype: float64

```

### Important Notes on Operations:

1. **Upcasting to Float:** Notice that the integer inputs (`1`, `2`, etc.) were automatically converted to floats (`4.0`, `2.0`). Pandas does this to ensure it can accommodate `NaN` values (which are technically floats under the hood) without losing data.
2. **Order Independence:** The order of the indices in `ser1` vs `ser2` does not matter; Pandas matches by the label string itself.

---

## Next Steps

While Series are foundational, they are rarely used in isolation. In the next phase, we will combine multiple Series to create a **DataFrame**â€”the true workhorse of the Pandas library.