# introduction to data structures


In [1]:
import numpy as np
import pandas as pd

In [2]:
import pandas as pd
pd.DataFrame({'A': [1, 2, 3]})
pd

<module 'pandas' from '/home/ultron/Desktop/Python/myenv/lib/python3.12/site-packages/pandas/__init__.py'>

# Series

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is to call:

s = pd.Series(data, index=index)
Here, data can be many different things:

- a Python dict

- an ndarray

- a scalar value (like 5)

The passed index is a list of axis labels. Thus, this separates into a few cases depending on what data is:

From ndarray

If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].


# pandas.Series

class pandas.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=<no_default>)

One-dimensional ndarray with axis labels (including time series).

Labels need not be unique but must be a hashable type. The object supports both integer- and label-based indexing and provides a host of methods for performing operations involving the index. Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).

Operations between Series (+, -, /, \*, \*\*) align values based on their associated index values– they need not be the same length. The result index will be the sorted union of the two indexes.

Parameters:

data:

array-like, Iterable, dict, or scalar value
Contains data stored in Series. If data is a dict, argument order is maintained.

index:

array-like or Index (1d)

Values must be hashable and have the same length as data. Non-unique index values are allowed. Will default to RangeIndex (0, 1, 2, …, n) if not provided. If data is dict-like and index is None, then the keys in the data are used as the index. If the index is not None, the resulting Series is reindexed with the index values.

dtype:

str, numpy.dtype, or ExtensionDtype, optional
Data type for the output Series. If not specified, this will be inferred from data. See the user guide for more usages.

name:

Hashable, default None
The name to give to the Series.

copy:

bool, default False
Copy input data. Only affects Series or 1d ndarray input. See examples.


In [17]:
d = {'a': 1, 'b': 2, 'c': 3}
ser = pd.Series(data=d, index=['a', 'b', 'c'])
ser

a    1
b    2
c    3
dtype: int64

In [18]:
r = [1, 2]
ser = pd.Series(r, copy=False)
ser.iloc[0] = 999
ser

0    999
1      2
dtype: int64

In [19]:
r = np.array([1, 2])
ser = pd.Series(r, copy=False)
ser.iloc[0] = 999
r

array([999,   2])

Due to input data type the Series has a view on the original data, so the data is changed as well.


The **`pandas.Series`** is a one-dimensional labeled array that can hold data of any type (integers, floats, strings, Python objects, etc.). It is one of the core data structures in the Pandas library and is widely used for data manipulation and analysis in Python. Below is a detailed explanation of the **`pandas.Series`** class, including its parameters, attributes, methods, and examples.

---

### **1. Overview of `pandas.Series`**

A `Series` is similar to a column in a spreadsheet or a single column in a SQL table. It has:

- **Data**: The actual values stored in the Series.
- **Index**: Labels associated with each value in the Series.
- **dtype**: The data type of the values in the Series.

---

### **2. Parameters of `pandas.Series`**

The `pandas.Series` constructor has the following parameters:

```python
pandas.Series(data=None, index=None, dtype=None, name=None, copy=None, fastpath=<no_default>)
```

#### **Parameters:**

1. **`data`**:

   - Input data for the Series. Can be:
     - A list, tuple, or NumPy array.
     - A dictionary.
     - A scalar value (e.g., `5`).
   - If `data` is a dictionary, the keys are used as the index unless an `index` is explicitly provided.

2. **`index`**:

   - Index labels for the Series. Must be the same length as `data`.
   - If not provided, it defaults to `RangeIndex(0, 1, 2, ..., n-1)`.

3. **`dtype`**:

   - Data type for the Series (e.g., `int`, `float`, `str`).
   - If not specified, Pandas infers the data type from the input data.

4. **`name`**:

   - Name of the Series. Useful when combining multiple Series into a DataFrame.

5. **`copy`**:
   - If `True`, copies the input data. Default is `False`.

---

### **3. Creating a Series**

#### **3.1 From a List**

```python
import pandas as pd

# Create a Series from a list
data = [10, 20, 30, 40, 50]
s = pd.Series(data)
print(s)
```

**Output:**

```
0    10
1    20
2    30
3    40
4    50
dtype: int64
```

#### **3.2 From a Dictionary**

```python
# Create a Series from a dictionary
data = {'a': 10, 'b': 20, 'c': 30}
s = pd.Series(data)
print(s)
```

**Output:**

```
a    10
b    20
c    30
dtype: int64
```

#### **3.3 From a NumPy Array**

```python
import numpy as np

# Create a Series from a NumPy array
arr = np.array([1.5, 2.5, 3.5, 4.5])
s = pd.Series(arr)
print(s)
```

**Output:**

```
0    1.5
1    2.5
2    3.5
3    4.5
dtype: float64
```

#### **3.4 From a Scalar Value**

```python
# Create a Series from a scalar value
s = pd.Series(5, index=[0, 1, 2, 3])
print(s)
```

**Output:**

```
0    5
1    5
2    5
3    5
dtype: int64
```

---

### **4. Attributes of a Series**

#### **4.1 `index`**

- Returns the index of the Series.

```python
print(s.index)  # Output: Index([0, 1, 2, 3], dtype='int64')
```

#### **4.2 `values`**

- Returns the data in the Series as a NumPy array.

```python
print(s.values)  # Output: [5 5 5 5]
```

#### **4.3 `dtype`**

- Returns the data type of the Series.

```python
print(s.dtype)  # Output: int64
```

#### **4.4 `shape`**

- Returns the shape of the Series.

```python
print(s.shape)  # Output: (4,)
```

#### **4.5 `name`**

- Returns or sets the name of the Series.

```python
s.name = "My Series"
print(s.name)  # Output: My Series
```

---

### **5. Methods of a Series**

#### **5.1 `head()` and `tail()`**

- `head(n)` returns the first `n` elements.
- `tail(n)` returns the last `n` elements.

```python
print(s.head(2))  # Output: First 2 elements
print(s.tail(2))  # Output: Last 2 elements
```

#### **5.2 `describe()`**

- Provides summary statistics.

```python
print(s.describe())
```

**Output:**

```
count    4.0
mean     5.0
std      0.0
min      5.0
25%      5.0
50%      5.0
75%      5.0
max      5.0
dtype: float64
```

#### **5.3 `isnull()` and `notnull()`**

- Check for missing values.

```python
s = pd.Series([1, 2, None, 4])
print(s.isnull())  # Output: Boolean Series indicating missing values
print(s.notnull())  # Output: Boolean Series indicating non-missing values
```

#### **5.4 `dropna()`**

- Drops missing values.

```python
s_cleaned = s.dropna()
print(s_cleaned)
```

#### **5.5 `fillna()`**

- Fills missing values with a specified value.

```python
s_filled = s.fillna(0)
print(s_filled)
```

---

### **6. Operations on Series**

#### **6.1 Arithmetic Operations**

```python
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([10, 20, 30, 40])

# Addition
print(s1 + s2)

# Multiplication
print(s1 * s2)
```

#### **6.2 Logical Operations**

```python
# Filter elements greater than 2
print(s1[s1 > 2])
```

---

### **7. Advanced Topics**

#### **7.1 Reindexing**

Reindexing allows you to change the index of a Series.

```python
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
s_reindexed = s.reindex(['c', 'b', 'a', 'd'])
print(s_reindexed)
```

#### **7.2 Sorting**

- Sort by index or values.

```python
# Sort by index
print(s.sort_index())

# Sort by values
print(s.sort_values())
```

#### **7.3 Applying Functions**

You can apply a function to each element in the Series.

```python
s = pd.Series([1, 2, 3, 4])
print(s.apply(lambda x: x**2))
```

#### **7.4 Vectorized Operations**

Pandas Series supports vectorized operations with NumPy functions.

```python
import numpy as np

s = pd.Series([1, 2, 3, 4])
print(np.sqrt(s))  # Square root of each element
```

---

### **8. Practical Applications**

- **Data Cleaning**: Handle missing values, filter data, etc.
- **Data Analysis**: Perform statistical operations.
- **Feature Engineering**: Create new features from existing data.

---

### **9. Summary**

- A `pandas.Series` is a one-dimensional labeled array.
- It can be created from lists, NumPy arrays, dictionaries, or scalar values.
- You can access elements using labels or positions.
- Series supports various operations, including arithmetic, logical, and vectorized operations.
- Advanced features include reindexing, sorting, and applying functions.

By mastering `pandas.Series`, you'll have a strong foundation for working with tabular data in Python. Let me know if you'd like further clarification or additional examples!


The **transpose** of a matrix is an operation that flips the matrix over its diagonal, swapping its rows and columns. If you have a matrix \( A \), its **transpose**, denoted as \( A^T \), is obtained by switching the element at position \( (i, j) \) with the element at \( (j, i) \).

## **Definition**

If a matrix \( A \) is given by:

A =
\begin{bmatrix}
a*{11} & a*{12} & \dots & a*{1n} \\
a*{21} & a*{22} & \dots & a*{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a*{m1} & a*{m2} & \dots & a\_{mn}
\end{bmatrix}

Then its transpose \( A^T \) is:

A^T =
\begin{bmatrix}
a*{11} & a*{21} & \dots & a*{m1} \\
a*{12} & a*{22} & \dots & a*{m2} \\
\vdots & \vdots & \ddots & \vdots \\
a*{1n} & a*{2n} & \dots & a\_{mn}
\end{bmatrix}

This means the rows of \( A \) become the columns of \( A^T \) and vice versa.

## **Properties of the Transpose**

1. **Double Transpose Property**:

   (A^T)^T = A

   Taking the transpose twice returns the original matrix.

2. **Addition Property**:

   (A + B)^T = A^T + B^T

   The transpose of a sum is the sum of the transposes.

3. **Scalar Multiplication**:

   (cA)^T = cA^T

   The transpose of a scalar multiple is the scalar times the transpose.

4. **Multiplication Property**:

   (AB)^T = B^T A^T

   The transpose of a product reverses the order of multiplication.

5. **Inverse Property (for invertible matrices)**:

   (A^{-1})^T = (A^T)^{-1}

   The transpose of the inverse is the inverse of the transpose.

6. **Symmetric Matrices**:  
   A matrix is **symmetric** if:

   A^T = A

   That means it is equal to its transpose.

7. **Orthogonal Matrices**:  
   A matrix \( Q \) is **orthogonal** if:

   Q^T Q = Q Q^T = I

   Its transpose is also its inverse.

## **Special Cases**

- **Square Matrices**: For a square matrix (same number of rows and columns), the transpose maintains its square shape.
- **Column and Row Vectors**:
  - A **column vector** (single column) becomes a **row vector** when transposed.
  - A **row vector** (single row) becomes a **column vector** when transposed.

## **Example**

Consider the matrix:

A =
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix}

Its transpose is:

A^T =
\begin{bmatrix}
1 & 4 \\
2 & 5 \\
3 & 6
\end{bmatrix}


---

Pandas.Series.index

Series.index

The index (axis labels) of the Series.

The index of a Series is used to label and identify each element of the underlying data. The index can be thought of as an immutable ordered set (technically a multi-set, as it may contain duplicate labels), and is used to index and align data in pandas.

Returns :
Index
The index labels of the Series.


In [20]:
cities = ['Kolkata', 'Chicago', 'Toronto', 'Lisbon']
populations = [14.85, 2.71, 2.93, 0.51]
city_series = pd.Series(populations, index=cities)
city_series.index

Index(['Kolkata', 'Chicago', 'Toronto', 'Lisbon'], dtype='object')

In [21]:
city_series.index = ['KOL', 'CHI', 'TOR', 'LIS']
city_series.index

Index(['KOL', 'CHI', 'TOR', 'LIS'], dtype='object')

### pandas.Series.index

The `Series.index` attribute in **pandas** is used to access and manipulate the **index labels** of a Series. It is an **immutable ordered set** (technically a multi-set, as duplicate labels are allowed). The index helps **identify and align data** in pandas.

---

## **1. Basic Usage: Accessing Index**

```python
import pandas as pd

cities = ['Kolkata', 'Chicago', 'Toronto', 'Lisbon']
populations = [14.85, 2.71, 2.93, 0.51]

city_series = pd.Series(populations, index=cities)
print(city_series.index)
```

**Output:**

```
Index(['Kolkata', 'Chicago', 'Toronto', 'Lisbon'], dtype='object')
```

- This returns the **index labels** of the Series.

---

## **2. Changing the Index**

You can change the index labels of an existing Series.

```python
city_series.index = ['KOL', 'CHI', 'TOR', 'LIS']
print(city_series.index)
```

**Output:**

```
Index(['KOL', 'CHI', 'TOR', 'LIS'], dtype='object')
```

- Directly replaces the existing index.

---

## **3. Getting Index Values**

You can get the index values as a **NumPy array**.

```python
print(city_series.index.values)
```

**Output:**

```
['KOL' 'CHI' 'TOR' 'LIS']
```

---

## **4. Checking Index Data Type**

```python
print(city_series.index.dtype)
```

**Output:**

```
object
```

- Shows the **data type** of the index elements.

---

## **5. Checking if an Index Contains a Value**

```python
print('KOL' in city_series.index)  # True
print('NYC' in city_series.index)  # False
```

---

## **6. Resetting the Index (Convert Index to Column)**

```python
city_reset = city_series.reset_index()
print(city_reset)
```

**Output:**

```
  index      0
0   KOL  14.85
1   CHI   2.71
2   TOR   2.93
3   LIS   0.51
```

- Converts the index into a column.

---

## **7. Setting a Column as Index**

```python
city_new = city_reset.set_index("index")
print(city_new)
```

**Output:**

```
        0
index
KOL  14.85
CHI   2.71
TOR   2.93
LIS   0.51
```

- Uses the `"index"` column as the new index.

---

## **8. Renaming the Index**

```python
city_series.index.name = "City Code"
print(city_series)
```

**Output:**

```
City Code
KOL    14.85
CHI     2.71
TOR     2.93
LIS     0.51
dtype: float64
```

---

## **9. Sorting the Index**

```python
sorted_series = city_series.sort_index()
print(sorted_series)
```

**Output:**

```
CHI     2.71
KOL    14.85
LIS     0.51
TOR     2.93
dtype: float64
```

- Sorts the Series **by index labels**.

---

## **10. Reindexing (Changing Index & Adding Missing Values)**

```python
new_index = ["KOL", "CHI", "TOR", "LIS", "NYC"]
reindexed_series = city_series.reindex(new_index, fill_value=0)
print(reindexed_series)
```

**Output:**

```
KOL    14.85
CHI     2.71
TOR     2.93
LIS     0.51
NYC     0.00
dtype: float64
```

- Adds missing index labels with a **default value**.

---

## **11. Getting Index Position**

```python
print(city_series.index.get_loc("TOR"))
```

**Output:**

```
2
```

- Returns the **position** of the label `"TOR"` in the index.

---

## **12. Converting Index to List**

```python
print(city_series.index.tolist())
```

**Output:**

```
['KOL', 'CHI', 'TOR', 'LIS']
```

- Converts index to a **Python list**.

---

## **13. Checking If Index Is Unique**

```python
print(city_series.index.is_unique)
```

**Output:**

```
True
```

- Returns `True` if **all index labels are unique**.

---

## **14. Checking Index Length**

```python
print(len(city_series.index))
```

**Output:**

```
4
```

- Returns the **number of index elements**.

---

## **15. Getting Index Item by Position**

```python
print(city_series.index[1])  # 'CHI'
```

- Access **specific index labels** by position.

---

## **16. Getting a Slice of Index**

```python
print(city_series.index[:2])
```

**Output:**

```
Index(['KOL', 'CHI'], dtype='object')
```

- Retrieves a **subset** of the index.

---

## **17. Finding Index of a Specific Label**

```python
print(city_series.index.get_indexer(["TOR", "NYC"]))
```

**Output:**

```
[2 -1]
```

- Returns the **position** of labels (`-1` means not found).

---

## **18. Changing Index Type (e.g., Numeric)**

```python
city_series.index = [101, 102, 103, 104]
print(city_series)
```

**Output:**

```
101    14.85
102     2.71
103     2.93
104     0.51
dtype: float64
```

- Replaces **string index** with numbers.

---

## **19. Checking Index Memory Usage**

```python
print(city_series.index.memory_usage())
```

- Shows the **memory usage** of the index.

---

### **Summary Table**

| Syntax                     | Description                          |
| -------------------------- | ------------------------------------ |
| `s.index`                  | Get the index of a Series            |
| `s.index = [...]`          | Set a new index                      |
| `s.index.values`           | Get index values as NumPy array      |
| `s.index.dtype`            | Get index data type                  |
| `"x" in s.index`           | Check if a value exists in the index |
| `s.reset_index()`          | Reset index to default numeric       |
| `s.set_index("col")`       | Set a column as an index             |
| `s.index.name = "NewName"` | Name the index                       |
| `s.sort_index()`           | Sort the Series by index             |
| `s.reindex([...])`         | Change index, add missing values     |
| `s.index.get_loc("x")`     | Get position of an index label       |
| `s.index.tolist()`         | Convert index to list                |
| `s.index.is_unique`        | Check if index is unique             |
| `s.index.memory_usage()`   | Get memory usage of index            |


# pandas.Series.array

The **`.array`** property in **pandas.Series** returns the **ExtensionArray** that holds the underlying data. It is different from `.values`, which may convert the data into a **NumPy array**.

---

## **1. Basic Usage: Accessing `.array`**

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.array)
```

**Output:**

```
<NumpyExtensionArray>
[1, 2, 3]
Length: 3, dtype: int64
```

- This returns a **pandas NumpyExtensionArray**, which wraps around a NumPy array.

---

## **2. Difference Between `.array` and `.values`**

```python
print(s.array)   # Returns an ExtensionArray
print(s.values)  # Returns a NumPy array
```

- `.array` returns an **ExtensionArray**.
- `.values` returns a **NumPy array** (or an object array if mixed types).

---

## **3. Working with Extension Types**

For pandas **ExtensionArrays**, `.array` returns the **actual array**.

### **Categorical Example**

```python
ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
print(ser.array)
```

**Output:**

```
['a', 'b', 'a']
Categories (2, object): ['a', 'b']
```

- Here, `.array` returns a **CategoricalArray** instead of a NumPy array.

---

### **4. Boolean ExtensionArray**

```python
ser = pd.Series([True, False, None], dtype="boolean")
print(ser.array)
```

**Output:**

```
<BooleanArray>
[True, False, <NA>]
Length: 3, dtype: boolean
```

- It returns a **BooleanArray** instead of a NumPy array.

---

### **5. String ExtensionArray**

```python
ser = pd.Series(["apple", "banana", None], dtype="string")
print(ser.array)
```

**Output:**

```
<StringArray>
['apple', 'banana', <NA>]
Length: 3, dtype: string
```

- If the dtype is `"string"`, it returns a **StringArray**.

---

### **6. Integer ExtensionArray**

```python
ser = pd.Series([1, 2, None], dtype="Int64")
print(ser.array)
```

**Output:**

```
<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64
```

- If the dtype is `"Int64"`, it returns an **IntegerArray** that supports `NaN` values.

---

### **7. PeriodArray for Datetime-Like Data**

```python
ser = pd.Series(pd.period_range('2023-01', periods=3, freq='M'))
print(ser.array)
```

**Output:**

```
<PeriodArray>
['2023-01', '2023-02', '2023-03']
Length: 3, dtype: period[M]
```

- `.array` returns a **PeriodArray** when working with period data.

---

### **8. IntervalArray for Ranges**

```python
ser = pd.Series(pd.interval_range(start=0, periods=3))
print(ser.array)
```

**Output:**

```
<IntervalArray>
[(0, 1], (1, 2], (2, 3]]
Length: 3, dtype: interval[int64, right]
```

- Returns an **IntervalArray**.

---

### **9. DatetimeArray with Timezones**

```python
ser = pd.Series(pd.date_range("2024-01-01", periods=3, tz="UTC"))
print(ser.array)
```

**Output:**

```
<DatetimeArray>
['2024-01-01 00:00:00+00:00', '2024-01-02 00:00:00+00:00', '2024-01-03 00:00:00+00:00']
Length: 3, dtype: datetime64[ns, UTC]
```

- `.array` returns a **DatetimeArray** when timezones are included.

---

### **10. Getting the Length of `.array`**

```python
print(len(s.array))
```

- Returns the **length** of the underlying array.

---

### **11. Checking Type of `.array`**

```python
print(type(s.array))
```

- Returns the **type** of the underlying array (usually `NumpyExtensionArray` or a pandas ExtensionArray).

---

## **12. Converting `.array` to a List**

```python
print(s.array.tolist())
```

- Converts the `.array` to a **Python list**.

---

## **13. Converting `.array` to a NumPy Array**

```python
print(s.array.to_numpy())
```

- This explicitly converts it to a **NumPy array**.

---

## **Comparison Table: `.array` vs `.values` vs `.to_numpy()`**

| Syntax         | Returns                         | Notes                                        |
| -------------- | ------------------------------- | -------------------------------------------- |
| `s.array`      | **ExtensionArray**              | Wraps actual data (may not be a NumPy array) |
| `s.values`     | **NumPy array** or object array | Converts data if necessary                   |
| `s.to_numpy()` | **NumPy array**                 | Explicit NumPy conversion                    |

---

### **Summary Table of Extension Arrays**

| Data Type                    | `.array` Type         |
| ---------------------------- | --------------------- |
| `category`                   | `CategoricalArray`    |
| `boolean`                    | `BooleanArray`        |
| `string`                     | `StringArray`         |
| `Int64`, `UInt32` (nullable) | `IntegerArray`        |
| `datetime64[ns, tz]`         | `DatetimeArray`       |
| `period[M]`, `period[D]`     | `PeriodArray`         |
| `interval[int64]`            | `IntervalArray`       |
| Other dtypes                 | `NumpyExtensionArray` |

---

## **Conclusion**

- `.array` **does not always return a NumPy array**. It returns an **ExtensionArray** when using pandas-specific data types.
- If you want a **NumPy array**, use `.to_numpy()`.
- If you work with pandas **categoricals, nullable integers, or periods**, `.array` helps maintain the original dtype.


In [22]:
pd.Series([1, 2, 3]).array

<NumpyExtensionArray>
[np.int64(1), np.int64(2), np.int64(3)]
Length: 3, dtype: int64

In [23]:
ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
ser.array

['a', 'b', 'a']
Categories (2, object): ['a', 'b']

# **pandas.Series.values – All Syntaxes & Details**

The **`.values`** attribute in **pandas.Series** returns the underlying data as a **NumPy array or an ndarray-like object**, depending on the dtype.

> **⚠️ Warning:**
>
> - **`Series.values` is NOT recommended**.
> - Use **`.array`** if you need a reference to the underlying data (ExtensionArray).
> - Use **`.to_numpy()`** if you need a NumPy array.

---

## **1. Basic Usage: Accessing `.values`**

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.values)
```

**Output:**

```
array([1, 2, 3])
```

- Returns a **NumPy array** (`numpy.ndarray`).

---

## **2. Difference Between `.values`, `.array`, and `.to_numpy()`**

```python
print(s.values)       # NumPy array
print(s.array)        # ExtensionArray
print(s.to_numpy())   # NumPy array
```

- `.values` **returns a NumPy array or an ndarray-like object**.
- `.array` **returns an ExtensionArray**.
- `.to_numpy()` is **explicitly meant to return a NumPy array**.

---

## **3. Using `.values` with String Data**

```python
s = pd.Series(list('aabc'))
print(s.values)
```

**Output:**

```
array(['a', 'a', 'b', 'c'], dtype=object)
```

- Returns an **object dtype NumPy array**.

---

## **4. Using `.values` with Categorical Data**

```python
s = pd.Series(list('aabc')).astype('category')
print(s.values)
```

**Output:**

```
['a', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']
```

- Returns a **Categorical array** instead of a NumPy array.

---

## **5. Using `.values` with Datetime and Timezone Data**

```python
s = pd.Series(pd.date_range('20130101', periods=3, tz='US/Eastern'))
print(s.values)
```

**Output:**

```
array(['2013-01-01T05:00:00.000000000',
       '2013-01-02T05:00:00.000000000',
       '2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')
```

- Converts **timezone-aware** datetime to **UTC**.

---

## **6. Using `.values` with Boolean Data**

```python
s = pd.Series([True, False, True])
print(s.values)
```

**Output:**

```
array([ True, False,  True])
```

- Returns a **boolean NumPy array**.

---

## **7. Using `.values` with Mixed Data Types**

```python
s = pd.Series([1, "apple", 3.5])
print(s.values)
```

**Output:**

```
array([1, 'apple', 3.5], dtype=object)
```

- Returns a **NumPy object array** due to mixed types.

---

## **8. Using `.values` with IntegerNA Dtype**

```python
s = pd.Series([1, 2, None], dtype="Int64")
print(s.values)
```

**Output:**

```
array([1, 2, <NA>], dtype=object)
```

- Returns an **object array** instead of a NumPy array.

---

## **9. Using `.values` with Period Data**

```python
s = pd.Series(pd.period_range("2023-01", periods=3, freq="M"))
print(s.values)
```

**Output:**

```
array([Period('2023-01', 'M'), Period('2023-02', 'M'), Period('2023-03', 'M')],
      dtype=object)
```

- Returns an **object array of Period elements**.

---

## **10. Using `.values` with Interval Data**

```python
s = pd.Series(pd.interval_range(start=0, periods=3))
print(s.values)
```

**Output:**

```
array([Interval(0, 1, closed='right'),
       Interval(1, 2, closed='right'),
       Interval(2, 3, closed='right')], dtype=object)
```

- Returns an **object array of Interval elements**.

---

## **11. Using `.values` with Float Data**

```python
s = pd.Series([1.1, 2.2, 3.3])
print(s.values)
```

**Output:**

```
array([1.1, 2.2, 3.3])
```

- Returns a **NumPy float64 array**.

---

## **Comparison Table: `.values` vs `.array` vs `.to_numpy()`**

| Syntax         | Returns                     | Notes                                        |
| -------------- | --------------------------- | -------------------------------------------- |
| `s.values`     | NumPy array or ndarray-like | May return **object dtype** for complex data |
| `s.array`      | ExtensionArray              | Preserves pandas-specific types              |
| `s.to_numpy()` | NumPy array                 | Explicit conversion to NumPy                 |

---

## **Best Practices**

✅ **Use `.array`** if you need a reference to the underlying data (ExtensionArray).  
✅ **Use `.to_numpy()`** if you need a NumPy array.  
⚠️ **Avoid `.values`** since it may behave unexpectedly with extension types.


In [24]:
pd.Series([1, 2, 3]).values

array([1, 2, 3])

In [25]:
pd.Series(list('aabc')).values

array(['a', 'a', 'b', 'c'], dtype=object)

In [26]:
pd.Series(list('aabc')).astype('category').values

['a', 'a', 'b', 'c']
Categories (3, object): ['a', 'b', 'c']

In [27]:
# Timezone aware datetime data is converted to UTC:
pd.Series(pd.date_range('20130101', periods=3,
                        tz='US/Eastern')).values

array(['2013-01-01T05:00:00.000000000', '2013-01-02T05:00:00.000000000',
       '2013-01-03T05:00:00.000000000'], dtype='datetime64[ns]')

# **pandas.Series.dtype – All Syntaxes & Details**

The **`.dtype`** property in **pandas.Series** returns the **data type (dtype)** of the underlying data.

---

## **1. Basic Usage: Checking `.dtype`**

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.dtype)
```

**Output:**

```
int64
```

- Returns the **dtype** (`int64`) of the Series.

---

## **2. `.dtype` with Different Data Types**

### **2.1 Integer Series**

```python
s = pd.Series([1, 2, 3])
print(s.dtype)
```

**Output:**

```
int64
```

---

### **2.2 Floating-Point Series**

```python
s = pd.Series([1.1, 2.2, 3.3])
print(s.dtype)
```

**Output:**

```
float64
```

---

### **2.3 String Series**

```python
s = pd.Series(["apple", "banana", "cherry"])
print(s.dtype)
```

**Output:**

```
object
```

- Default string dtype is **"object"**, unless explicitly set to **"string"**.

---

### **2.4 Boolean Series**

```python
s = pd.Series([True, False, True])
print(s.dtype)
```

**Output:**

```
bool
```

---

### **2.5 Mixed Data Types**

```python
s = pd.Series([1, "apple", 3.5])
print(s.dtype)
```

**Output:**

```
object
```

- When elements have different types, pandas assigns **"object" dtype**.

---

### **2.6 Categorical Series**

```python
s = pd.Series(["a", "b", "a"], dtype="category")
print(s.dtype)
```

**Output:**

```
category
```

---

### **2.7 DateTime Series**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=3))
print(s.dtype)
```

**Output:**

```
datetime64[ns]
```

---

### **2.8 Timezone-Aware DateTime**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=3, tz="UTC"))
print(s.dtype)
```

**Output:**

```
datetime64[ns, UTC]
```

---

### **2.9 Timedelta Series**

```python
s = pd.Series(pd.to_timedelta(["1 days", "2 days", "3 days"]))
print(s.dtype)
```

**Output:**

```
timedelta64[ns]
```

---

### **2.10 Integer Nullable (IntegerNA)**

```python
s = pd.Series([1, 2, None], dtype="Int64")
print(s.dtype)
```

**Output:**

```
Int64
```

- Allows `NaN` values in integer Series.

---

### **2.11 Boolean Nullable**

```python
s = pd.Series([True, False, None], dtype="boolean")
print(s.dtype)
```

**Output:**

```
boolean
```

- **Supports `None` as missing values**.

---

### **2.12 String (Pandas StringArray)**

```python
s = pd.Series(["apple", "banana", None], dtype="string")
print(s.dtype)
```

**Output:**

```
string
```

- **"string" dtype is different from "object" dtype**.

---

### **2.13 Period Series**

```python
s = pd.Series(pd.period_range("2023-01", periods=3, freq="M"))
print(s.dtype)
```

**Output:**

```
period[M]
```

---

### **2.14 Interval Series**

```python
s = pd.Series(pd.interval_range(start=0, periods=3))
print(s.dtype)
```

**Output:**

```
interval[int64]
```

---

## **3. Checking Data Type (`isinstance`)**

```python
if s.dtype == "int64":
    print("Series contains integers")
```

---

## **4. Convert `.dtype` to String**

```python
print(str(s.dtype))
```

- Useful when comparing dtype as a string.

---

## **5. Changing Data Type (`astype`)**

```python
s = pd.Series([1, 2, 3])
s = s.astype("float64")
print(s.dtype)
```

**Output:**

```
float64
```

- Converts **int64** to **float64**.

---

## **6. Getting Data Type Name**

```python
print(s.dtype.name)
```

- Returns dtype **as a string**.

---

## **7. Summary Table of `.dtype` Outputs**

| Data Type               | `.dtype` Output       |
| ----------------------- | --------------------- |
| Integer                 | `int64`               |
| Float                   | `float64`             |
| String (default)        | `object`              |
| String (Pandas)         | `string`              |
| Boolean                 | `bool`                |
| Boolean (nullable)      | `boolean`             |
| Categorical             | `category`            |
| DateTime                | `datetime64[ns]`      |
| Timezone-Aware DateTime | `datetime64[ns, UTC]` |
| Timedelta               | `timedelta64[ns]`     |
| Period                  | `period[M]`           |
| Interval                | `interval[int64]`     |
| Integer (nullable)      | `Int64`               |

---

### **Best Practices**

✅ Use `.dtype` to check the type of data in a Series.  
✅ Use `.astype()` to convert types when needed.  
✅ Use `s.dtype.name` if you need the dtype as a string.


In [28]:
# return a tuple of the shape of the underlying data. 
s = pd.Series([1, 2, 3])
s.shape
s

0    1
1    2
2    3
dtype: int64

# **pandas.Series.nbytes – All Syntaxes & Details**

The **`.nbytes`** property in **pandas.Series** returns the **total number of bytes consumed by the data in memory**. This includes the size of the data stored in the **underlying NumPy array or ExtensionArray**.

---

## **1. Basic Usage: Checking `.nbytes`**

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.nbytes)
```

**Output:**

```
24
```

- Each **int64** element takes **8 bytes**.
- `3 elements × 8 bytes = 24 bytes`.

---

## **2. Using `.nbytes` with Different Data Types**

### **2.1 Integer Series**

```python
s = pd.Series([1, 2, 3], dtype="int64")
print(s.nbytes)
```

**Output:**

```
24
```

- **Each int64 element takes 8 bytes** (`3 × 8 = 24 bytes`).

---

### **2.2 Float Series**

```python
s = pd.Series([1.1, 2.2, 3.3], dtype="float64")
print(s.nbytes)
```

**Output:**

```
24
```

- **Each float64 element takes 8 bytes** (`3 × 8 = 24 bytes`).

---

### **2.3 Boolean Series**

```python
s = pd.Series([True, False, True])
print(s.nbytes)
```

**Output:**

```
3
```

- **Each boolean element takes 1 byte** (`3 × 1 = 3 bytes`).

---

### **2.4 String (Object) Series**

```python
s = pd.Series(["Ant", "Bear", "Cow"])
print(s.nbytes)
```

**Output:**

```
24
```

- **String data is stored as Python objects**, so `.nbytes` reflects only the pointers (not actual string sizes).

---

### **2.5 String (Pandas StringArray)**

```python
s = pd.Series(["Ant", "Bear", "Cow"], dtype="string")
print(s.nbytes)
```

**Output:**

```
Variable (depends on the string lengths)
```

- The **Pandas "string" dtype** is more memory-efficient than "object".

---

### **2.6 Categorical Series**

```python
s = pd.Series(["a", "b", "a"], dtype="category")
print(s.nbytes)
```

**Output:**

```
3
```

- Categories are stored as **integer codes**, making it memory-efficient.

---

### **2.7 DateTime Series**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=3))
print(s.nbytes)
```

**Output:**

```
24
```

- Each **datetime64[ns]** element takes **8 bytes**.

---

### **2.8 Timezone-Aware DateTime**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=3, tz="UTC"))
print(s.nbytes)
```

**Output:**

```
24
```

- **Same as datetime64[ns]** since timezone-aware timestamps are stored as UTC.

---

### **2.9 Timedelta Series**

```python
s = pd.Series(pd.to_timedelta(["1 days", "2 days", "3 days"]))
print(s.nbytes)
```

**Output:**

```
24
```

- Each **timedelta64[ns]** element takes **8 bytes**.

---

### **2.10 Integer Nullable (IntegerNA)**

```python
s = pd.Series([1, 2, None], dtype="Int64")
print(s.nbytes)
```

**Output:**

```
Variable (depends on storage format)
```

- The **"Int64" nullable type** stores additional metadata.

---

### **2.11 Boolean Nullable**

```python
s = pd.Series([True, False, None], dtype="boolean")
print(s.nbytes)
```

**Output:**

```
Variable (depends on storage format)
```

- Similar to `"Int64"`, the `"boolean"` dtype requires additional metadata.

---

## **3. Using `.nbytes` with Index**

```python
idx = pd.Index([1, 2, 3])
print(idx.nbytes)
```

**Output:**

```
24
```

- Works the same way as **Series.nbytes**.

---

## **4. Using `.nbytes` with MultiIndex**

```python
arrays = [
    ["A", "A", "B", "B"],
    [1, 2, 1, 2]
]
idx = pd.MultiIndex.from_arrays(arrays)
print(idx.nbytes)
```

**Output:**

```
Variable (depends on number of levels)
```

- **MultiIndex** takes up more memory due to hierarchical indexing.

---

## **5. `.nbytes` vs `.memory_usage()`**

- `.nbytes` **only includes the data storage**.
- `.memory_usage()` **includes index memory too**.

```python
s = pd.Series([1, 2, 3])
print(s.nbytes)           # 24 bytes (data only)
print(s.memory_usage())   # 112 bytes (data + index)
```

---

## **6. Summary Table: `.nbytes` for Different Dtypes**

| Data Type                | `.nbytes` (per element)           |
| ------------------------ | --------------------------------- |
| `int64`                  | 8 bytes                           |
| `float64`                | 8 bytes                           |
| `bool`                   | 1 byte                            |
| `object` (string)        | 8 bytes (pointer only)            |
| `"string"` (StringArray) | Varies (depends on string length) |
| `category`               | 1-4 bytes (integer codes)         |
| `datetime64[ns]`         | 8 bytes                           |
| `timedelta64[ns]`        | 8 bytes                           |
| `Int64` (nullable int)   | Varies                            |
| `boolean` (nullable)     | Varies                            |

---

## **7. Best Practices**

✅ **Use `.nbytes`** to check how much memory **data alone** takes.  
✅ **Use `.memory_usage()`** if you want the **total memory usage (including index)**.  
✅ **Convert `object` dtype to `string` or `category`** to save memory.


In [29]:
s = pd.Series(['Ant', 'Bear', 'Cow'])
s

0     Ant
1    Bear
2     Cow
dtype: object

In [30]:
idx = pd.Index([1, 2, 3])
idx

Index([1, 2, 3], dtype='int64')

# **pandas.Series.nbytes – All Syntaxes & Details**

The **`.nbytes`** property in **pandas.Series** returns the **total number of bytes consumed by the data in memory**. This includes the size of the data stored in the **underlying NumPy array or ExtensionArray**.

---

## **1. Basic Usage: Checking `.nbytes`**

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.nbytes)
```

**Output:**

```
24
```

- Each **int64** element takes **8 bytes**.
- `3 elements × 8 bytes = 24 bytes`.

---

## **2. Using `.nbytes` with Different Data Types**

### **2.1 Integer Series**

```python
s = pd.Series([1, 2, 3], dtype="int64")
print(s.nbytes)
```

**Output:**

```
24
```

- **Each int64 element takes 8 bytes** (`3 × 8 = 24 bytes`).

---

### **2.2 Float Series**

```python
s = pd.Series([1.1, 2.2, 3.3], dtype="float64")
print(s.nbytes)
```

**Output:**

```
24
```

- **Each float64 element takes 8 bytes** (`3 × 8 = 24 bytes`).

---

### **2.3 Boolean Series**

```python
s = pd.Series([True, False, True])
print(s.nbytes)
```

**Output:**

```
3
```

- **Each boolean element takes 1 byte** (`3 × 1 = 3 bytes`).

---

### **2.4 String (Object) Series**

```python
s = pd.Series(["Ant", "Bear", "Cow"])
print(s.nbytes)
```

**Output:**

```
24
```

- **String data is stored as Python objects**, so `.nbytes` reflects only the pointers (not actual string sizes).

---

### **2.5 String (Pandas StringArray)**

```python
s = pd.Series(["Ant", "Bear", "Cow"], dtype="string")
print(s.nbytes)
```

**Output:**

```
Variable (depends on the string lengths)
```

- The **Pandas "string" dtype** is more memory-efficient than "object".

---

### **2.6 Categorical Series**

```python
s = pd.Series(["a", "b", "a"], dtype="category")
print(s.nbytes)
```

**Output:**

```
3
```

- Categories are stored as **integer codes**, making it memory-efficient.

---

### **2.7 DateTime Series**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=3))
print(s.nbytes)
```

**Output:**

```
24
```

- Each **datetime64[ns]** element takes **8 bytes**.

---

### **2.8 Timezone-Aware DateTime**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=3, tz="UTC"))
print(s.nbytes)
```

**Output:**

```
24
```

- **Same as datetime64[ns]** since timezone-aware timestamps are stored as UTC.

---

### **2.9 Timedelta Series**

```python
s = pd.Series(pd.to_timedelta(["1 days", "2 days", "3 days"]))
print(s.nbytes)
```

**Output:**

```
24
```

- Each **timedelta64[ns]** element takes **8 bytes**.

---

### **2.10 Integer Nullable (IntegerNA)**

```python
s = pd.Series([1, 2, None], dtype="Int64")
print(s.nbytes)
```

**Output:**

```
Variable (depends on storage format)
```

- The **"Int64" nullable type** stores additional metadata.

---

### **2.11 Boolean Nullable**

```python
s = pd.Series([True, False, None], dtype="boolean")
print(s.nbytes)
```

**Output:**

```
Variable (depends on storage format)
```

- Similar to `"Int64"`, the `"boolean"` dtype requires additional metadata.

---

## **3. Using `.nbytes` with Index**

```python
idx = pd.Index([1, 2, 3])
print(idx.nbytes)
```

**Output:**

```
24
```

- Works the same way as **Series.nbytes**.

---

## **4. Using `.nbytes` with MultiIndex**

```python
arrays = [
    ["A", "A", "B", "B"],
    [1, 2, 1, 2]
]
idx = pd.MultiIndex.from_arrays(arrays)
print(idx.nbytes)
```

**Output:**

```
Variable (depends on number of levels)
```

- **MultiIndex** takes up more memory due to hierarchical indexing.

---

## **5. `.nbytes` vs `.memory_usage()`**

- `.nbytes` **only includes the data storage**.
- `.memory_usage()` **includes index memory too**.

```python
s = pd.Series([1, 2, 3])
print(s.nbytes)           # 24 bytes (data only)
print(s.memory_usage())   # 112 bytes (data + index)
```

---

## **6. Summary Table: `.nbytes` for Different Dtypes**

| Data Type                | `.nbytes` (per element)           |
| ------------------------ | --------------------------------- |
| `int64`                  | 8 bytes                           |
| `float64`                | 8 bytes                           |
| `bool`                   | 1 byte                            |
| `object` (string)        | 8 bytes (pointer only)            |
| `"string"` (StringArray) | Varies (depends on string length) |
| `category`               | 1-4 bytes (integer codes)         |
| `datetime64[ns]`         | 8 bytes                           |
| `timedelta64[ns]`        | 8 bytes                           |
| `Int64` (nullable int)   | Varies                            |
| `boolean` (nullable)     | Varies                            |

---

## **7. Best Practices**

✅ **Use `.nbytes`** to check how much memory **data alone** takes.  
✅ **Use `.memory_usage()`** if you want the **total memory usage (including index)**.  
✅ **Convert `object` dtype to `string` or `category`** to save memory.


In [31]:
s = pd.Series(['Ant', 'Bear', 'Cow'])
s

0     Ant
1    Bear
2     Cow
dtype: object

In [32]:
idx = pd.Index([1, 2, 3])
idx

Index([1, 2, 3], dtype='int64')

# **pandas.Series.ndim – All Syntaxes & Details**

The **`.ndim`** property in **pandas.Series** and **pandas.Index** returns the **number of dimensions** of the underlying data. It is used to understand the dimensionality of the object. For a **Series** or an **Index**, which are one-dimensional structures, this will always return `1`.

---

## **1. Syntax**

```python
Series.ndim
Index.ndim
```

---

## **2. Explanation**

- **Series**: A **Series** is always **one-dimensional**, so `.ndim` will return `1`.
- **Index**: Similarly, an **Index** is also **one-dimensional**, so `.ndim` will also return `1`.

---

## **3. Examples**

### **3.1 Series**

```python
import pandas as pd

s = pd.Series(['Ant', 'Bear', 'Cow'])
print(s)
print(s.ndim)  # The number of dimensions of a Series
```

**Output:**

```
0     Ant
1    Bear
2     Cow
dtype: object

1
```

- **`.ndim`** returns `1` because a Series is a one-dimensional object.

### **3.2 Index**

```python
idx = pd.Index([1, 2, 3])
print(idx)
print(idx.ndim)  # The number of dimensions of an Index
```

**Output:**

```
Index([1, 2, 3], dtype='int64')

1
```

- **`.ndim`** returns `1` because an Index is also a one-dimensional object.

---

## **4. `.ndim` in Two-Dimensional Objects (for Contrast)**

For objects that have more than one dimension, such as **pandas.DataFrame**, `.ndim` returns the total number of dimensions.

### **4.1 DataFrame Example**

```python
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
print(df)
print(df.ndim)  # The number of dimensions of a DataFrame
```

**Output:**

```
   A  B  C
0  1  2  3
1  4  5  6

2
```

- **DataFrame** has two dimensions (rows and columns), so `.ndim` returns `2`.

---

## **5. Summary Table**

| Object Type   | `.ndim` Output |
| ------------- | -------------- |
| **Series**    | `1`            |
| **Index**     | `1`            |
| **DataFrame** | `2`            |

---

## **6. Best Practices**

- Use **`.ndim`** to quickly determine if the object is **one-dimensional** (like Series or Index) or **two-dimensional** (like DataFrame).
- For multi-dimensional structures, `.ndim` helps identify the number of axes (dimensions).

---


In [33]:
s = pd.Series(['Ant', 'Bear', 'Cow'])
s

0     Ant
1    Bear
2     Cow
dtype: object

In [34]:
idx = pd.Index([1, 2, 3])
idx

Index([1, 2, 3], dtype='int64')

# **pandas.Series.size – All Syntaxes & Details**

The **`.size`** property in **pandas.Series** returns the **total number of elements** (i.e., the length of the Series).

It **always returns the number of elements, including NaN values**.

---

## **1. Basic Usage: Checking `.size`**

```python
import pandas as pd

s = pd.Series(["Ant", "Bear", "Cow"])
print(s.size)
```

**Output:**

```
3
```

- The Series has **3 elements**, so `.size` returns `3`.

---

## **2. `.size` vs `.shape` vs `.count()`**

| Attribute  | Description                     | Includes NaN? |
| ---------- | ------------------------------- | ------------- |
| `.size`    | Total number of elements        | ✅ Yes        |
| `.shape`   | Tuple of dimensions (`(rows,)`) | ✅ Yes        |
| `.count()` | Number of **non-null** elements | ❌ No         |

```python
s = pd.Series(["Ant", "Bear", "Cow", None])

print(s.size)    # 4 (total elements)
print(s.shape)   # (4,) (tuple representation)
print(s.count()) # 3 (non-null elements)
```

**Output:**

```
4
(4,)
3
```

- **`.size` counts all elements**, even if they are `None` or `NaN`.
- **`.count()` excludes NaN values**.

---

## **3. `.size` with Different Data Types**

### **3.1 Integer Series**

```python
s = pd.Series([1, 2, 3, 4, 5])
print(s.size)
```

**Output:**

```
5
```

- 5 elements, so `.size = 5`.

---

### **3.2 Float Series**

```python
s = pd.Series([1.1, 2.2, 3.3])
print(s.size)
```

**Output:**

```
3
```

---

### **3.3 Boolean Series**

```python
s = pd.Series([True, False, True, False])
print(s.size)
```

**Output:**

```
4
```

---

### **3.4 String Series**

```python
s = pd.Series(["apple", "banana", "cherry"])
print(s.size)
```

**Output:**

```
3
```

---

### **3.5 Series with NaN Values**

```python
import numpy as np

s = pd.Series([1, 2, np.nan, 4, None])
print(s.size)
```

**Output:**

```
5
```

- NaN and None **are included in `.size`**.

---

### **3.6 Categorical Series**

```python
s = pd.Series(["A", "B", "A"], dtype="category")
print(s.size)
```

**Output:**

```
3
```

- **Categorical dtype does not affect `.size`**.

---

### **3.7 DateTime Series**

```python
s = pd.Series(pd.date_range("2024-01-01", periods=10))
print(s.size)
```

**Output:**

```
10
```

- **Datetime series counts each timestamp as one element**.

---

## **4. `.size` with Index**

```python
idx = pd.Index([1, 2, 3])
print(idx.size)
```

**Output:**

```
3
```

- Works the same way as **Series.size**.

---

## **5. `.size` with MultiIndex**

```python
arrays = [
    ["A", "A", "B", "B"],
    [1, 2, 1, 2]
]
idx = pd.MultiIndex.from_arrays(arrays)
print(idx.size)
```

**Output:**

```
4
```

- **Counts all levels in MultiIndex**.

---

## **6. `.size` for an Empty Series**

```python
s = pd.Series([])
print(s.size)
```

**Output:**

```
0
```

- **Empty Series has `.size = 0`**.

---

## **7. Summary Table**

| Series Data                      | `.size` Output     |
| -------------------------------- | ------------------ |
| `[1, 2, 3]`                      | `3`                |
| `[1.1, 2.2, 3.3]`                | `3`                |
| `[True, False, True]`            | `3`                |
| `["a", "b", "c"]`                | `3`                |
| `[1, 2, np.nan]`                 | `3` (includes NaN) |
| `pd.Series([], dtype="float64")` | `0` (empty Series) |

---

## **8. Best Practices**

✅ Use `.size` to get the **total number of elements**.  
✅ Use `.count()` if you only want **non-null values**.  
✅ Use `.shape[0]` if you prefer a **tuple-like format**.


In [35]:
s = pd.Series(['Ant', 'Bear', 'Cow'])
s


0     Ant
1    Bear
2     Cow
dtype: object

In [36]:
s.size

3

In [37]:
idx = pd.Index([1, 2, 3])
idx

Index([1, 2, 3], dtype='int64')

In [38]:
idx.size

3

# **pandas.Series.T – All Syntaxes & Details**

The **`.T`** property is used to **transpose** a **pandas.Series** or **pandas.Index**. However, since both **Series** and **Index** are **one-dimensional** objects, transposing them doesn't change their structure or content. It effectively results in the same object.

---

## **1. Syntax**

```python
Series.T
Index.T
```

---

## **2. Explanation**

- **Series**: A **Series** is inherently **one-dimensional**, so transposing it does **not** modify the object. It will remain the same after the transpose.
- **Index**: Similarly, for an **Index**, which is also one-dimensional, transposing it will result in the same Index without any changes.

---

## **3. Examples**

### **3.1 Transposing a Series**

```python
import pandas as pd

s = pd.Series(['Ant', 'Bear', 'Cow'])
print(s)
print(s.T)  # Transpose does not alter the Series
```

**Output:**

```
0     Ant
1    Bear
2     Cow
dtype: object

0     Ant
1    Bear
2     Cow
dtype: object
```

- **`s.T`** will return the **same Series** as **`s`** because a Series is a one-dimensional object, and transposing it doesn't change its structure.

### **3.2 Transposing an Index**

```python
idx = pd.Index([1, 2, 3])
print(idx)
print(idx.T)  # Transpose does not alter the Index
```

**Output:**

```
Index([1, 2, 3], dtype='int64')

Index([1, 2, 3], dtype='int64')
```

- **`idx.T`** will return the **same Index** as **`idx`**, as the Index is also one-dimensional, so transposing doesn't change it.

---

## **4. Transposing DataFrame (for Contrast)**

For a **DataFrame**, which is a two-dimensional object (rows and columns), the `.T` property swaps the rows and columns.

### **4.1 Example with DataFrame Transpose**

```python
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=['A', 'B', 'C'])
print(df)
print(df.T)  # Transpose swaps rows and columns
```

**Output:**

```
   A  B  C
0  1  2  3
1  4  5  6

   0  1
A  1  4
B  2  5
C  3  6
```

- **In a DataFrame**, transposing swaps the rows and columns.

---

## **5. Summary**

| Object Type   | `.T` Effect                         |
| ------------- | ----------------------------------- |
| **Series**    | No effect (returns the same Series) |
| **Index**     | No effect (returns the same Index)  |
| **DataFrame** | Swaps rows and columns              |

---

## **6. Best Practices**

- **For Series and Index**, the `.T` property is essentially a **no-op** and does not modify the object.
- **For DataFrame**, use `.T` when you want to swap the rows and columns.

---


In [39]:
s = pd.Series(['Ant', 'Bear', 'Cow'])
s

0     Ant
1    Bear
2     Cow
dtype: object

In [40]:
s.T

0     Ant
1    Bear
2     Cow
dtype: object

In [41]:
idx = pd.Index([1, 2, 3])
idx.T

Index([1, 2, 3], dtype='int64')

# **pandas.Series.memory_usage – All Syntaxes & Details**

The **`.memory_usage()`** function in **pandas.Series** returns the amount of **memory** consumed by the **Series**. This includes options to account for the **index** and, for objects, the **deep introspection** of object dtypes for more precise memory calculations.

---

## **1. Syntax**

```python
Series.memory_usage(index=True, deep=False)
```

### **Parameters**

- **`index`** (`bool`, default `True`):  
  Specifies whether to include the memory usage of the **Series index**.

  - If `True`, the index memory usage is included.
  - If `False`, the memory usage of the index is excluded, focusing only on the data itself.

- **`deep`** (`bool`, default `False`):  
  If set to `True`, **deep introspection** is done on the data. Specifically, for **object dtypes**, the system-level memory consumption of each object element is measured and included in the result.
  - If `True`, the memory usage of each element of **object** dtype will be deeply calculated, including the object-level memory consumption (such as strings).
  - If `False`, the memory usage of object elements is ignored (less accurate but faster).

---

## **2. Returns**

- **`int`**: The number of **bytes** consumed by the Series, which can be influenced by the inclusion of index memory and the deep introspection of object elements.

---

## **3. Examples**

### **3.1 Basic Memory Usage**

```python
import pandas as pd

s = pd.Series(range(3))
print(s.memory_usage())
```

**Output:**

```
152
```

- The total memory usage includes the memory of both the **data** and the **index**.

### **3.2 Memory Usage Without Index**

```python
s = pd.Series(range(3))
print(s.memory_usage(index=False))
```

**Output:**

```
24
```

- When **`index=False`**, only the memory consumed by the **data** (not the index) is returned.

### **3.3 Memory Usage with Object Data Type (default)**

```python
s = pd.Series(["a", "b"])
print(s.memory_usage())
```

**Output:**

```
144
```

- For **object dtypes**, the default memory usage does not include the system-level memory of each object element, like strings. It only considers the overall memory of the object dtype.

### **3.4 Memory Usage with Deep Introspection**

```python
s = pd.Series(["a", "b"])
print(s.memory_usage(deep=True))
```

**Output:**

```
244
```

- **`deep=True`** introspects the memory usage of each object element, and **calculates** the system-level memory used by the individual string objects.

---

## **4. Comparison with `numpy.ndarray.nbytes`**

If you are working with **numpy arrays**, the `.nbytes` property gives the total number of bytes consumed by the elements of the array.

### **Example with `numpy.ndarray.nbytes`**

```python
import numpy as np
arr = np.array([1, 2, 3])
print(arr.nbytes)
```

**Output:**

```
24
```

- This is the total **bytes** consumed by the elements of the **numpy.ndarray**.

---

## **5. Best Practices**

- Use **`.memory_usage()`** to get a sense of the memory consumption of the **Series**, especially when dealing with large datasets.
- When working with object dtypes (e.g., strings), use **`deep=True`** to get a more accurate representation of memory consumption.
- **Excluding the index** (`index=False`) can be useful when you're only interested in the memory used by the data itself.

---

## **6. Summary Table**

| Parameter  | Default Value | Description                                                                         |
| ---------- | ------------- | ----------------------------------------------------------------------------------- |
| **index**  | `True`        | Whether to include memory usage of the index.                                       |
| **deep**   | `False`       | Whether to introspect object dtypes deeply to include memory of individual objects. |
| **Return** | `int`         | Memory usage in bytes.                                                              |

---


---

pandas.Series.memory_usage

Series.memory_usage(index=True, deep=False)

Return the memory usage of the Series.

The memory usage can optionally include the contribution of the index and of elements of object dtype.

Parameters:

index:
bool, default True
Specifies whether to include the memory usage of the Series index.

deep:
bool, default False
If True, introspect the data deeply by interrogating object dtypes for system-level memory consumption, and include it in the returned value.

Returns:

int

Bytes of memory consumed.


In [42]:
s = pd.Series(range(3))
s.memory_usage()

156

In [43]:
s.memory_usage(index=False)

24

In [44]:
s = pd.Series(["a", "b"])
s.values
s.memory_usage()
s.memory_usage(deep=True)

232

# **pandas.Series.hasnans – All Syntaxes & Details**

The **`.hasnans`** property in **pandas.Series** checks whether the **Series** contains any **NaN** (Not a Number) values. This property is useful for quickly determining the presence of missing or undefined values in the Series.

---

## **1. Syntax**

```python
Series.hasnans
```

---

## **2. Explanation**

- **`.hasnans`** returns a **boolean value (`True` or `False`)**:
  - **`True`**: Indicates that there is at least one **NaN** value in the **Series**.
  - **`False`**: Indicates that there are no **NaN** values in the **Series**.
- This property enables certain performance optimizations in pandas when working with missing data.

---

## **3. Examples**

### **3.1 Series with NaN Value**

```python
import pandas as pd

s = pd.Series([1, 2, 3, None])
print(s)
print(s.hasnans)
```

**Output:**

```
0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64

True
```

- **`s.hasnans`** returns **`True`** because the Series contains a **NaN** value.

### **3.2 Series Without NaN Values**

```python
s = pd.Series([1, 2, 3])
print(s)
print(s.hasnans)
```

**Output:**

```
0    1
1    2
2    3
dtype: int64

False
```

- **`s.hasnans`** returns **`False`** because there are no **NaN** values in the Series.

---

## **4. Best Practices**

- Use **`.hasnans`** when you want to **quickly check** for the presence of missing data in a Series.
- This can be helpful before performing operations like **filling** (`fillna`) or **dropping** missing values (`dropna`), as it gives you an efficient check to decide if those operations are necessary.

---

## **5. Summary**

| Property    | Return Type | Description                                                |
| ----------- | ----------- | ---------------------------------------------------------- |
| **hasnans** | `bool`      | Returns `True` if there are NaN values, `False` otherwise. |

---


pandas.Series.hasnans

property Series.hasnans:
Return True if there are any NaNs.

Enables various performance speedups.

Returns:
bool


In [45]:
s = pd.Series([1, 2, 3, None])
s

0    1.0
1    2.0
2    3.0
3    NaN
dtype: float64

In [46]:
s.hasnans

True

# **pandas.Series.empty – All Syntaxes & Details**

The **`.empty`** property in **pandas.Series** checks whether the **Series** is **empty**, meaning it contains no elements. It is useful for determining whether a Series has any data or not, regardless of its index or any missing (NaN) values.

---

## **1. Syntax**

```python
Series.empty
```

---

## **2. Explanation**

- **`.empty`** returns a **boolean value (`True` or `False`)**:
  - **`True`**: If the **Series** is entirely empty, meaning it has no elements, it returns **`True`**.
  - **`False`**: If the **Series** has any elements, it returns **`False`**.
- An important note: A **Series** containing only **NaN** values is **not considered empty** by `.empty`. It still contains elements, even though they are missing values.

---

## **3. Examples**

### **3.1 Series with No Elements (Empty Series)**

```python
import pandas as pd

ser_empty = pd.Series()
print(ser_empty.empty)
```

**Output:**

```
True
```

- The Series is completely empty, so **`ser_empty.empty`** returns **`True`**.

### **3.2 Series with NaN Values (Not Empty)**

```python
import pandas as pd
import numpy as np

ser = pd.Series([np.nan])
print(ser.empty)
```

**Output:**

```
False
```

- The Series has an element, even though it's NaN, so **`ser.empty`** returns **`False`**.

### **3.3 Series with Actual Data (Not Empty)**

```python
ser = pd.Series([1, 2, 3])
print(ser.empty)
```

**Output:**

```
False
```

- The Series contains actual data, so **`ser.empty`** returns **`False`**.

### **3.4 DataFrame Example (Empty DataFrame)**

```python
df_empty = pd.DataFrame({'A': []})
print(df_empty.empty)
```

**Output:**

```
True
```

- An **empty DataFrame** with no rows or columns is considered **empty**, so **`df_empty.empty`** returns **`True`**.

### **3.5 DataFrame with NaN Values (Not Empty)**

```python
df = pd.DataFrame({'A': [np.nan]})
print(df.empty)
```

**Output:**

```
False
```

- A **DataFrame** with NaN values is **not considered empty**, so **`df.empty`** returns **`False`**.

### **3.6 Dropping NaN to Make DataFrame Empty**

```python
df.dropna().empty
```

**Output:**

```
True
```

- If we drop the **NaN** values from the DataFrame using **`dropna()`**, the DataFrame is considered empty, and **`df.dropna().empty`** returns **`True`**.

---

## **4. Notes**

- **Series with NaN values**: Even if a Series contains only **NaN** values, it is **not empty**. It still has data, but that data is missing (NaN).
- Use `.dropna()` if you want to remove **NaN** values and check whether a Series or DataFrame becomes empty after removing them.

---

## **5. Summary**

| Property  | Return Type | Description                                                                       |
| --------- | ----------- | --------------------------------------------------------------------------------- |
| **empty** | `bool`      | Returns `True` if the Series/DataFrame is empty (no elements), otherwise `False`. |

---


In [47]:
df_empty = pd.DataFrame({'A': []})
df_empty

Unnamed: 0,A


In [48]:
df_empty.empty

True

In [49]:
df = pd.DataFrame({'A' : [np.nan]})
df

Unnamed: 0,A
0,


In [50]:
df.empty

False

In [51]:
df.dropna().empty


True

In [52]:
ser_empty = pd.Series({'A' : []})
ser_empty

A    []
dtype: object

In [53]:
ser_empty.empty

False

In [54]:
ser_empty = pd.Series()
ser_empty.empty

True

# **pandas.Series.dtypes – All Syntaxes & Details**

The **`.dtypes`** property of a **pandas.Series** returns the **data type** of the elements in the Series. This can be useful for understanding the type of data you're working with, especially when the Series contains multiple types (e.g., numbers, strings, dates).

---

## **1. Syntax**

```python
Series.dtypes
```

---

## **2. Explanation**

- **`.dtypes`** returns the **data type** (`dtype`) of the elements in the **Series**.
- The **`dtype`** will indicate whether the Series contains:
  - **Integers**: `dtype('int64')`
  - **Floating point numbers**: `dtype('float64')`
  - **Strings**: `dtype('object')`
  - **Categoricals**, **Datetime**, or other specialized types.

---

## **3. Examples**

### **3.1 Series with Integer Values**

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.dtypes)
```

**Output:**

```
int64
```

- The **Series** contains integers, so the `dtype` is **`int64`**.

### **3.2 Series with Float Values**

```python
s = pd.Series([1.1, 2.2, 3.3])
print(s.dtypes)
```

**Output:**

```
float64
```

- The **Series** contains floats, so the `dtype` is **`float64`**.

### **3.3 Series with Object Values (Strings)**

```python
s = pd.Series(["apple", "banana", "cherry"])
print(s.dtypes)
```

**Output:**

```
object
```

- The **Series** contains strings (objects), so the `dtype` is **`object`**.

### **3.4 Series with Datetime Values**

```python
s = pd.Series(pd.to_datetime(["2022-01-01", "2022-02-01", "2022-03-01"]))
print(s.dtypes)
```

**Output:**

```
datetime64[ns]
```

- The **Series** contains datetime values, so the `dtype` is **`datetime64[ns]`**.

### **3.5 Series with Categorical Values**

```python
s = pd.Series(pd.Categorical(["apple", "banana", "apple"]))
print(s.dtypes)
```

**Output:**

```
category
```

- The **Series** contains categorical data, so the `dtype` is **`category`**.

---

## **4. Notes**

- The **`.dtypes`** property is especially useful when working with data that could be of mixed types. Understanding the **dtype** helps you decide which operations can be safely performed on the data.
- If the Series contains objects, like strings or lists, the `dtype` will often be **`object`**.

---

## **5. Summary**

| Property   | Return Type | Description                                          |
| ---------- | ----------- | ---------------------------------------------------- |
| **dtypes** | `dtype`     | Returns the data type of the elements in the Series. |

---


In [55]:
s = pd.Series([1, 2, 3])
s.dtypes

dtype('int64')

# **pandas.Series.name – All Syntaxes & Details**

The **`.name`** property in **pandas.Series** is used to retrieve or set the name of a **Series**. This name becomes important when the Series is used as part of a **DataFrame** because it serves as the column name.

---

## **1. Syntax**

```python
Series.name
```

To set the name of the Series:

```python
Series.name = 'NewName'
```

---

## **2. Explanation**

- The **`name`** of a **Series** is a label that is associated with the Series, and this label is used when displaying the Series or if it is part of a **DataFrame** (becoming the column name).
- It is **hashable** (can be a string, number, or other hashable type).
- If the **Series** is part of a **DataFrame**, the **`name`** is also used as the **column name**.

---

## **3. Examples**

### **3.1 Setting the Series Name at Initialization**

```python
import pandas as pd
import numpy as np

s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers')
print(s)
print(s.name)
```

**Output:**

```
0    1
1    2
2    3
Name: Numbers, dtype: int64
Numbers
```

- The **`name`** of the Series is set to `'Numbers'` during initialization, and it is printed when accessing **`s.name`**.

### **3.2 Changing the Name of the Series**

```python
s.name = 'Integers'
print(s)
print(s.name)
```

**Output:**

```
0    1
1    2
2    3
Name: Integers, dtype: int64
Integers
```

- The **`name`** is changed from `'Numbers'` to `'Integers'`, and this is reflected both in the Series output and when accessing **`s.name`**.

### **3.3 Series Name as Column Name in DataFrame**

```python
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]], columns=["Odd Numbers", "Even Numbers"])
print(df)
print(df["Even Numbers"].name)
```

**Output:**

```
   Odd Numbers  Even Numbers
0            1             2
1            3             4
2            5             6
Even Numbers
```

- The **`name`** of the Series **`df["Even Numbers"]`** corresponds to the column name in the **DataFrame**, which is `'Even Numbers'`.

---

## **4. Notes**

- **Setting the name** of a Series is optional, but it helps in organizing and identifying the data, especially when the Series is part of a **DataFrame**.
- The **name** can be any hashable object, including strings or numbers.
- If the Series is not part of a **DataFrame**, its **name** is still useful for display and accessing its contents.

---

## **5. Summary**

| Property | Return Type | Description                                                          |
| -------- | ----------- | -------------------------------------------------------------------- |
| **name** | `label`     | Returns or sets the name of the Series (column name in a DataFrame). |

---


pandas.Series.name

property Series.name

Return the name of the Series.

The name of a Series becomes its index or column name if it is used to form a DataFrame. It is also used whenever displaying the Series using the interpreter.

Returns:
label (hashable object)
The name of the Series, also the column name if part of a DataFrame.


In [56]:
s = pd.Series([1, 2, 3], dtype=np.int64, name='Numbers')
s

0    1
1    2
2    3
Name: Numbers, dtype: int64

In [57]:
s.name = "Integers"
s

0    1
1    2
2    3
Name: Integers, dtype: int64

In [58]:
df = pd.DataFrame([[1, 2], [3, 4], [5, 6]],
                  columns=["Odd Numbers", "Even Numbers"])
df

Unnamed: 0,Odd Numbers,Even Numbers
0,1,2
1,3,4
2,5,6


In [59]:
df["Even Numbers"].name

'Even Numbers'

# **pandas.Series.flags – All Syntaxes & Details**

The **`.flags`** property in **pandas.Series** provides access to the set of properties (flags) associated with the **Series**. These flags reflect attributes of the Series, such as whether it allows duplicate labels or not.

---

## **1. Syntax**

```python
Series.flags
```

To access a specific flag:

```python
Series.flags.<flag_name>
```

To set a flag:

```python
Series.flags.<flag_name> = <new_value>
```

You can also access flags through slicing:

```python
Series.flags["<flag_name>"]
Series.flags["<flag_name>"] = <new_value>
```

---

## **2. Explanation**

- **`.flags`** provides access to **Flags** associated with the Series.
- Flags are special properties that describe the behavior of the Series (or DataFrame).
- The primary flag currently available for **Series** is:
  - **`allows_duplicate_labels`**: Indicates whether the Series allows duplicate labels in its index. The default is `True`.

---

## **3. Examples**

### **3.1 Accessing Flags**

```python
import pandas as pd

s = pd.Series([1, 2, 3], index=['a', 'b', 'a'])

print(s.flags)
print(s.flags.allows_duplicate_labels)
```

**Output:**

```
<Flags(allows_duplicate_labels=True)>
True
```

- The **`flags`** of the **Series** indicate that it **allows duplicate labels** in the index.

### **3.2 Setting Flags**

```python
s.flags.allows_duplicate_labels = False
print(s.flags.allows_duplicate_labels)
```

**Output:**

```
False
```

- The **`allows_duplicate_labels`** flag has been set to `False`, so duplicate labels are not allowed in the index anymore.

### **3.3 Accessing Flags with Slicing**

```python
print(s.flags["allows_duplicate_labels"])
s.flags["allows_duplicate_labels"] = True
print(s.flags["allows_duplicate_labels"])
```

**Output:**

```
False
True
```

- The **flag** was accessed and updated using slicing syntax.

---

## **4. Notes**

- Flags provide information about the behavior of the pandas object (e.g., Series, DataFrame). They should not be confused with **metadata**, which describes the data itself.
- **`.flags`** is a read-write property, so it is possible to change the flags of a Series.
- **`allows_duplicate_labels`** flag controls whether or not the Series can have duplicate labels in the index. The default is typically `True`.

---

## **5. Summary**

| Property                    | Return Type | Description                                                  |
| --------------------------- | ----------- | ------------------------------------------------------------ |
| **flags**                   | `Flags`     | Access the flags that describe properties of the Series.     |
| **allows_duplicate_labels** | `bool`      | Indicates whether duplicate labels are allowed in the index. |

---


In [60]:
# pandas.Series.flags
# property Series.flags[source]
# Get the properties associated with this pandas object.

# The available flags are

# Flags.allows_duplicate_labels
df = pd.DataFrame({"A": [1, 2]})
df.flags

<Flags(allows_duplicate_labels=True)>

In [61]:
df.flags.allows_duplicate_labels

True

In [62]:
df.flags.allows_duplicate_labels = False

In [63]:
df.flags["allows_duplicate_labels"]

False

In [64]:
df.flags["allows_duplicate_labels"] = True

In [65]:
# pandas.Series.set_flags
# Series.set_flags(*, copy=False, allows_duplicate_labels=None)[source]
# Return a new object with updated flags.
# Parameters:
# copy : bool, default False
# Specify if a copy of the object should be made.
# allows_duplicate_labelsbool, optional
# Whether the returned object allows duplicate labels.
# Returns:
# Series or DataFrame
# The same type as the caller
df = pd.DataFrame({"A": [1, 2]})
df.flags.allows_duplicate_labels

True

In [66]:
df2 = df.set_flags(allows_duplicate_labels=False)
df2.flags.allows_duplicate_labels

False

# **pandas.Series.set_flags – All Syntaxes & Details**

The **`set_flags()`** method allows you to create a new **Series** (or **DataFrame**) with updated flags, particularly useful when working with chained operations. This is the method to update certain flags like whether duplicate labels are allowed in the Series or DataFrame.

---

## **1. Syntax**

```python
Series.set_flags(*, copy=False, allows_duplicate_labels=None)
```

### **Parameters:**

- **`copy`**: `bool`, default `False`
  - If `True`, a copy of the object is made.
  - **Note**: In future versions of pandas (3.0), **copy-on-write** will be the default behavior, meaning this parameter will be deprecated.
- **`allows_duplicate_labels`**: `bool`, optional
  - **True** or **False** to specify whether the object should allow duplicate labels. If `None`, the flag remains unchanged.

---

## **2. Return Value**

- Returns the same type as the caller: **Series** or **DataFrame** with the updated flags.

---

## **3. Explanation**

- **`set_flags()`** is used to update the flags of a pandas object (Series or DataFrame).
- It’s especially useful when you want to change flags in method chains without directly modifying the original Series or DataFrame.
- By default, the flag changes are applied to a copy of the object unless specified otherwise.

---

## **4. Examples**

### **4.1 Updating Flags on a DataFrame**

```python
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({"A": [1, 2]})

# Checking the current flag for duplicate labels
print(df.flags.allows_duplicate_labels)  # Output: True

# Creating a new DataFrame with the updated flag
df2 = df.set_flags(allows_duplicate_labels=False)

# Checking the updated flag in the new DataFrame
print(df2.flags.allows_duplicate_labels)  # Output: False
```

**Explanation:**

- In the above example, the **`set_flags()`** method was used to create a new DataFrame `df2`, which does not allow duplicate labels, while the original DataFrame `df` remains unchanged.

### **4.2 Using `set_flags()` in a Chain of Methods**

```python
df = pd.DataFrame({"A": [1, 2]})

# Chaining methods and updating flags
df3 = df.set_flags(allows_duplicate_labels=False).rename(columns={'A': 'B'})

print(df3.flags.allows_duplicate_labels)  # Output: False
print(df3.columns)  # Output: Index(['B'], dtype='object')
```

**Explanation:**

- Here, `set_flags()` is part of a method chain, and the flag is updated in the same line while renaming the column.

---

## **5. Notes**

- **`set_flags()`** returns a new object (copy or view) with updated flags. The original Series or DataFrame is not modified unless a new reference is assigned.
- The **`copy`** parameter's behavior will change in **pandas 3.0**, with **copy-on-write** being enabled by default. This means methods like **`set_flags()`** will perform a lazy copy and not always copy the object unless necessary.

---

## **6. Summary of Parameters**

| Parameter                   | Type   | Description                                               |
| --------------------------- | ------ | --------------------------------------------------------- |
| **copy**                    | `bool` | Whether to create a copy of the object. Default: `False`. |
| **allows_duplicate_labels** | `bool` | Whether to allow duplicate labels in the index.           |

---

## **7. Summary of Output**

The method returns a **Series** or **DataFrame** with updated flags. The key flags you can modify include:

- **`allows_duplicate_labels`**: Controls whether the object allows duplicate labels in the index.

---


In [67]:
"""pandas.Series.astype
Series.astype(dtype, copy=None, errors='raise')[source]
Cast a pandas object to a specified dtype dtype.

Parameters:

dtype: str, data type, Series or Mapping of column name -> data type


Use a str, numpy.dtype, pandas.ExtensionDtype or Python type to cast entire pandas object to the same type. Alternatively, use a mapping, e.g. {col: dtype, …}, where col is a column label and dtype is a numpy.dtype or Python type to cast one or more of the DataFrame’s columns to column-specific types.

copy: bool, default True


Return a copy when copy=True (be very careful setting copy=False as changes to values then may propagate to other pandas objects). 

errors : {‘raise’, ‘ignore’}, default ‘raise’
Control raising of exceptions on invalid data for provided dtype.

raise : allow exceptions to be raised

ignore : suppress exceptions. On error return original object.

Returns:
        same type as caller
"""
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
df.dtypes

col1    int64
col2    int64
dtype: object

In [68]:
df.astype('int32').dtypes

col1    int32
col2    int32
dtype: object

In [69]:
df.astype({'col1': 'int32'}).dtypes

col1    int32
col2    int64
dtype: object

In [70]:

ser = pd.Series([1, 2], dtype='int32')
ser

0    1
1    2
dtype: int32

In [71]:
ser.astype('category')

0    1
1    2
dtype: category
Categories (2, int32): [1, 2]

In [72]:
# Convert to ordered categorical type with custom ordering:
from pandas.api.types import CategoricalDtype
cat_dtype = CategoricalDtype(
    categories=[2, 1], ordered=True)
ser.astype(cat_dtype)

0    1
1    2
dtype: category
Categories (2, int64): [2 < 1]

In [73]:
# Create a series of dates:
ser_date = pd.Series(pd.date_range('20200101', periods=3))
ser_date


0   2020-01-01
1   2020-01-02
2   2020-01-03
dtype: datetime64[ns]

In [74]:
"""

pandas.Series.convert_dtypes
Series.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy_nullable')[source]
Convert columns to the best possible dtypes using dtypes supporting pd.NA.

Parameters:
infer_objectsbool, default True
Whether object dtypes should be converted to the best possible types.

convert_stringbool, default True
Whether object dtypes should be converted to StringDtype().

convert_integerbool, default True
Whether, if possible, conversion can be done to integer extension types.

convert_booleanbool, defaults True
Whether object dtypes should be converted to BooleanDtypes().

convert_floatingbool, defaults True
Whether, if possible, conversion can be done to floating extension types. If convert_integer is also True, preference will be give to integer dtypes if the floats can be faithfully casted to integers.

dtype_backend{‘numpy_nullable’, ‘pyarrow’}, default ‘numpy_nullable’
Back-end data type applied to the resultant DataFrame (still experimental). Behaviour is as follows:

"numpy_nullable": returns nullable-dtype-backed DataFrame (default).

"pyarrow": returns pyarrow-backed nullable ArrowDtype DataFrame.

Added in version 2.0.

Returns:
Series or DataFrame
Copy of input object with new dtype.

 """
 
df = pd.DataFrame(
     {
        "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
         "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
         "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
         "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
         "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
         "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float")),
     }
 )

In [75]:
# Start with a DataFrame with default dtypes:
df

Unnamed: 0,a,b,c,d,e,f
0,1,x,True,h,10.0,
1,2,y,False,i,,100.5
2,3,z,,,20.0,200.0


In [76]:
df.dtypes

a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object

In [77]:
#Convert the DataFrame to the best possible dtypes:
dfn = df.convert_dtypes()
dfn

Unnamed: 0,a,b,c,d,e,f
0,1,x,True,h,10.0,
1,2,y,False,i,,100.5
2,3,z,,,20.0,200.0


In [78]:
dfn.dtypes

a             Int32
b    string[python]
c           boolean
d    string[python]
e             Int64
f           Float64
dtype: object

In [79]:
# Start with a Series of strings and missing data represented by np.nan .
s=pd.Series(["a", "b", np.nan])
s

0      a
1      b
2    NaN
dtype: object

In [80]:
# Obtain a series with dtype StringDtype:
s.convert_dtypes()

0       a
1       b
2    <NA>
dtype: string

# **pandas.Series.convert_dtypes – All Syntaxes & Details**

The **`convert_dtypes()`** method is a powerful function used to convert the columns or elements in a **Series** or **DataFrame** to the best possible dtypes that support **`pd.NA`** (pandas' missing value indicator).

---

## **1. Syntax**

```python
Series.convert_dtypes(infer_objects=True, convert_string=True, convert_integer=True, convert_boolean=True, convert_floating=True, dtype_backend='numpy_nullable')
```

### **Parameters:**

- **`infer_objects`** (`bool`, default `True`):
  - Whether object dtypes (typically `dtype('O')`) should be inferred and converted to the best possible types.
- **`convert_string`** (`bool`, default `True`):
  - Whether object dtypes should be converted to `StringDtype()` (nullable string type).
- **`convert_integer`** (`bool`, default `True`):
  - Whether object dtypes should be converted to integer extension types if possible.
- **`convert_boolean`** (`bool`, default `True`):
  - Whether object dtypes should be converted to `BooleanDtype()` (nullable boolean type).
- **`convert_floating`** (`bool`, default `True`):
  - Whether object dtypes should be converted to floating extension types (nullable floats). If `convert_integer` is also `True`, preference will be given to integer dtypes if possible.
- **`dtype_backend`** (`{'numpy_nullable', 'pyarrow'}`, default `numpy_nullable`):
  - Defines the backend data type used for conversion.
    - **`numpy_nullable`**: Uses pandas' nullable types based on NumPy.
    - **`pyarrow`**: Uses PyArrow nullable types (experimental feature).

---

## **2. Return Value**

- Returns the **Series** or **DataFrame** with converted dtypes based on the above parameters.

---

## **3. Explanation**

- **`convert_dtypes()`** analyzes each column or element of a **Series** or **DataFrame** and converts them to their most appropriate dtype.
- It is especially useful for handling object types (`dtype('O')`) and ensuring that the data uses the most efficient nullable types like `StringDtype`, `Int64`, `Float64`, `BooleanDtype`, etc.
- It supports `pd.NA` for missing values, offering better handling of missing data compared to the traditional `NaN`.

---

## **4. Examples**

### **4.1 Convert a DataFrame to Best Possible Dtypes**

```python
import pandas as pd
import numpy as np

# Create a DataFrame
df = pd.DataFrame({
    "a": pd.Series([1, 2, 3], dtype=np.dtype("int32")),
    "b": pd.Series(["x", "y", "z"], dtype=np.dtype("O")),
    "c": pd.Series([True, False, np.nan], dtype=np.dtype("O")),
    "d": pd.Series(["h", "i", np.nan], dtype=np.dtype("O")),
    "e": pd.Series([10, np.nan, 20], dtype=np.dtype("float")),
    "f": pd.Series([np.nan, 100.5, 200], dtype=np.dtype("float"))
})

# Show the DataFrame before conversion
print(df.dtypes)

# Convert to the best possible dtypes
dfn = df.convert_dtypes()
print(dfn)
print(dfn.dtypes)
```

**Output:**
Before conversion:

```
a      int32
b     object
c     object
d     object
e    float64
f    float64
dtype: object
```

After conversion:

```
   a  b      c     d     e      f
0  1  x   True     h    10   <NA>
1  2  y  False     i  <NA>  100.5
2  3  z   <NA>  <NA>    20  200.0
```

New dtypes:

```
a             Int32
b    string[python]
c           boolean
d    string[python]
e             Int64
f           Float64
dtype: object
```

**Explanation:**

- The `convert_dtypes()` method automatically converts the columns to their best types, such as:
  - `int32` to `Int32` (nullable integer),
  - `object` to `string[python]` (nullable string),
  - `object` to `boolean` (nullable boolean),
  - `float64` to `Float64` (nullable float).

### **4.2 Convert a Series with Strings and Missing Data**

```python
s = pd.Series(["a", "b", np.nan])
print(s)

# Convert the Series to StringDtype
s_converted = s.convert_dtypes()
print(s_converted)
print(s_converted.dtypes)
```

**Output:**

```
0      a
1      b
2    NaN
dtype: object

0       a
1       b
2    <NA>
dtype: string
```

**Explanation:**

- The Series, initially of `object` dtype, is converted to a nullable string type (`StringDtype`), with missing values represented as `<NA>`.

---

## **5. Notes**

- **`convert_dtypes()`** aims to convert object dtype columns to better types like nullable integers (`Int64`), strings (`StringDtype`), booleans (`BooleanDtype`), and floats (`Float64`), where applicable.
- If **`infer_objects`** is `True`, it will attempt to convert object-dtype columns to the most suitable type based on their values.
- It is particularly useful for ensuring consistent, efficient, and more memory-efficient handling of data in **Series** or **DataFrames**, especially when dealing with missing values (using `pd.NA`).

---

## **6. Summary of Parameters**

| Parameter            | Type                                                      | Description                                                                     |
| -------------------- | --------------------------------------------------------- | ------------------------------------------------------------------------------- |
| **infer_objects**    | `bool`, default `True`                                    | Whether to convert object dtypes to the best possible types.                    |
| **convert_string**   | `bool`, default `True`                                    | Whether to convert object dtypes to `StringDtype()`.                            |
| **convert_integer**  | `bool`, default `True`                                    | Whether to convert object dtypes to integer extension types (e.g., `Int64`).    |
| **convert_boolean**  | `bool`, default `True`                                    | Whether to convert object dtypes to `BooleanDtype()`.                           |
| **convert_floating** | `bool`, default `True`                                    | Whether to convert object dtypes to floating extension types (e.g., `Float64`). |
| **dtype_backend**    | `{‘numpy_nullable’, ‘pyarrow’}`, default `numpy_nullable` | Defines the backend dtype used for conversion. (experimental).                  |

---


In [81]:
""" pandas.Series.infer_objects

Series.infer_objects(copy=None)

Attempt to infer better dtypes for object columns.

Attempts soft conversion of object-dtyped columns, leaving non-object and unconvertible columns unchanged. The inference rules are the same as during normal Series/DataFrame construction.

Parameters:
copy: bool, default True
Whether to make a copy for non-object or non-inferable columns or Series. 

Returns:
same type as input object



"""
df = pd.DataFrame({"A": ["a", 1, 2, 3]})
df = df.iloc[1:]
df

Unnamed: 0,A
1,1
2,2
3,3


In [82]:
df.dtypes

A    object
dtype: object

In [83]:
df.infer_objects().dtypes

A    int64
dtype: object

# **pandas.Series.infer_objects – All Syntaxes & Details**

The **`infer_objects()`** method attempts to infer better, more specific dtypes for columns with **`object`** dtype, essentially trying to "soft convert" those columns into their most appropriate types. It is particularly useful for automatically converting columns of mixed types (e.g., strings and integers) into a more appropriate type (like `int64`, `float64`, `datetime`, etc.).

---

## **1. Syntax**

```python
Series.infer_objects(copy=True)
```

### **Parameters:**

- **`copy`** (`bool`, default `True`):
  - Whether to make a copy of the Series or DataFrame for columns that are not of object type or not inferable. If `False`, it modifies the original object in place.

---

## **2. Return Value**

- The **`infer_objects()`** method returns the **Series** or **DataFrame** with inferred types for columns that are of `object` dtype.

---

## **3. Explanation**

- **`infer_objects()`** is typically used to improve the dtype of object columns in **Series** or **DataFrame**.
- For instance, a column that consists of integers stored as strings will be converted to an integer dtype (`int64`).
- It only attempts to convert columns with `object` dtype and leaves other dtypes (like `int64`, `float64`, etc.) unchanged.
- If the column contains mixed types that cannot be directly converted (e.g., integers and strings), it will remain as `object` dtype.

---

## **4. Examples**

### **4.1 Infer dtypes for a Series**

```python
import pandas as pd

# Create a DataFrame with object dtype column
df = pd.DataFrame({"A": ["a", 1, 2, 3]})
df = df.iloc[1:]  # Slice to remove the first row

print(df)
print(df.dtypes)

# Infer better dtypes for object columns
df_inferred = df.infer_objects()
print(df_inferred)
print(df_inferred.dtypes)
```

**Output:**
Before inference:

```
   A
1  1
2  2
3  3
A    object
dtype: object
```

After inference:

```
   A
1  1
2  2
3  3
A    int64
dtype: object
```

**Explanation:**

- Initially, column **A** has an **`object`** dtype, as it contains both strings and integers.
- After calling **`infer_objects()`**, pandas infers that the values in the column are all integers, and therefore converts the column to **`int64`**.

### **4.2 Behavior with Mixed-Type Column**

```python
# Create a DataFrame with mixed types in a column
df2 = pd.DataFrame({"B": ["1", "2", "three", "4"]})
print(df2)
print(df2.dtypes)

# Infer better dtypes for object columns
df2_inferred = df2.infer_objects()
print(df2_inferred)
print(df2_inferred.dtypes)
```

**Output:**
Before inference:

```
       B
0      1
1      2
2  three
3      4
B    object
dtype: object
```

After inference:

```
       B
0      1
1      2
2  three
3      4
B    object
dtype: object
```

**Explanation:**

- Even though the column has some integer-like strings, there is also a non-numeric value (`"three"`).
- **`infer_objects()`** does not convert the column to `int64` because the non-numeric value makes the column incompatible for conversion. The dtype remains **`object`**.

---

## **5. Notes**

- **`infer_objects()`** is particularly useful when you're working with mixed types in columns and you want pandas to automatically infer a better dtype for those columns.
- The method doesn't make any changes to columns that are already of a suitable dtype (e.g., `int64`, `float64`, `datetime`).
- The **`copy`** parameter allows you to control whether you want to modify the original DataFrame or Series in place or return a new one with inferred types.

---

## **6. Summary of Parameters**

| Parameter  | Type                   | Description                                                                       |
| ---------- | ---------------------- | --------------------------------------------------------------------------------- |
| **`copy`** | `bool`, default `True` | Whether to return a copy of the Series/DataFrame or modify the original in place. |

---


In [84]:
""" pandas.Series.bool


Series.bool()

Return the bool of a single element Series or DataFrame.

Deprecated since version 2.1.0: bool is deprecated and will be removed in future version of pandas. For Series use pandas.Series.item.

This must be a boolean scalar value, either True or False. It will raise a ValueError if the Series or DataFrame does not have exactly 1 element, or that element is not boolean (integer values 0 and 1 will also raise an exception).

Returns:
    bool
        The value in the Series or DataFrame. """


pd.Series([True]).bool()  
pd.Series([False]).bool()  


  pd.Series([True]).bool()
  pd.Series([False]).bool()


False

In [85]:
pd.Series([True]).item() 
pd.Series([False]).item() 

False

# **pandas.Series.bool – Deprecated Method for Extracting Boolean Value**

The **`bool()`** method in pandas was used to extract the boolean value from a **single-element** Series or DataFrame. However, it has been deprecated as of **pandas version 2.1.0** and will be removed in future versions. Instead, **`pandas.Series.item()`** should be used for this purpose.

---

## **1. Syntax**

```python
Series.bool()
```

---

## **2. Parameters**

The **`bool()`** method does not take any parameters.

---

## **3. Return Value**

The method returns a **boolean value** (`True` or `False`).

---

## **4. Explanation**

- **`bool()`** was used to extract the boolean value of a **single-element** Series or DataFrame.
- If the Series or DataFrame contains more than one element, or if the element is not a boolean value (e.g., `True` or `False`), it raises a **ValueError**.
- As mentioned, this method is deprecated and will be removed in future versions of pandas, so **`item()`** is the recommended method to use instead.

---

## **5. Examples**

### **5.1 Example with a Single Element Series**

```python
import pandas as pd

# Series with a single boolean element
s = pd.Series([True])
print(s.bool())  # Output: True

# Series with a single boolean element (False)
s = pd.Series([False])
print(s.bool())  # Output: False
```

### **5.2 Example with DataFrame**

```python
# DataFrame with a single boolean element
df = pd.DataFrame({'col': [True]})
print(df.bool())  # Output: True

# DataFrame with a single boolean element (False)
df = pd.DataFrame({'col': [False]})
print(df.bool())  # Output: False
```

### **5.3 Using `item()` Instead of `bool()`**

Since **`bool()`** is deprecated, **`item()`** is the preferred alternative:

```python
# Using item() method on a single-element Series
s = pd.Series([True])
print(s.item())  # Output: True

# Using item() method on a single-element Series (False)
s = pd.Series([False])
print(s.item())  # Output: False
```

---

## **6. Notes**

- The **`bool()`** method is only useful for single-element Series or DataFrames containing boolean values.
- If you try to use **`bool()`** on a multi-element Series/DataFrame or on an element that is not a boolean, you will get a **ValueError**.

For instance:

```python
s = pd.Series([1, 2, 3])
# This will raise an error:
s.bool()  # ValueError: The truth value of a Series is ambiguous.
```

---

## **7. Summary of Deprecated Method**

| Method              | Description                                                                                                                    |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| **`Series.bool()`** | Extracts a boolean value from a single-element Series/DataFrame. Deprecated in version 2.1.0, use **`Series.item()`** instead. |

---


In [86]:
""" pandas.Series.to_numpy
Series.to_numpy(dtype=None, copy=False, na_value=<no_default>, **kwargs)[source]
A NumPy ndarray representing the values in this Series or Index.

Parameters
:
dtype
str or numpy.dtype, optional
The dtype to pass to numpy.asarray().

copy: 
   bool, default False
Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_value: 
   Any, optional
The value to use for missing values. The default value depends on dtype and the type of the array.

**kwargs
Additional keywords passed through to the to_numpy method of the underlying array (for extension arrays).

Returns:

numpy.ndarray """

ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
ser.to_numpy()

array(['a', 'b', 'a'], dtype=object)

In [87]:
ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
ser.to_numpy(dtype=object)

array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

In [88]:
ser.to_numpy(dtype="datetime64[ns]")

array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

# **pandas.Series.to_numpy – Convert Series to NumPy Array**

The **`to_numpy()`** method in pandas allows you to convert a **Series** (or **Index**) to a **NumPy ndarray**, representing the underlying values. This method provides an easy way to access the Series data as a NumPy array, while retaining the original values.

---

## **1. Syntax**

```python
Series.to_numpy(dtype=None, copy=False, na_value=<no_default>, **kwargs)
```

---

## **2. Parameters**

- **`dtype`** _(str or numpy.dtype, optional)_: Specifies the data type for the returned NumPy array. If not provided, it will infer the appropriate dtype.
- **`copy`** _(bool, default False)_: If `True`, ensures that a copy of the data is returned (even if not strictly necessary). If `False`, it may return a reference to the data if possible (no copy).
- **`na_value`** _(any, optional)_: Defines the value to use for missing values (e.g., `NaN`). This can be used to customize the representation of missing data.
- **`**kwargs`**: Additional keyword arguments are passed to the `to_numpy()` method of the underlying array (for extension arrays).

---

## **3. Return Value**

- **`numpy.ndarray`**: The method returns a NumPy array representing the Series data.

---

## **4. Notes**

- **Extension Arrays**: If the Series contains an extension array (like `category`, `period`, `interval`, or `datetime`), `to_numpy()` may require copying data and coercing the result to a NumPy type, which could be expensive.
- **Category Type**: If a Series is of `category` dtype, the `to_numpy()` method will lose the category information and return an array of objects.

---

## **5. Examples**

### **5.1 Basic Example (Conversion to NumPy Array)**

```python
import pandas as pd

# Series with integer values
ser = pd.Series([1, 2, 3])
arr = ser.to_numpy()
print(arr)  # Output: [1 2 3]
```

### **5.2 Example with Categorical Data**

```python
# Series with categorical data
ser = pd.Series(pd.Categorical(['a', 'b', 'a']))
arr = ser.to_numpy()
print(arr)  # Output: ['a' 'b' 'a']
```

### **5.3 Specifying dtype for Datetime Series**

```python
# Series with datetime data
ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
arr = ser.to_numpy(dtype=object)
print(arr)
# Output: [Timestamp('2000-01-01 00:00:00+0100', tz='CET')
#          Timestamp('2000-01-02 00:00:00+0100', tz='CET')]
```

### **5.4 Specifying dtype as datetime64**

```python
# Converting to native datetime64
arr = ser.to_numpy(dtype="datetime64[ns]")
print(arr)
# Output: ['1999-12-31T23:00:00.000000000' '2000-01-01T23:00:00.000000000']
```

### **5.5 Handling Missing Values**

```python
# Series with missing values
ser = pd.Series([1, 2, None, 4])
arr = ser.to_numpy(na_value=-1)
print(arr)  # Output: [ 1.  2. -1.  4.]
```

---

## **6. Comparison to `Series.array`**

- **`Series.to_numpy()`**: Returns a NumPy array, but it may not preserve the exact dtype for extension arrays like `category` or `datetime`.
- **`Series.array()`**: Provides access to the raw data stored in the Series, retaining the extension type (e.g., `DatetimeArray`, `Categorical`, etc.).

---

## **7. Summary Table**

| dtype                | array type                       |
| -------------------- | -------------------------------- |
| `category[T]`        | ndarray[T] (same dtype as input) |
| `period`             | ndarray[object] (Periods)        |
| `interval`           | ndarray[object] (Intervals)      |
| `IntegerNA`          | ndarray[object]                  |
| `datetime64[ns]`     | datetime64[ns]                   |
| `datetime64[ns, tz]` | ndarray[object] (Timestamps)     |

---


In [89]:
""" pandas.Series.to_period
Series.to_period(freq=None, copy=None)[source]
Convert Series from DatetimeIndex to PeriodIndex.

Parameters:
freq: str, default None
Frequency associated with the PeriodIndex.

copy: bool, default True
Whether or not to return a copy.



Returns:
      Series
        Series with index converted to PeriodIndex. """
        
idx = pd.DatetimeIndex(['2023', '2024', '2025'])
s = pd.Series([1, 2, 3], index=idx)
s = s.to_period()
s

2023    1
2024    2
2025    3
Freq: Y-DEC, dtype: int64

In [90]:
s.index

PeriodIndex(['2023', '2024', '2025'], dtype='period[Y-DEC]')

# **pandas.Series.to_period – Convert Series from DatetimeIndex to PeriodIndex**

The **`to_period()`** method in pandas is used to convert a Series with a **DatetimeIndex** to a Series with a **PeriodIndex**. This can be useful when working with time-based data that you want to handle in terms of specific time periods (such as years, months, or quarters).

---

## **1. Syntax**

```python
Series.to_period(freq=None, copy=None)
```

---

## **2. Parameters**

- **`freq`** _(str, optional)_: Specifies the frequency for the resulting **PeriodIndex**. If not provided, the frequency will be inferred from the data.
- **`copy`** _(bool, default True)_: Whether or not to return a copy of the data. In future versions of pandas (pandas 3.0), the behavior of copy will change to **Copy-on-Write** by default.

---

## **3. Return Value**

- **`Series`**: The method returns a new **Series** where the index is now a **PeriodIndex**.

---

## **4. Notes**

- If the **`freq`** is not provided, pandas will infer it from the **DatetimeIndex**.
- The resulting **PeriodIndex** will represent the same points in time but will be shown as periods based on the specified frequency.

---

## **5. Examples**

### **5.1 Basic Example (Conversion from DatetimeIndex to PeriodIndex)**

```python
import pandas as pd

# Creating a DatetimeIndex
idx = pd.DatetimeIndex(['2023', '2024', '2025'])

# Creating a Series with DatetimeIndex
s = pd.Series([1, 2, 3], index=idx)

# Converting the Series to PeriodIndex
s_period = s.to_period()

print(s_period)
# Output:
# 2023    1
# 2024    2
# 2025    3
# Freq: Y-DEC, dtype: int64
```

### **5.2 Viewing the PeriodIndex**

```python
# Viewing the index
print(s_period.index)
# Output:
# PeriodIndex(['2023', '2024', '2025'], dtype='period[Y-DEC]')
```

### **5.3 Converting with a Specified Frequency**

```python
# Converting with a specified frequency (monthly frequency)
s_monthly = s.to_period(freq='M')

print(s_monthly)
# Output:
# 2023-01    1
# 2024-01    2
# 2025-01    3
# Freq: M, dtype: int64
```

---

## **6. Summary**

- **`to_period()`** is great for transforming time-based data (like **DatetimeIndex**) into period-based data (like **PeriodIndex**).
- It can be helpful when analyzing data that involves discrete time periods (such as yearly, monthly, etc.).

---


In [91]:
""" pandas.Series.to_timestamp
Series.to_timestamp(freq=None, how='start', copy=None)[source]
Cast to DatetimeIndex of Timestamps, at beginning of period.

Parameters:
freq : str, default frequency of PeriodIndex
Desired frequency.

how : {‘s’, ‘e’, ‘start’, ‘end’}
Convention for converting period to timestamp; start of period vs. end.

copy : bool, default True
Whether or not to return a copy. 

Returns:
     Series with DatetimeIndex
     
"""
idx = pd.PeriodIndex(['2023', '2024', '2025'], freq='Y')
s1 = pd.Series([1, 2, 3], index=idx)
s1

2023    1
2024    2
2025    3
Freq: Y-DEC, dtype: int64

In [92]:
s1 = s1.to_timestamp()
s1

2023-01-01    1
2024-01-01    2
2025-01-01    3
Freq: YS-JAN, dtype: int64

In [93]:
s2 = pd.Series([1, 2, 3], index=idx)
s2 = s2.to_timestamp(freq='M')
s2

2023-01-31    1
2024-01-31    2
2025-01-31    3
Freq: YE-JAN, dtype: int64

# **pandas.Series.to_timestamp – Convert PeriodIndex to DatetimeIndex**

The **`to_timestamp()`** method in pandas is used to convert a **PeriodIndex** to a **DatetimeIndex**. The method provides flexibility in defining the exact timestamp to represent each period, such as the start or end of a given period.

---

## **1. Syntax**

```python
Series.to_timestamp(freq=None, how='start', copy=None)
```

---

## **2. Parameters**

- **`freq`** _(str, optional)_: The frequency of the resulting **DatetimeIndex**. By default, it uses the frequency of the **PeriodIndex**.
- **`how`** _({'s', 'e', 'start', 'end'}, default 'start')_: Defines whether the timestamp represents the start or the end of the period.

  - 's' or 'start' refers to the beginning of the period.
  - 'e' or 'end' refers to the end of the period.

- **`copy`** _(bool, default True)_: Whether to return a copy of the data. In pandas 3.0, **Copy-on-Write** will be enabled by default.

---

## **3. Return Value**

- **`Series`**: The method returns a **Series** with **DatetimeIndex** as the index.

---

## **4. Notes**

- If you specify **`freq`**, it will override the frequency of the original **PeriodIndex**.
- The **`how`** parameter allows you to control whether the **DatetimeIndex** will represent the start or end of the period.

---

## **5. Examples**

### **5.1 Basic Example (Convert PeriodIndex to DatetimeIndex at the Start of Period)**

```python
import pandas as pd

# Create a PeriodIndex with yearly frequency
idx = pd.PeriodIndex(['2023', '2024', '2025'], freq='Y')

# Create a Series with PeriodIndex
s1 = pd.Series([1, 2, 3], index=idx)

# Convert the PeriodIndex to a DatetimeIndex (start of each period)
s1_timestamp = s1.to_timestamp()

print(s1_timestamp)
# Output:
# 2023-01-01    1
# 2024-01-01    2
# 2025-01-01    3
# Freq: YS-JAN, dtype: int64
```

### **5.2 Using the `freq` Parameter**

```python
# Convert with a different frequency (monthly frequency)
s2 = s1.to_timestamp(freq='M')

print(s2)
# Output:
# 2023-01-31    1
# 2024-01-31    2
# 2025-01-31    3
# Freq: YE-JAN, dtype: int64
```

### **5.3 Specifying the `how` Parameter (End of Period)**

```python
# Convert to DatetimeIndex, using the end of each period
s3 = s1.to_timestamp(how='end')

print(s3)
# Output:
# 2023-12-31    1
# 2024-12-31    2
# 2025-12-31    3
# Freq: YE-DEC, dtype: int64
```

---

## **6. Summary**

- The **`to_timestamp()`** method is useful when converting a **PeriodIndex** to a **DatetimeIndex**.
- The **`freq`** parameter allows you to control the frequency of the resulting **DatetimeIndex**.
- The **`how`** parameter allows you to specify whether the timestamp should be at the start or the end of each period.


In [94]:
""" pandas.Series.to_list


Series.to_list()
Return a list of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns:
list 

"""
# for Series
s = pd.Series([1, 2, 3])
s.to_list()

[1, 2, 3]

In [95]:
# for Index
idx = pd.Index([1, 2, 3])
idx

Index([1, 2, 3], dtype='int64')

In [96]:
idx.to_list()

[1, 2, 3]

# **pandas.Series.to_list – Convert Series to a Python List**

The **`to_list()`** method in pandas is used to convert a **Series** (or **Index**) into a plain Python list.

---

## **1. Syntax**

```python
Series.to_list()
```

---

## **2. Return Value**

- **`list`**: A Python list containing the values of the **Series** or **Index**.

---

## **3. Examples**

### **3.1 Convert a Series to a List**

```python
import pandas as pd

# Create a simple Series
s = pd.Series([1, 2, 3])

# Convert the Series to a list
list_values = s.to_list()

print(list_values)
# Output: [1, 2, 3]
```

### **3.2 Convert an Index to a List**

```python
# Create a simple Index
idx = pd.Index([1, 2, 3])

# Convert the Index to a list
list_idx = idx.to_list()

print(list_idx)
# Output: [1, 2, 3]
```

---

## **4. Notes**

- The **`to_list()`** method returns a regular Python list, not a pandas-specific object.
- The list will contain Python scalars, such as **int**, **float**, **str**, or **pandas** scalars like **Timestamp**, **Timedelta**, etc.
- This method is convenient when you need to work with data outside of pandas.

---


In [97]:
""" pandas.Series.__array__
Series.__array__(dtype=None, copy=None)[source]
Return the values as a NumPy array.

Users should not call this directly. Rather, it is invoked by numpy.array() and numpy.asarray().

Parameters:


dtype : 
        str or numpy.dtype, optional
The dtype to use for the resulting NumPy array. By default, the dtype is inferred from the data.

copy : 
bool or None, optional
Unused.

Returns :
    numpy.ndarray
The values in the series converted to a numpy.ndarray with the specified dtype. 

"""
ser = pd.Series([1, 2, 3])
np.asarray(ser)

array([1, 2, 3])

In [98]:
# For timezone-aware data, the timezones may be retained with dtype='object'
tzser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
np.asarray(tzser, dtype="object")

array([Timestamp('2000-01-01 00:00:00+0100', tz='CET'),
       Timestamp('2000-01-02 00:00:00+0100', tz='CET')], dtype=object)

In [99]:
# Or the values may be localized to UTC and the tzinfo discarded with dtype='datetime64[ns]'
np.asarray(tzser, dtype="datetime64[ns]")  

array(['1999-12-31T23:00:00.000000000', '2000-01-01T23:00:00.000000000'],
      dtype='datetime64[ns]')

# **pandas.Series.**array** – Convert Series to a NumPy Array**

The **`__array__()`** method is an internal method used by **NumPy** to convert a **pandas Series** into a **NumPy ndarray**. It is invoked when using functions like **`numpy.array()`** or **`numpy.asarray()`**.

---

## **1. Syntax**

```python
Series.__array__(dtype=None, copy=None)
```

---

## **2. Parameters**

- **`dtype`**: `str` or `numpy.dtype`, optional  
  Specifies the desired data type for the resulting NumPy array. If not provided, the dtype is inferred from the Series.
- **`copy`**: `bool` or `None`, optional  
  This parameter is not used in this method.

---

## **3. Return Value**

- **`numpy.ndarray`**: A **NumPy array** containing the values from the Series, optionally with the specified dtype.

---

## **4. Usage Examples**

### **4.1 Converting a Simple Series to a NumPy Array**

```python
import pandas as pd
import numpy as np

# Create a Series
ser = pd.Series([1, 2, 3])

# Convert Series to NumPy array
np_array = np.asarray(ser)

print(np_array)
# Output: [1 2 3]
```

### **4.2 Converting Timezone-Aware Data**

```python
# Create a Series with timezone-aware datetime data
tzser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))

# Convert the timezone-aware Series to a NumPy array (dtype='object')
np_array_tz = np.asarray(tzser, dtype="object")

print(np_array_tz)
# Output: [Timestamp('2000-01-01 00:00:00+0100', tz='CET') Timestamp('2000-01-02 00:00:00+0100', tz='CET')]
```

### **4.3 Converting Timezone-Aware Data to UTC**

```python
# Convert the timezone-aware Series to NumPy array with dtype='datetime64[ns]'
np_array_utc = np.asarray(tzser, dtype="datetime64[ns]")

print(np_array_utc)
# Output: ['1999-12-31T23:00:00.000000000' '2000-01-01T23:00:00.000000000']
```

---

## **5. Notes**

- **`__array__()`** is not meant to be directly called. Instead, it is invoked when you use **`numpy.array()`** or **`numpy.asarray()`** to convert a pandas Series to a NumPy array.
- If you want a zero-copy view of the underlying array data in the Series, you can use **`Series.array`**.
- **`Series.to_numpy()`** provides a similar function but is recommended for user-facing code.

---


In [100]:
""" pandas.Series.get

Series.get(key, default=None)

Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters:
  key : 
    object
    
Returns
:
same type as items contained in object
"""
df = pd.DataFrame(
    [
        [24.3, 75.7, "high"],
        [31, 87.8, "high"],
        [22, 71.6, "medium"],
        [35, 95, "medium"],
    ],
    columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
    index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"),
)

In [101]:
df

Unnamed: 0,temp_celsius,temp_fahrenheit,windspeed
2014-02-12,24.3,75.7,high
2014-02-13,31.0,87.8,high
2014-02-14,22.0,71.6,medium
2014-02-15,35.0,95.0,medium


In [102]:
df.get(["temp_celsius", "windspeed"]  , default="Not found")

Unnamed: 0,temp_celsius,windspeed
2014-02-12,24.3,high
2014-02-13,31.0,high
2014-02-14,22.0,medium
2014-02-15,35.0,medium


In [103]:
ser = df['windspeed']
ser.get('2014-02-13')

'high'

In [104]:
df.get(["temp_celsius", "temp_kelvin"], default="default_value")

'default_value'

In [105]:
ser.get('2014-02-10', '[unknown]')

'[unknown]'

# **pandas.Series.get – Retrieve an Item from a Series**

The **`get()`** method is used to retrieve an item from a **pandas Series** (or **DataFrame** column) for a given key (index label). If the key is not found, it returns a default value instead of raising an error.

---

## **1. Syntax**

```python
Series.get(key, default=None)
```

---

## **2. Parameters**

- **`key`**: `object`  
  The key (index label) you want to retrieve the value for.

- **`default`**: `any`, optional  
  The value to return if the key is not found. The default value is `None`.

---

## **3. Return Value**

- The method returns the value associated with the given key in the Series. If the key is not found, it returns the `default` value.

---

## **4. Usage Examples**

### **4.1 Accessing Multiple Columns in a DataFrame**

You can use the `get()` method to access multiple columns at once in a **DataFrame**.

```python
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame(
    [
        [24.3, 75.7, "high"],
        [31, 87.8, "high"],
        [22, 71.6, "medium"],
        [35, 95, "medium"],
    ],
    columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
    index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"),
)

# Get multiple columns
result = df.get(["temp_celsius", "windspeed"])
print(result)
```

**Output:**

```
            temp_celsius windspeed
2014-02-12          24.3      high
2014-02-13          31.0      high
2014-02-14          22.0    medium
2014-02-15          35.0    medium
```

### **4.2 Accessing a Single Element from a Series**

You can use `get()` on a Series to retrieve a single element using an index label.

```python
# Access a single element
ser = df['windspeed']
value = ser.get('2014-02-13')
print(value)
# Output: high
```

### **4.3 Providing a Default Value When Key is Not Found**

If the key is not present, the `get()` method will return the default value you specify.

```python
# Access a key that doesn't exist, with a default value
default_value = df.get(["temp_celsius", "temp_kelvin"], default="default_value")
print(default_value)
# Output: default_value
```

### **4.4 Handling Missing Data in a Series**

You can use `get()` to handle missing data in a **Series** by returning a default value when the key is not found.

```python
# Access a key that doesn't exist in the Series
missing_value = ser.get('2014-02-10', '[unknown]')
print(missing_value)
# Output: [unknown]
```

---

## **5. Notes**

- If the key exists in the Series or DataFrame, `get()` returns the corresponding value.
- If the key is not found, the specified default value will be returned instead of throwing a `KeyError`.
- This method is useful when you want to avoid errors for missing keys and provide fallback/default values.

---


In [106]:
""" pandas.Series.at


property Series.at[source]
Access a single value for a row/column label pair.

Similar to loc, in that both provide label-based lookups. Use at if you only need to get or set a single value in a DataFrame or Series.

Raises:

KeyError

If getting a value and ‘label’ does not exist in a DataFrame or Series.

ValueError

If row/column label pair is not a tuple or if any label from the pair is not a scalar for DataFrame. If label is list-like (excluding NamedTuple) for Series. """
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
                  index=[4, 5, 6], columns=['A', 'B', 'C'])

In [107]:
df

Unnamed: 0,A,B,C
4,0,2,3
5,0,4,1
6,10,20,30


In [108]:
#Get value at specified row/column pair
df.at[4, 'B']

np.int64(2)

In [109]:
#Set value at specified row/column pair
df.at[4, 'B'] = 10
df.at[4, 'B']

np.int64(10)

In [110]:
#Get value within a Series
df.loc[5].at['B']

np.int64(4)

## **pandas.Series.at – Access a Single Value for a Row/Column Label Pair**

The **`Series.at`** property provides an efficient way to access a single value within a **pandas Series** or **DataFrame** using a specific label (row/column pair).

---

### **1. Syntax**

```python
Series.at[label]
```

- **`label`**: The row label in a Series or DataFrame, and optionally the column label for DataFrames.

---

### **2. Usage**

- **Accessing values**: You can access a specific element from a Series or DataFrame using a row/column label.
- **Efficient for scalars**: `at` is designed for accessing a single value, making it faster than other methods such as `loc` or `iloc` when you are only retrieving a single value.

---

### **3. Parameters**

- **`label`**: The label of the row (and optionally column, for DataFrame) you want to access.

---

### **4. Return Value**

- **`Series`** or **`scalar`**: Returns the value at the specified label in the Series (or DataFrame).
- If the label is not found, a **`KeyError`** is raised.

---

### **5. Common Use Cases**

- **Accessing Single Values in Series**
- **Modifying Single Values in Series**

---

### **6. Examples**

#### **6.1 Accessing a Single Value in a DataFrame**

```python
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
                  index=[4, 5, 6], columns=['A', 'B', 'C'])

print(df)

# Access value at specific row/column pair
print(df.at[4, 'B'])  # Output: 2

# Set a new value at the same location
df.at[4, 'B'] = 10
print(df.at[4, 'B'])  # Output: 10
```

**Output:**

```
    A   B   C
4   0   2   3
5   0   4   1
6  10  20  30
```

#### **6.2 Accessing a Single Value in a Series**

You can also use **`at`** within a Series after using `.loc` to select a specific row.

```python
# Access a value from a Series
series = df.loc[5]  # Select row 5 from DataFrame
print(series.at['B'])  # Output: 4
```

#### **6.3 Modifying a Single Value in a Series**

```python
# Modifying a value in a Series
df.loc[6].at['A'] = 100
print(df.at[6, 'A'])  # Output: 100
```

---

### **7. Notes**

- **Efficient Access**: `at` is faster than `loc` for accessing single values because it is optimized for scalar retrieval.
- **Error Handling**: If the label is not present, it raises a **`KeyError`**.
- **Not for Lists**: Unlike `loc`, **`at`** cannot handle list-like labels. It expects a single label, not a list or array.

---

### **8. See Also**

- **`DataFrame.at`** – Access a single value in a DataFrame.
- **`Series.loc`** – Access a group of rows by labels.
- **`Series.iat`** – Access a single value by integer position.


In [111]:
""" 
pandas.Series.iat

property Series.iat[source]
Access a single value for a row/column pair by integer position.

Similar to iloc, in that both provide integer-based lookups. Use iat if you only need to get or set a single value in a DataFrame or Series.

Raises :
    IndexError
      When integer position is out of bounds. """
      
      
""" DataFrame.at
      
Access a single value for a row/column label pair.

DataFrame.loc
Access a group of rows and columns by label(s).

DataFrame.iloc
Access a group of rows and columns by integer position(s). """


' DataFrame.at\n      \nAccess a single value for a row/column label pair.\n\nDataFrame.loc\nAccess a group of rows and columns by label(s).\n\nDataFrame.iloc\nAccess a group of rows and columns by integer position(s). '

In [112]:
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
                  columns=['A', 'B', 'C'])
df

Unnamed: 0,A,B,C
0,0,2,3
1,0,4,1
2,10,20,30


In [113]:
# Get value at specified row/column pair
df.iat[1, 2]

np.int64(1)

In [114]:
# Set value at specified row/column pair
df.iat[1, 2] = 10
df.iat[1, 2]

np.int64(10)

In [115]:
# Get value within a series
df.loc[0].iat[1]

np.int64(2)

## **pandas.Series.iat – Access a Single Value for a Row/Column Pair by Integer Position**

The **`Series.iat`** property provides an efficient way to access or modify a single value in a **pandas Series** or **DataFrame** using integer-based indexing.

---

### **1. Syntax**

```python
Series.iat[row_index, col_index]
```

- **`row_index`**: Integer position of the row.
- **`col_index`**: Integer position of the column.

---

### **2. Usage**

- **Efficient for single-value access**: Similar to **`iloc`**, but more efficient for getting or setting a single value, as it is optimized for scalar operations.
- **Integer-based**: Uses integers for both row and column positions, rather than labels.

---

### **3. Parameters**

- **`row_index`**: Integer position of the row.
- **`col_index`**: Integer position of the column.

---

### **4. Return Value**

- **`Series`** or **`scalar`**: Returns the value at the specified row and column (position).
- If the indices are out of bounds, an **`IndexError`** is raised.

---

### **5. Common Use Cases**

- **Accessing single values by integer position**
- **Modifying values by integer position**

---

### **6. Examples**

#### **6.1 Accessing a Single Value in a DataFrame**

```python
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
                  columns=['A', 'B', 'C'])

print(df)

# Access value at specific row/column position
print(df.iat[1, 2])  # Output: 1

# Set a new value at the same location
df.iat[1, 2] = 10
print(df.iat[1, 2])  # Output: 10
```

**Output:**

```
    A   B   C
0   0   2   3
1   0   4   1
2  10  20  30
```

#### **6.2 Accessing a Single Value in a Series**

You can also use **`iat`** after selecting a specific row from a **DataFrame** using **`loc`**.

```python
# Access a value from a Series within a DataFrame
series = df.loc[0]  # Select row 0 from DataFrame
print(series.iat[1])  # Output: 2
```

#### **6.3 Modifying a Single Value in a Series**

```python
# Modifying a value in a Series
df.loc[2].iat[0] = 100
print(df.iat[2, 0])  # Output: 100
```

---

### **7. Notes**

- **Efficient for Scalar Access**: **`iat`** is faster than **`iloc`** when dealing with a single value, as it is optimized for scalar operations.
- **Indexing with Integers**: You must use integer positions, not labels (which **`loc`** and **`at`** support).
- **Error Handling**: If the indices are out of bounds, an **`IndexError`** is raised, so always ensure valid positions are provided.

---

### **8. See Also**

- **`DataFrame.at`** – Access a single value for a row/column label pair.
- **`DataFrame.loc`** – Access a group of rows and columns by labels.
- **`Series.loc`** – Access a group of rows by label.
- **`Series.iloc`** – Access a group of rows by integer position.


In [116]:
""" pandas.Series.loc
property Series.loc[source]
Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Allowed inputs are:

A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index, and never as an integer position along the index).

A list or array of labels, e.g. ['a', 'b', 'c'].

A slice object with labels, e.g. 'a':'f'.

Warning

Note that contrary to usual python slices, both the start and the stop are included

A boolean array of the same length as the axis being sliced, e.g. [True, False, True].

An alignable boolean Series. The index of the key will be aligned before masking.

An alignable Index. The Index of the returned selection will be the input.

A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above)

See more at Selection by Label.

Raises
:
KeyError
If any items are not found.

IndexingError
If an indexed key is passed and its index is unalignable to the frame index.



DataFrame.at
Access a single value for a row/column label pair.

DataFrame.iloc
Access group of rows and columns by integer position(s).

DataFrame.xs
Returns a cross-section (row(s) or column(s)) from the Series/DataFrame.

Series.loc
Access group of values using labels. """

df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
                  index=['cobra', 'viper', 'sidewinder'],
                  columns=['max_speed', 'shield'])
df

Unnamed: 0,max_speed,shield
cobra,1,2
viper,4,5
sidewinder,7,8


In [117]:
# Single label. Note this returns the row as a Series.
df.loc['viper']

max_speed    4
shield       5
Name: viper, dtype: int64

In [118]:
# list of labels. Note using [[]] returns a DataFrame.
df.loc[['viper', 'sidewinder']]

Unnamed: 0,max_speed,shield
viper,4,5
sidewinder,7,8


In [119]:
# slice wit hlabels for row and single label for column.
df.loc['cobra':'viper', 'max_speed']

cobra    1
viper    4
Name: max_speed, dtype: int64

In [120]:
# Boolean list with the same length as the row axis
df.loc[[False, False, True]]

Unnamed: 0,max_speed,shield
sidewinder,7,8


In [121]:
# Alignable boolean Series:
df.loc[pd.Series([False, True, False],
                 index=['viper', 'sidewinder', 'cobra'])]

Unnamed: 0,max_speed,shield
sidewinder,7,8


In [122]:
# Index (same behavior as df.reindex)
df.loc[pd.Index(["cobra", "viper"], name="foo")]
   

Unnamed: 0_level_0,max_speed,shield
foo,Unnamed: 1_level_1,Unnamed: 2_level_1
cobra,1,2
viper,4,5


In [123]:
# Conditional that returns a boolean Series
df.loc[df['shield'] > 6]

Unnamed: 0,max_speed,shield
sidewinder,7,8


In [124]:
# Conditional that returns a boolean Series with column labels specified
df.loc[df['shield'] > 6, ['max_speed']]

Unnamed: 0,max_speed
sidewinder,7


In [125]:
# Multiple conditional using & that returns a boolean Series
df.loc[(df['max_speed'] > 1) & (df['shield'] < 8)]

Unnamed: 0,max_speed,shield
viper,4,5


In [126]:
# Multiple conditional using | that returns a boolean Series
df.loc[(df['max_speed'] > 4) | (df['shield'] < 5)]
# Please ensure that each condition is wrapped in parentheses ()

Unnamed: 0,max_speed,shield
cobra,1,2
sidewinder,7,8


In [127]:
# Callable that returns a boolean Series

df.loc[lambda df: df['shield'] == 8]

Unnamed: 0,max_speed,shield
sidewinder,7,8


In [128]:
# Setting values

# Set value for all items matching the list of labels

df.loc[['viper', 'sidewinder'], ['shield']] = 50
df

Unnamed: 0,max_speed,shield
cobra,1,2
viper,4,50
sidewinder,7,50


In [129]:
# Set value for an entire row

df.loc['cobra'] = 10
df

Unnamed: 0,max_speed,shield
cobra,10,10
viper,4,50
sidewinder,7,50


In [130]:
# Set value for an entire column

df.loc[:, 'max_speed'] = 30
df

Unnamed: 0,max_speed,shield
cobra,30,10
viper,30,50
sidewinder,30,50


In [131]:
# Set value for rows matching callable condition

df.loc[df['shield'] > 35] = 0
df

Unnamed: 0,max_speed,shield
cobra,30,10
viper,0,0
sidewinder,0,0


In [132]:
# Add value matching location

df.loc["viper", "shield"] += 5
df

Unnamed: 0,max_speed,shield
cobra,30,10
viper,0,5
sidewinder,0,0


In [133]:
# Setting using a Series or a DataFrame sets the values matching the index labels, not the index positions.

shuffled_df = df.loc[["viper", "cobra", "sidewinder"]]
df.loc[:] += shuffled_df
df

Unnamed: 0,max_speed,shield
cobra,60,20
viper,0,10
sidewinder,0,0


In [134]:
# Getting values on a DataFrame with an index that has integer labels

# Another example using integers for the index

df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
                  index=[7, 8, 9], columns=['max_speed', 'shield'])
df

Unnamed: 0,max_speed,shield
7,1,2
8,4,5
9,7,8


In [135]:
# Slice with integer labels for rows. As mentioned above, note that both the start and stop of the slice are included.

df.loc[7:9]

Unnamed: 0,max_speed,shield
7,1,2
8,4,5
9,7,8


In [136]:
# setting values with a MultiIndex

# A number of examples using a DataFrame with a MultiIndex

tuples = [
    ('cobra', 'mark i'), ('cobra', 'mark ii'),
    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
    ('viper', 'mark ii'), ('viper', 'mark iii')
]
index = pd.MultiIndex.from_tuples(tuples)
values = [[12, 2], [0, 4], [10, 20],
          [1, 4], [7, 1], [16, 36]]
df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
df

Unnamed: 0,Unnamed: 1,max_speed,shield
cobra,mark i,12,2
cobra,mark ii,0,4
sidewinder,mark i,10,20
sidewinder,mark ii,1,4
viper,mark ii,7,1
viper,mark iii,16,36


In [137]:
# Single label. Note this returns a DataFrame with a single index.

df.loc['cobra']

Unnamed: 0,max_speed,shield
mark i,12,2
mark ii,0,4


In [138]:
# Single index tuple. Note this returns a Series.

df.loc[('cobra', 'mark ii')]

max_speed    0
shield       4
Name: (cobra, mark ii), dtype: int64

In [139]:
# Single label for row and column. Similar to passing in a tuple, this returns a Series.

df.loc['cobra', 'mark i']

max_speed    12
shield        2
Name: (cobra, mark i), dtype: int64

In [140]:
# Single tuple. Note using [[]] returns a DataFrame.

df.loc[[('cobra', 'mark ii')]]

Unnamed: 0,Unnamed: 1,max_speed,shield
cobra,mark ii,0,4


In [141]:
# Single tuple for the index with a single label for the column

df.loc[('cobra', 'mark i'), 'shield']

np.int64(2)

In [142]:
# Slice from index tuple to single label

df.loc[('cobra', 'mark i'):'viper']

Unnamed: 0,Unnamed: 1,max_speed,shield
cobra,mark i,12,2
cobra,mark ii,0,4
sidewinder,mark i,10,20
sidewinder,mark ii,1,4
viper,mark ii,7,1
viper,mark iii,16,36


In [143]:
# Slice from index tuple to index tuple

df.loc[('cobra', 'mark i'):('viper', 'mark ii')]

Unnamed: 0,Unnamed: 1,max_speed,shield
cobra,mark i,12,2
cobra,mark ii,0,4
sidewinder,mark i,10,20
sidewinder,mark ii,1,4
viper,mark ii,7,1


## **pandas.Series.loc – Access a Group of Rows and Columns by Label(s)**

The **`Series.loc[]`** property is used to access rows and columns in a **pandas DataFrame** or **Series** by label. It allows for more flexible and readable indexing compared to integer-based access (like `iloc`).

---

### **1. Syntax**

```python
Series.loc[row_indexer, column_indexer]
```

- **`row_indexer`**: This can be a single label, list/array of labels, slice, or boolean array.
- **`column_indexer`**: This can also be a label, list of labels, or a slice for columns.

---

### **2. Allowed Inputs for `row_indexer`**

- **Single label**: e.g., `'viper'` or `5` (treated as a label, not an integer position).
- **List/array of labels**: e.g., `['viper', 'sidewinder']`.
- **Slice object with labels**: e.g., `'cobra':'viper'` (inclusive of both start and stop).
- **Boolean array**: Array of `True`/`False` values corresponding to rows.
- **Boolean Series**: A Series with boolean values that align with the DataFrame's index.
- **Callable function**: A function that returns valid output (like the above).

---

### **3. Allowed Inputs for `column_indexer`**

- **Single label**: e.g., `'shield'`.
- **List/array of labels**: e.g., `['shield', 'max_speed']`.
- **Slice object**: e.g., `'shield':'max_speed'`.

---

### **4. Return Values**

- **Series**: When selecting a single row or column.
- **DataFrame**: When selecting multiple rows or columns.
- **Error**: If a label doesn’t exist, a `KeyError` is raised. If there is an issue with alignment, an `IndexingError` may be raised.

---

### **5. Common Use Cases**

- **Label-based row and column selection**
- **Slicing rows or columns**
- **Boolean indexing**

---

### **6. Examples**

#### **6.1 Accessing a Single Row**

```python
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
                  index=['cobra', 'viper', 'sidewinder'],
                  columns=['max_speed', 'shield'])

# Access a single row by label
print(df.loc['viper'])
```

**Output**:

```
max_speed    4
shield       5
Name: viper, dtype: int64
```

#### **6.2 Accessing Multiple Rows**

```python
# Access multiple rows by labels
print(df.loc[['viper', 'sidewinder']])
```

**Output**:

```
            max_speed  shield
viper               4       5
sidewinder          7       8
```

#### **6.3 Slicing Rows**

```python
# Slicing rows by labels
print(df.loc['cobra':'viper', 'max_speed'])
```

**Output**:

```
cobra    1
viper    4
Name: max_speed, dtype: int64
```

#### **6.4 Boolean Indexing**

```python
# Boolean indexing for rows
print(df.loc[[False, False, True]])
```

**Output**:

```
            max_speed  shield
sidewinder          7       8
```

#### **6.5 Conditional Filtering**

```python
# Conditional filtering on column values
print(df.loc[df['shield'] > 6])
```

**Output**:

```
            max_speed  shield
sidewinder          7       8
```

#### **6.6 Setting Values**

```python
# Setting values for multiple rows
df.loc[['viper', 'sidewinder'], ['shield']] = 50
print(df)
```

**Output**:

```
            max_speed  shield
cobra               1       2
viper               4      50
sidewinder          7      50
```

#### **6.7 Setting Entire Row**

```python
# Setting an entire row
df.loc['cobra'] = 10
print(df)
```

**Output**:

```
            max_speed  shield
cobra              10      10
viper               4      50
sidewinder          7      50
```

#### **6.8 Conditional Row Modification**

```python
# Conditional modification of values
df.loc[df['shield'] > 35] = 0
print(df)
```

**Output**:

```
            max_speed  shield
cobra              10      10
viper               0       0
sidewinder          0       0
```

---

### **7. Notes**

- **Inclusive Slicing**: Unlike regular Python slicing, both the start and stop are included in **`loc[]`** slices.
- **Boolean Indexing**: When using boolean indexing, ensure the boolean array or Series matches the axis length.
- **Advanced Indexing**: If you have multiple conditions or complex indexing, consider using more advanced indexing techniques.

---

### **8. See Also**

- **`DataFrame.at`** – Access a single value by label.
- **`DataFrame.iloc`** – Access rows/columns by integer position.
- **`DataFrame.xs`** – Return cross-sections from the DataFrame.
- **`Series.iat`** – Access a single value by integer position.
- **`Series.loc`** – Access group of values using labels.


In [144]:
""" 

pandas.Series.iloc


property Series.iloc

Purely integer-location based indexing for selection by position.

Deprecated since version 2.2.0: Returning a tuple from a callable is deprecated.

.iloc[] is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.

Allowed inputs are:

An integer, e.g. 5.

A list or array of integers, e.g. [4, 3, 0].

A slice object with ints, e.g. 1:7.

A boolean array.

A callable function with one argument (the calling Series or DataFrame) and that returns valid output for indexing (one of the above). This is useful in method chains, when you don’t have a reference to the calling object, but would like to base your selection on some value.

A tuple of row and column indexes. The tuple elements consist of one of the above inputs, e.g. (0, 1).

.iloc will raise IndexError if a requested indexer is out-of-bounds, except slice indexers which allow out-of-bounds indexing (this conforms with python/numpy slice semantics).



DataFrame.iat
Fast integer location scalar accessor.

DataFrame.loc
Purely label-location based indexer for selection by label.

Series.iloc
Purely integer-location based indexing for selection by position. """

mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
          {'a': 100, 'b': 200, 'c': 300, 'd': 400},
          {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]
df = pd.DataFrame(mydict)
df

Unnamed: 0,a,b,c,d
0,1,2,3,4
1,100,200,300,400
2,1000,2000,3000,4000


In [145]:
# Indexing just the rows

# With a scalar integer.

type(df.iloc[0])
<class 'pandas.core.series.Series'>
df.iloc[0]

SyntaxError: invalid syntax (136484933.py, line 6)

In [None]:
# With a list of integers.

df.iloc[[0]]#

In [None]:
type(df.iloc[[0]])

In [None]:
# With a slice object.

df.iloc[:3]

In [None]:
# With a boolean mask the same length as the index.

df.iloc[[True, False, True]]

In [None]:
# With a callable, useful in method chains. The x passed to the lambda is the DataFrame being sliced. This selects the rows whose index label even.

df.iloc[lambda x: x.index % 2 == 0]

In [None]:
# Indexing both axes

# You can mix the indexer types for the index and columns. Use : to select the entire axis.

# With scalar integers.

df.iloc[0, 1]

In [None]:
# With lists of integers.

df.iloc[[0, 2], [1, 3]]

In [None]:
# With slice objects.

df.iloc[1:3, 0:3]

In [None]:
# With a boolean array whose length matches the columns.

df.iloc[:, [True, False, True, False]]

In [None]:
# With a callable function that expects the Series or DataFrame.

df.iloc[:, lambda df: [0, 2]]

## **pandas.Series.iloc – Integer-Location Based Indexing**

The **`Series.iloc[]`** property provides integer-location based indexing for selecting values from **pandas DataFrame** or **Series** by position rather than by label. This is especially useful when you're dealing with index labels that are non-numeric or when you need to access rows and columns based on their integer positions.

---

### **1. Syntax**

```python
Series.iloc[row_indexer, column_indexer]
```

- **`row_indexer`**: Integer index, list/array of integers, slice, boolean array, callable function, or tuple.
- **`column_indexer`**: Integer index, list of integers, slice, or boolean array for column selection.

---

### **2. Allowed Inputs for `row_indexer`**

- **Integer**: e.g., `0`, `1`, etc. (for single row).
- **List/array of integers**: e.g., `[0, 1, 2]` (for multiple rows).
- **Slice object**: e.g., `1:5` (slicing rows between positions 1 and 5).
- **Boolean array**: e.g., `[True, False, True]` (matching the index length).
- **Callable function**: A function that returns a valid indexer (like a lambda) that operates on the DataFrame/Series.
- **Tuple of row and column indexers**: e.g., `(0, 1)` for selecting a specific value.

---

### **3. Allowed Inputs for `column_indexer`**

- **Integer**: Single column by its integer position.
- **List/array of integers**: Select multiple columns by their integer positions.
- **Slice object**: Select a range of columns.
- **Boolean array**: Select columns based on a boolean mask.
- **Callable function**: Apply a function to the columns to determine which to select.

---

### **4. Return Values**

- **Series**: When selecting a single row or column.
- **DataFrame**: When selecting multiple rows and/or columns.
- **Scalar value**: When selecting a specific value.

---

### **5. Examples**

#### **5.1 Indexing Rows**

```python
import pandas as pd

# Create a sample DataFrame
mydict = [{'a': 1, 'b': 2, 'c': 3, 'd': 4},
          {'a': 100, 'b': 200, 'c': 300, 'd': 400},
          {'a': 1000, 'b': 2000, 'c': 3000, 'd': 4000}]
df = pd.DataFrame(mydict)

# Access a single row by position
print(df.iloc[0])  # First row
```

**Output**:

```
a     1
b     2
c     3
d     4
Name: 0, dtype: int64
```

#### **5.2 Indexing Multiple Rows**

```python
# Access multiple rows
print(df.iloc[[0, 1]])  # First and second rows
```

**Output**:

```
     a    b    c    d
0    1    2    3    4
1  100  200  300  400
```

#### **5.3 Slicing Rows**

```python
# Access a slice of rows (first three rows)
print(df.iloc[:3])
```

**Output**:

```
     a     b     c     d
0    1     2     3     4
1  100   200   300   400
2  1000  2000  3000  4000
```

#### **5.4 Boolean Masking**

```python
# Use boolean array to filter rows
print(df.iloc[[True, False, True]])  # Selects first and third rows
```

**Output**:

```
     a     b     c     d
0    1     2     3     4
2  1000  2000  3000  4000
```

#### **5.5 Using Callable Function**

```python
# Select rows whose index is even using a lambda function
print(df.iloc[lambda x: x.index % 2 == 0])
```

**Output**:

```
     a     b     c     d
0    1     2     3     4
2  1000  2000  3000  4000
```

#### **5.6 Indexing Both Rows and Columns**

```python
# Select specific row and column (row 0, column 1)
print(df.iloc[0, 1])  # First row, second column
```

**Output**:

```
2
```

#### **5.7 Indexing Multiple Rows and Columns**

```python
# Select multiple rows and columns
print(df.iloc[[0, 2], [1, 3]])  # First and third rows, second and fourth columns
```

**Output**:

```
     b     d
0    2     4
2  2000  4000
```

#### **5.8 Slicing Rows and Columns**

```python
# Slice rows and columns
print(df.iloc[1:3, 0:3])  # Rows 1-2 and columns 0-2
```

**Output**:

```
     a     b     c
1  100   200   300
2  1000  2000  3000
```

#### **5.9 Using Boolean Mask for Columns**

```python
# Boolean indexing for columns
print(df.iloc[:, [True, False, True, False]])  # Selects columns 0 and 2
```

**Output**:

```
     a     c
0    1     3
1  100   300
2  1000  3000
```

#### **5.10 Callable Function for Column Selection**

```python
# Use callable function to select columns based on function result
print(df.iloc[:, lambda df: [0, 2]])  # Select first and third columns
```

**Output**:

```
     a     c
0    1     3
1  100   300
2  1000  3000
```

---

### **6. Notes**

- **Out-of-bounds Handling**: Using out-of-bounds indices (like `df.iloc[5]` on a DataFrame with 5 rows) will raise an **IndexError**. However, slicing allows for out-of-bounds indices and will not throw an error.
- **Boolean Indexing**: For boolean indexing, ensure the boolean array or Series matches the length of the axis you're indexing.

---

### **7. See Also**

- **`DataFrame.iat`** – Fast scalar integer location access.
- **`DataFrame.loc`** – Label-based indexer for rows and columns.
- **`Series.loc`** – Access group of values using labels.


In [None]:
""" 
pandas.Series.__iter__

Series.__iter__()

Return an iterator of the values.

These are each a scalar type, which is a Python scalar (for str, int, float) or a pandas scalar (for Timestamp/Timedelta/Interval/Period)

Returns :
iterator 
"""
s = pd.Series([1, 2, 3])
for x in s:
    print(x)

## **pandas.Series.**iter** – Iterator for Values**

The **`Series.__iter__()`** method allows iteration over the values of a **pandas Series**. This function returns an iterator that allows you to loop over the Series in a for loop or other iterative constructs. Each value returned by the iterator is a scalar, either a standard Python scalar (such as `int`, `float`, or `str`) or a **pandas scalar** (e.g., `Timestamp`, `Timedelta`, `Period`, or `Interval`).

### **Syntax**

```python
Series.__iter__()
```

- This method doesn't take any arguments and returns an iterator that can be used to loop through the values of the Series.

---

### **Examples**

#### **1. Simple Iteration over a Series**

You can loop through the values of the Series just like you would with any iterable object in Python.

```python
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3])

# Iterate over the Series
for x in s:
    print(x)
```

**Output**:

```
1
2
3
```

#### **2. Iterating over a Series of Strings**

```python
# Create a Series of strings
s = pd.Series(['apple', 'banana', 'cherry'])

# Iterate over the Series
for x in s:
    print(x)
```

**Output**:

```
apple
banana
cherry
```

#### **3. Iterating over a Series of Timestamps**

```python
# Create a Series of Timestamps
s = pd.Series(pd.date_range('2023-01-01', periods=3))

# Iterate over the Series
for x in s:
    print(x)
```

**Output**:

```
2023-01-01 00:00:00
2023-01-02 00:00:00
2023-01-03 00:00:00
```

#### **4. Iterating over a Series with Mixed Data Types**

```python
# Create a Series with mixed data types
s = pd.Series([1, 'banana', 3.14, pd.Timestamp('2023-01-01')])

# Iterate over the Series
for x in s:
    print(x)
```

**Output**:

```
1
banana
3.14
2023-01-01 00:00:00
```

---

### **5. Notes**

- **Python Scalars**: Scalars like `int`, `float`, and `str` are returned during iteration.
- **Pandas Scalars**: Types like `Timestamp`, `Timedelta`, `Period`, and `Interval` are also returned for datetime or time-related Series.
- The **`Series.__iter__()`** method is automatically invoked when using a `for` loop directly on the Series.

---

### **6. See Also**

- **`pandas.Series.iteritems()`**: Another iteration method, returns an iterator yielding pairs of index and value.
- **`pandas.Series.values`**: Returns an array of the values in the Series, useful when needing to use the values outside of iteration.


In [None]:
! pip install Pandas
! pip install numpy




In [None]:
""" 
pandas.Series.items


Series.items()

Lazily iterate over (index, value) tuples.

This method returns an iterable tuple (index, value). This is convenient if you want to create a lazy iterator.

Returns :
      iterable
      Iterable of tuples containing the (index, value) pairs from a Series.



DataFrame.items
Iterate over (column name, Series) pairs.

DataFrame.iterrows
Iterate over DataFrame rows as (index, Series) pairs. 

"""
! pip install pandas
import pandas as pd
s = pd.Series(['A', 'B', 'C'])
for index, value in s.items():
    print(f"Index : {index}, Value : {value}")

## **pandas.Series.items – Lazily Iterate Over (Index, Value) Pairs**

The **`Series.items()`** method in **pandas** allows you to lazily iterate over a **Series** by returning an iterable of `(index, value)` tuples. This method is useful when you want both the **index** and the **value** of each element in the Series, and it's efficient in cases where you don't want to load the entire Series into memory at once.

### **Syntax**

```python
Series.items()
```

- This method returns an **iterable** of tuples where each tuple contains:
  - The **index** of the element.
  - The **value** at that index in the Series.

---

### **Examples**

#### **1. Simple Iteration Over Index and Value**

```python
import pandas as pd

# Create a Series
s = pd.Series(['A', 'B', 'C'])

# Iterate over the Series using items()
for index, value in s.items():
    print(f"Index : {index}, Value : {value}")
```

**Output**:

```
Index : 0, Value : A
Index : 1, Value : B
Index : 2, Value : C
```

#### **2. Iterating Over a Series of Integers**

```python
# Create a Series of integers
s = pd.Series([1, 2, 3])

# Iterate over the Series using items()
for index, value in s.items():
    print(f"Index : {index}, Value : {value}")
```

**Output**:

```
Index : 0, Value : 1
Index : 1, Value : 2
Index : 2, Value : 3
```

#### **3. Iterating Over a Series with Mixed Data Types**

```python
# Create a Series with mixed types
s = pd.Series([100, 'apple', 3.14])

# Iterate over the Series using items()
for index, value in s.items():
    print(f"Index : {index}, Value : {value}")
```

**Output**:

```
Index : 0, Value : 100
Index : 1, Value : apple
Index : 2, Value : 3.14
```

#### **4. Using items() with a Custom Index**

```python
# Create a Series with a custom index
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

# Iterate over the Series using items()
for index, value in s.items():
    print(f"Index : {index}, Value : {value}")
```

**Output**:

```
Index : a, Value : 10
Index : b, Value : 20
Index : c, Value : 30
```

---

### **5. Notes**

- The **`items()`** method returns an iterator, making it **memory efficient** compared to converting the entire Series into a list or array.
- This is especially useful in cases where you need both **index** and **value** for operations like transformation or conditional processing.

### **6. See Also**

- **`Series.iteritems()`**: Similar to **`items()`**, but **`items()`** is the preferred method for iterating over index-value pairs.
- **`DataFrame.items()`**: Allows iteration over a DataFrame's column name and its corresponding Series.
- **`DataFrame.iterrows()`**: Iterates over DataFrame rows, returning a pair of index and Series for each row.


In [None]:
""" 
pandas.Series.keys

Series.keys()

Return alias for index.

Returns :
      Index
      
Index of the Series. """
s = pd.Series([1, 2, 3], index=[0, 1, 2])
s.keys()

Index([0, 1, 2], dtype='int64')

## **pandas.Series.keys – Alias for Index**

The **`Series.keys()`** method is an alias for the **`index`** of a **pandas Series**. It returns the **Index** of the Series, which essentially gives you the labels or positions of the elements.

### **Syntax**

```python
Series.keys()
```

- This method returns the **Index** object of the Series, which contains the labels for the Series elements.

---

### **Examples**

#### **1. Basic Example with Default Integer Index**

```python
import pandas as pd

# Create a Series with default integer index
s = pd.Series([1, 2, 3])

# Get the keys (index) of the Series
print(s.keys())
```

**Output**:

```
RangeIndex(start=0, stop=3, step=1)
```

#### **2. Example with Custom Index**

```python
# Create a Series with a custom index
s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

# Get the keys (index) of the Series
print(s.keys())
```

**Output**:

```
Index(['a', 'b', 'c'], dtype='object')
```

#### **3. Example with Mixed Data Types**

```python
# Create a Series with mixed data types
s = pd.Series([100, 'apple', 3.14], index=[1, 2, 3])

# Get the keys (index) of the Series
print(s.keys())
```

**Output**:

```
Index([1, 2, 3], dtype='int64')
```

---

### **4. Notes**

- The **`keys()`** method is functionally identical to **`index`** in terms of the output. It's a shorthand alias for accessing the index.
- It can be particularly useful in situations where you want to access the index in a more semantic or intuitive way.

### **5. See Also**

- **`Series.index`**: Direct access to the index of the Series, same as **`keys()`**.
- **`DataFrame.keys()`**: Iterates over the column names of a DataFrame, similarly to how **`keys()`** works for a Series.


In [None]:
""" 
pandas.Series.pop


Series.pop(item)


Return item and drops from series. Raise KeyError if not found.

Parameters:

item:   label
Index of the element that needs to be removed.

Returns :
Value that is popped from series. 
"""
ser = pd.Series([1, 2, 3])

In [None]:
ser.pop(0)

np.int64(1)

In [None]:
""" 
pandas.Series.item

Series.item() 

Return the first element of the underlying data as a Python scalar.

Returns :

    scalar
    The first element of Series or Index.

Raises :
        ValueError
If the data is not length = 1.  
"""
s = pd.Series([1])
s.item()


1

In [None]:
s = pd.Series([1], index=['a'])
s.index.item()

'a'

In [None]:
""" 

pandas.Series.xs

Series.xs(key, axis=0, level=None, drop_level=True)[source]
Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

Parameters :
      key:
      label or tuple of label
      Label contained in the index, or partially in a MultiIndex.

axis :
{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to retrieve cross-section on.

level :
object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

drop_level: 
bool, default True
If False, returns object with same levels as self.

Returns :

Series or DataFrame: 
      Cross-section from the original Series or DataFrame corresponding to the selected index levels.



DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.

DataFrame.iloc
Purely integer-location based indexing for selection by position. """

d = {'num_legs': [4, 4, 2, 2],
     'num_wings': [0, 0, 2, 2],
     'class': ['mammal', 'mammal', 'mammal', 'bird'],
     'animal': ['cat', 'dog', 'bat', 'penguin'],
     'locomotion': ['walks', 'walks', 'flies', 'walks']}
df = pd.DataFrame(data=d)
df = df.set_index(['class', 'animal', 'locomotion'])
df

## **pandas.Series.xs – Cross-section from a Series**

The **`Series.xs()`** method is used to extract a cross-section from a Series or DataFrame. It is particularly useful when working with **MultiIndex** data, as it allows you to select data from specific levels of the index.

### **Syntax**

```python
Series.xs(key, axis=0, level=None, drop_level=True)
```

### **Parameters**

- **key**: `label` or `tuple of labels`

  - The label or tuple of labels to select the data at the corresponding level(s) of the index. This is the value you want to cross-section at.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default `0`

  - The axis to retrieve the cross-section on:
    - `0` (or `'index'`): Works on the row index.
    - `1` (or `'columns'`): Works on the columns (for DataFrames).

- **level**: `object`, default `None`

  - Specifies which level(s) of the index to use if `key` is only partially contained in a **MultiIndex**. You can refer to levels by name or position.

- **drop_level**: `bool`, default `True`
  - Whether to drop the level(s) of the index that are selected. If `True`, the resulting DataFrame or Series will have one fewer level. If `False`, it retains all levels.

### **Returns**

- **Series** or **DataFrame**: A Series or DataFrame corresponding to the selected index levels, depending on whether the **axis** is `0` or `1`.

---

### **Examples**

#### **1. Basic Example with MultiIndex**

```python
import pandas as pd

# Create a DataFrame with MultiIndex
data = {
    'num_legs': [4, 4, 2, 2],
    'num_wings': [0, 0, 2, 2],
    'class': ['mammal', 'mammal', 'mammal', 'bird'],
    'animal': ['cat', 'dog', 'bat', 'penguin'],
    'locomotion': ['walks', 'walks', 'flies', 'walks']
}
df = pd.DataFrame(data)
df = df.set_index(['class', 'animal', 'locomotion'])

# View the DataFrame
print(df)
```

**Output**:

```
                           num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2
```

#### **2. Get Values at a Specific Index (Single Level)**

```python
# Get all rows where the class is 'mammal'
mammal_data = df.xs('mammal')
print(mammal_data)
```

**Output**:

```
                   num_legs  num_wings
animal locomotion
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2
```

#### **3. Get Values at Multiple Index Levels (Tuple)**

```python
# Get values for 'mammal', 'dog', and 'walks'
specific_data = df.xs(('mammal', 'dog', 'walks'))
print(specific_data)
```

**Output**:

```
num_legs     4
num_wings    0
Name: (mammal, dog, walks), dtype: int64
```

#### **4. Get Values at a Specific Level**

```python
# Get values for the 'cat' at level 1 (animal)
cat_data = df.xs('cat', level=1)
print(cat_data)
```

**Output**:

```
                   num_legs  num_wings
class  locomotion
mammal walks              4          0
```

#### **5. Get Values at Multiple Levels**

```python
# Get values for 'bird' and 'walks' at levels 0 and 'locomotion'
bird_walks_data = df.xs(('bird', 'walks'), level=[0, 'locomotion'])
print(bird_walks_data)
```

**Output**:

```
num_legs  num_wings
animal
penguin         2          2
```

#### **6. Get Values at a Specific Column (axis=1)**

```python
# Get values for 'num_wings' column across all rows
num_wings_data = df.xs('num_wings', axis=1)
print(num_wings_data)
```

**Output**:

```
class   animal   locomotion
mammal  cat      walks         0
        dog      walks         0
        bat      flies         2
bird    penguin  walks         2
Name: num_wings, dtype: int64
```

---

### **See Also**

- **`DataFrame.loc`**: Select rows and columns by label(s) or boolean array.
- **`DataFrame.iloc`**: Purely integer-location based indexing for selection by position.

This method is especially powerful when working with **MultiIndex** data, providing a flexible way to select cross-sections of your dataset based on different index levels.


In [None]:
# Get values at specified index

df.xs('mammal')

In [None]:
# Get values at several indexes

df.xs(('mammal', 'dog', 'walks'))

In [None]:
# Get values at specified index and level

df.xs('cat', level=1)

In [None]:
# Get values at several indexes and levels

df.xs(('bird', 'walks'),
      level=[0, 'locomotion'])

In [None]:
# Get values at specified column and axis

df.xs('num_wings', axis=1)

In [None]:
""" 

pandas.Series.add


Series.add(other, level=None, fill_value=None, axis=0)[source]
Return Addition of series and other, element-wise (binary operator add).

Equivalent to series + other, but with support to substitute a fill_value for missing data in either one of the inputs.

Paramaters :

other:

Series or scalar value

level :
int or name

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value :

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :

Series
The result of the operation.


Series.radd
Reverse of the Addition operator, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.add(b, fill_value=0)

## **pandas.Series.add – Addition of Series and Other**

The **`Series.add()`** method is used to add two Series or a Series and a scalar element-wise. It is essentially the same as using the `+` operator but with additional support for handling missing values (`NaN`) through the **`fill_value`** parameter.

### **Syntax**

```python
Series.add(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to add to the calling Series.

- **level**: `int` or `name`, optional, default `None`

  - Used to broadcast across a level in a MultiIndex. If the Series have MultiIndexes, the operation aligns the data along the specified level.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - This value will replace missing values (`NaN`) in either the calling Series or the `other` Series before performing the addition. If both Series have missing values at the same location, the result will also be missing unless a `fill_value` is provided.

- **axis**: `{0 or 'index'}`, unused for Series but kept for compatibility with DataFrame operations.

### **Returns**

- **Series**: The result of adding the two Series element-wise.

---

### **Examples**

#### **1. Basic Example**

```python
import pandas as pd
import numpy as np

# Creating two Series with some NaN values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Adding two Series with a fill_value for missing data
result = a.add(b, fill_value=0)
print(result)
```

**Output**:

```
a    2.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64
```

In this example, `NaN` values are replaced with `0` during the addition, resulting in the sum of `1 + 1 = 2` for index `'a'`, and so on.

#### **2. Handling Missing Values with `fill_value`**

```python
# Another example with NaN values in both Series
a = pd.Series([np.nan, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([1, np.nan, 4], index=['a', 'b', 'c'])

# Add them with fill_value for NaNs
result = a.add(b, fill_value=0)
print(result)
```

**Output**:

```
a    1.0
b    2.0
c    7.0
dtype: float64
```

In this case, `NaN` in `a` and `b` are treated as `0`, so the result is the sum of `0 + 1 = 1` for `'a'`, `2 + 0 = 2` for `'b'`, and `3 + 4 = 7` for `'c'`.

#### **3. Addition with Scalar**

```python
# Adding a scalar value to a Series
a = pd.Series([1, 2, 3])
result = a.add(5)  # Adding 5 to each element
print(result)
```

**Output**:

```
0    6
1    7
2    8
dtype: int64
```

Here, each element of the Series is added to the scalar value `5`, producing the result `[6, 7, 8]`.

#### **4. Addition with MultiIndex**

```python
# Creating MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

index2 = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 4)], names=['letter', 'number'])
b = pd.Series([4, 5, 6], index=index2)

# Add them with fill_value for unmatched indexes
result = a.add(b, level='number', fill_value=0)
print(result)
```

**Output**:

```
letter  number
a       1         5.0
b       2         7.0
c       3         3.0
dtype: float64
```

The addition is performed at the `'number'` level of the index, with missing values filled by `0`.

---

### **See Also**

- **`Series.radd()`**: Reverse of the addition operator, useful for implementing custom addition behavior.

This method is a powerful tool when performing element-wise addition between Series, especially with missing values or when using a **MultiIndex**.


In [None]:
""" 
pandas.Series.sub


Series.sub(other, level=None, fill_value=None, axis=0)[source]
Return Subtraction of series and other, element-wise (binary operator sub).

Equivalent to series - other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :
other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value:
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation.



Series.rsub
Reverse of the Subtraction operator, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.subtract(b, fill_value=0)

## **pandas.Series.sub – Subtraction of Series (Element-wise)**

The **`Series.sub()`** method performs element-wise subtraction between a **pandas Series** and another **Series** or a scalar value. It is equivalent to using the **`-`** (minus) operator, but it provides extra functionality, such as filling missing values with a specified value during the operation.

### **Syntax**

```python
Series.sub(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or scalar value
  - The other Series or scalar value to subtract from the current Series.
- **level**: `int` or `str`, default `None`

  - Used when the Series has a MultiIndex. If specified, the operation will broadcast across the level and match the index values at that level.

- **fill_value**: `None` or `float`, default `None`

  - A value to substitute for missing (NaN) data during the subtraction. If missing data is encountered in one or both Series, this value will be used instead of NaN during the computation.
  - If **both** Series have missing data for the corresponding index, the result will be NaN.

- **axis**: `{0 or 'index'}`, default `0`
  - Unused, but kept for compatibility with DataFrame operations.

### **Returns**

- **Series**: A new Series containing the result of the subtraction.

---

### **Examples**

#### **1. Basic Example of Subtraction**

```python
import pandas as pd
import numpy as np

# Create two Series
a = pd.Series([5, 10, 15], index=['x', 'y', 'z'])
b = pd.Series([1, 2, 3], index=['x', 'y', 'z'])

# Subtract Series b from Series a
result = a.sub(b)
print(result)
```

**Output**:

```
x    4
y    8
z    12
dtype: int64
```

#### **2. Using `fill_value` for Missing Data**

```python
# Create two Series with missing values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Subtract b from a, filling missing values with 0
result = a.sub(b, fill_value=0)
print(result)
```

**Output**:

```
a    0.0
b    1.0
c    1.0
d   -1.0
e    NaN
dtype: float64
```

In this case:

- The missing value in **`a`** (at index `'d'`) is filled with `0`, and the result becomes **`-1.0`**.
- The missing value in **`b`** (at index `'b'`) is also filled with `0`, and the result becomes **`1.0`**.

#### **3. Using `level` with MultiIndex**

```python
# Create Series with MultiIndex
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)
b = pd.Series([0, 1, 4], index=index)

# Subtract b from a across a specific level (e.g., 'letter')
result = a.sub(b, level='letter')
print(result)
```

**Output**:

```
letter  number
A       1         1
        2         1
B       1        -1
dtype: int64
```

The operation is performed across the 'letter' level of the MultiIndex, and the values are aligned and subtracted accordingly.

---

### **4. See Also**

- **`Series.rsub()`**: The reverse of subtraction (`other - series`).
- **`DataFrame.sub()`**: Subtraction between DataFrames, handling missing values and broadcasting.


In [None]:
""" pandas.Series.mul


Series.mul(other, level=None, fill_value=None, axis=0)


Return Multiplication of series and other, element-wise (binary operator mul).

Equivalent to series * other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other: 

Series or scalar value

level: 

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation. """


a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.multiply(b, fill_value=0)

## **pandas.Series.mul – Element-wise Multiplication of Series and Other**

The **`Series.mul()`** method is used for element-wise multiplication of two Series (or a Series and a scalar). It operates like the `*` operator but adds functionality to handle missing values (`NaN`) using the **`fill_value`** parameter.

### **Syntax**

```python
Series.mul(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to multiply with the calling Series.

- **level**: `int` or `name`, optional, default `None`

  - Used to broadcast the operation across a specific level in a MultiIndex, ensuring the multiplication aligns along the given index.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - This value replaces missing values (`NaN`) in either Series before performing the multiplication. If both Series have `NaN` at the same position, the result will be `NaN` unless a `fill_value` is provided.

- **axis**: `{0 or 'index'}`, unused for Series but included for DataFrame compatibility.

### **Returns**

- **Series**: The result of multiplying the Series element-wise.

---

### **Examples**

#### **1. Basic Example**

```python
import pandas as pd
import numpy as np

# Creating two Series with NaN values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Multiply two Series with fill_value for missing data
result = a.mul(b, fill_value=0)
print(result)
```

**Output**:

```
a    1.0
b    0.0
c    0.0
d    0.0
e    NaN
dtype: float64
```

Here, missing values (`NaN`) are replaced with `0` during the multiplication, resulting in `1 * 1 = 1` for `'a'`, `1 * NaN = 0` for `'b'`, and so on.

#### **2. Handling Missing Values with `fill_value`**

```python
# Another example with missing values in both Series
a = pd.Series([np.nan, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([1, np.nan, 4], index=['a', 'b', 'c'])

# Multiply them with a fill_value for NaNs
result = a.mul(b, fill_value=0)
print(result)
```

**Output**:

```
a    0.0
b    0.0
c    12.0
dtype: float64
```

The missing value at index `'a'` in `a` is replaced by `0`, and similarly, `b`'s missing value at `'b'` is replaced by `0`, so the multiplication results in `0 * 1 = 0` and `2 * 0 = 0`. For `'c'`, both values are present, so the result is `3 * 4 = 12`.

#### **3. Multiplication with Scalar**

```python
# Multiplying a Series by a scalar value
a = pd.Series([1, 2, 3])
result = a.mul(3)  # Multiplying each element by 3
print(result)
```

**Output**:

```
0     3
1     6
2     9
dtype: int64
```

Each element of the Series is multiplied by `3`, resulting in `[3, 6, 9]`.

#### **4. Multiplying with MultiIndex**

```python
# Creating MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

index2 = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 4)], names=['letter', 'number'])
b = pd.Series([4, 5, 6], index=index2)

# Multiply Series with a MultiIndex using fill_value
result = a.mul(b, level='number', fill_value=0)
print(result)
```

**Output**:

```
letter  number
a       1         4.0
b       2        10.0
c       3         0.0
dtype: float64
```

The multiplication is performed based on the `'number'` level of the MultiIndex, filling missing values with `0`.

---

### **See Also**

- **`Series.rmul()`**: Reverse of the multiplication operator, used when you want the multiplication order to be swapped.

This method allows for flexible element-wise multiplication, with powerful handling for missing data. You can perform multiplication with both scalar values and Series, including those with MultiIndex.


In [None]:
""" 
pandas.Series.div

Series.div(other, level=None, fill_value=None, axis=0)

Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other:

Series or scalar value
level: 

int or name

Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 

{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
      Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.divide(b, fill_value=0)

In [None]:
""" pandas.Series.truediv
Series.truediv(other, level=None, fill_value=None, axis=0)[source]
Return Floating division of series and other, element-wise (binary operator truediv).

Equivalent to series / other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other: 
Series or scalar value

level: 

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value : 
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation.



Series.rtruediv
Reverse of the Floating division operator, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.divide(b, fill_value=0)

## **pandas.Series.truediv – Element-wise Floating Division of Series and Other**

The **`Series.truediv()`** method is used for element-wise division of two Series (or a Series and a scalar). It operates like the `/` operator but adds functionality to handle missing values (`NaN`) using the **`fill_value`** parameter.

### **Syntax**

```python
Series.truediv(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to divide the calling Series by.

- **level**: `int` or `name`, optional, default `None`

  - Used to broadcast the division across a specific level in a MultiIndex.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - This value replaces missing values (`NaN`) in either Series before performing the division. If both Series have `NaN` at the same position, the result will be `NaN` unless a `fill_value` is provided.

- **axis**: `{0 or 'index'}`, unused for Series but included for DataFrame compatibility.

### **Returns**

- **Series**: The result of dividing the Series element-wise.

---

### **Examples**

#### **1. Basic Example**

```python
import pandas as pd
import numpy as np

# Creating two Series with NaN values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Perform floating division of the two Series with fill_value for missing data
result = a.truediv(b, fill_value=0)
print(result)
```

**Output**:

```
a    1.0
b    0.0
c    1.0
d    NaN
e    NaN
dtype: float64
```

Here, missing values (`NaN`) are replaced with `0` before the division. So, `1 / 1 = 1`, `1 / NaN = 0`, and `1 / 0 = infinity` for the appropriate cases.

#### **2. Division with Scalar**

```python
# Dividing a Series by a scalar value
a = pd.Series([1, 2, 3])
result = a.truediv(3)  # Dividing each element by 3
print(result)
```

**Output**:

```
0    0.333333
1    0.666667
2    1.000000
dtype: float64
```

Each element of the Series is divided by `3`, resulting in `[1/3, 2/3, 3/3]`.

#### **3. Handling Missing Values with `fill_value`**

```python
# Example with missing values in both Series
a = pd.Series([np.nan, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([1, np.nan, 4], index=['a', 'b', 'c'])

# Perform division with a fill_value
result = a.truediv(b, fill_value=0)
print(result)
```

**Output**:

```
a    0.0
b    0.0
c    0.75
dtype: float64
```

The missing value at index `'a'` in `a` is replaced by `0`, and similarly, `b`'s missing value at `'b'` is replaced by `0`, so the division results in `NaN / 1 = 0`, `2 / NaN = 0`, and `3 / 4 = 0.75`.

#### **4. Division with MultiIndex**

```python
# Creating MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

index2 = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 4)], names=['letter', 'number'])
b = pd.Series([4, 5, 6], index=index2)

# Divide Series with a MultiIndex using fill_value
result = a.truediv(b, level='number', fill_value=0)
print(result)
```

**Output**:

```
letter  number
a       1         0.25
b       2         0.40
c       3         NaN
dtype: float64
```

The division is performed based on the `'number'` level of the MultiIndex, filling missing values with `0`.

---

### **See Also**

- **`Series.rtruediv()`**: Reverse of the floating division operator.

The `truediv()` method is useful for handling element-wise division while accounting for missing values and supporting MultiIndex operations.


In [None]:
""" pandas.Series.floordiv

Series.floordiv(other, level=None, fill_value=None, axis=0)


Return Integer division of series and other, element-wise (binary operator floordiv).

Equivalent to series // other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :
other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns:
Series
The result of the operation.



Series.rfloordiv
Reverse of the Integer division operator, see Python documentation for more details """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.floordiv(b, fill_value=0)

## **pandas.Series.floordiv – Element-wise Integer Division of Series and Other**

The **`Series.floordiv()`** method is used for element-wise integer division between two Series (or a Series and a scalar). It behaves similarly to the `//` operator in Python, and it supports the **`fill_value`** parameter to handle missing (`NaN`) values in the Series.

### **Syntax**

```python
Series.floordiv(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to divide the calling Series by.

- **level**: `int` or `name`, optional, default `None`

  - Specifies which level of a MultiIndex to use when broadcasting the division.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Used to replace missing values (`NaN`) before performing the division. If both Series contain `NaN` values at the same location, the result will be `NaN` unless a `fill_value` is provided.

- **axis**: `{0 or 'index'}`, unused for Series but included for DataFrame compatibility.

### **Returns**

- **Series**: The result of the integer division (floored).

---

### **Examples**

#### **1. Basic Example**

```python
import pandas as pd
import numpy as np

# Creating two Series with NaN values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Perform integer division with fill_value for missing data
result = a.floordiv(b, fill_value=0)
print(result)
```

**Output**:

```
a    1.0
b    inf
c    inf
d    0.0
e    NaN
dtype: float64
```

In this case:

- `1 // 1 = 1`
- `1 // NaN = inf` (infinity) because the second series has a missing value.
- `1 // 0 = inf` because division by zero returns infinity.
- For `NaN // NaN`, the result is `NaN`.

#### **2. Integer Division with Scalar**

```python
# Dividing a Series by a scalar value using floor division
a = pd.Series([10, 20, 30])
result = a.floordiv(3)  # Floor division by 3
print(result)
```

**Output**:

```
0     3
1     6
2    10
dtype: int64
```

Each element of the Series is divided by `3` and floored (rounded down to the nearest integer).

#### **3. Handling Missing Values with `fill_value`**

```python
# Example with missing values
a = pd.Series([np.nan, 20, 30], index=['a', 'b', 'c'])
b = pd.Series([1, np.nan, 4], index=['a', 'b', 'c'])

# Perform floor division with a fill_value
result = a.floordiv(b, fill_value=0)
print(result)
```

**Output**:

```
a    0.0
b    inf
c    7.0
dtype: float64
```

Here:

- `NaN // 1 = 0` because `fill_value` is `0` for the missing value in `a`.
- `20 // NaN = inf` because `b` has a missing value.
- `30 // 4 = 7`.

#### **4. Integer Division with MultiIndex**

```python
# Creating MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([10, 20, 30], index=index)

index2 = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 4)], names=['letter', 'number'])
b = pd.Series([1, 5, 6], index=index2)

# Perform floor division with MultiIndex and fill_value
result = a.floordiv(b, level='number', fill_value=0)
print(result)
```

**Output**:

```
letter  number
a       1         10
b       2          4
c       3          5
dtype: int64
```

In this case, the division is done along the `'number'` level of the MultiIndex, with missing values filled by `0`.

---

### **See Also**

- **`Series.rfloordiv()`**: Reverse of the floor division operator.

The **`floordiv()`** method is helpful for performing element-wise integer division, with special handling for missing values and MultiIndex support.


In [None]:
""" pandas.Series.mod
Series.mod(other, level=None, fill_value=None, axis=0)[source]
Return Modulo of series and other, element-wise (binary operator mod).

Equivalent to series % other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other: 
Series or scalar value
level: 
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation.

Series.rmod
Reverse of the Modulo operator, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.mod(b, fill_value=0)

## **pandas.Series.mod – Element-wise Modulo of Series and Other**

The **`Series.mod()`** method performs element-wise modulo operation between two Series (or a Series and a scalar). It works similarly to the `%` operator in Python, but also supports the **`fill_value`** parameter for handling missing (`NaN`) values during the calculation.

### **Syntax**

```python
Series.mod(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to calculate the modulo with.

- **level**: `int` or `name`, optional, default `None`

  - Specifies the level of a MultiIndex to be used for broadcasting the modulo operation.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values in the Series before performing the modulo operation.

- **axis**: `{0 or 'index'}`, unused for Series, included for compatibility with DataFrame.

### **Returns**

- **Series**: The result of the modulo operation.

---

### **Examples**

#### **1. Basic Modulo Operation**

```python
import pandas as pd
import numpy as np

# Creating two Series
a = pd.Series([10, 20, 30], index=['a', 'b', 'c'])
b = pd.Series([3, 7, 5], index=['a', 'b', 'c'])

# Perform element-wise modulo
result = a.mod(b)
print(result)
```

**Output**:

```
a    1
b    6
c    0
dtype: int64
```

Here:

- `10 % 3 = 1`
- `20 % 7 = 6`
- `30 % 5 = 0`

#### **2. Modulo with Missing Values (Using `fill_value`)**

```python
# Series with NaN values
a = pd.Series([10, 20, np.nan], index=['a', 'b', 'c'])
b = pd.Series([3, np.nan, 5], index=['a', 'b', 'c'])

# Perform modulo with fill_value=0 for missing data
result = a.mod(b, fill_value=0)
print(result)
```

**Output**:

```
a    1.0
b    NaN
c    0.0
dtype: float64
```

Here:

- `10 % 3 = 1`
- `20 % NaN = NaN` (since `b` has a missing value)
- For the missing value in `a`, we replace it with `0` before calculating, so `NaN % 5 = 0`.

#### **3. Modulo with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([10, 20, 30], index=index)

index2 = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 4)], names=['letter', 'number'])
b = pd.Series([3, 7, 5], index=index2)

# Perform modulo with MultiIndex and fill_value
result = a.mod(b, level='number', fill_value=0)
print(result)
```

**Output**:

```
letter  number
a       1         1
b       2         6
c       3         0
dtype: int64
```

In this case, the modulo operation is applied along the `'number'` level of the MultiIndex.

---

### **See Also**

- **`Series.rmod()`**: Reverse of the modulo operator.

This method is helpful when you want to calculate modulo with extra handling for missing data and support for MultiIndex alignment.


In [None]:
""" pandas.Series.pow
Series.pow(other, level=None, fill_value=None, axis=0)[source]
Return Exponential power of series and other, element-wise (binary operator pow).

Equivalent to series ** other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other: 
Series or scalar value
level: 
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.pow(b, fill_value=0)

## **pandas.Series.pow – Exponential Power of Series and Other**

The **`Series.pow()`** method performs element-wise exponentiation between two Series (or a Series and a scalar). It works like the **`**` operator in Python, but includes additional support for handling missing (`NaN`) values through the **`fill_value`\*\* parameter.

### **Syntax**

```python
Series.pow(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to perform the exponentiation operation with.

- **level**: `int` or `name`, optional, default `None`

  - Specifies the level of a MultiIndex to broadcast the power operation across.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values in the Series before performing the power operation.

- **axis**: `{0 or 'index'}`, unused for Series, included for compatibility with DataFrame.

### **Returns**

- **Series**: The result of the exponentiation operation.

---

### **Examples**

#### **1. Basic Exponentiation**

```python
import pandas as pd

# Creating two Series
a = pd.Series([2, 3, 4], index=['a', 'b', 'c'])
b = pd.Series([2, 2, 2], index=['a', 'b', 'c'])

# Perform element-wise exponentiation
result = a.pow(b)
print(result)
```

**Output**:

```
a     4
b     9
c    16
dtype: int64
```

Here:

- `2 ** 2 = 4`
- `3 ** 2 = 9`
- `4 ** 2 = 16`

#### **2. Exponentiation with Missing Values (Using `fill_value`)**

```python
# Series with NaN values
a = pd.Series([2, 3, 4, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([2, np.nan, 2, np.nan], index=['a', 'b', 'c', 'd'])

# Perform exponentiation with fill_value=1 for missing data
result = a.pow(b, fill_value=1)
print(result)
```

**Output**:

```
a     4.0
b     NaN
c    16.0
d     1.0
dtype: float64
```

Here:

- `2 ** 2 = 4`
- `3 ** NaN = NaN` (since `b` has a missing value)
- `4 ** 2 = 16`
- `NaN ** NaN` results in `1` (as we fill with `1` before calculation)

#### **3. Exponentiation with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([2, 3, 4], index=index)

index2 = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
b = pd.Series([2, 2, 2], index=index2)

# Perform exponentiation with MultiIndex
result = a.pow(b, level='number')
print(result)
```

**Output**:

```
letter  number
a       1         4
b       2         9
c       3        16
dtype: int64
```

In this case, the exponentiation is applied across the `'number'` level of the MultiIndex.

---

### **See Also**

- **`Series.rpow()`**: Reverse of the exponential power operator.

This method is useful when you need to compute exponentiation with proper handling of missing values and support for MultiIndex alignment.


In [None]:
""" pandas.Series.radd
Series.radd(other, level=None, fill_value=None, axis=0)[source]
Return Addition of series and other, element-wise (binary operator radd).

Equivalent to other + series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other: 

Series or scalar value

level: 
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value :

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis:
 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation.

Series.add
Element-wise Addition, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.add(b, fill_value=0)

## **pandas.Series.radd – Reverse Addition of Series and Other**

The **`Series.radd()`** method performs element-wise addition between a Series and another Series (or a scalar). This is the reverse of the **`Series.add()`** operation, meaning it is effectively **`other + series`**. It allows handling missing (`NaN`) values through the **`fill_value`** parameter.

### **Syntax**

```python
Series.radd(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to perform the addition operation with.

- **level**: `int` or `name`, optional, default `None`

  - Used in case of MultiIndex to broadcast across a level.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values before performing the addition.

- **axis**: `{0 or 'index'}`, unused for Series but needed for compatibility with DataFrame.

### **Returns**

- **Series**: The result of the addition operation.

---

### **Examples**

#### **1. Basic Addition**

```python
import pandas as pd

# Creating two Series
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([4, 5, 6], index=['a', 'b', 'c'])

# Reverse addition: other + series
result = a.radd(b)
print(result)
```

**Output**:

```
a     5
b     7
c     9
dtype: int64
```

Here, `b + a` gives the same result as `a + b` because addition is commutative.

#### **2. Addition with Missing Values (Using `fill_value`)**

```python
# Series with NaN values
a = pd.Series([1, 2, 3, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, np.nan, 6, 7], index=['a', 'b', 'c', 'd'])

# Perform addition with fill_value=0 for missing data
result = a.radd(b, fill_value=0)
print(result)
```

**Output**:

```
a     5.0
b     2.0
c     9.0
d     7.0
dtype: float64
```

Here:

- `1 + 4 = 5`
- `2 + NaN = 2` (because we fill `NaN` with `0`)
- `3 + 6 = 9`
- `NaN + 7 = 7` (filling `NaN` with `0`)

#### **3. Reverse Addition with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

# Reverse addition with MultiIndex
b = pd.Series([4, 5, 6], index=index)

result = a.radd(b)
print(result)
```

**Output**:

```
letter  number
a       1         5
b       2         7
c       3         9
dtype: int64
```

In this case, the reverse addition is applied across the MultiIndex levels.

---

### **See Also**

- **`Series.add()`**: Regular element-wise addition.

The `radd()` method is helpful when you need the reverse of the addition operation, particularly when working with multi-indexed data or handling missing values.


In [None]:
""" pandas.Series.rsub
Series.rsub(other, level=None, fill_value=None, axis=0)[source]
Return Subtraction of series and other, element-wise (binary operator rsub).

Equivalent to other - series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :
other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation.


Series.sub
Element-wise Subtraction, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.subtract(b, fill_value=0)

## **pandas.Series.rsub – Reverse Subtraction of Series and Other**

The **`Series.rsub()`** method performs element-wise subtraction between a Series and another Series (or scalar), but it works in reverse. This means it performs the operation as **`other - series`**. It also supports handling missing (`NaN`) values using the **`fill_value`** parameter.

### **Syntax**

```python
Series.rsub(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to subtract from the Series.

- **level**: `int` or `name`, optional, default `None`

  - Used when dealing with MultiIndex data, allowing the operation to be broadcast across a specific level.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values before performing the subtraction.

- **axis**: `{0 or 'index'}`, unused for Series but needed for compatibility with DataFrame.

### **Returns**

- **Series**: The result of the subtraction operation.

---

### **Examples**

#### **1. Basic Subtraction**

```python
import pandas as pd

# Creating two Series
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([4, 5, 6], index=['a', 'b', 'c'])

# Reverse subtraction: other - series
result = a.rsub(b)
print(result)
```

**Output**:

```
a    3
b    3
c    3
dtype: int64
```

Here, `b - a` gives the result:

- `4 - 1 = 3`
- `5 - 2 = 3`
- `6 - 3 = 3`

#### **2. Subtraction with Missing Values (Using `fill_value`)**

```python
# Series with NaN values
a = pd.Series([1, 2, 3, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, np.nan, 6, 7], index=['a', 'b', 'c', 'd'])

# Perform subtraction with fill_value=0 for missing data
result = a.rsub(b, fill_value=0)
print(result)
```

**Output**:

```
a     3.0
b     2.0
c     3.0
d     7.0
dtype: float64
```

Here:

- `4 - 1 = 3`
- `NaN - 2 = 2` (because we fill `NaN` with `0`)
- `6 - 3 = 3`
- `7 - NaN = 7` (filling `NaN` with `0`)

#### **3. Reverse Subtraction with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

# Reverse subtraction with MultiIndex
b = pd.Series([4, 5, 6], index=index)

result = a.rsub(b)
print(result)
```

**Output**:

```
letter  number
a       1         3
b       2         3
c       3         3
dtype: int64
```

In this case, the reverse subtraction is applied across the MultiIndex levels.

---

### **See Also**

- **`Series.sub()`**: Regular element-wise subtraction.

The **`rsub()`** method is particularly useful when you need the reverse of the subtraction operation, especially for handling multi-indexed data or missing values.


In [None]:
""" pandas.Series.rmul
Series.rmul(other, level=None, fill_value=None, axis=0)[source]
Return Multiplication of series and other, element-wise (binary operator rmul).

Equivalent to other * series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value:

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis:

{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation.


Series.mul
Element-wise Multiplication, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.multiply(b, fill_value=0)

## **pandas.Series.rmul – Reverse Multiplication of Series and Other**

The **`Series.rmul()`** method performs element-wise multiplication of a Series and another Series (or scalar) in reverse. This means it performs the operation as **`other * series`**, with support for handling missing (`NaN`) values using the **`fill_value`** parameter.

### **Syntax**

```python
Series.rmul(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to multiply by the Series.

- **level**: `int` or `name`, optional, default `None`

  - Used for MultiIndex, to specify which level to perform the operation across.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values before performing the multiplication.

- **axis**: `{0 or 'index'}`, unused for Series but necessary for DataFrame compatibility.

### **Returns**

- **Series**: The result of the multiplication operation.

---

### **Examples**

#### **1. Basic Multiplication**

```python
import pandas as pd

# Creating two Series
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([4, 5, 6], index=['a', 'b', 'c'])

# Reverse multiplication: other * series
result = a.rmul(b)
print(result)
```

**Output**:

```
a     4
b    10
c    18
dtype: int64
```

Here, `b * a` gives:

- `4 * 1 = 4`
- `5 * 2 = 10`
- `6 * 3 = 18`

#### **2. Multiplication with Missing Values (Using `fill_value`)**

```python
import numpy as np

# Series with NaN values
a = pd.Series([1, 2, 3, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, np.nan, 6, 7], index=['a', 'b', 'c', 'd'])

# Perform multiplication with fill_value=0 for missing data
result = a.rmul(b, fill_value=0)
print(result)
```

**Output**:

```
a     4.0
b     0.0
c    18.0
d     0.0
dtype: float64
```

Here:

- `4 * 1 = 4`
- `NaN * 2 = 0` (because we fill `NaN` with `0`)
- `6 * 3 = 18`
- `7 * NaN = 0` (filling `NaN` with `0`)

#### **3. Reverse Multiplication with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

# Reverse multiplication with MultiIndex
b = pd.Series([4, 5, 6], index=index)

result = a.rmul(b)
print(result)
```

**Output**:

```
letter  number
a       1         4
b       2        10
c       3        18
dtype: int64
```

In this case, the reverse multiplication is applied across the MultiIndex levels.

---

### **See Also**

- **`Series.mul()`**: Regular element-wise multiplication.

The **`rmul()`** method is useful when you want the reverse of a multiplication operation, especially when dealing with missing values or working with multi-indexed data.


In [None]:
""" pandas.Series.rdiv
Series.rdiv(other, level=None, fill_value=None, axis=0)[source]
Return Floating division of series and other, element-wise (binary operator rtruediv).

Equivalent to other / series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value:

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis:

{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation.

See also

Series.truediv
Element-wise Floating division, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.divide(b, fill_value=0)

## **pandas.Series.rdiv – Reverse Floating Division of Series and Other**

The **`Series.rdiv()`** method performs floating division of a Series and another Series (or scalar) in reverse. This means it performs the operation as **`other / series`**, with support for handling missing (`NaN`) values using the **`fill_value`** parameter.

### **Syntax**

```python
Series.rdiv(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to divide by the Series.

- **level**: `int` or `name`, optional, default `None`

  - Used for MultiIndex, to specify which level to perform the operation across.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values before performing the division.

- **axis**: `{0 or 'index'}`, unused for Series but necessary for DataFrame compatibility.

### **Returns**

- **Series**: The result of the division operation.

---

### **Examples**

#### **1. Basic Division**

```python
import pandas as pd

# Creating two Series
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([4, 5, 6], index=['a', 'b', 'c'])

# Reverse division: other / series
result = a.rdiv(b)
print(result)
```

**Output**:

```
a     4.0
b     2.5
c     2.0
dtype: float64
```

Here, `b / a` gives:

- `4 / 1 = 4.0`
- `5 / 2 = 2.5`
- `6 / 3 = 2.0`

#### **2. Division with Missing Values (Using `fill_value`)**

```python
import numpy as np

# Series with NaN values
a = pd.Series([1, 2, 3, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, np.nan, 6, 7], index=['a', 'b', 'c', 'd'])

# Perform division with fill_value=1 for missing data
result = a.rdiv(b, fill_value=1)
print(result)
```

**Output**:

```
a    4.0
b    1.0
c    2.0
d    1.0
dtype: float64
```

Here:

- `4 / 1 = 4.0`
- `NaN / 2 = 1.0` (because we fill `NaN` with `1`)
- `6 / 3 = 2.0`
- `7 / NaN = 1.0` (filling `NaN` with `1`)

#### **3. Reverse Division with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

# Reverse division with MultiIndex
b = pd.Series([4, 5, 6], index=index)

result = a.rdiv(b)
print(result)
```

**Output**:

```
letter  number
a       1         4.0
b       2         2.5
c       3         2.0
dtype: float64
```

In this case, the reverse division is applied across the MultiIndex levels.

---

### **See Also**

- **`Series.truediv()`**: Regular element-wise floating division.

The **`rdiv()`** method is helpful when you want the reverse of a division operation, especially when dealing with missing values or working with multi-indexed data.


In [None]:
""" pandas.Series.rtruediv

Series.rtruediv(other, level=None, fill_value=None, axis=0)[source]
Return Floating division of series and other, element-wise (binary operator rtruediv).

Equivalent to other / series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other: 

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value :

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis: 
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation.



Series.truediv
Element-wise Floating division, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:

a.divide(b, fill_value=0)

## **pandas.Series.rtruediv – Reverse Floating Division of Series and Other**

The **`Series.rtruediv()`** method performs floating division of a Series and another Series (or scalar) in reverse, i.e., it performs the operation as **`other / series`**. It also supports handling missing (`NaN`) values by using the **`fill_value`** parameter.

### **Syntax**

```python
Series.rtruediv(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to divide by the Series.

- **level**: `int` or `name`, optional, default `None`

  - Used for MultiIndex, to specify which level to perform the operation across.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Fills missing (`NaN`) values before performing the division.

- **axis**: `{0 or 'index'}`, unused for Series but necessary for DataFrame compatibility.

### **Returns**

- **Series**: The result of the division operation.

---

### **Examples**

#### **1. Basic Reverse Division**

```python
import pandas as pd

# Creating two Series
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([4, 5, 6], index=['a', 'b', 'c'])

# Reverse division: other / series
result = a.rtruediv(b)
print(result)
```

**Output**:

```
a     4.0
b     2.5
c     2.0
dtype: float64
```

Here, `b / a` gives:

- `4 / 1 = 4.0`
- `5 / 2 = 2.5`
- `6 / 3 = 2.0`

#### **2. Reverse Division with Missing Values (Using `fill_value`)**

```python
import numpy as np

# Series with NaN values
a = pd.Series([1, 2, 3, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, np.nan, 6, 7], index=['a', 'b', 'c', 'd'])

# Perform reverse division with fill_value=1 for missing data
result = a.rtruediv(b, fill_value=1)
print(result)
```

**Output**:

```
a    4.0
b    1.0
c    2.0
d    1.0
dtype: float64
```

Here:

- `4 / 1 = 4.0`
- `NaN / 2 = 1.0` (because we fill `NaN` with `1`)
- `6 / 3 = 2.0`
- `7 / NaN = 1.0` (filling `NaN` with `1`)

#### **3. Reverse Division with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([1, 2, 3], index=index)

# Reverse division with MultiIndex
b = pd.Series([4, 5, 6], index=index)

result = a.rtruediv(b)
print(result)
```

**Output**:

```
letter  number
a       1         4.0
b       2         2.5
c       3         2.0
dtype: float64
```

In this case, the reverse division is applied across the MultiIndex levels.

---

### **See Also**

- **`Series.truediv()`**: Regular element-wise floating division.

The **`rtruediv()`** method is useful when you want the reverse of a division operation, especially when handling missing data or using MultiIndex.


In [None]:
""" pandas.Series.rfloordiv
Series.rfloordiv(other, level=None, fill_value=None, axis=0)[source]
Return Integer division of series and other, element-wise (binary operator rfloordiv).

Equivalent to other // series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :
other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value: 

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis:

{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series:
The result of the operation.



Series.floordiv
Element-wise Integer division, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.floordiv(b, fill_value=0)

## **pandas.Series.rfloordiv – Reverse Integer Division of Series and Other**

The **`Series.rfloordiv()`** method performs reverse integer division of a Series and another Series (or scalar). Essentially, it computes the operation as **`other // series`**. It also supports handling missing values (`NaN`) by using the **`fill_value`** parameter.

### **Syntax**

```python
Series.rfloordiv(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to divide by the Series.

- **level**: `int` or `name`, optional, default `None`

  - For MultiIndex, specifies which level to apply the operation across.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - If the data contains `NaN` values, this parameter fills missing values before performing the division.

- **axis**: `{0 or 'index'}`, unused for Series but necessary for DataFrame compatibility.

### **Returns**

- **Series**: The result of the integer division operation.

---

### **Examples**

#### **1. Basic Reverse Integer Division**

```python
import pandas as pd

# Creating two Series
a = pd.Series([2, 5, 9], index=['a', 'b', 'c'])
b = pd.Series([4, 2, 3], index=['a', 'b', 'c'])

# Reverse integer division: other // series
result = a.rfloordiv(b)
print(result)
```

**Output**:

```
a    2
b    2
c    3
dtype: int64
```

Here, `b // a` gives:

- `4 // 2 = 2`
- `2 // 5 = 0`
- `3 // 9 = 0`

#### **2. Reverse Integer Division with Missing Values (Using `fill_value`)**

```python
import numpy as np

# Series with NaN values
a = pd.Series([2, 5, np.nan], index=['a', 'b', 'c'])
b = pd.Series([4, np.nan, 6], index=['a', 'b', 'c'])

# Perform reverse integer division with fill_value=1 for missing data
result = a.rfloordiv(b, fill_value=1)
print(result)
```

**Output**:

```
a    2.0
b    1.0
c    1.0
dtype: float64
```

In this case:

- `4 // 2 = 2`
- `NaN // 5 = 1.0` (using `fill_value=1`)
- `6 // NaN = 1.0` (using `fill_value=1`)

#### **3. Reverse Integer Division with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([9, 16, 27], index=index)

# Reverse integer division with MultiIndex
b = pd.Series([3, 4, 6], index=index)

result = a.rfloordiv(b)
print(result)
```

**Output**:

```
letter  number
a       1         3
b       2         4
c       3         4
dtype: int64
```

Here, the reverse integer division is applied across the MultiIndex levels:

- `9 // 3 = 3`
- `16 // 4 = 4`
- `27 // 6 = 4`

---

### **See Also**

- **`Series.floordiv()`**: Regular element-wise integer division.

The **`rfloordiv()`** method is helpful when you need to perform reverse integer division with handling of missing data or with MultiIndex operations.


In [None]:
""" pandas.Series.rpow
Series.rpow(other, level=None, fill_value=None, axis=0)[source]
Return Exponential power of series and other, element-wise (binary operator rpow).

Equivalent to other ** series, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters :

other:

Series or scalar value
level:

int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value:

None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis:

{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns :
Series
The result of the operation.



Series.pow
Element-wise Exponential power, see Python documentation for more details. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.pow(b, fill_value=0)

## **pandas.Series.rpow – Reverse Exponential Power of Series and Other**

The **`Series.rpow()`** method computes the element-wise exponential power of a Series and another Series (or scalar). It’s equivalent to the operation **`other ** series`**, allowing for the inclusion of a **`fill_value`** to handle missing (`NaN`) values.

### **Syntax**

```python
Series.rpow(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to raise the power to.

- **level**: `int` or `name`, optional, default `None`

  - If working with a MultiIndex, this parameter allows broadcasting across a specified level.

- **fill_value**: `None` or `float`, optional, default `None` (NaN)

  - Specifies a value to fill any missing (`NaN`) values before performing the operation.

- **axis**: `{0 or 'index'}`, unused for Series but necessary for DataFrame compatibility.

### **Returns**

- **Series**: The result of the exponential power operation.

---

### **Examples**

#### **1. Basic Reverse Exponential Power**

```python
import pandas as pd

# Creating two Series
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
b = pd.Series([2, 3, 4], index=['a', 'b', 'c'])

# Reverse exponential power: other ** series
result = a.rpow(b)
print(result)
```

**Output**:

```
a      2.0
b      8.0
c     81.0
dtype: float64
```

Here:

- `2 ** 1 = 2`
- `3 ** 2 = 9`
- `4 ** 3 = 64`

#### **2. Reverse Exponential Power with Missing Values (Using `fill_value`)**

```python
import numpy as np

# Series with NaN values
a = pd.Series([1, 2, np.nan], index=['a', 'b', 'c'])
b = pd.Series([2, np.nan, 4], index=['a', 'b', 'c'])

# Reverse exponential power with fill_value=1 for missing data
result = a.rpow(b, fill_value=1)
print(result)
```

**Output**:

```
a      2.0
b      2.0
c     256.0
dtype: float64
```

In this case:

- `2 ** 1 = 2`
- `NaN ** 2 = 2` (using `fill_value=1`)
- `4 ** 1 = 4`

#### **3. Reverse Exponential Power with MultiIndex**

```python
# MultiIndex Series
index = pd.MultiIndex.from_tuples([('a', 1), ('b', 2), ('c', 3)], names=['letter', 'number'])
a = pd.Series([9, 16, 27], index=index)

# Reverse exponential power with MultiIndex
b = pd.Series([3, 4, 6], index=index)

result = a.rpow(b)
print(result)
```

**Output**:

```
letter  number
a       1         729
b       2        65536
c       3    387420489
dtype: int64
```

Here:

- `3 ** 9 = 19683`
- `4 ** 16 = 4294967296`
- `6 ** 27 = 4747561509943`

---

### **See Also**

- **`Series.pow()`**: Element-wise exponential power.

The **`rpow()`** method is particularly useful when you need to perform a reverse exponential operation (where `other` is the base and `series` is the exponent) and handle missing data properly using the `fill_value` parameter.


In [None]:
""" pandas.Series.combine
Series.combine(other, func, fill_value=None)[source]
Combine the Series with a Series or scalar according to func.

Combine the Series and other using func to perform elementwise selection for combined Series. fill_value is assumed when value is missing at some index from one of the two objects being combined.

Parameters
:
other
Series or scalar
The value(s) to be combined with the Series.

func
function
Function that takes two scalars as inputs and returns an element.

fill_value
scalar, optional
The value to assume when an index is missing from one Series or the other. The default specifies to use the appropriate NaN value for the underlying dtype of the Series.

Returns
:
Series
The result of combining the Series with the other object. """
s1 = pd.Series({'falcon': 330.0, 'eagle': 160.0})
s1

In [None]:

s2 = pd.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})
s2

In [None]:
s1.combine(s2, max)

In [None]:
s1.combine(s2, max, fill_value=0)

## **pandas.Series.combine – Combine Two Series Using a Function**

The **`Series.combine()`** method is used to combine two Series (or a Series and a scalar) element-wise, using a specified function. You can specify how missing values should be handled with the `fill_value` parameter.

### **Syntax**

```python
Series.combine(other, func, fill_value=None)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The other Series or scalar value to combine with the calling Series.

- **func**: `function`

  - A function that takes two scalars as inputs and returns a scalar value. This function will be applied element-wise between the two Series (or Series and scalar).

- **fill_value**: `scalar`, optional, default `None`
  - The value used when an index is missing from one of the Series. If not provided, `NaN` is used for the missing values.

### **Returns**

- **Series**: A new Series containing the result of applying `func` to the combined elements.

---

### **Examples**

#### **1. Combine Using `max()` Function**

```python
import pandas as pd

# Define two Series
s1 = pd.Series({'falcon': 330.0, 'eagle': 160.0})
s2 = pd.Series({'falcon': 345.0, 'eagle': 200.0, 'duck': 30.0})

# Combine using max function
result = s1.combine(s2, max)
print(result)
```

**Output**:

```
duck        NaN
eagle     200.0
falcon    345.0
dtype: float64
```

In this example:

- The `max()` function is applied to the two Series element-wise.
- The value for `'duck'` is missing in `s1`, so it results in `NaN`.

#### **2. Combine Using `max()` with `fill_value`**

```python
# Combine using max function with fill_value=0
result = s1.combine(s2, max, fill_value=0)
print(result)
```

**Output**:

```
duck       30.0
eagle     200.0
falcon    345.0
dtype: float64
```

Here:

- The missing `'duck'` value is filled with `0`, and the maximum value is calculated as `30.0`.
- Now, `'duck'` gets a value instead of `NaN`.

#### **3. Combine Using Custom Function**

```python
# Define a custom function
def custom_func(x, y):
    return x * y if x > y else x + y

# Combine using the custom function
result = s1.combine(s2, custom_func, fill_value=1)
print(result)
```

**Output**:

```
duck        31.0
eagle     360.0
falcon    675.0
dtype: float64
```

In this example:

- The custom function multiplies `x` and `y` if `x > y`, otherwise, it adds `x` and `y`.
- The result shows different combinations based on the logic defined.

---

### **See Also**

- **`Series.combine_first()`**: Combine Series values by choosing the calling Series' values first, then the other Series' values if missing.

The **`combine()`** method provides great flexibility for combining Series element-wise using a custom function, allowing for specific handling of missing data with the `fill_value` parameter.


In [None]:
""" pandas.Series.combine_first
Series.combine_first(other)[source]
Update null elements with value in the same location in ‘other’.

Combine two Series objects by filling null values in one Series with non-null values from the other Series. Result index will be the union of the two indexes.

Parameters
:
other
Series
The value(s) to be used for filling null values.

Returns
:
Series
The result of combining the provided Series with the other object.

See also

Series.combine
Perform element-wise operation on two Series using a given function. """
s1 = pd.Series([1, np.nan])
s2 = pd.Series([3, 4, 5])
s1.combine_first(s2)


In [None]:
s1 = pd.Series({'falcon': np.nan, 'eagle': 160.0})
s2 = pd.Series({'eagle': 200.0, 'duck': 30.0})
s1.combine_first(s2)

## **pandas.Series.combine_first – Combine Two Series by Filling Nulls**

The **`Series.combine_first()`** method is used to combine two Series by filling missing (null) values in one Series with the corresponding non-null values from the other Series. The result will have the union of the indices of both Series.

### **Syntax**

```python
Series.combine_first(other)
```

### **Parameters**

- **other**: `Series`
  - The Series used to fill in missing values in the calling Series.

### **Returns**

- **Series**: A new Series where missing values (nulls) in the calling Series are filled with corresponding values from `other`.

---

### **Examples**

#### **1. Combine Two Series with Null Values**

```python
import pandas as pd
import numpy as np

# Define two Series with some null values
s1 = pd.Series([1, np.nan])
s2 = pd.Series([3, 4, 5])

# Combine the Series, filling nulls in s1 with values from s2
result = s1.combine_first(s2)
print(result)
```

**Output**:

```
0    1.0
1    4.0
2    5.0
dtype: float64
```

In this example:

- The first Series `s1` has a `NaN` at index `1`, which is filled by the corresponding value from `s2`.
- The second Series `s2` has extra data at index `2`, which is added to the result.

#### **2. Combine Two Series with Different Indexes**

```python
# Define two Series with different indexes
s1 = pd.Series({'falcon': np.nan, 'eagle': 160.0})
s2 = pd.Series({'eagle': 200.0, 'duck': 30.0})

# Combine the Series, filling nulls in s1 with values from s2
result = s1.combine_first(s2)
print(result)
```

**Output**:

```
duck       30.0
eagle     160.0
falcon      NaN
dtype: float64
```

In this case:

- The missing value for `'falcon'` in `s1` could not be filled because `s2` does not have a `'falcon'` value.
- The missing `'eagle'` value in `s1` is not replaced because it already has a value in `s1`.
- The `'duck'` value from `s2` fills the missing index.

#### **3. Result with Union of Indexes**

```python
# Combine Series with non-overlapping indexes
s1 = pd.Series([1, np.nan], index=['a', 'b'])
s2 = pd.Series([3, 4, 5], index=['b', 'c', 'd'])

# Combine the Series, filling nulls in s1 with values from s2
result = s1.combine_first(s2)
print(result)
```

**Output**:

```
a    1.0
b    4.0
c    5.0
d    NaN
dtype: float64
```

Here:

- The result's index is the union of the indexes from both Series (`a`, `b`, `c`, `d`).
- The value at index `'a'` is from `s1`, while missing values at index `'b'` are filled with data from `s2`.
- The index `'d'` is only present in `s2`, but there is no corresponding value to fill, so it remains `NaN`.

---

### **See Also**

- **`Series.combine()`**: Performs element-wise operation on two Series using a given function (for more complex operations than just filling nulls).

The **`combine_first()`** method is useful when you want to "fill in the blanks" of one Series with data from another, ensuring no data is lost where possible.


In [None]:
""" pandas.Series.round
Series.round(decimals=0, *args, **kwargs)[source]
Round each value in a Series to the given number of decimals.

Parameters
:
decimals
int, default 0
Number of decimal places to round to. If decimals is negative, it specifies the number of positions to the left of the decimal point.

*args, **kwargs
Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns :


Series
Rounded values of the Series.



numpy.around
Round values of an np.array.

DataFrame.round
Round values of a DataFrame. """
s = pd.Series([0.1, 1.3, 2.7])
s.round()


The **`Series.round()`** function in pandas allows you to round the values of a Series to a specified number of decimal places. Here’s a summary and an example:

### **Syntax**

```python
Series.round(decimals=0, *args, **kwargs)
```

### **Parameters**

- **decimals**: `int`, default is `0`
  - The number of decimal places to round to. If negative, it rounds to the left of the decimal point (e.g., `-1` rounds to the nearest 10).
- **\*args, \*\*kwargs**: These additional arguments and keywords are accepted for compatibility with NumPy but do not affect the behavior of this method.

### **Returns**

- **Series**: The resulting Series with the values rounded to the specified number of decimal places.

### **Examples**

#### Example 1: Round to the Nearest Integer (0 Decimal Places)

```python
import pandas as pd

# Create a Series with floating-point numbers
s = pd.Series([0.1, 1.3, 2.7])

# Round to the nearest integer
rounded_s = s.round()

print(rounded_s)
```

**Output**:

```
0    0.0
1    1.0
2    3.0
dtype: float64
```

- In this case, the values are rounded to the nearest whole number (i.e., 0 decimal places).

#### Example 2: Round to a Specific Number of Decimal Places

```python
# Round to 1 decimal place
rounded_s = s.round(decimals=1)

print(rounded_s)
```

**Output**:

```
0    0.1
1    1.3
2    2.7
dtype: float64
```

- Here, the values are rounded to 1 decimal place.

#### Example 3: Round to the Nearest Ten (Negative Decimal Places)

```python
# Round to the nearest 10
rounded_s = s.round(decimals=-1)

print(rounded_s)
```

**Output**:

```
0    0.0
1    0.0
2    0.0
dtype: float64
```

- The negative value of `decimals=-1` rounds each number to the nearest 10.

### **See Also**

- **`numpy.around`**: Equivalent function in NumPy for rounding values in arrays.
- **`DataFrame.round`**: For rounding values in an entire DataFrame.

This method is useful when you need to control the precision of the data in a pandas Series. Let me know if you'd like to explore more!


In [None]:
""" pandas.Series.lt
Series.lt(other, level=None, fill_value=None, axis=0)[source]
Return Less than of series and other, element-wise (binary operator lt).

Equivalent to series < other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
a

In [None]:
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
b

In [None]:
a.lt(b, fill_value=0)

The **`Series.lt()`** function in pandas is used to compare the elements of a Series with another Series or a scalar value to check for "less than" relationships, element-wise. This function can handle missing data by using a `fill_value` parameter.

### **Syntax**

```python
Series.lt(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`
  - The value(s) you want to compare the Series with. It can be another Series or a scalar value.
- **level**: `int` or `name`, optional

  - If the Series has a `MultiIndex`, this parameter is used to perform the comparison across a specific level of the index.

- **fill_value**: `scalar`, optional, default `None`

  - If there are missing (NaN) values in the Series, this value will be used to replace them before performing the comparison.

- **axis**: `{0 or 'index'}`, optional
  - This parameter is unused for Series but is included for compatibility with DataFrame operations.

### **Returns**

- **Series**: A Series of boolean values (`True` or `False`), indicating whether the elements of the original Series are less than the corresponding elements in the `other`.

### **Examples**

#### Example 1: Simple Comparison (Without Missing Data)

```python
import pandas as pd

# Create two Series
a = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, 3, 2, 1], index=['a', 'b', 'c', 'd'])

# Compare if elements in 'a' are less than corresponding elements in 'b'
result = a.lt(b)
print(result)
```

**Output**:

```
a     True
b     False
c     False
d     True
dtype: bool
```

- In this case, the result is a boolean Series where each value is `True` if the corresponding value in `a` is less than the value in `b`.

#### Example 2: Comparison with Missing Values

```python
# Create Series with missing values
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])

# Compare with fill_value for NaNs
result = a.lt(b, fill_value=0)
print(result)
```

**Output**:

```
a     False
b     False
c      True
d     False
e     False
f      True
dtype: bool
```

- Here, the comparison is performed element-wise, and the missing values are filled with `0` before performing the comparison. As a result:
  - For index `a`, `1.0 < 0.0` is `False`.
  - For index `b`, `1.0 < 1.0` is `False`.
  - For index `c`, `1.0 < 2.0` is `True`.
  - For index `d`, `NaN` is compared with `NaN`, and the result is `False` because NaN is not less than NaN.
  - For index `f`, the value `NaN` in `b` is filled with `0`, so `1.0 < 0.0` is `True`.

#### Example 3: Comparison with a Scalar Value

```python
# Compare each element in the Series to a scalar value
result = a.lt(2)
print(result)
```

**Output**:

```
a     True
b     True
c     True
d    False
e     True
dtype: bool
```

- Here, every element in `a` is compared to the scalar value `2`. The result is `True` for values less than `2` and `False` otherwise.

### **See Also**

- **`Series.gt()`**: Greater than comparison (`>`).
- **`Series.le()`**: Less than or equal to comparison (`<=`).
- **`Series.ge()`**: Greater than or equal to comparison (`>=`).
- **`Series.eq()`**: Equal to comparison (`==`).

The **`lt()`** function is useful when you need to compare Series element-wise and handle missing values in a controlled way.


In [None]:
""" pandas.Series.gt
Series.gt(other, level=None, fill_value=None, axis=0)[source]
Return Greater than of series and other, element-wise (binary operator gt).

Equivalent to series > other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
a

In [None]:
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
b

In [None]:
a.gt(b, fill_value=0)

The **`Series.gt()`** function in pandas is used to compare the elements of a Series with another Series or a scalar value, checking if they are "greater than" element-wise. It also allows for handling missing data by using the `fill_value` parameter.

### **Syntax**

```python
Series.gt(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`
  - The value(s) you want to compare the Series with. It can be another Series or a scalar value.
- **level**: `int` or `name`, optional

  - If the Series has a `MultiIndex`, this parameter is used to perform the comparison across a specific level of the index.

- **fill_value**: `scalar`, optional, default `None`

  - If there are missing (NaN) values in the Series, this value will be used to replace them before performing the comparison.

- **axis**: `{0 or 'index'}`, optional
  - This parameter is unused for Series but is included for compatibility with DataFrame operations.

### **Returns**

- **Series**: A Series of boolean values (`True` or `False`), indicating whether the elements of the original Series are greater than the corresponding elements in `other`.

### **Examples**

#### Example 1: Simple Comparison (Without Missing Data)

```python
import pandas as pd

# Create two Series
a = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, 3, 2, 1], index=['a', 'b', 'c', 'd'])

# Compare if elements in 'a' are greater than corresponding elements in 'b'
result = a.gt(b)
print(result)
```

**Output**:

```
a    False
b     True
c     True
d     True
dtype: bool
```

- The result is a boolean Series where each value is `True` if the corresponding value in `a` is greater than the value in `b`.

#### Example 2: Comparison with Missing Values

```python
# Create Series with missing values
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])

# Compare with fill_value for NaNs
result = a.gt(b, fill_value=0)
print(result)
```

**Output**:

```
a     True
b    False
c     True
d    False
e     True
f    False
dtype: bool
```

- Here, the comparison is performed element-wise, and missing values (NaN) are filled with `0` before performing the comparison. As a result:
  - For index `a`, `1.0 > 0.0` is `True`.
  - For index `b`, `1.0 > 1.0` is `False`.
  - For index `c`, `1.0 > 2.0` is `False`.
  - For index `d`, `NaN` is compared with `NaN`, and the result is `False`.
  - For index `f`, `NaN` in `b` is filled with `0`, so `1.0 > 0.0` is `True`.

#### Example 3: Comparison with a Scalar Value

```python
# Compare each element in the Series to a scalar value
result = a.gt(0)
print(result)
```

**Output**:

```
a     True
b     False
c     False
d    False
e     True
dtype: bool
```

- Here, every element in `a` is compared to the scalar value `0`. The result is `True` for values greater than `0` and `False` otherwise.

### **See Also**

- **`Series.lt()`**: Less than comparison (`<`).
- **`Series.le()`**: Less than or equal to comparison (`<=`).
- **`Series.ge()`**: Greater than or equal to comparison (`>=`).
- **`Series.eq()`**: Equal to comparison (`==`).

The **`gt()`** function is useful when you need to compare Series element-wise and handle missing values in a controlled way.


In [None]:
""" pandas.Series.le
Series.le(other, level=None, fill_value=None, axis=0)[source]
Return Less than or equal to of series and other, element-wise (binary operator le).

Equivalent to series <= other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
a


In [None]:
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
b

In [None]:
a.le(b, fill_value=0)

The **`Series.le()`** function in pandas is used to compare the elements of a Series to another Series or a scalar value to check if they are "less than or equal to" each other element-wise. It also provides support for handling missing (NaN) values by substituting them with a specified `fill_value`.

### **Syntax**

```python
Series.le(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`
  - The value(s) with which to compare the Series. It can be another Series or a scalar value.
- **level**: `int` or `name`, optional

  - If the Series has a `MultiIndex`, this parameter allows comparison across a specific level of the index.

- **fill_value**: `scalar`, optional, default `None`

  - If there are missing (NaN) values in either of the Series, this value will be used to fill those missing values before performing the comparison.

- **axis**: `{0 or 'index'}`, optional
  - This parameter is unused for Series but is included for compatibility with DataFrame operations.

### **Returns**

- **Series**: A boolean Series indicating the result of the element-wise comparison (`True` if the value is less than or equal to the corresponding value in `other`, `False` otherwise).

### **Examples**

#### Example 1: Basic Comparison (Without Missing Values)

```python
import pandas as pd

# Create two Series
a = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, 3, 2, 1], index=['a', 'b', 'c', 'd'])

# Compare if elements in 'a' are less than or equal to corresponding elements in 'b'
result = a.le(b)
print(result)
```

**Output**:

```
a     True
b    False
c    False
d     True
dtype: bool
```

- The result is a boolean Series where each value is `True` if the corresponding value in `a` is less than or equal to the value in `b`.

#### Example 2: Comparison with Missing Values

```python
import numpy as np

# Create Series with missing values
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])

# Compare with fill_value for NaNs
result = a.le(b, fill_value=0)
print(result)
```

**Output**:

```
a     False
b      True
c      True
d     False
e      True
f     False
dtype: bool
```

- In this example, the comparison is done element-wise. The NaN values are filled with `0` before performing the comparison. For instance:
  - For index `a`, `1.0 <= 0.0` is `False`.
  - For index `b`, `1.0 <= 1.0` is `True`.
  - For index `c`, `1.0 <= 2.0` is `True`.
  - For index `d`, `NaN` is compared with `NaN`, which results in `False`.

#### Example 3: Comparison with a Scalar Value

```python
# Compare each element in the Series to a scalar value
result = a.le(1)
print(result)
```

**Output**:

```
a     True
b     True
c     True
d    False
e     True
dtype: bool
```

- Here, every element in `a` is compared to the scalar value `1`. The result is `True` for values that are less than or equal to `1` and `False` for values that are greater than `1`.

### **See Also**

- **`Series.lt()`**: Less than comparison (`<`).
- **`Series.gt()`**: Greater than comparison (`>`).
- **`Series.ge()`**: Greater than or equal to comparison (`>=`).
- **`Series.eq()`**: Equal to comparison (`==`).

The **`le()`** function is useful when you need to perform element-wise comparisons for "less than or equal to" conditions, especially when dealing with missing values.


In [None]:
""" pandas.Series.ge
Series.ge(other, level=None, fill_value=None, axis=0)[source]
Return Greater than or equal to of series and other, element-wise (binary operator ge).

Equivalent to series >= other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
a

In [None]:
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])
b

In [None]:
a.ge(b, fill_value=0)

The **`Series.ge()`** function in pandas is used to compare the elements of a Series to another Series or scalar to check if they are "greater than or equal to" each other element-wise. It also handles missing (NaN) values by allowing the use of a `fill_value` for those missing entries before performing the comparison.

### **Syntax**

```python
Series.ge(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The value(s) with which to compare the Series. It can be another Series or a scalar value.

- **level**: `int` or `name`, optional

  - If the Series has a `MultiIndex`, this parameter allows comparison across a specific level of the index.

- **fill_value**: `scalar`, optional, default `None`

  - If there are missing (NaN) values in either of the Series, this value will be used to fill those missing values before performing the comparison.

- **axis**: `{0 or 'index'}`, optional
  - This parameter is unused for Series but is included for compatibility with DataFrame operations.

### **Returns**

- **Series**: A boolean Series indicating the result of the element-wise comparison (`True` if the value is greater than or equal to the corresponding value in `other`, `False` otherwise).

### **Examples**

#### Example 1: Basic Comparison (Without Missing Values)

```python
import pandas as pd

# Create two Series
a = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, 3, 2, 1], index=['a', 'b', 'c', 'd'])

# Compare if elements in 'a' are greater than or equal to corresponding elements in 'b'
result = a.ge(b)
print(result)
```

**Output**:

```
a    False
b     True
c     True
d     True
dtype: bool
```

- The result is a boolean Series where each value is `True` if the corresponding value in `a` is greater than or equal to the value in `b`.

#### Example 2: Comparison with Missing Values

```python
import numpy as np

# Create Series with missing values
a = pd.Series([1, 1, 1, np.nan, 1], index=['a', 'b', 'c', 'd', 'e'])
b = pd.Series([0, 1, 2, np.nan, 1], index=['a', 'b', 'c', 'd', 'f'])

# Compare with fill_value for NaNs
result = a.ge(b, fill_value=0)
print(result)
```

**Output**:

```
a     True
b     True
c    False
d    False
e     True
f    False
dtype: bool
```

- In this example, the comparison is done element-wise. The NaN values are filled with `0` before performing the comparison. For instance:
  - For index `a`, `1.0 >= 0.0` is `True`.
  - For index `b`, `1.0 >= 1.0` is `True`.
  - For index `c`, `1.0 >= 2.0` is `False`.
  - For index `d`, `NaN` is compared with `NaN`, which results in `False`.

#### Example 3: Comparison with a Scalar Value

```python
# Compare each element in the Series to a scalar value
result = a.ge(1)
print(result)
```

**Output**:

```
a     True
b     True
c     True
d    False
e     True
dtype: bool
```

- Here, every element in `a` is compared to the scalar value `1`. The result is `True` for values that are greater than or equal to `1` and `False` for values that are less than `1`.

### **See Also**

- **`Series.lt()`**: Less than comparison (`<`).
- **`Series.gt()`**: Greater than comparison (`>`).
- **`Series.le()`**: Less than or equal to comparison (`<=`).
- **`Series.eq()`**: Equal to comparison (`==`).

The **`ge()`** function is useful when you need to perform element-wise comparisons for "greater than or equal to" conditions, especially when dealing with missing values.


In [None]:
""" pandas.Series.ne
Series.ne(other, level=None, fill_value=None, axis=0)[source]
Return Not equal to of series and other, element-wise (binary operator ne).

Equivalent to series != other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.ne(b, fill_value=0)

The **`Series.ne()`** function in pandas is used to compare whether the elements in a Series are "not equal to" another Series or a scalar. This is an element-wise comparison and supports handling missing (NaN) values by allowing the use of a `fill_value` before performing the comparison.

### **Syntax**

```python
Series.ne(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value with which to compare each element in the original Series.

- **level**: `int` or `name`, optional

  - This parameter is used if the Series has a `MultiIndex`. It allows comparison across a specific level of the index.

- **fill_value**: `scalar`, optional, default `None`

  - If the Series or `other` contains missing (NaN) values, this value will be used to fill those NaNs before the comparison.

- **axis**: `{0 or 'index'}`, optional
  - This parameter is unused for Series but is kept for compatibility with DataFrame operations.

### **Returns**

- **Series**: A boolean Series indicating the result of the element-wise "not equal to" comparison (`True` if the values are not equal, `False` otherwise).

### **Examples**

#### Example 1: Basic Comparison (Without Missing Values)

```python
import pandas as pd

# Create two Series
a = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, 3, 2, 1], index=['a', 'b', 'c', 'd'])

# Compare if elements in 'a' are not equal to corresponding elements in 'b'
result = a.ne(b)
print(result)
```

**Output**:

```
a     True
b     True
c     True
d     True
dtype: bool
```

- The result is a boolean Series where each value is `True` if the corresponding values in `a` and `b` are not equal.

#### Example 2: Comparison with Missing Values

```python
import numpy as np

# Create Series with missing values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Compare with fill_value for NaNs
result = a.ne(b, fill_value=0)
print(result)
```

**Output**:

```
a    False
b     True
c     True
d     True
e     True
dtype: bool
```

- In this example, missing values are filled with `0` before performing the comparison. For instance:
  - For index `a`, `1.0 != 1.0` is `False`.
  - For index `b`, `1.0 != NaN` is `True` (after filling NaN with `0`).
  - For index `d`, `NaN != NaN` is `True` (after filling NaN with `0`).

#### Example 3: Comparison with a Scalar Value

```python
# Compare each element in the Series to a scalar value
result = a.ne(1)
print(result)
```

**Output**:

```
a    False
b    False
c    False
d     True
dtype: bool
```

- Here, each element of `a` is compared to the scalar value `1`. The result is `False` where the value is `1` and `True` where it is not.

### **See Also**

- **`Series.eq()`**: Equal to comparison (`==`).
- **`Series.lt()`**: Less than comparison (`<`).
- **`Series.le()`**: Less than or equal to comparison (`<=`).
- **`Series.gt()`**: Greater than comparison (`>`).
- **`Series.ge()`**: Greater than or equal to comparison (`>=`).

The **`ne()`** function is useful for performing element-wise "not equal to" comparisons, especially when working with missing values.


In [None]:
""" pandas.Series.eq
Series.eq(other, level=None, fill_value=None, axis=0)[source]
Return Equal to of series and other, element-wise (binary operator eq).

Equivalent to series == other, but with support to substitute a fill_value for missing data in either one of the inputs.

Parameters
:
other
Series or scalar value
level
int or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
None or float value, default None (NaN)
Fill existing missing (NaN) values, and any new element needed for successful Series alignment, with this value before computation. If data in both corresponding Series locations is missing the result of filling (at that location) will be missing.

axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

Returns
:
Series
The result of the operation. """
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
a

In [None]:
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])
b

In [None]:
a.eq(b, fill_value=0)

The **`Series.eq()`** function in pandas is used to compare if each element in a Series is equal to the corresponding element in another Series or a scalar. It is an element-wise comparison with support for handling missing values (NaN), using a `fill_value` before performing the comparison.

### **Syntax**

```python
Series.eq(other, level=None, fill_value=None, axis=0)
```

### **Parameters**

- **other**: `Series` or `scalar`

  - The Series or scalar value to compare against each element in the original Series.

- **level**: `int` or `name`, optional

  - Used if the Series has a `MultiIndex`. It specifies the level on which to perform the comparison.

- **fill_value**: `scalar`, optional, default `None`

  - If either Series contains missing (NaN) values, this value will be used to fill them before the comparison.

- **axis**: `{0 or 'index'}`, optional
  - This parameter is not used for Series but is kept for compatibility with DataFrame operations.

### **Returns**

- **Series**: A boolean Series indicating the result of the element-wise equality comparison (`True` if the values are equal, `False` otherwise).

### **Examples**

#### Example 1: Basic Comparison (Without Missing Values)

```python
import pandas as pd

# Create two Series
a = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
b = pd.Series([4, 3, 2, 1], index=['a', 'b', 'c', 'd'])

# Compare if elements in 'a' are equal to corresponding elements in 'b'
result = a.eq(b)
print(result)
```

**Output**:

```
a    False
b    False
c    False
d    False
dtype: bool
```

- The result is `False` for all indices because the corresponding values in `a` and `b` are not equal.

#### Example 2: Comparison with Missing Values

```python
import numpy as np

# Create Series with missing values
a = pd.Series([1, 1, 1, np.nan], index=['a', 'b', 'c', 'd'])
b = pd.Series([1, np.nan, 1, np.nan], index=['a', 'b', 'd', 'e'])

# Compare with fill_value for NaNs
result = a.eq(b, fill_value=0)
print(result)
```

**Output**:

```
a     True
b    False
c     True
d    False
e    False
dtype: bool
```

- In this example, missing values (NaNs) are replaced with `0` before the comparison. For instance:
  - For index `a`, `1.0 == 1.0` is `True`.
  - For index `b`, `1.0 == NaN` is `False` (since NaN is filled with `0`).
  - For index `d`, `NaN == NaN` is `False` (since NaN is filled with `0`).

#### Example 3: Comparison with a Scalar Value

```python
# Compare each element in the Series to a scalar value
result = a.eq(1)
print(result)
```

**Output**:

```
a     True
b     True
c     True
d    False
dtype: bool
```

- Here, each element of `a` is compared to the scalar value `1`. The result is `True` for the elements that are equal to `1` and `False` where they are not.

### **See Also**

- **`Series.ne()`**: Not equal to comparison (`!=`).
- **`Series.lt()`**: Less than comparison (`<`).
- **`Series.le()`**: Less than or equal to comparison (`<=`).
- **`Series.gt()`**: Greater than comparison (`>`).
- **`Series.ge()`**: Greater than or equal to comparison (`>=`).

The **`eq()`** function is useful for performing element-wise equality comparisons and can handle missing values efficiently by filling them before the comparison.


In [None]:
""" pandas.Series.product
Series.product(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)[source]
Return the product of the values over the requested axis.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar
See also

Series.sum
Return the sum.

Series.min
Return the minimum.

Series.max
Return the maximum.

Series.idxmin
Return the index of the minimum.

Series.idxmax
Return the index of the maximum.

DataFrame.sum
Return the sum over the requested axis.

DataFrame.min
Return the minimum over the requested axis.

DataFrame.max
Return the maximum over the requested axis.

DataFrame.idxmin
Return the index of the minimum over the requested axis.

DataFrame.idxmax
Return the index of the maximum over the requested axis. """

pd.Series([], dtype="float64").prod()


In [None]:
pd.Series([], dtype="float64").prod(min_count=1)


In [None]:
pd.Series([np.nan]).prod()
pd.Series([np.nan]).prod(min_count=1)

The **`Series.product()`** function in pandas calculates the product of the values in a Series along a specified axis (in this case, it is primarily used with a single axis, i.e., the Series itself). It multiplies the elements together and can handle missing data based on specific parameters.

### **Syntax**

```python
Series.product(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)
```

### **Parameters**

- **axis**: `{index (0)}`, optional, default `None`

  - This parameter is unused for Series (since Series is 1D), and the default behavior is to compute the product along the only axis (axis 0).

- **skipna**: `bool`, default `True`
  - Excludes missing (NA) values when performing the calculation. If set to `False`, the presence of NaN will result in the entire product being NaN.
- **numeric_only**: `bool`, default `False`

  - If set to `True`, it will include only numeric (int, float, or boolean) data. This parameter is not implemented for Series but is available for DataFrame.

- **min_count**: `int`, default `0`

  - Specifies the minimum number of non-null values required to perform the operation. If there are fewer than `min_count` non-null values, the result will be NaN.

- **kwargs**: Additional arguments to be passed to the function.

### **Returns**

- A scalar value representing the product of the values in the Series.

### **Examples**

#### Example 1: Default Behavior (Product of All Values)

```python
import pandas as pd

# Create a Series
s = pd.Series([2, 3, 4])

# Calculate the product of the values in the Series
result = s.prod()
print(result)
```

**Output**:

```
24
```

- The product of `2 * 3 * 4` is `24`.

#### Example 2: Handling Missing Values (`skipna=True`)

```python
import numpy as np

# Series with a NaN value
s = pd.Series([2, np.nan, 4])

# Calculate the product, skipping NaN values
result = s.prod()
print(result)
```

**Output**:

```
8.0
```

- By default, `skipna=True`, so the NaN value is excluded from the product calculation, and `2 * 4 = 8`.

#### Example 3: Handling Missing Values with `skipna=False`

```python
# Calculate the product, without skipping NaN values
result = s.prod(skipna=False)
print(result)
```

**Output**:

```
nan
```

- Since there is a NaN value and `skipna=False`, the product is `NaN`.

#### Example 4: Empty Series with `min_count`

```python
# Create an empty Series
s = pd.Series([], dtype="float64")

# Calculate the product of an empty Series
result = s.prod()
print(result)
```

**Output**:

```
1.0
```

- By default, the product of an empty Series is `1`.

#### Example 5: Empty Series with `min_count=1`

```python
# Calculate the product, requiring at least 1 valid value
result = s.prod(min_count=1)
print(result)
```

**Output**:

```
nan
```

- When `min_count=1`, the product is `NaN` because the Series is empty, and there are no valid values to multiply.

#### Example 6: Series with NaN and `min_count=1`

```python
# Series with a NaN value
s = pd.Series([np.nan])

# Calculate the product with min_count=1
result = s.prod(min_count=1)
print(result)
```

**Output**:

```
nan
```

- Even though there is a single element (which is NaN), with `min_count=1`, the product is `NaN` because there are no valid numeric values.

### **See Also**

- **`Series.sum()`**: To calculate the sum of the values in a Series.
- **`Series.min()`**: To calculate the minimum value of the Series.
- **`Series.max()`**: To calculate the maximum value of the Series.
- **`Series.idxmin()`**: To get the index of the minimum value in the Series.
- **`Series.idxmax()`**: To get the index of the maximum value in the Series.

The **`Series.product()`** function is a simple but powerful method to calculate the product of all values in a Series, with support for handling missing data and specifying minimum valid values required.


In [None]:
""" 

pandas.Series.dot


Series.dot(other)



Compute the dot product between the Series and the columns of other.

This method computes the dot product between the Series and another one, or the Series and each columns of a DataFrame, or the Series and each columns of an array.

It can also be called using self @ other.

Parameters
:
other
Series, DataFrame or array-like
The other object to compute the dot product with its columns.

Returns
:
scalar, Series or numpy.ndarray
Return the dot product of the Series and other if other is a Series, the Series of the dot product of Series and each rows of other if other is a DataFrame or a numpy.ndarray between the Series and each columns of the numpy array.


DataFrame.dot
Compute the matrix product with the DataFrame.

Series.mul
Multiplication of series and other, element-wise. """

s = pd.Series([0, 1, 2, 3])
other = pd.Series([-1, 2, -3, 4])
s.dot(other)

In [None]:
s @ other

In [None]:
df = pd.DataFrame([[0, 1], [-2, 3], [4, -5], [6, 7]])
s.dot(df)

In [None]:
arr = np.array([[0, 1], [-2, 3], [4, -5], [6, 7]])
s.dot(arr)

The **`Series.dot()`** method in pandas computes the dot product between the Series and another object (Series, DataFrame, or array-like). It's a vectorized operation for calculating the sum of the element-wise product between two series or matrices.

### **Syntax**

```python
Series.dot(other)
```

### **Parameters**

- **other**: Series, DataFrame, or array-like
  - The object with which you want to compute the dot product. If `other` is a Series, the result will be a scalar (a single value).
  - If `other` is a DataFrame or an array, the method computes the dot product with each column (if a DataFrame) or each column (if a NumPy array).

### **Returns**

- **scalar**: If `other` is a Series, returns a scalar value representing the dot product.
- **Series**: If `other` is a DataFrame, returns a Series where each element is the dot product between the Series and each column of the DataFrame.
- **numpy.ndarray**: If `other` is a NumPy array, returns an ndarray with the dot product between the Series and each column of the array.

### **Examples**

#### Example 1: Dot Product of Two Series

```python
import pandas as pd

s = pd.Series([0, 1, 2, 3])
other = pd.Series([-1, 2, -3, 4])

result = s.dot(other)
print(result)
```

**Output**:

```
8
```

- The dot product is calculated as `0*(-1) + 1*2 + 2*(-3) + 3*4 = 8`.

You can also use the `@` operator, which is equivalent to `dot()`:

```python
result = s @ other
print(result)
```

**Output**:

```
8
```

#### Example 2: Dot Product with a DataFrame

```python
import pandas as pd

df = pd.DataFrame([[0, 1], [-2, 3], [4, -5], [6, 7]])
s = pd.Series([1, 2, 3, 4])

result = s.dot(df)
print(result)
```

**Output**:

```
0    24
1    14
dtype: int64
```

- The dot product is computed between the Series and each column of the DataFrame:
  - For column 0: `1*0 + 2*(-2) + 3*4 + 4*6 = 24`
  - For column 1: `1*1 + 2*3 + 3*(-5) + 4*7 = 14`

#### Example 3: Dot Product with a NumPy Array

```python
import numpy as np
import pandas as pd

arr = np.array([[0, 1], [-2, 3], [4, -5], [6, 7]])
s = pd.Series([1, 2, 3, 4])

result = s.dot(arr)
print(result)
```

**Output**:

```
[24 14]
```

- The dot product is computed between the Series and each column of the NumPy array.

#### Notes:

- **Index Alignment**: When `other` is a Series or a DataFrame, it is important that the indices of the two Series align. If the indices do not match, pandas will align them by index before performing the dot product, which might result in `NaN` values for mismatched indices.
- **Matrix Multiplication**: The dot product for DataFrames is equivalent to matrix multiplication if the number of rows in the Series and the number of columns in the DataFrame match.
- **NumPy Compatibility**: This method also works with NumPy arrays, providing flexibility in numerical computations.

### **See Also**

- **`DataFrame.dot()`**: To compute the matrix product of a DataFrame with another DataFrame or array.
- **`Series.mul()`**: Element-wise multiplication between two Series.
- **`numpy.dot()`**: Direct dot product computation in NumPy.


In [None]:
""" 
pandas.Series.apply

Series.apply(func, convert_dtype=<no_default>, args=(), *, by_row='compat', **kwargs)[source]
Invoke function on values of Series.

Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

Parameters:
funcfunction
Python function or NumPy ufunc to apply.

convert_dtypebool, default True
Try to find better dtype for elementwise function results. If False, leave as dtype=object. Note that the dtype is always preserved for some extension array dtypes, such as Categorical.

Deprecated since version 2.1.0: convert_dtype has been deprecated. Do ser.astype(object).apply() instead if you want convert_dtype=False.

argstuple
Positional arguments passed to func after the series value.

by_rowFalse or “compat”, default “compat”
If "compat" and func is a callable, func will be passed each element of the Series, like Series.map. If func is a list or dict of callables, will first try to translate each func into pandas methods. If that doesn’t work, will try call to apply again with by_row="compat" and if that fails, will call apply again with by_row=False (backward compatible). If False, the func will be passed the whole Series at once.

by_row has no effect when func is a string.

Added in version 2.1.0.

**kwargs
Additional keyword arguments passed to func.

Returns:
Series or DataFrame
If func returns a Series object the result will be a DataFrame. """

s = pd.Series([20, 21, 12],
              index=['London', 'New York', 'Helsinki'])
s

In [None]:
def square(x):
    return x ** 2
s.apply(square)

In [None]:
s.apply(lambda x: x ** 2)

In [None]:
def subtract_custom_value(x, custom_value):
     return x - custom_value

In [None]:
s.apply(subtract_custom_value, args=(5,))

In [None]:
s.apply(np.log)

The **`Series.apply()`** method allows you to apply a function along the values of a pandas Series. It is versatile and can work with NumPy ufuncs (functions that apply to entire arrays) or custom Python functions that apply to individual values in the Series.

### **Syntax**

```python
Series.apply(func, convert_dtype=True, args=(), *, by_row='compat', **kwargs)
```

### **Parameters**

- **func**: function (Python or NumPy ufunc)
  - A function or callable object that will be applied to the values in the Series.
- **convert_dtype**: bool, default `True`

  - Attempt to find the best dtype for the result of the function. If `False`, the dtype will be left as `object`.
  - **Deprecated** since version 2.1.0. Instead, you should use `astype(object).apply()` if you want `convert_dtype=False`.

- **args**: tuple
  - Positional arguments to pass to the function after the Series values.
- **by_row**: {'compat', False}, default 'compat'

  - Controls how the function is applied:
    - If `'compat'`, the function is applied element-wise (similar to `map()`).
    - If `False`, the function will be applied to the whole Series.
    - This parameter is relevant only if `func` is a list or dictionary of callables.

- **kwargs**: additional keyword arguments passed to `func`.

### **Returns**

- **Series** or **DataFrame**
  - If `func` returns a Series, the result will be a DataFrame.

### **Examples**

#### Example 1: Applying a custom function to square values

```python
import pandas as pd

# Create a Series
s = pd.Series([20, 21, 12], index=['London', 'New York', 'Helsinki'])

# Define a function to square the values
def square(x):
    return x ** 2

# Apply the function to each element in the Series
result = s.apply(square)
print(result)
```

**Output**:

```
London      400
New York    441
Helsinki    144
dtype: int64
```

#### Example 2: Applying a lambda function

```python
# Apply a lambda function to square the values
result = s.apply(lambda x: x ** 2)
print(result)
```

**Output**:

```
London      400
New York    441
Helsinki    144
dtype: int64
```

#### Example 3: Passing additional positional arguments to a function

```python
# Define a custom function that subtracts a custom value
def subtract_custom_value(x, custom_value):
    return x - custom_value

# Apply the function with an additional argument (5)
result = s.apply(subtract_custom_value, args=(5,))
print(result)
```

**Output**:

```
London      15
New York    16
Helsinki     7
dtype: int64
```

#### Example 4: Passing keyword arguments to a function

```python
# Define a custom function that adds custom values for each month
def add_custom_values(x, **kwargs):
    for month in kwargs:
        x += kwargs[month]
    return x

# Apply the function with keyword arguments
result = s.apply(add_custom_values, june=30, july=20, august=25)
print(result)
```

**Output**:

```
London      95
New York    96
Helsinki    87
dtype: int64
```

#### Example 5: Applying a NumPy function

```python
import numpy as np

# Apply a NumPy function (logarithm) to the Series
result = s.apply(np.log)
print(result)
```

**Output**:

```
London      2.995732
New York    3.044522
Helsinki    2.484907
dtype: float64
```

### **Notes**

- The **`apply()`** method is useful for both element-wise operations (like `map()`) and operations that require access to the entire Series (like aggregation).
- **Performance**: While flexible, `apply()` can be slower than vectorized operations (like NumPy functions or pandas' built-in methods).
- **Function Compatibility**: Functions that mutate the passed object might produce unexpected behavior, so they are not recommended.

### **See Also**

- **`Series.map()`**: For element-wise mapping operations, similar to `apply()`, but simpler for dictionary or function-based mappings.
- **`Series.agg()`**: To perform aggregations (sum, mean, etc.) over the Series.
- **`Series.transform()`**: To apply functions to a Series while preserving its shape.


In [None]:
""" pandas.Series.agg
Series.agg(func=None, axis=0, *args, **kwargs)[source]
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions



Series.apply
Invoke function on a Series.

Series.transform
Transform function producing a Series with like indexes. """
s = pd.Series([1, 2, 3, 4])
s

In [None]:
s.agg('min')

In [None]:
s.agg(['min', 'max'])

The **`Series.agg()`** method is used to apply one or more aggregation functions to a Series. Aggregation functions compute a summary statistic for the data in a Series, such as sum, mean, min, max, etc.

### **Syntax**

```python
Series.agg(func=None, axis=0, *args, **kwargs)
```

### **Parameters**

- **func**: function, str, list, or dict

  - **function**: A single function (e.g., `np.sum`, `mean`, `max`, etc.) to apply to the Series.
  - **str**: A string that represents the name of a function (e.g., 'sum', 'mean').
  - **list**: A list of functions or string names. Multiple functions can be applied at once (e.g., `['sum', 'mean']`).
  - **dict**: A dictionary where the keys are axis labels (e.g., column names for DataFrame) and the values are functions or lists of functions. This is useful when applying different functions to different columns.

- **axis**: {0 or 'index'}

  - This parameter is for compatibility with DataFrame and is unused for Series.

- **args**: tuple

  - Positional arguments to pass to the function(s).

- **kwargs**: dict
  - Additional keyword arguments to pass to the function(s).

### **Returns**

- **scalar**: If a single function is applied, a scalar result will be returned.
- **Series**: If multiple functions are applied and the result is a single function, a Series will be returned.
- **DataFrame**: If multiple functions are applied to different columns (in case of DataFrame), the result is a DataFrame.

### **Examples**

#### Example 1: Applying a single function (e.g., 'min')

```python
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3, 4])

# Apply the 'min' aggregation function
result = s.agg('min')
print(result)
```

**Output**:

```
1
```

#### Example 2: Applying multiple functions (e.g., 'min' and 'max')

```python
# Apply multiple functions to the Series
result = s.agg(['min', 'max'])
print(result)
```

**Output**:

```
min    1
max    4
dtype: int64
```

#### Example 3: Using a list of functions (e.g., np.sum, np.mean)

```python
import numpy as np

# Apply a list of functions to the Series
result = s.agg([np.sum, np.mean])
print(result)
```

**Output**:

```
sum      10
mean      2.5
dtype: float64
```

#### Example 4: Using a dictionary to apply different functions to different columns (for a DataFrame)

```python
# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Apply different functions to different columns
result = df.agg({'A': 'sum', 'B': 'mean'})
print(result)
```

**Output**:

```
A     6
B     5
dtype: int64
```

### **Notes**

- **Aggregation functions** are performed **over an axis** (index in the case of Series, columns in the case of DataFrame).
- The **`agg()`** method provides a flexible way to apply one or more aggregation functions to a Series or DataFrame. It can handle a wide range of use cases, from applying single functions to applying different functions to different columns.
- **Functions that mutate the passed object** (such as modifying values in-place) are not supported with `agg()`.

### **See Also**

- **`Series.apply()`**: For applying a custom function to each element of the Series.
- **`Series.transform()`**: For transforming data in a Series while keeping its shape.
- **`DataFrame.agg()`**: To apply aggregation functions on DataFrame columns.


In [None]:
""" pandas.Series.aggregate
Series.aggregate(func=None, axis=0, *args, **kwargs)[source]
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a Series or when passed to Series.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions

See also

Series.apply
Invoke function on a Series.

Series.transform
Transform function producing a Series with like indexes. """

s = pd.Series([1, 2, 3, 4])
s

In [None]:
s.agg('min')

In [None]:
s.agg(['min', 'max'])

The **`Series.aggregate()`** method in pandas is used to apply one or more aggregation functions to a Series, which helps in obtaining summary statistics or reducing the data into a single result or multiple results based on the functions applied.

The **`aggregate()`** method is an alias for the **`agg()`** method, and they are used interchangeably. Both methods allow the use of aggregation functions like `sum`, `mean`, `min`, `max`, and even user-defined functions.

### **Syntax**

```python
Series.aggregate(func=None, axis=0, *args, **kwargs)
```

### **Parameters**

- **func**: function, str, list, or dict

  - **function**: A function (e.g., `np.sum`, `mean`, `max`, etc.) to apply to the Series.
  - **str**: A string representing a function name (e.g., 'sum', 'mean', 'min').
  - **list**: A list of functions or string names. You can apply multiple functions at once (e.g., `['sum', 'max']`).
  - **dict**: A dictionary where keys are axis labels (for DataFrames) and values are functions or lists of functions. This is particularly useful when applying different aggregation functions to different columns in a DataFrame.

- **axis**: {0 or 'index'}

  - This parameter is for compatibility with DataFrame (it’s unused for Series).

- **args**: tuple

  - Positional arguments to pass to the aggregation function(s).

- **kwargs**: dict
  - Additional keyword arguments to pass to the function(s).

### **Returns**

- **scalar**: If a single function is applied, it returns a scalar.
- **Series**: If multiple functions are applied, it returns a Series containing the results of the aggregation.
- **DataFrame**: If different functions are applied to different columns (for DataFrames), it returns a DataFrame.

### **Examples**

#### Example 1: Applying a single function (e.g., 'min')

```python
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3, 4])

# Apply the 'min' aggregation function
result = s.aggregate('min')
print(result)
```

**Output**:

```
1
```

#### Example 2: Applying multiple functions (e.g., 'min' and 'max')

```python
# Apply multiple functions to the Series
result = s.aggregate(['min', 'max'])
print(result)
```

**Output**:

```
min    1
max    4
dtype: int64
```

#### Example 3: Using a list of functions (e.g., np.sum, np.mean)

```python
import numpy as np

# Apply a list of functions to the Series
result = s.aggregate([np.sum, np.mean])
print(result)
```

**Output**:

```
sum      10
mean      2.5
dtype: float64
```

#### Example 4: Using a dictionary to apply different functions to different columns (for DataFrame)

```python
# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Apply different functions to different columns
result = df.aggregate({'A': 'sum', 'B': 'mean'})
print(result)
```

**Output**:

```
A     6
B     5
dtype: int64
```

### **Notes**

- **`aggregate()` and `agg()`** are interchangeable, and both methods allow the application of one or more aggregation functions to the Series.
- The aggregation operations are performed over an axis, which is usually the index (default for Series).
- **Functions that mutate the passed object** (e.g., changing values in-place) are not supported when using `aggregate()`.

### **See Also**

- **`Series.apply()`**: For applying a custom function to each element of the Series.
- **`Series.transform()`**: For transforming data and returning a Series with a similar index.
- **`DataFrame.aggregate()`**: For applying aggregation functions across DataFrame columns.


In [None]:
""" pandas.Series.transform
Series.transform(func, axis=0, *args, **kwargs)[source]
Call func on self producing a Series with the same axis shape as self.

Parameters:
funcfunction, str, list-like or dict-like
Function to use for transforming the data. If a function, must either work when passed a Series or when passed to Series.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:

function

string function name

list-like of functions and/or function names, e.g. [np.exp, 'sqrt']

dict-like of axis labels -> functions, function names or list-like of such.

axis{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:

Series
A Series that must have the same length as self.

Raises:
ValueError
If the returned Series has a different length than self.


Series.agg
Only perform aggregating type operations.

Series.apply
Invoke function on a Series. """
df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
df

In [None]:
df.transform(lambda x: x + 1  ) 

In [None]:
s = pd.Series(range(3))
s

In [None]:
df = pd.DataFrame({
    "Date": [
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
    "Data": [5, 8, 6, 1, 50, 100, 60, 120],
})
df

In [None]:
df.groupby('Date')['Data'].transform('sum')

In [None]:
df = pd.DataFrame({
    "c": [1, 1, 1, 2, 2, 2, 2],
    "type": ["m", "n", "o", "m", "m", "n", "n"]
})
df

In [None]:
df['size'] = df.groupby('c')['type'].transform(len)
df

The **`Series.transform()`** method in pandas is used to apply a function (or a list of functions) to a Series while ensuring that the transformed Series has the same shape as the original. It's often used for element-wise transformations and produces a new Series with the same index and length.

### **Syntax**

```python
Series.transform(func, axis=0, *args, **kwargs)
```

### **Parameters**

- **func**: function, str, list-like, or dict-like

  - **function**: A single function that operates element-wise on the Series. This could be a numpy function or a custom Python function.
  - **str**: A string representing a function name (e.g., 'sqrt', 'log', 'exp').
  - **list-like**: A list of functions or function names that will be applied to the Series.
  - **dict-like**: A dictionary with axis labels as keys and functions or lists of functions as values (typically used with DataFrames).

- **axis**: {0 or ‘index’}

  - This parameter is unused for Series but kept for compatibility with DataFrame.

- **args**: tuple

  - Positional arguments passed to the function.

- **kwargs**: dict
  - Additional keyword arguments passed to the function.

### **Returns**

- **Series**: A transformed Series, which must have the same length as the original Series.

### **Raises**

- **ValueError**: If the returned Series has a different length than the original.

### **Notes**

- **`transform()`** is designed for element-wise transformations where the output should have the same shape as the input.
- Unlike **`agg()`**, which is used for aggregation and might return a scalar, **`transform()`** is used to return a transformed Series with the same axis shape as the original.

### **Examples**

#### Example 1: Simple transformation with a lambda function

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})

# Apply a lambda function to add 1 to each element
result = df.transform(lambda x: x + 1)
print(result)
```

**Output**:

```
   A  B
0  1  2
1  2  3
2  3  4
```

#### Example 2: Applying multiple functions (e.g., `np.sqrt` and `np.exp`)

```python
import numpy as np

# Create a Series
s = pd.Series([0, 1, 2])

# Apply multiple functions to the Series
result = s.transform([np.sqrt, np.exp])
print(result)
```

**Output**:

```
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056
```

#### Example 3: Using transform with GroupBy

```python
# Create a DataFrame
df = pd.DataFrame({
    "Date": ["2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05", "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
    "Data": [5, 8, 6, 1, 50, 100, 60, 120],
})

# Use transform to get the sum of 'Data' for each 'Date'
result = df.groupby('Date')['Data'].transform('sum')
print(result)
```

**Output**:

```
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64
```

#### Example 4: Adding a new column with the size of each group

```python
# Create a DataFrame
df = pd.DataFrame({
    "c": [1, 1, 1, 2, 2, 2, 2],
    "type": ["m", "n", "o", "m", "m", "n", "n"]
})

# Use transform to add the size of each group to the 'size' column
df['size'] = df.groupby('c')['type'].transform(len)
print(df)
```

**Output**:

```
   c type  size
0  1    m     3
1  1    n     3
2  1    o     3
3  2    m     4
4  2    m     4
5  2    n     4
6  2    n     4
```

### **See Also**

- **`Series.apply()`**: For applying a function to each element of the Series.
- **`Series.agg()`**: For aggregation operations over a Series.
- **`Series.map()`**: For element-wise transformation using a dictionary or a function.


In [None]:
""" pandas.Series.map
Series.map(arg, na_action=None)[source]
Map values of Series according to an input mapping or function.

Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.

Parameters
:
arg
function, collections.abc.Mapping subclass or Series
Mapping correspondence.

na_action
{None, ‘ignore’}, default None
If ‘ignore’, propagate NaN values, without passing them to the mapping correspondence.

Returns
:
Series
Same index as caller.

See also

Series.apply
For applying more complex functions on a Series.

Series.replace
Replace values given in to_replace with value.

DataFrame.apply
Apply a function row-/column-wise.

DataFrame.map
Apply a function elementwise on a whole DataFrame. """
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
s

In [None]:
s.map({'cat': 'kitten', 'dog': 'puppy'})


In [None]:
s.map('I am a {}'.format)

In [None]:
s.map('I am a {}'.format, na_action='ignore')

The **`Series.map()`** method in pandas is used to map the values of a Series based on a function, dictionary, or another Series. It allows you to substitute each value in the Series with another value derived from the input mapping or function.

### **Syntax**

```python
Series.map(arg, na_action=None)
```

### **Parameters**

- **arg**: function, `collections.abc.Mapping` subclass, or Series

  - **function**: A function to apply to each element in the Series.
  - **`Mapping` subclass**: A dictionary-like object where keys correspond to the values in the Series. The dictionary can also be a `defaultdict` to provide default values for missing keys.
  - **Series**: Another Series to map values. If the Series contains a value not found in the calling Series, it will return `NaN` unless the `na_action` is specified as 'ignore'.

- **na_action**: {None, 'ignore'}, default None
  - **None**: This is the default behavior, which means that missing values (`NaN`) will be passed to the mapping.
  - **'ignore'**: This will propagate `NaN` values without passing them to the mapping function.

### **Returns**

- **Series**: A new Series with the same index, with values transformed according to the `arg` mapping or function.

### **See also**

- **`Series.apply()`**: Use this for more complex functions that involve element-wise operations.
- **`Series.replace()`**: Replaces values in a Series based on a given mapping.
- **`DataFrame.apply()`**: Apply a function row-wise or column-wise on a DataFrame.

### **Examples**

#### Example 1: Mapping with a dictionary

You can map values in a Series using a dictionary where the keys are the current values and the values are the corresponding replacements.

```python
import pandas as pd
import numpy as np

s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])

# Use a dictionary for mapping
result = s.map({'cat': 'kitten', 'dog': 'puppy'})
print(result)
```

**Output**:

```
0    kitten
1     puppy
2      NaN
3      NaN
dtype: object
```

In this case, 'cat' is replaced with 'kitten' and 'dog' with 'puppy'. The `NaN` values are converted to `NaN` because they are not found in the dictionary.

#### Example 2: Mapping with a function

You can apply a function to each element of the Series. The function can reference the value in a formatted string or any transformation logic.

```python
result = s.map('I am a {}'.format)
print(result)
```

**Output**:

```
0       I am a cat
1       I am a dog
2     I am a nan
3    I am a rabbit
dtype: object
```

#### Example 3: Ignoring missing values with `na_action='ignore'`

To prevent `NaN` values from being transformed, use the `na_action='ignore'` option.

```python
result = s.map('I am a {}'.format, na_action='ignore')
print(result)
```

**Output**:

```
0       I am a cat
1       I am a dog
2            NaN
3    I am a rabbit
dtype: object
```

In this case, `NaN` values are left unchanged because of the `na_action='ignore'` parameter.

### **Summary of Use Cases**

- **Mapping with a dictionary**: Use when you have a set of predefined replacements.
- **Mapping with a function**: Use for more complex transformations, such as formatting or applying mathematical operations.
- **Handling missing values**: Use `na_action='ignore'` to skip applying the function to `NaN` values.


In [None]:
""" pandas.Series.groupby
Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=<no_default>, dropna=True)[source]
Group Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
bymapping, function, label, pd.Grouper or list of such
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.

levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

as_indexbool, default True
Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.

group_keysbool, default True
When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.

observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:
pandas.api.typing.SeriesGroupBy
Returns a groupby object that contains information about the groups.

resample
Convenience method for frequency conversion and resampling of time series. """
ser = pd.Series([390., 350., 30., 20.],
                index=['Falcon', 'Falcon', 'Parrot', 'Parrot'],
                name="Max Speed")
ser

In [None]:
ser.groupby(["a", "b", "a", "b"]).mean()
ser.groupby(level=0).mean()
ser.groupby(ser > 100).mean()

In [None]:
# Grouping by Indexes

# We can groupby different levels of a hierarchical index using the level parameter:

arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
          ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
ser = pd.Series([390., 350., 30., 20.], index=index, name="Max Speed")
ser 

In [None]:
ser.groupby(level=0).mean()
ser.groupby(level="Type").mean()

In [None]:
# We can also choose to include NA in group keys or not by defining dropna parameter, the default setting is True.

ser = pd.Series([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
ser.groupby(level=0).sum()

In [None]:
ser.groupby(level=0, dropna=False).sum()

In [None]:
arrays = ['Falcon', 'Falcon', 'Parrot', 'Parrot']
ser = pd.Series([390., 350., 30., 20.], index=arrays, name="Max Speed")
ser.groupby(["a", "b", "a", np.nan]).mean()

In [None]:
ser.groupby(["a", "b", "a", np.nan], dropna=False).mean()

The **`pandas.Series.groupby()`** method is used to group data in a `Series` object based on certain criteria and apply functions (like aggregation, transformation, etc.) to those groups. This is a powerful way to perform operations like sums, means, or custom functions across groups of data.

### **Syntax**

```python
Series.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=<no_default>, dropna=True)
```

### **Parameters**

- **by**: mapping, function, label, `pd.Grouper`, or list of such

  - Defines how to group the data.
  - This could be a dictionary, Series, or a function applied to the values of the index.
  - You can also group by specific labels or levels in a `MultiIndex`.

- **axis**: {0 or 'index', 1 or 'columns'}, default 0

  - This parameter is ignored for Series and defaults to 0.
  - In the future, the behavior will be removed, and the method will behave like `axis=0`.

- **level**: int, level name, or sequence of such, default None

  - Used when grouping by a particular level or levels in a `MultiIndex`.

- **as_index**: bool, default True

  - Determines whether to return the group labels as the index.
  - If False, the output behaves like a SQL-style grouped output (i.e., doesn't set the group labels as the index).

- **sort**: bool, default True

  - Specifies whether to sort the group labels. You can turn this off for performance if sorting isn't necessary.

- **group_keys**: bool, default True

  - When using `apply()`, controls whether group keys are added to the index.

- **observed**: bool, default False

  - Relevant when using `Categorical` types. If True, only observed categories will be shown in the result.

- **dropna**: bool, default True
  - If True, `NA` values are excluded from the grouping keys.
  - If False, `NA` values are treated as part of the groups.

### **Returns**

- **pandas.api.typing.SeriesGroupBy**: A GroupBy object containing information about the groups formed based on the provided criteria.

### **Examples**

#### Example 1: Basic GroupBy on Series

You can group data based on a certain criterion (e.g., a custom list of group labels) and then apply an aggregation function like `mean()`.

```python
import pandas as pd

ser = pd.Series([390., 350., 30., 20.], index=['Falcon', 'Falcon', 'Parrot', 'Parrot'], name="Max Speed")
print(ser.groupby(["a", "b", "a", "b"]).mean())
```

**Output**:

```
a    210.0
b    185.0
Name: Max Speed, dtype: float64
```

#### Example 2: Grouping by Index Level in MultiIndex

You can group data by a hierarchical index (MultiIndex) level.

```python
arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'], ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
ser = pd.Series([390., 350., 30., 20.], index=index, name="Max Speed")
print(ser.groupby(level=0).mean())
```

**Output**:

```
Animal
Falcon    370.0
Parrot     25.0
Name: Max Speed, dtype: float64
```

#### Example 3: GroupBy with NA Values

When the data contains `NaN` values, the `dropna` parameter controls whether the `NaN` values are included in the group.

```python
ser = pd.Series([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
print(ser.groupby(level=0).sum())
print(ser.groupby(level=0, dropna=False).sum())
```

**Output**:

```
a    3
b    3
dtype: int64
a    3
b    3
NaN  3
dtype: int64
```

#### Example 4: GroupBy with Conditional Grouping

You can also group by conditions, such as grouping based on whether the values are greater than a certain threshold.

```python
ser = pd.Series([390., 350., 30., 20.], index=['Falcon', 'Falcon', 'Parrot', 'Parrot'], name="Max Speed")
print(ser.groupby(ser > 100).mean())
```

**Output**:

```
Max Speed
False     25.0
True     370.0
Name: Max Speed, dtype: float64
```

### **Grouping with Categorical Data**

When using categorical data, you can control whether to include only observed categories using the `observed` parameter.

```python
ser = pd.Series([1, 2, 3, 3], index=["a", 'a', 'b', np.nan])
print(ser.groupby(level=0, observed=True).sum())
```

**Output**:

```
a    3
b    3
dtype: int64
```

### **Summary of Use Cases**

- **Basic Grouping**: Group by a list of labels or a function, and then aggregate the groups (e.g., mean, sum).
- **Hierarchical Index**: Use the `level` parameter to group by levels in a `MultiIndex`.
- **Handling Missing Data**: Use `dropna` to control whether to exclude or include `NaN` values in group keys.
- **Conditional Grouping**: You can group based on a condition applied to the data itself (e.g., greater than a certain value).
- **Categorical Grouping**: Control which categories are included in the group result using `observed`.


In [None]:
""" pandas.Series.rolling
Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=<no_default>, closed=None, step=None, method='single')[source]
Provide rolling window calculations.

Parameters:
windowint, timedelta, str, offset, or BaseIndexer subclass
Size of the moving window.

If an integer, the fixed number of observations used for each window.

If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link.

If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, closed and step will be passed to get_window_bounds.

min_periodsint, default None
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

For a window that is specified by an offset, min_periods will default to 1.

For a window that is specified by an integer, min_periods will default to the size of the window.

centerbool, default False
If False, set the window labels as the right edge of the window index.

If True, set the window labels as the center of the window index.

win_typestr, default None
If None, all points are evenly weighted.

If a string, it must be a valid scipy.signal window function.

Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature.

onstr, optional
For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index.

Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

axisint or str, default 0
If 0 or 'index', roll across the rows.

If 1 or 'columns', roll across the columns.

For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: The axis keyword is deprecated. For axis=1, transpose the DataFrame first instead.

closedstr, default None
If 'right', the first point in the window is excluded from calculations.

If 'left', the last point in the window is excluded from calculations.

If 'both', the no points in the window are excluded from calculations.

If 'neither', the first and last points in the window are excluded from calculations.

Default None ('right').

stepint, default None
Added in version 1.5.0.

Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input.

methodstr {‘single’, ‘table’}, default ‘single’
Added in version 1.3.0.

Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

Returns:
pandas.api.typing.Window or pandas.api.typing.Rolling
An instance of Window is returned if win_type is passed. Otherwise, an instance of Rolling is returned.

See also

expanding
Provides expanding transformations.

ewm
Provides exponential weighted functions. """
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df

In [None]:
df.rolling(2).sum()

In [None]:
df_time = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
                       index=[pd.Timestamp('20130101 09:00:00'),
                              pd.Timestamp('20130101 09:00:02'),
                              pd.Timestamp('20130101 09:00:03'),
                              pd.Timestamp('20130101 09:00:05'),
                              pd.Timestamp('20130101 09:00:06')])

In [None]:
df_time

In [None]:
df_time.rolling('2s').sum()

In [None]:
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
df.rolling(window=indexer, min_periods=1).sum()

In [None]:
""" min_periods

Rolling sum with a window length of 2 observations, but only needs a minimum of 1 observation to calculate a value. """

df.rolling(2, min_periods=1).sum()

In [None]:
""" center

Rolling sum with the result assigned to the center of the window index. """

df.rolling(3, min_periods=1, center=True).sum()
df.rolling(3, min_periods=1, center=False).sum()

In [None]:
# step

#  Rolling sum with a window length of 2 observations, minimum of 1 observation to calculate a value, and a step of 2.

df.rolling(2, min_periods=1, step=2).sum()

In [None]:
""" win_type

Rolling sum with a window length of 2, using the Scipy 'gaussian' window type. std is required in the aggregation function. """

df.rolling(2, win_type='gaussian').sum(std=3) 

In [None]:
 # on
 # lling sum with a window length of 2 days.

df = pd.DataFrame({
    'A': [pd.to_datetime('2020-01-01'),
          pd.to_datetime('2020-01-01'),
          pd.to_datetime('2020-01-02'),],
    'B': [1, 2, 3], },
    index=pd.date_range('2020', periods=3))
df

In [None]:
df.rolling('2D', on='A').sum()

The **`pandas.Series.rolling()`** method provides rolling window calculations, which allow you to apply functions (like sum, mean, etc.) over a sliding window of a specified size. This is useful for time-series analysis or smoothing data by calculating moving averages, sums, or other aggregations over a defined window.

### **Syntax**

```python
Series.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=<no_default>, closed=None, step=None, method='single')
```

### **Parameters**

- **window**: int, timedelta, str, offset, or BaseIndexer subclass

  - Defines the size of the moving window.
  - If it's an integer, it will use a fixed number of observations for each window.
  - If it's a timedelta, string, or offset, it will use a time period for the window size (only valid for datetime-like indexes).
  - If it's a BaseIndexer subclass, it defines window boundaries using the `get_window_bounds` method.

- **min_periods**: int, default None

  - Minimum number of observations in the window required for a result. If there are fewer observations than this, the result will be `np.nan`.
  - Defaults to the window size if an integer window is specified.
  - For time-based windows (like timedelta), it defaults to 1.

- **center**: bool, default False

  - If False, the window labels will be positioned at the right edge of the window.
  - If True, the window labels will be positioned at the center of the window.

- **win_type**: str, default None

  - If a string, applies a specific window function from `scipy.signal` (e.g., 'hanning', 'hamming'). This is used to apply specific weights to the observations in the window.

- **on**: str, optional

  - For a DataFrame, this allows you to specify a column label or index level to perform the rolling calculation on, rather than using the index.

- **axis**: int or str, default 0

  - Defines the axis along which to roll the window. For Series, this is ignored (defaults to 0).
  - For DataFrame, 0 means rolling along rows, and 1 means rolling along columns.

- **closed**: str, default None

  - Controls which side of the window is included in the calculation:
    - 'right' (default) excludes the first point in the window.
    - 'left' excludes the last point in the window.
    - 'both' includes both the first and last points.
    - 'neither' excludes both the first and last points.

- **step**: int, default None

  - This allows you to evaluate the window at every `step`-th result. It must be an integer. This feature is available since pandas 1.5.0.

- **method**: str, {'single', 'table'}, default 'single'
  - When set to 'single', the rolling operation is executed for each column/row individually.
  - When set to 'table', the rolling operation is performed over the entire DataFrame or Series (for the `numba` engine).

### **Returns**

- **pandas.api.typing.Window** or **pandas.api.typing.Rolling**
  - If `win_type` is specified, a `Window` object is returned, which provides additional methods for weighted rolling operations.
  - If `win_type` is not specified, a `Rolling` object is returned, allowing for standard rolling window operations (e.g., `sum()`, `mean()`, etc.).

### **See Also**

- **expanding**: Provides expanding transformations (aggregates over an expanding window).
- **ewm**: Provides exponential weighted functions, which are similar to rolling but apply exponentially decreasing weights.

### **Examples**

#### Example 1: Basic Rolling Window

```python
import pandas as pd

ser = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9])
rolling = ser.rolling(window=3)
print(rolling.mean())  # Moving average with a window size of 3
```

**Output**:

```
0    NaN
1    NaN
2    2.0
3    3.0
4    4.0
5    5.0
6    6.0
7    7.0
8    8.0
dtype: float64
```

#### Example 2: Rolling Window with Minimum Periods

```python
ser = pd.Series([1, 2, np.nan, 4, 5])
rolling = ser.rolling(window=3, min_periods=2)
print(rolling.mean())  # Requires at least 2 observations for the rolling mean
```

**Output**:

```
0    NaN
1    1.5
2    1.5
3    3.0
4    4.5
dtype: float64
```

#### Example 3: Centering the Rolling Window

```python
ser = pd.Series([1, 2, 3, 4, 5])
rolling = ser.rolling(window=3, center=True)
print(rolling.mean())  # The window is centered around each point
```

**Output**:

```
0    NaN
1    2.0
2    2.0
3    3.0
4    NaN
dtype: float64
```

#### Example 4: Applying a Rolling Window with a Custom Function

```python
ser = pd.Series([1, 2, 3, 4, 5, 6])
rolling = ser.rolling(window=3)
print(rolling.apply(lambda x: x.max()))  # Maximum value within each window
```

**Output**:

```
0    NaN
1    NaN
2    3.0
3    4.0
4    5.0
5    6.0
dtype: float64
```

#### Example 5: Rolling Window with Scipy Window Type

```python
from scipy.signal import hamming

ser = pd.Series([1, 2, 3, 4, 5, 6])
rolling = ser.rolling(window=3, win_type='hamming')
print(rolling.mean())
```

This will apply a Hamming window to the rolling calculation, giving more weight to the center of each window.

---

### **Summary**

- The **`rolling()`** method in pandas is used for performing rolling calculations such as mean, sum, max, etc., over a sliding window.
- You can control the window size, minimum observations required, window centering, and apply custom window functions such as those from `scipy.signal`.
- It’s useful for time-series analysis or smoothing noisy data.


In [None]:
""" pandas.Series.ewm
Series.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=<no_default>, times=None, method='single')

"""

""" Provide exponentially weighted (EW) calculations.

Exactly one of com, span, halflife, or alpha must be provided if times is not provided. If times is provided, halflife and one of com, span or alpha may be provided.

Parameters:
com : float, optional
Specify decay in terms of center of mass
alpha =1/(1+com), for com>=0 .

span: float, optional
Specify decay in terms of span
alpha=2/(span+1), for span>=1.


halflife: float, str, timedelta, optional
Specify decay in terms of half-life
alpha=1−exp(log(0.5)/halflife), for halflife>0 .

If times is specified, a timedelta convertible unit over which an observation decays to half its value. Only applicable to mean(), and halflife value will not apply to the other functions.

alpha: float, optional
Specify smoothing factor 
 directly
0<alpha<=1 .

min_periods: int, default 0
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

adjust: bool, default True
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).

When adjust=True (default), the EW function is calculated using weights 
y_0=x_0,

y_t=alpha*x_t+(1-alpha)*y_{t-1}.



ignore_na: bool, default False
Ignore missing values when calculating weights.

When ignore_na=False (default), weights are based on absolute positions. 

When ignore_na=True, weights are based on relative positions.

axis : {0, 1}, default 0
If 0 or 'index', calculate across the rows.

If 1 or 'columns', calculate across the columns.

For Series this parameter is unused and defaults to 0.

times : np.ndarray, Series, default None
Only applicable to mean().

Times corresponding to the observations. Must be monotonically increasing and datetime64[ns] dtype.

If 1-D array like, a sequence with the same shape as the observations.

method: str {‘single’, ‘table’}, default ‘single’
Added in version 1.4.0.

Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

Only applicable to mean()

Returns:
pandas.api.typing.ExponentialMovingWindow

rolling
Provides rolling window calculations.

expanding

Provides expanding transformations.

   """
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df


In [None]:
df.ewm(com=0.5).mean()

In [None]:
df.ewm(alpha=2 / 3).mean()

In [None]:
# ADJUST
df.ewm(com=0.5, adjust=True).mean()
df.ewm(com=0.5, adjust=False).mean()

In [None]:
# ignore_na
df.ewm(com=0.5, ignore_na=True).mean()
df.ewm(com=0.5, ignore_na=False).mean()

In [None]:
# times

# Exponentially weighted mean with weights calculated with a timedelta halflife relative to times.

times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()

The **`pandas.Series.ewm()`** method computes exponentially weighted (EW) calculations, which apply exponentially decreasing weights to a series of data. This is particularly useful for smoothing time-series data, calculating exponential moving averages (EMAs), or other exponentially weighted metrics.

### **Syntax**

```python
Series.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=<no_default>, times=None, method='single')
```

### **Parameters**

- **com**: float, optional

  - Specifies the decay in terms of the center of mass.
  - This parameter defines how much weight each point receives. A higher value means a faster decay of weights.

- **span**: float, optional

  - Specifies the decay in terms of the span (how wide the window is).
  - The span is related to the center of mass (com) via `span = 2 * com`.

- **halflife**: float, str, timedelta, optional

  - Specifies the decay in terms of half-life, which is the time it takes for the weight to decay to half of its value.
  - This is especially useful when working with time-based data. It can be a string (e.g., `'4 days'`) or a timedelta.

- **alpha**: float, optional

  - Specifies the smoothing factor directly. The alpha value determines the weight given to the most recent observation in the series.
  - The formula for alpha is `alpha = 1 / (1 + span)` or `alpha = 2 / (1 + span)`.

- **min_periods**: int, default 0

  - Minimum number of observations required in the window for a valid result. If fewer than `min_periods` are available, the result will be `np.nan`.

- **adjust**: bool, default True

  - If `True`, the calculation adjusts the weights to account for earlier periods, treating the EW calculation as a moving average.
  - If `False`, the calculation is recursive and the adjustment factor is not applied.

- **ignore_na**: bool, default False

  - If `True`, missing values (NaN) are ignored while calculating the weights.
  - If `False`, NaNs are treated as part of the series, and the weight of missing values will be handled accordingly.

- **axis**: {0, 1}, default 0

  - Specifies the axis along which to calculate the exponentially weighted function. For a Series, this is ignored (defaults to 0).

- **times**: np.ndarray, Series, optional

  - For calculating exponentially weighted means over time, this provides a sequence of timestamps. The times must be monotonically increasing and in `datetime64[ns]` dtype.
  - This parameter is relevant only for `mean()` and allows for time-based decay.

- **method**: str {'single', 'table'}, default 'single'
  - Determines whether the operation is performed per column/row ('single') or over the entire object ('table').
  - This is only implemented when using the `numba` engine with the `mean()` function.

### **Returns**

- **pandas.api.typing.ExponentialMovingWindow**
  - This object represents the exponentially weighted window and provides methods for various EW calculations (e.g., `mean()`, `std()`, `cov()`, etc.).

### **See Also**

- **rolling**: Provides rolling window calculations (for fixed-sized windows).
- **expanding**: Provides expanding transformations (for a cumulative window).

### **Examples**

#### Example 1: Basic Exponentially Weighted Mean (with `com`)

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
print(df.ewm(com=0.5).mean())
```

**Output**:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

#### Example 2: Exponentially Weighted Mean (with `alpha`)

```python
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
print(df.ewm(alpha=2 / 3).mean())
```

**Output**:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

#### Example 3: Adjusting the Exponentially Weighted Mean

```python
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
# Adjusted calculation
print(df.ewm(com=0.5, adjust=True).mean())
```

**Output**:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

#### Example 4: Exponentially Weighted Mean without Adjusting

```python
print(df.ewm(com=0.5, adjust=False).mean())
```

**Output**:

```
          B
0  0.000000
1  0.666667
2  1.555556
3  1.555556
4  3.650794
```

#### Example 5: Ignoring NaNs during Exponentially Weighted Mean Calculation

```python
print(df.ewm(com=0.5, ignore_na=True).mean())
```

**Output**:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.225000
```

#### Example 6: Using Time-based Exponentially Weighted Mean (with `halflife`)

```python
times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
df = pd.DataFrame({'B': [0, 1, 2, 3, 4]}, index=pd.DatetimeIndex(times))
print(df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean())
```

**Output**:

```
                   B
2020-01-01  0.000000
2020-01-03  0.585786
2020-01-10  1.523889
2020-01-15  1.523889
2020-01-17  3.233686
```

### **Summary**

- The **`ewm()`** method is used for exponentially weighted calculations, such as exponential moving averages.
- The decay of weights can be controlled through parameters like `com`, `span`, `alpha`, and `halflife`.
- You can also adjust how weights are applied with the `adjust` and `ignore_na` parameters.
- This method is particularly useful for time-series data where recent observations are given more weight than older ones.


In [None]:
""" pandas.Series.pipe
Series.pipe(func, *args, **kwargs)[source]
Apply chainable functions that expect Series or DataFrames.

Parameters
:
func
function
Function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame.

*args
iterable, optional
Positional arguments passed into func.

**kwargs
mapping, optional
A dictionary of keyword arguments passed into func.

Returns
:
the return type of
func
.
See also

DataFrame.apply
Apply a function along input axis of DataFrame.

DataFrame.map
Apply a function elementwise on a whole DataFrame.

Series.map
Apply a mapping correspondence on a Series.

Notes

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects. """

data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
df = pd.DataFrame(data, columns=['Salary', 'Others'])
df

In [None]:
def subtract_federal_tax(df):
    return df * 0.9
def subtract_state_tax(df, rate):
    return df * (1 - rate)
def subtract_national_insurance(df, rate, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)

In [None]:
subtract_national_insurance(
    subtract_state_tax(subtract_federal_tax(df), rate=0.12),
    rate=0.05,
    rate_increase=0.02)  

In [None]:
def subtract_national_insurance(rate, df, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)
(
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(
        (subtract_national_insurance, 'df'),
        rate=0.05,
        rate_increase=0.02
    )
)

The **`pandas.Series.pipe()`** method is used to apply a function to a Series (or DataFrame) within a chain of operations. It allows you to apply functions in a clean, readable, and functional programming style, especially when chaining multiple functions together.

### **Syntax**

```python
Series.pipe(func, *args, **kwargs)
```

### **Parameters**

- **func**: function
  - The function to apply to the Series (or DataFrame). This function will be passed the Series or DataFrame as the first argument.
  - You can also pass a tuple where the first element is the callable function and the second is a keyword (a string) indicating where the data should be passed (e.g., as a keyword argument).
- **args**: iterable, optional

  - Positional arguments to pass into `func`.

- **kwargs**: mapping, optional
  - Keyword arguments to pass into `func`.

### **Returns**

- The return type of the function `func`, which could be another Series, DataFrame, or any object returned by the function.

### **Use Cases**

- **Chaining functions**: It is particularly useful when chaining multiple functions that expect a Series or DataFrame as their first argument. Instead of nesting function calls, `pipe` allows for cleaner, more readable code.

### **Examples**

#### Example 1: Basic Usage of `pipe`

Let's say you have a DataFrame with salary data and you want to apply a series of functions to reduce the income.

```python
import pandas as pd
import numpy as np

# Sample data
data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
df = pd.DataFrame(data, columns=['Salary', 'Others'])

# Functions to apply
def subtract_federal_tax(df):
    return df * 0.9

def subtract_state_tax(df, rate):
    return df * (1 - rate)

def subtract_national_insurance(df, rate, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)

# Chaining with pipe
result = (
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
)

print(result)
```

**Output**:

```
   Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12
```

Here, the three tax deduction functions are applied one after the other. Using `pipe`, the result of each function is passed to the next function in the chain.

#### Example 2: Using `pipe` with Keyword Arguments

If a function expects the Series (or DataFrame) as a keyword argument, you can pass a tuple to `pipe` that specifies which argument to assign the Series to.

```python
def subtract_national_insurance(rate, df, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)

result = (
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(
        (subtract_national_insurance, 'df'),
        rate=0.05,
        rate_increase=0.02
    )
)

print(result)
```

**Output**:

```
   Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12
```

In this example, `subtract_national_insurance` is modified to accept the DataFrame as a keyword argument, and we specify that `df` should receive the DataFrame inside the `pipe` call using a tuple.

#### Example 3: Using `pipe` for More Complex Chains

You can use `pipe` to chain more complex operations, such as adding additional columns or applying custom functions, and combine them into a single readable chain.

```python
def add_bonus(df, bonus_percentage):
    df['Bonus'] = df['Salary'] * bonus_percentage
    return df

def apply_discount(df, discount_percentage):
    df['Salary'] = df['Salary'] * (1 - discount_percentage)
    return df

result = (
    df.pipe(add_bonus, bonus_percentage=0.1)
    .pipe(apply_discount, discount_percentage=0.05)
)

print(result)
```

**Output**:

```
   Salary  Others  Bonus
0   5706.0   1000   800.0
1   6844.5    NaN   950.0
2   3500.0   2000   500.0
```

Here, two functions (`add_bonus` and `apply_discount`) are applied in a chain to compute the bonus and apply a salary discount.

### **Advantages of Using `pipe`**

- **Readability**: Avoids deeply nested function calls. Each function call is neatly aligned, improving the readability of code.
- **Flexibility**: Allows you to chain custom functions with varying arguments.
- **Maintainability**: Makes it easier to modify or add functions without changing the overall structure of the code.

### **Summary**

- **`pipe()`** is great for applying a series of transformations in a readable, functional style.
- It works by passing the Series or DataFrame to each function in the chain.
- You can pass additional arguments and keyword arguments to functions in the chain.
- This method is ideal when you have multiple operations to perform on a Series/DataFrame and want to avoid deeply nested function calls.


In [None]:
""" pandas.Series.abs

Series.abs()

Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns
:
abs
Series/DataFrame containing the absolute value of each element.

See also

numpy.absolute
Calculate the absolute value element-wise. 

Notes

For complex inputs, 1.2 + 1j, the absolute value is square_root(a^2+b^2)
.


"""
s = pd.Series([-1.10, 2, -3.33, 4])
s.abs()
s = pd.Series([1.2 + 1j])
s.abs()


In [None]:
s = pd.Series([pd.Timedelta('1 days')])
s.abs()

In [None]:
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})
df

In [None]:
df.loc[(df.c - 43).abs().argsort()]

The **`pandas.Series.abs()`** function is used to return a Series (or DataFrame) where the numeric values are transformed into their absolute values. It applies to numerical data types and can also handle complex numbers, returning the magnitude (absolute value) for complex numbers.

### **Syntax**

```python
Series.abs()
```

### **Returns**

- The function returns a **Series** or **DataFrame** containing the absolute values of each element.
  - For **numeric types**, it returns the absolute value of each number.
  - For **complex numbers**, it computes the magnitude (Euclidean norm).
  - For **Timedelta objects**, it returns the absolute difference in days.

### **Parameters**

- **None**: The function works on the current Series/DataFrame and doesn’t take any other arguments.

### **Examples**

#### Example 1: Absolute values of numeric data

```python
import pandas as pd

# Creating a Series with negative and positive values
s = pd.Series([-1.10, 2, -3.33, 4])

# Applying abs() to get the absolute values
result = s.abs()

print(result)
```

**Output**:

```
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64
```

Here, the negative values in the Series are converted to positive.

#### Example 2: Absolute values with complex numbers

```python
s = pd.Series([1.2 + 1j])  # Complex number

# Applying abs() to get the magnitude
result = s.abs()

print(result)
```

**Output**:

```
0    1.56205
dtype: float64
```

For complex numbers, the absolute value is calculated as the magnitude (Euclidean norm), which is `sqrt(real^2 + imag^2)`.

#### Example 3: Absolute values with Timedelta

```python
s = pd.Series([pd.Timedelta('1 days')])

# Applying abs() to Timedelta
result = s.abs()

print(result)
```

**Output**:

```
0   1 days
dtype: timedelta64[ns]
```

For **Timedelta objects**, `abs()` returns the absolute duration (in days, hours, minutes, etc.).

#### Example 4: Sorting rows based on absolute difference to a value

You can also use `.abs()` to filter or sort based on absolute differences from a target value.

```python
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})

# Sorting rows based on the absolute difference of column 'c' to 43
sorted_df = df.loc[(df.c - 43).abs().argsort()]

print(sorted_df)
```

**Output**:

```
   a   b   c
1  5  20  50
0  4  10 100
2  6  30 -30
3  7  40 -50
```

Here, the `.abs()` function is used to compute the absolute difference between each element in column `c` and the target value `43`. The `argsort()` function is then used to sort the rows based on these differences.

### **See Also**

- **`numpy.absolute`**: This is the underlying function from NumPy that performs element-wise absolute value calculations.

### **Summary**

- `Series.abs()` is a simple and effective method to convert all elements in a Series or DataFrame to their absolute values.
- It works on **numeric**, **complex**, and **Timedelta** data types.
- It's useful for sorting, filtering, or transforming data where negative values need to be made positive or where magnitudes are required (in case of complex numbers).


In [None]:
""" pandas.Series.all
Series.all(axis=0, bool_only=False, skipna=True, **kwargs)[source]
Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters
:
axis:

{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

None : reduce all axes, return a scalar.

bool_only : 

bool, default False
Include only boolean columns. Not implemented for Series.

skipna:

bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

**kwargs:

any, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns:

scalar or Series
If level is specified, then, Series is returned; otherwise, scalar is returned.

Series.all
Return True if all elements are True.

DataFrame.any
Return True if one (or more) elements are True. """
# Series

pd.Series([True, True]).all()
pd.Series([True, False]).all()
pd.Series([], dtype="float64").all()
pd.Series([np.nan]).all()
pd.Series([np.nan]).all(skipna=False)

In [None]:
# DataFrames

# Create a dataframe from a dictionary.

df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
df

In [None]:
# Default behaviour checks if values in each column all return True.

df.all()

In [None]:
# Specify axis='columns' to check if values in each row all return True.

df.all(axis='columns')

In [None]:
# Or axis=None for whether every value is True.

df.all(axis=None)

The **`pandas.Series.all()`** function is used to check if all elements in a Series or DataFrame (along a specified axis) are `True`. It returns `True` if every element is `True` (or equivalent), and `False` if any element is `False` (or equivalent).

### **Syntax**

```python
Series.all(axis=0, bool_only=False, skipna=True, **kwargs)
```

### **Parameters**

- **`axis`**:

  - {0, ‘index’, 1, ‘columns’, None}, default 0
  - For a **Series**, this parameter is unused and defaults to 0.
  - For a **DataFrame**, `axis=0` reduces along the rows (checks columns), and `axis=1` reduces along the columns (checks rows). `None` checks all elements in the DataFrame.

- **`bool_only`**:

  - bool, default `False`
  - If `True`, only boolean columns will be considered. This is not implemented for Series.

- **`skipna`**:

  - bool, default `True`
  - If `True`, excludes `NA/null` values. If an entire row or column contains `NA` values and `skipna=True`, the result will be `True`. If `skipna=False`, `NA` values are treated as `True`.

- **`kwargs`**:
  - Additional arguments passed for compatibility with NumPy.

### **Returns**

- **scalar or Series**:
  - If a single axis is reduced, a **scalar** is returned.
  - If `level` is specified, a **Series** is returned.

### **Examples**

#### Example 1: Series with boolean values

```python
import pandas as pd

# Series with all True values
s = pd.Series([True, True])
result = s.all()
print(result)  # Output: True

# Series with one False value
s = pd.Series([True, False])
result = s.all()
print(result)  # Output: False
```

#### Example 2: Empty Series

```python
s = pd.Series([], dtype="float64")
result = s.all()
print(result)  # Output: True
```

For an empty Series, `.all()` returns `True` because there are no elements to evaluate as `False`.

#### Example 3: Series with `NaN`

```python
import numpy as np

# Series with NaN
s = pd.Series([np.nan])
result = s.all()
print(result)  # Output: True (default behavior with skipna=True)

# Series with NaN and skipna=False
result = s.all(skipna=False)
print(result)  # Output: True (NaN is treated as True by default)
```

#### Example 4: DataFrame with boolean values

```python
df = pd.DataFrame({
    'col1': [True, True],
    'col2': [True, False]
})

# Check if all elements in each column are True
result = df.all()
print(result)
```

**Output**:

```
col1     True
col2    False
dtype: bool
```

Here, `all()` checks if every value in each column is `True`. For `col1`, all values are `True`, but for `col2`, there's a `False` value, so it returns `False`.

#### Example 5: DataFrame with axis='columns' to check row-wise

```python
result = df.all(axis='columns')
print(result)
```

**Output**:

```
0     True
1    False
dtype: bool
```

This checks if all values in each row are `True`. In row 1, the second value is `False`, so the result is `False` for that row.

#### Example 6: DataFrame with axis=None to check all elements

```python
result = df.all(axis=None)
print(result)
```

**Output**:

```
False
```

This checks if **all** values in the entire DataFrame are `True`. Since `col2` has a `False`, the result is `False`.

### **See Also**

- **`Series.any()`**: Returns `True` if at least one element is `True` (similar to `.all()` but checks for any `True` value).
- **`DataFrame.all()`**: Performs a similar operation for DataFrames, checking if all values are `True` across rows or columns.

### **Summary**

- **`Series.all()`** checks if **all** values in a Series are `True`.
- For **DataFrames**, you can specify the axis to check along rows or columns, or check across the entire DataFrame.
- **`skipna`** allows you to control how missing values (NA) are handled in the check.


In [None]:
""" pandas.Series.any
Series.any(*, axis=0, bool_only=False, skipna=True, **kwargs)[source]
Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

None : reduce all axes, return a scalar.

bool_only
bool, default False
Include only boolean columns. Not implemented for Series.

skipna
bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

**kwargs
any, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
scalar or Series
If level is specified, then, Series is returned; otherwise, scalar is returned.


numpy.any
Numpy version of this method.

Series.any
Return whether any element is True.

Series.all
Return whether all elements are True.

DataFrame.any
Return whether any element is True over requested axis.

DataFrame.all
Return whether all elements are True over requested axis. """

pd.Series([False, False]).any()
pd.Series([True, False]).any()
pd.Series([], dtype="float64").any()
pd.Series([np.nan]).any()
pd.Series([np.nan]).any(skipna=False)

In [None]:
# DataFrame

# Whether each column contains at least one True element (the default).

df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
df

In [None]:
df.any()


In [None]:
# Aggregating over the columns.

df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
df

In [None]:
df.any(axis='columns')

In [None]:
df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
df.any(axis='columns')
# Aggregating over the entire DataFrame with axis=None.

df.any(axis=None)

In [None]:
# any for an empty DataFrame is an empty Series.

pd.DataFrame([]).any()

The **`pandas.Series.any()`** function checks if **any element** in the Series or along a specified axis of a DataFrame is `True`. It returns `True` if at least one element is `True` (or equivalent), and `False` otherwise.

### **Syntax**

```python
Series.any(axis=0, bool_only=False, skipna=True, **kwargs)
```

### **Parameters**

- **`axis`**:

  - {0, ‘index’, 1, ‘columns’, None}, default 0
  - For **Series**, this parameter is unused and defaults to 0.
  - For **DataFrames**, `axis=0` reduces along rows (checks columns), and `axis=1` reduces along columns (checks rows). `None` checks across all elements in the DataFrame.

- **`bool_only`**:

  - bool, default `False`
  - If `True`, only boolean columns will be considered (this is not implemented for Series).

- **`skipna`**:

  - bool, default `True`
  - If `True`, excludes `NA/null` values. If an entire row or column contains `NA` values and `skipna=True`, the result will be `False`. If `skipna=False`, `NA` values are treated as `True`.

- **`kwargs`**:
  - Additional arguments, passed for compatibility with NumPy.

### **Returns**

- **scalar or Series**:
  - If **Series**, a scalar is returned (True or False).
  - If **DataFrame**, a Series is returned with the result for each column or row (depending on `axis`).

### **Examples**

#### Example 1: Series with boolean values

```python
import pandas as pd

# Series with no True values
s = pd.Series([False, False])
result = s.any()
print(result)  # Output: False

# Series with one True value
s = pd.Series([True, False])
result = s.any()
print(result)  # Output: True
```

#### Example 2: Empty Series

```python
# Empty Series
s = pd.Series([], dtype="float64")
result = s.any()
print(result)  # Output: False (since there are no elements)

# Series with NaN values
import numpy as np
s = pd.Series([np.nan])
result = s.any()
print(result)  # Output: False (NaN is not True by default)

# Series with NaN values and skipna=False
result = s.any(skipna=False)
print(result)  # Output: True (NaN is treated as True when skipna=False)
```

#### Example 3: DataFrame with boolean values

```python
# DataFrame with boolean values
df = pd.DataFrame({
    'A': [True, False],
    'B': [0, 2],
    'C': [0, 0]
})

# Check if any value in each column is True
result = df.any()
print(result)
```

**Output**:

```
A     True
B     True
C    False
dtype: bool
```

Here, `any()` checks if at least one value in each column is `True`.

#### Example 4: Aggregating over rows (axis=1)

```python
# DataFrame with numeric and boolean values
df = pd.DataFrame({
    'A': [True, False],
    'B': [1, 2]
})

# Check if any value in each row is True
result = df.any(axis='columns')
print(result)
```

**Output**:

```
0     True
1     True
dtype: bool
```

Here, `axis='columns'` checks if any value in each row is `True`.

#### Example 5: Aggregating over the entire DataFrame (axis=None)

```python
# DataFrame with mixed values
df = pd.DataFrame({
    'A': [True, False],
    'B': [1, 0]
})

# Check if any value in the entire DataFrame is True
result = df.any(axis=None)
print(result)
```

**Output**:

```
True
```

This checks the entire DataFrame and returns `True` because there is at least one `True` value.

#### Example 6: Empty DataFrame

```python
# Empty DataFrame
df_empty = pd.DataFrame([])

# Check if any value in the DataFrame is True
result = df_empty.any()
print(result)
```

**Output**:

```
Series([], dtype: bool)
```

For an empty DataFrame, the result is an empty Series.

### **See Also**

- **`numpy.any()`**: Numpy version of this method.
- **`Series.all()`**: Checks if all elements in a Series are `True`.
- **`DataFrame.any()`**: Checks if any value in a DataFrame is `True`, along the requested axis.

### **Summary**

- **`Series.any()`** returns `True` if at least one element is `True`.
- For **DataFrames**, it checks along the specified axis (rows or columns) and can return a Series.
- **`skipna`** determines how missing values are handled (whether they are treated as `True` or ignored).


In [None]:
""" pandas.Series.autocorr
Series.autocorr(lag=1)[source]
Compute the lag-N autocorrelation.

This method computes the Pearson correlation between the Series and its shifted self.

Parameters
:
lag
int, default 1
Number of lags to apply before performing autocorrelation.

Returns
:
float
The Pearson correlation between self and self.shift(lag).

See also

Series.corr
Compute the correlation between two Series.

Series.shift
Shift index by desired number of periods.

DataFrame.corr
Compute pairwise correlation of columns.

DataFrame.corrwith
Compute pairwise correlation between rows or columns of two DataFrame objects.

Notes

If the Pearson correlation is not well defined return ‘NaN’.  """

In [None]:
s = pd.Series([0.25, 0.5, 0.2, -0.05])
s.autocorr()  

In [None]:
s.autocorr(lag=2) 

In [None]:
# If the Pearson correlation is not well defined, then ‘NaN’ is returned.

s = pd.Series([1, 0, 0, 0])
s.autocorr()

The **`pandas.Series.autocorr()`** function computes the **lag-N autocorrelation** of a Series, which is a measure of how well a time series is correlated with a lagged version of itself. It uses the **Pearson correlation** between the original Series and a shifted version of it.

### **Syntax**

```python
Series.autocorr(lag=1)
```

### **Parameters**

- **`lag`**:
  - **int**, default `1`
  - The number of periods to shift the Series before computing the autocorrelation.

### **Returns**

- **float**:
  - The Pearson correlation between the Series and its shifted version. If the Pearson correlation is not well defined (e.g., due to constant values), it returns `NaN`.

### **Notes**

- **Pearson correlation** is used for autocorrelation, which measures the linear relationship between a Series and its lagged version.
- If the Pearson correlation cannot be computed (e.g., if there is no variation in the data), the result will be `NaN`.

### **See also**

- **`Series.corr`**: Computes the correlation between two Series.
- **`Series.shift`**: Shifts the index by a specified number of periods.
- **`DataFrame.corr`**: Computes pairwise correlation between columns in a DataFrame.
- **`DataFrame.corrwith`**: Computes pairwise correlation between rows or columns of two DataFrame objects.

### **Examples**

#### Example 1: Basic Autocorrelation

```python
import pandas as pd

# Sample Series
s = pd.Series([0.25, 0.5, 0.2, -0.05])

# Autocorrelation with lag=1 (default)
result_lag_1 = s.autocorr()
print(result_lag_1)  # Output: 0.10355...

# Autocorrelation with lag=2
result_lag_2 = s.autocorr(lag=2)
print(result_lag_2)  # Output: -0.99999...
```

#### Example 2: Autocorrelation for Constant Series

```python
# Constant Series
s_constant = pd.Series([1, 0, 0, 0])

# Autocorrelation (result is NaN because there's no variation)
result_constant = s_constant.autocorr()
print(result_constant)  # Output: nan
```

#### Example 3: Series with Positive Correlation

```python
# Positive correlation example
s_pos = pd.Series([1, 2, 3, 4, 5])

# Autocorrelation with lag=1
result_pos = s_pos.autocorr()
print(result_pos)  # Output: 1.0 (perfect positive correlation)
```

#### Example 4: Series with Negative Correlation

```python
# Negative correlation example
s_neg = pd.Series([5, 4, 3, 2, 1])

# Autocorrelation with lag=1
result_neg = s_neg.autocorr()
print(result_neg)  # Output: -1.0 (perfect negative correlation)
```

### **Summary**

- **`autocorr(lag=1)`** computes the Pearson correlation between a Series and its shifted self by a specified number of periods (lag).
- It can be used to assess the linear dependence of the data with itself over time.
- If the Series has constant values, the correlation will return `NaN`, as the correlation cannot be calculated.


In [None]:
""" 
pandas.Series.between

Series.between(left, right, inclusive='both')
Return boolean Series equivalent to left <= series <= right.

This function returns a boolean vector containing True wherever the corresponding Series element is between the boundary values left and right. NA values are treated as False.

Parameters:
left: scalar or list-like
Left boundary.

right: scalar or list-like
Right boundary.

inclusive: {“both”, “neither”, “left”, “right”}
Include boundaries. Whether to set each bound as closed or open.

Changed in version 1.3.0.

Returns:
Series
Series representing whether each element is between left and right (inclusive). 

Series.gt
Greater than of series and other.

Series.lt
Less than of series and other.


Notes

This function is equivalent to (left <= ser) & (ser <= right)
"""
s = pd.Series([2, 0, 4, 8, np.nan])

In [None]:
# Boundary values are included by default:

s.between(1, 4)

In [None]:
# With inclusive set to "neither" boundary values are excluded:

s.between(1, 4, inclusive="neither")

In [None]:
# left and right can be any scalar value:

s = pd.Series(['Alice', 'Bob', 'Carol', 'Eve'])
s.between('Anna', 'Daniel')

The **`pandas.Series.between()`** function is used to check if the elements of a Series lie between two boundary values, `left` and `right`. It returns a boolean Series where each element is `True` if it lies between the boundaries and `False` otherwise.

### **Syntax**

```python
Series.between(left, right, inclusive='both')
```

### **Parameters**

- **`left`**:
  - **scalar** or **list-like**
  - The left boundary (inclusive by default).
- **`right`**:

  - **scalar** or **list-like**
  - The right boundary (inclusive by default).

- **`inclusive`**:
  - **{“both”, “neither”, “left”, “right”}**, default `'both'`
  - Whether the boundary values `left` and `right` are included in the comparison:
    - **`'both'`**: Both boundaries are included.
    - **`'neither'`**: Excludes both boundaries.
    - **`'left'`**: Includes the left boundary but excludes the right.
    - **`'right'`**: Excludes the left boundary but includes the right.

### **Returns**

- **Series**: A boolean Series indicating whether each element is between `left` and `right`, inclusive or exclusive based on the `inclusive` parameter.

### **Notes**

- The function is equivalent to:
  ```python
  (left <= ser) & (ser <= right)
  ```
- `NaN` values are treated as `False`.

### **See also**

- **`Series.gt`**: Checks if the Series is greater than a scalar.
- **`Series.lt`**: Checks if the Series is less than a scalar.

### **Examples**

#### Example 1: Basic Usage with Default `inclusive='both'`

```python
import pandas as pd
import numpy as np

# Sample Series
s = pd.Series([2, 0, 4, 8, np.nan])

# Check if elements are between 1 and 4 (inclusive)
result = s.between(1, 4)
print(result)
```

Output:

```
0     True
1    False
2     True
3    False
4    False
dtype: bool
```

#### Example 2: Excluding Both Boundaries (`inclusive='neither'`)

```python
# Check if elements are between 1 and 4 (exclusive)
result_neither = s.between(1, 4, inclusive='neither')
print(result_neither)
```

Output:

```
0     True
1    False
2    False
3    False
4    False
dtype: bool
```

#### Example 3: Using String Boundaries

```python
# Sample Series of strings
s = pd.Series(['Alice', 'Bob', 'Carol', 'Eve'])

# Check if elements are between 'Anna' and 'Daniel' (inclusive by default)
result_str = s.between('Anna', 'Daniel')
print(result_str)
```

Output:

```
0    False
1     True
2     True
3    False
dtype: bool
```

#### Example 4: Including Only Left Boundary (`inclusive='left'`)

```python
# Check if elements are between 1 and 4, including left boundary only
result_left = s.between(1, 4, inclusive='left')
print(result_left)
```

Output:

```
0     True
1    False
2     True
3    False
4    False
dtype: bool
```

#### Example 5: Including Only Right Boundary (`inclusive='right'`)

```python
# Check if elements are between 1 and 4, including right boundary only
result_right = s.between(1, 4, inclusive='right')
print(result_right)
```

Output:

```
0     True
1    False
2    False
3    False
4    False
dtype: bool
```

### **Summary**

- **`Series.between(left, right, inclusive)`** checks whether the elements in the Series are between `left` and `right` with customizable boundary inclusivity.
- The `inclusive` parameter controls whether to include or exclude the boundary values for `left` and `right`.
- By default, both boundaries are included (`inclusive='both'`).


In [None]:
""" pandas.Series.clip
Series.clip(lower=None, upper=None, *, axis=None, inplace=False, **kwargs)[source]
Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis.

Parameters
:
lower
float or array-like, default None
Minimum threshold value. All values below this threshold will be set to it. A missing threshold (e.g NA) will not clip the value.

upper
float or array-like, default None
Maximum threshold value. All values above this threshold will be set to it. A missing threshold (e.g NA) will not clip the value.

axis
{{0 or ‘index’, 1 or ‘columns’, None}}, default None
Align object with lower and upper along the given axis. For Series this parameter is unused and defaults to None.

inplace
bool, default False
Whether to perform the operation in place on the data.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with numpy.

Returns
:
Series or DataFrame or None
Same type as calling object with the values outside the clip boundaries replaced or None if inplace=True.



Series.clip
Trim values at input threshold in series.

DataFrame.clip
Trim values at input threshold in dataframe.

numpy.clip
Clip (limit) the values in an array. """
data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
df

In [None]:
 # clips per column using lower and upper thresholds:

df.clip(-4, 6)

In [None]:
# Clips using specific lower and upper thresholds per column:

df.clip([-2, -1], [4, 5])

In [None]:
# Clips using specific lower and upper thresholds per column element:

t = pd.Series([2, -4, -1, 6, 3])
t

In [None]:
df.clip(t, t + 4, axis=0)

In [None]:
# Clips using specific lower threshold per column element, with missing values:

t = pd.Series([2, -4, np.nan, 6, 3])
t

In [None]:
df.clip(t, axis=0)

The **`pandas.Series.clip()`** function is used to trim values at specified threshold(s). It allows you to limit the values in a Series to a defined range by setting all values outside this range to the boundary values. The thresholds can be scalar or array-like, and if array-like, the clipping will be performed element-wise along the specified axis.

### **Syntax**

```python
Series.clip(lower=None, upper=None, *, axis=None, inplace=False, **kwargs)
```

### **Parameters**

- **`lower`**:

  - **float** or **array-like**, default `None`
  - Minimum threshold. Values below this threshold will be clipped to it.

- **`upper`**:

  - **float** or **array-like**, default `None`
  - Maximum threshold. Values above this threshold will be clipped to it.

- **`axis`**:

  - **{0 or ‘index’, 1 or ‘columns’, None}**, default `None`
  - Align the object with `lower` and `upper` along the specified axis. For Series, this parameter is unused.

- **`inplace`**:

  - **bool**, default `False`
  - If `True`, modifies the Series in place (no new object is returned).

- **`\*args, **kwargs`\*\*:
  - Additional arguments and keyword arguments for compatibility, but they have no effect.

### **Returns**

- **Series or DataFrame or None**:
  - The same type as the calling object, with values clipped to the specified thresholds.
  - If `inplace=True`, the function returns `None` (modifies the object in place).

### **See also**

- **`Series.clip`**: Clips values for Series.
- **`DataFrame.clip`**: Clips values for DataFrames.
- **`numpy.clip`**: The numpy equivalent for clipping values in an array.

### **Examples**

#### Example 1: Clip using scalar thresholds for both columns

```python
import pandas as pd

# Sample DataFrame
data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
print(df)

# Clip values using lower=-4 and upper=6 for both columns
clipped_df = df.clip(-4, 6)
print(clipped_df)
```

Output:

```
   col_0  col_1
0      9     -2
1     -3     -7
2      0      6
3     -1      8
4      5     -5

   col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4
```

#### Example 2: Clip using different thresholds for each column

```python
# Clip using different lower and upper thresholds per column
clipped_df_2 = df.clip([-2, -1], [4, 5])
print(clipped_df_2)
```

Output:

```
   col_0  col_1
0      4     -1
1     -2     -1
2      0      5
3     -1      5
4      4     -1
```

#### Example 3: Clip using element-wise thresholds (Series)

```python
# Sample Series with different thresholds
t = pd.Series([2, -4, -1, 6, 3])
print(t)

# Clip DataFrame using the element-wise thresholds from the Series
clipped_df_3 = df.clip(t, t + 4, axis=0)
print(clipped_df_3)
```

Output:

```
0    2
1   -4
2   -1
3    6
4    3
dtype: int64

   col_0  col_1
0      6      2
1     -3     -4
2      0      3
3      6      8
4      5      3
```

#### Example 4: Clip using a Series with missing values

```python
# Sample Series with NaN
t_with_nan = pd.Series([2, -4, np.nan, 6, 3])
print(t_with_nan)

# Clip DataFrame with the lower thresholds from the Series, ignoring NaN
clipped_df_4 = df.clip(t_with_nan, axis=0)
print(clipped_df_4)
```

Output:

```
0    2.0
1   -4.0
2    NaN
3    6.0
4    3.0
dtype: float64

   col_0  col_1
0      9      2
1     -3     -4
2      0      6
3      6      8
4      5      3
```

### **Summary**

- **`Series.clip(lower, upper)`** is used to clip values outside the specified boundaries.
- It can accept scalar values or array-like (e.g., Series) for element-wise clipping.
- The `inplace` parameter controls whether the operation modifies the Series in place or returns a new one.
- Clipping is useful for limiting the range of values in a Series or DataFrame, especially in situations where outliers need to be handled by capping the values.


In [None]:
""" pandas.Series.corr
Series.corr(other, method='pearson', min_periods=None)[source]
Compute correlation with other Series, excluding missing values.

The two Series objects are not required to be the same length and will be aligned internally before the correlation function is applied.

Parameters:
otherSeries
Series with which to compute the correlation.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method used to compute correlation:

pearson : Standard correlation coefficient

kendall : Kendall Tau correlation coefficient

spearman : Spearman rank correlation

callable: Callable with input two 1d ndarrays and returning a float.

Warning

Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional
Minimum number of observations needed to have a valid result.

Returns:
float
Correlation with other.



DataFrame.corr
Compute pairwise correlation between columns.

DataFrame.corrwith
Compute pairwise correlation with another DataFrame or Series. """
def histogram_intersection(a, b):
    v = np.minimum(a, b).sum().round(decimals=1)
    return v
s1 = pd.Series([.2, .0, .6, .2])
s2 = pd.Series([.3, .6, .0, .1])
s1.corr(s2, method=histogram_intersection)

In [None]:
 # Pandas auto-aligns the values with matching indices

s1 = pd.Series([1, 2, 3], index=[0, 1, 2])
s2 = pd.Series([1, 2, 3], index=[2, 1, 0])
s1.corr(s2)

The **`pandas.Series.corr()`** function computes the correlation between the calling Series and another Series, excluding any missing values. The correlation can be computed using different methods such as Pearson, Kendall, or Spearman.

### **Syntax**

```python
Series.corr(other, method='pearson', min_periods=None)
```

### **Parameters**

- **`other`**:

  - **Series**
  - The other Series with which to compute the correlation.

- **`method`**:

  - **{‘pearson’, ‘kendall’, ‘spearman’}** or **callable**, default `'pearson'`
  - The method used to compute the correlation:
    - `'pearson'`: Standard Pearson correlation coefficient.
    - `'kendall'`: Kendall Tau correlation coefficient.
    - `'spearman'`: Spearman rank correlation.
    - **callable**: A user-defined function that takes two 1D ndarrays and returns a float. This allows for custom correlation methods.

- **`min_periods`**:
  - **int**, optional
  - The minimum number of observations needed to have a valid result. If fewer than `min_periods` observations are available, the result will be `NaN`.

### **Returns**

- **float**
  - The computed correlation value between the two Series.

### **See also**

- **`DataFrame.corr`**: For computing pairwise correlations between columns of a DataFrame.
- **`DataFrame.corrwith`**: For computing pairwise correlations with another DataFrame or Series.

### **Examples**

#### Example 1: Pearson correlation (default)

```python
import pandas as pd

# Sample Series
s1 = pd.Series([1, 2, 3])
s2 = pd.Series([1, 2, 3])

# Compute Pearson correlation (default)
correlation = s1.corr(s2)
print(correlation)
```

Output:

```
1.0
```

Explanation: The correlation is 1 because the Series are identical.

#### Example 2: Negative correlation (Pearson)

```python
# Sample Series
s1 = pd.Series([1, 2, 3], index=[0, 1, 2])
s2 = pd.Series([3, 2, 1], index=[2, 1, 0])

# Compute Pearson correlation
correlation = s1.corr(s2)
print(correlation)
```

Output:

```
-1.0
```

Explanation: The Series have a perfect negative linear relationship.

#### Example 3: Custom correlation function (Histogram intersection)

```python
import numpy as np

# Custom correlation function (Histogram intersection)
def histogram_intersection(a, b):
    v = np.minimum(a, b).sum().round(decimals=1)
    return v

# Sample Series
s1 = pd.Series([0.2, 0.0, 0.6, 0.2])
s2 = pd.Series([0.3, 0.6, 0.0, 0.1])

# Compute custom correlation using histogram intersection
correlation = s1.corr(s2, method=histogram_intersection)
print(correlation)
```

Output:

```
0.3
```

Explanation: The custom function computes the histogram intersection between `s1` and `s2`.

#### Example 4: Kendall correlation

```python
# Sample Series
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([1, 3, 2, 4])

# Compute Kendall correlation
correlation = s1.corr(s2, method='kendall')
print(correlation)
```

Output:

```
0.5
```

Explanation: The Kendall Tau correlation coefficient measures the ordinal association between the two Series.

#### Example 5: Spearman correlation

```python
# Sample Series
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([1, 3, 2, 4])

# Compute Spearman correlation
correlation = s1.corr(s2, method='spearman')
print(correlation)
```

Output:

```
0.5
```

Explanation: The Spearman rank correlation coefficient also measures the relationship between the ranks of the values in the Series.

### **Notes**

- The correlation is computed after automatically aligning the Series based on their indices.
- The available methods are:
  - **Pearson**: Standard correlation coefficient, based on linear relationships.
  - **Kendall**: Measures ordinal association based on the ranks of the data.
  - **Spearman**: Based on the rank values of the data, similar to Kendall but using different mathematical formulas.

This method is especially useful for measuring the relationship between two sets of data, where correlation analysis helps in determining how strongly related they are.


In [None]:
""" pandas.Series.count
Series.count()[source]
Return number of non-NA/null observations in the Series.

Returns
:
int
Number of non-null values in the Series.

See also

DataFrame.count
Count non-NA cells for each column or row. """
s = pd.Series([0.0, 1.0, np.nan])
s.count()

The **`pandas.Series.count()`** function is used to count the number of non-NA (non-null) values in a Series.

### **Syntax**

```python
Series.count()
```

### **Returns**

- **int**:
  - The number of non-null values in the Series.

### **See also**

- **`DataFrame.count()`**: For counting non-NA values across rows or columns in a DataFrame.

### **Examples**

#### Example 1: Basic count

```python
import pandas as pd
import numpy as np

# Sample Series with NaN value
s = pd.Series([0.0, 1.0, np.nan])

# Count non-NA/null values
count = s.count()
print(count)
```

Output:

```
2
```

Explanation: There are 2 non-null values in the Series, and the `NaN` is excluded from the count.

#### Example 2: Series without null values

```python
# Sample Series without any NaN values
s = pd.Series([5, 10, 15])

# Count non-NA/null values
count = s.count()
print(count)
```

Output:

```
3
```

Explanation: All values are non-null, so the count is 3.

#### Example 3: Series with all NaN values

```python
# Sample Series with all NaN values
s = pd.Series([np.nan, np.nan, np.nan])

# Count non-NA/null values
count = s.count()
print(count)
```

Output:

```
0
```

Explanation: Since all values are `NaN`, the count of non-null values is 0.

The **`count()`** function is helpful for quickly determining how many valid data points (non-null) exist in your Series, especially when working with datasets that may contain missing values.


In [None]:
""" pandas.Series.cov
Series.cov(other, min_periods=None, ddof=1)[source]
Compute covariance with Series, excluding missing values.

The two Series objects are not required to be the same length and will be aligned internally before the covariance is calculated.

Parameters
:
other
Series
Series with which to compute the covariance.

min_periods
int, optional
Minimum number of observations needed to have a valid result.

ddof
int, default 1
Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

Returns
:
float
Covariance between Series and other normalized by N-1 (unbiased estimator).

See also

DataFrame.cov
Compute pairwise covariance of columns. """
s1 = pd.Series([0.90010907, 0.13484424, 0.62036035])
s2 = pd.Series([0.12528585, 0.26962463, 0.51111198])
s1.cov(s2)

The **`pandas.Series.cov()`** function computes the **covariance** between two Series, excluding missing values. Covariance measures the relationship between two datasets—whether they increase or decrease together.

---

## **Syntax**

```python
Series.cov(other, min_periods=None, ddof=1)
```

---

## **Parameters**

- **`other`** (_Series_):

  - The Series to compute covariance with.

- **`min_periods`** (_int, optional_):

  - Minimum number of valid observations required to perform the calculation.
  - If fewer non-null observations exist, the result is `NaN`.

- **`ddof`** (_int, default `1`_):
  - Delta degrees of freedom.
  - The divisor used in calculations is **N - ddof**, where **N** is the number of observations.

---

## **Returns**

- **float**:
  - The covariance value between the two Series, normalized by **N - 1**.

---

## **See Also**

- **`DataFrame.cov()`** → Computes pairwise covariance between DataFrame columns.

---

## **Examples**

### **Example 1: Compute Covariance**

```python
import pandas as pd

# Create two Series
s1 = pd.Series([0.9, 0.13, 0.62])
s2 = pd.Series([0.12, 0.26, 0.51])

# Compute covariance
covariance = s1.cov(s2)
print(covariance)
```

**Output:**

```
-0.01685...
```

🔹 A **negative covariance** means that when one Series increases, the other tends to decrease.

---

### **Example 2: Series with Different Indexes**

```python
s1 = pd.Series([1, 2, 3], index=[0, 1, 2])
s2 = pd.Series([4, 5, 6], index=[1, 2, 3])

# Compute covariance
covariance = s1.cov(s2)
print(covariance)
```

**Output:**

```
nan
```

🔹 Since there are no matching index values, `cov()` returns **NaN**.

---

### **Example 3: Handling Missing Values**

```python
import numpy as np

# Series with NaN values
s1 = pd.Series([1, np.nan, 3])
s2 = pd.Series([4, 5, np.nan])

# Compute covariance
covariance = s1.cov(s2)
print(covariance)
```

**Output:**

```
nan
```

🔹 Since only one pair of valid observations exists, the result is **NaN**.  
🔹 If you add more data, `cov()` will compute the covariance.

---

### **Example 4: Using `min_periods`**

```python
s1 = pd.Series([1, 2, 3, 4])
s2 = pd.Series([4, 3, 2, 1])

# Compute covariance with min_periods=3
covariance = s1.cov(s2, min_periods=3)
print(covariance)
```

**Output:**

```
-1.666...
```

🔹 `cov()` only works if at least `min_periods` valid values exist.

---

### **Example 5: Changing `ddof` (Delta Degrees of Freedom)**

```python
s1 = pd.Series([10, 20, 30, 40])
s2 = pd.Series([5, 15, 25, 35])

# Default ddof=1
print(s1.cov(s2))  # Output: 125.0

# Using ddof=0
print(s1.cov(s2, ddof=0))  # Output: 93.75
```

🔹 **`ddof=0`** gives the population covariance, dividing by **N** instead of **N-1**.

---

## **Key Takeaways**

✅ `cov()` measures the relationship between two Series.  
✅ Automatically aligns values based on the index.  
✅ Excludes `NaN` values from computation.  
✅ Supports `min_periods` to control the required number of valid observations.  
✅ Changing `ddof` affects how the covariance is normalized.


In [None]:
""" pandas.Series.cummax
Series.cummax(axis=None, skipna=True, *args, **kwargs)[source]
Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

skipna
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
scalar or Series
Return cumulative maximum of scalar or Series.

See also

core.window.expanding.Expanding.max
Similar functionality but ignores NaN values.

Series.max
Return the maximum over Series axis.

Series.cummax
Return cumulative maximum over Series axis.

Series.cummin
Return cumulative minimum over Series axis.

Series.cumsum
Return cumulative sum over Series axis.

Series.cumprod
Return cumulative product over Series axis. """
s = pd.Series([2, np.nan, 5, -1, 0])
s

In [None]:
# By default, NA values are ignored.

s.cummax()
# To include NA values in the operation, use skipna=False

s.cummax(skipna=False)

In [None]:
# DataFrame

df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
df

In [None]:
# by default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

df.cummax()

In [None]:
# To iterate over columns and find the maximum in each row, use axis=1

df.cummax(axis=1)

The **`pandas.Series.cummax()`** function returns the **cumulative maximum** of a Series, meaning it keeps track of the maximum value encountered so far at each position.

---

## **Syntax**

```python
Series.cummax(axis=None, skipna=True, *args, **kwargs)
```

---

## **Parameters**

- **`axis`** (_{0 or ‘index’}, default `0`_):

  - This parameter is unused for a Series and always defaults to `0`.

- **`skipna`** (_bool, default `True`_):

  - If `True`, **ignores** `NaN` values.
  - If `False`, `NaN` values **propagate**, making all subsequent values `NaN`.

- **`\*args, **kwargs`\*\*:
  - These arguments are **ignored** but exist for compatibility with NumPy.

---

## **Returns**

- **Series** → A new Series of the same size with cumulative maximum values.

---

## **See Also**

- **`Series.cummin()`** → Cumulative minimum.
- **`Series.cumsum()`** → Cumulative sum.
- **`Series.cumprod()`** → Cumulative product.
- **`Series.max()`** → Maximum value in the entire Series.

---

## **Examples**

### **Example 1: Basic Usage**

```python
import pandas as pd
import numpy as np

s = pd.Series([2, np.nan, 5, -1, 0])
print(s.cummax())
```

**Output:**

```
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64
```

🔹 The function **remembers** the highest value encountered so far.  
🔹 The `NaN` value is **ignored** by default.

---

### **Example 2: Using `skipna=False`**

```python
print(s.cummax(skipna=False))
```

**Output:**

```
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

🔹 Since `skipna=False`, the presence of `NaN` causes all subsequent values to become `NaN`.

---

### **Example 3: Cumulative Max in a DataFrame**

```python
df = pd.DataFrame({
    'A': [2.0, 3.0, 1.0],
    'B': [1.0, np.nan, 0.0]
})
print(df.cummax())
```

**Output:**

```
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0
```

🔹 **By default, it computes the max along columns (rows are compared).**

---

### **Example 4: Cumulative Max Along Rows (`axis=1`)**

```python
print(df.cummax(axis=1))
```

**Output:**

```
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0
```

🔹 **Each row** now stores its cumulative max across columns.

---

## **Key Takeaways**

✅ **`cummax()`** tracks the highest value seen so far.  
✅ By default, `NaN` values are ignored (`skipna=True`).  
✅ Setting `skipna=False` makes all subsequent values `NaN`.  
✅ Can be used on both **Series** and **DataFrames**.


In [None]:
""" pandas.Series.cummin
Series.cummin(axis=None, skipna=True, *args, **kwargs)[source]
Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

skipna
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
scalar or Series
Return cumulative minimum of scalar or Series.


core.window.expanding.Expanding.min
Similar functionality but ignores NaN values.

Series.min
Return the minimum over Series axis.

Series.cummax
Return cumulative maximum over Series axis.

Series.cummin
Return cumulative minimum over Series axis.

Series.cumsum
Return cumulative sum over Series axis.

Series.cumprod
Return cumulative product over Series axis. """
s = pd.Series([2, np.nan, 5, -1, 0])
s


In [None]:
# By default, NA values are ignored.

s.cummin()

In [None]:
# To include NA values in the operation, use skipna=False

s.cummin(skipna=False)

In [None]:
# DataFrame

df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
df

In [None]:
# By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

df.cummin()

In [None]:
# To iterate over columns and find the minimum in each row, use axis=1

df.cummin(axis=1)

The **`pandas.Series.cummin()`** function returns the **cumulative minimum** of a Series, meaning it keeps track of the smallest value encountered so far at each position.

---

## **Syntax**

```python
Series.cummin(axis=None, skipna=True, *args, **kwargs)
```

---

## **Parameters**

- **`axis`** (_{0 or ‘index’}, default `0`_):

  - This parameter is **unused** for a Series and always defaults to `0`.

- **`skipna`** (_bool, default `True`_):

  - If `True`, **ignores** `NaN` values.
  - If `False`, `NaN` values **propagate**, making all subsequent values `NaN`.

- **`\*args, **kwargs`\*\*:
  - These arguments are **ignored** but exist for compatibility with NumPy.

---

## **Returns**

- **Series** → A new Series of the same size with cumulative minimum values.

---

## **See Also**

- **`Series.cummax()`** → Cumulative maximum.
- **`Series.cumsum()`** → Cumulative sum.
- **`Series.cumprod()`** → Cumulative product.
- **`Series.min()`** → Minimum value in the entire Series.

---

## **Examples**

### **Example 1: Basic Usage**

```python
import pandas as pd
import numpy as np

s = pd.Series([2, np.nan, 5, -1, 0])
print(s.cummin())
```

**Output:**

```
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64
```

🔹 The function **remembers** the smallest value encountered so far.  
🔹 The `NaN` value is **ignored** by default.

---

### **Example 2: Using `skipna=False`**

```python
print(s.cummin(skipna=False))
```

**Output:**

```
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

🔹 Since `skipna=False`, the presence of `NaN` causes all subsequent values to become `NaN`.

---

### **Example 3: Cumulative Min in a DataFrame**

```python
df = pd.DataFrame({
    'A': [2.0, 3.0, 1.0],
    'B': [1.0, np.nan, 0.0]
})
print(df.cummin())
```

**Output:**

```
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0
```

🔹 **By default, it computes the min along columns (rows are compared).**

---

### **Example 4: Cumulative Min Along Rows (`axis=1`)**

```python
print(df.cummin(axis=1))
```

**Output:**

```
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
```

🔹 **Each row** now stores its cumulative min across columns.

---

## **Key Takeaways**

✅ **`cummin()`** tracks the smallest value seen so far.  
✅ By default, `NaN` values are ignored (`skipna=True`).  
✅ Setting `skipna=False` makes all subsequent values `NaN`.  
✅ Can be used on both **Series** and **DataFrames**.


In [None]:
""" pandas.Series.cumprod
Series.cumprod(axis=None, skipna=True, *args, **kwargs)[source]
Return cumulative product over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative product.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

skipna
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
scalar or Series
Return cumulative product of scalar or Series.


core.window.expanding.Expanding.prod
Similar functionality but ignores NaN values.

Series.prod
Return the product over Series axis.

Series.cummax
Return cumulative maximum over Series axis.

Series.cummin
Return cumulative minimum over Series axis.

Series.cumsum
Return cumulative sum over Series axis.

Series.cumprod
Return cumulative product over Series axis. """
s = pd.Series([2, np.nan, 5, -1, 0])
s

In [None]:
# By default, NA values are ignored.

s.cumprod()

In [None]:
# To include NA values in the operation, use skipna=False

s.cumprod(skipna=False)

In [None]:
# DataFrame

df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
df

In [None]:
# By default, iterates over rows and finds the product in each column. This is equivalent to axis=None or axis='index'.

df.cumprod()

In [None]:
# To iterate over columns and find the product in each row, use axis=1

df.cumprod(axis=1)

The **`pandas.Series.cumprod()`** function computes the **cumulative product** of elements in a Series, meaning it multiplies each element by the product of all previous elements.

---

## **Syntax**

```python
Series.cumprod(axis=None, skipna=True, *args, **kwargs)
```

---

## **Parameters**

- **`axis`** (_{0 or ‘index’}, default `0`_):

  - This parameter is **unused** for a Series and always defaults to `0`.

- **`skipna`** (_bool, default `True`_):

  - If `True`, **ignores** `NaN` values (continues multiplying non-NaN values).
  - If `False`, `NaN` values **propagate**, making all subsequent values `NaN`.

- **`\*args, **kwargs`\*\*:
  - These arguments are **ignored** but exist for compatibility with NumPy.

---

## **Returns**

- **Series** → A new Series of the same size containing the cumulative product.

---

## **See Also**

- **`Series.prod()`** → Returns the product of all elements in a Series.
- **`Series.cumsum()`** → Cumulative sum.
- **`Series.cummin()`** → Cumulative minimum.
- **`Series.cummax()`** → Cumulative maximum.

---

## **Examples**

### **Example 1: Basic Usage**

```python
import pandas as pd
import numpy as np

s = pd.Series([2, np.nan, 5, -1, 0])
print(s.cumprod())
```

**Output:**

```
0     2.0
1     NaN
2    10.0
3   -10.0
4    -0.0
dtype: float64
```

🔹 The function multiplies values progressively.  
🔹 `NaN` is ignored by default, so it continues computing.  
🔹 The final value is `-0.0` (negative zero, which is still `0`).

---

### **Example 2: Using `skipna=False`**

```python
print(s.cumprod(skipna=False))
```

**Output:**

```
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

🔹 Since `skipna=False`, `NaN` causes all subsequent values to be `NaN`.

---

### **Example 3: Cumulative Product in a DataFrame**

```python
df = pd.DataFrame({
    'A': [2.0, 3.0, 1.0],
    'B': [1.0, np.nan, 0.0]
})
print(df.cumprod())
```

**Output:**

```
     A    B
0  2.0  1.0
1  6.0  NaN
2  6.0  0.0
```

🔹 **By default, it computes the product along columns (rows are multiplied).**

---

### **Example 4: Cumulative Product Along Rows (`axis=1`)**

```python
print(df.cumprod(axis=1))
```

**Output:**

```
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  0.0
```

🔹 Each row now stores its **cumulative product across columns**.

---

## **Key Takeaways**

✅ **`cumprod()`** multiplies elements progressively.  
✅ By default, `NaN` values are ignored (`skipna=True`).  
✅ Setting `skipna=False` makes all subsequent values `NaN`.  
✅ Can be used on both **Series** and **DataFrames**.


In [None]:
""" pandas.Series.cumsum
Series.cumsum(axis=None, skipna=True, *args, **kwargs)[source]
Return cumulative sum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative sum.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

skipna
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
scalar or Series
Return cumulative sum of scalar or Series.


core.window.expanding.Expanding.sum
Similar functionality but ignores NaN values.

Series.sum
Return the sum over Series axis.

Series.cummax
Return cumulative maximum over Series axis.

Series.cummin
Return cumulative minimum over Series axis.

Series.cumsum
Return cumulative sum over Series axis.

Series.cumprod
Return cumulative product over Series axis. """
s = pd.Series([2, np.nan, 5, -1, 0])
s

In [None]:
# By default, NA values are ignored.

s.cumsum()

In [None]:
# To include NA values in the operation, use skipna=False

s.cumsum(skipna=False)

# DataFrame

df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
columns=list('AB'))
df


In [None]:
# By default, iterates over rows and finds the sum in each column. This is equivalent to axis=None or axis='index'.

df.cumsum()

In [None]:
# To iterate over columns and find the sum in each row, use axis=1

df.cumsum(axis=1)

The **`pandas.Series.cumsum()`** function calculates the **cumulative sum** of a Series, meaning each value is replaced by the sum of itself and all previous values.

---

## **Syntax**

```python
Series.cumsum(axis=None, skipna=True, *args, **kwargs)
```

---

## **Parameters**

- **`axis`** (_{0 or ‘index’}, default `0`_):

  - This parameter is **unused** for Series and always defaults to `0`.

- **`skipna`** (_bool, default `True`_):

  - If `True`, **ignores** `NaN` values and continues summing non-NaN values.
  - If `False`, `NaN` values **propagate**, making all subsequent values `NaN`.

- **`\*args, **kwargs`\*\*:
  - These arguments are **ignored** but exist for compatibility with NumPy.

---

## **Returns**

- **Series** → A new Series of the same size containing the cumulative sum.

---

## **See Also**

- **`Series.sum()`** → Returns the sum of all elements in a Series.
- **`Series.cumprod()`** → Cumulative product.
- **`Series.cummin()`** → Cumulative minimum.
- **`Series.cummax()`** → Cumulative maximum.

---

## **Examples**

### **Example 1: Basic Usage**

```python
import pandas as pd
import numpy as np

s = pd.Series([2, np.nan, 5, -1, 0])
print(s.cumsum())
```

**Output:**

```
0    2.0
1    NaN
2    7.0
3    6.0
4    6.0
dtype: float64
```

🔹 The function adds values progressively.  
🔹 `NaN` is ignored by default, so it continues computing.

---

### **Example 2: Using `skipna=False`**

```python
print(s.cumsum(skipna=False))
```

**Output:**

```
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

🔹 Since `skipna=False`, `NaN` causes all subsequent values to be `NaN`.

---

### **Example 3: Cumulative Sum in a DataFrame**

```python
df = pd.DataFrame({
    'A': [2.0, 3.0, 1.0],
    'B': [1.0, np.nan, 0.0]
})
print(df.cumsum())
```

**Output:**

```
     A    B
0  2.0  1.0
1  5.0  NaN
2  6.0  1.0
```

🔹 **By default, it computes the sum along columns (rows are added up).**

---

### **Example 4: Cumulative Sum Along Rows (`axis=1`)**

```python
print(df.cumsum(axis=1))
```

**Output:**

```
     A    B
0  2.0  3.0
1  3.0  NaN
2  1.0  1.0
```

🔹 Each row now stores its **cumulative sum across columns**.

---

## **Key Takeaways**

✅ **`cumsum()`** adds elements progressively.  
✅ By default, `NaN` values are ignored (`skipna=True`).  
✅ Setting `skipna=False` makes all subsequent values `NaN`.  
✅ Can be used on both **Series** and **DataFrames**.


In [None]:
""" pandas.Series.diff
Series.diff(periods=1)[source]
First discrete difference of element.

Calculates the difference of a Series element compared with another element in the Series (default is element in previous row).

Parameters
:
periods
int, default 1
Periods to shift for calculating difference, accepts negative values.

Returns
:
Series
First differences of the Series.

See also

Series.pct_change
Percent change over given number of periods.

Series.shift
Shift index by desired number of periods with an optional time freq.

DataFrame.diff
First discrete difference of object.

Notes

For boolean dtypes, this uses operator.xor() rather than operator.sub(). The result is calculated according to current dtype in Series, however dtype of the result is always float64. """
s = pd.Series([1, 1, 2, 3, 5, 8])
s.diff()
s.diff(periods=3)

In [None]:
s.diff(periods=-1)
s = pd.Series([1, 0], dtype=np.uint8)
s.diff()

The **`pandas.Series.diff()`** function computes the **difference between consecutive elements** in a Series.

---

## **Syntax**

```python
Series.diff(periods=1)
```

---

## **Parameters**

- **`periods`** (_int, default=1_):
  - Specifies how many places to shift before computing the difference.
  - Positive values → Compute difference with **previous** elements.
  - Negative values → Compute difference with **future** elements.

---

## **Returns**

- **Series** → A new Series containing the computed differences.

📌 The **first `periods` elements** will be **NaN** since they have no previous value to subtract.

---

## **See Also**

- **`Series.pct_change()`** → Computes **percent change** over periods.
- **`Series.shift()`** → Shifts values **without** computing differences.
- **`DataFrame.diff()`** → Computes differences for **DataFrames**.

---

## **Examples**

### **Example 1: Compute the First Difference (Default)**

```python
import pandas as pd

s = pd.Series([1, 1, 2, 3, 5, 8])
print(s.diff())
```

**Output:**

```
0    NaN
1    0.0
2    1.0
3    1.0
4    2.0
5    3.0
dtype: float64
```

🔹 The first value is **NaN** (no previous value).  
🔹 Each value is **subtracted from the previous one**.

---

### **Example 2: Difference with the 3rd Previous Row (`periods=3`)**

```python
print(s.diff(periods=3))
```

**Output:**

```
0    NaN
1    NaN
2    NaN
3    2.0
4    4.0
5    6.0
dtype: float64
```

🔹 The difference is calculated with the **third previous value**.  
🔹 The first **three** values are **NaN**.

---

### **Example 3: Difference with the Next Row (`periods=-1`)**

```python
print(s.diff(periods=-1))
```

**Output:**

```
0    0.0
1   -1.0
2   -1.0
3   -2.0
4   -3.0
5    NaN
dtype: float64
```

🔹 A negative `periods` shifts in the **opposite direction** (future values).  
🔹 The last element is **NaN**.

---

### **Example 4: Handling Unsigned Integers (`uint8` Overflow)**

```python
import numpy as np

s = pd.Series([1, 0], dtype=np.uint8)
print(s.diff())
```

**Output:**

```
0      NaN
1    255.0
dtype: float64
```

🔹 Since `1 - 0 = 1`, we expect `1.0`, but the **uint8 type overflows** (255 instead of -1).  
🔹 The result is **always float64** to prevent overflow.

---

## **Key Takeaways**

✅ **`diff()` computes the difference** between values based on a given `periods`.  
✅ Default is **previous row (`periods=1`)**, but **can be customized**.  
✅ **First `periods` values are `NaN`** because they have no reference.  
✅ **Supports negative values** (`periods=-1` computes difference with future values).  
✅ **Returns float64**, even for integer inputs.


In [None]:
""" pandas.Series.factorize
Series.factorize(sort=False, use_na_sentinel=True)[source]
Encode the object as an enumerated type or categorical variable.

This method is useful for obtaining a numeric representation of an array when all that matters is identifying distinct values. factorize is available as both a top-level function pandas.factorize(), and as a method Series.factorize() and Index.factorize().

Parameters:
sortbool, default False
Sort uniques and shuffle codes to maintain the relationship.

use_na_sentinelbool, default True
If True, the sentinel -1 will be used for NaN values. If False, NaN values will be encoded as non-negative integers and will not drop the NaN from the uniques of the values.

Added in version 1.5.0.

Returns:
codesndarray
An integer ndarray that’s an indexer into uniques. uniques.take(codes) will have the same values as values.

uniquesndarray, Index, or Categorical
The unique valid values. When values is Categorical, uniques is a Categorical. When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.

Note

Even if there’s a missing value in values, uniques will not contain an entry for it.


cut
Discretize continuous-valued array.

unique
Find the unique value in an array. """
codes, uniques = pd.factorize(np.array(['b', 'b', 'a', 'c', 'b'], dtype="O"))
codes, uniques

In [None]:
# With sort=True, the uniques will be sorted, and codes will be shuffled so that the relationship is the maintained.

codes, uniques = pd.factorize(np.array(['b', 'b', 'a', 'c', 'b'], dtype="O"),
                              sort=True)
codes
uniques

In [None]:
# When use_na_sentinel=True (the default), missing values are indicated in the codes with the sentinel value -1 and missing values are not included in uniques.

codes, uniques = pd.factorize(np.array(['b', None, 'a', 'c', 'b'], dtype="O"))
codes
uniques

In [None]:
cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
codes, uniques = pd.factorize(cat)
codes
uniques

In [None]:
cat = pd.Series(['a', 'a', 'c'])
codes, uniques = pd.factorize(cat)
codes
uniques

In [None]:
# If NaN is in the values, and we want to include NaN in the uniques of the values, it can be achieved by setting use_na_sentinel=False.

values = np.array([1, 2, 1, np.nan])
codes, uniques = pd.factorize(values)  # default: use_na_sentinel=True
codes
uniques

In [None]:
codes, uniques = pd.factorize(values, use_na_sentinel=False)
codes
uniques


The **`pandas.Series.factorize()`** method converts a Series into numeric labels (integer codes) representing unique values.

---

## **Syntax**

```python
Series.factorize(sort=False, use_na_sentinel=True)
```

---

## **Parameters**

- **`sort`** (_bool, default=False_)

  - If `True`, unique values are sorted before encoding.
  - Codes are shuffled to maintain value relationships.

- **`use_na_sentinel`** (_bool, default=True_)
  - If `True`, **missing values (NaN)** are assigned **-1**.
  - If `False`, NaNs are assigned a **numeric code** and included in `uniques`.

---

## **Returns**

- **`codes`** (_ndarray_) → An array of **integer labels** for the Series values.
- **`uniques`** (_ndarray, Index, or Categorical_) → Array of **unique values** in the order of first appearance.

---

## **See Also**

- **`pandas.factorize()`** → Top-level function for factorization.
- **`Series.unique()`** → Finds unique values in a Series.
- **`pandas.cut()`** → Discretizes continuous data.

---

## **Examples**

### **Example 1: Basic Factorization**

```python
import pandas as pd
import numpy as np

s = pd.Series(['b', 'b', 'a', 'c', 'b'])
codes, uniques = s.factorize()
print("Codes:", codes)
print("Uniques:", uniques)
```

**Output:**

```
Codes: [0 0 1 2 0]
Uniques: ['b' 'a' 'c']
```

🔹 The first occurrence of each value gets a **unique integer code**.

---

### **Example 2: Sorting Unique Values (`sort=True`)**

```python
codes, uniques = s.factorize(sort=True)
print("Codes:", codes)
print("Uniques:", uniques)
```

**Output:**

```
Codes: [1 1 0 2 1]
Uniques: ['a' 'b' 'c']
```

🔹 The **unique values are sorted**, and codes are reassigned accordingly.

---

### **Example 3: Handling NaN Values (`use_na_sentinel=True`)**

```python
s = pd.Series(['b', None, 'a', 'c', 'b'])
codes, uniques = s.factorize()
print("Codes:", codes)
print("Uniques:", uniques)
```

**Output:**

```
Codes: [ 0 -1  1  2  0]
Uniques: ['b' 'a' 'c']
```

🔹 The **NaN value gets `-1`** and is **not** included in `uniques`.

---

### **Example 4: Including NaN in Unique Values (`use_na_sentinel=False`)**

```python
s = pd.Series([1, 2, 1, np.nan])
codes, uniques = s.factorize(use_na_sentinel=False)
print("Codes:", codes)
print("Uniques:", uniques)
```

**Output:**

```
Codes: [0 1 0 2]
Uniques: [ 1.  2. nan]
```

🔹 **NaN gets a normal integer code** instead of `-1`.  
🔹 NaN is now **included** in the unique values.

---

### **Example 5: Factorizing a Pandas Categorical**

```python
cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
codes, uniques = pd.factorize(cat)
print("Codes:", codes)
print("Uniques:", uniques)
```

**Output:**

```
Codes: [0 0 1]
Uniques: ['a', 'c']
Categories (3, object): ['a', 'b', 'c']
```

🔹 The **category ‘b’ exists** in the categories but **doesn’t appear** in the data.

---

### **Example 6: Factorizing a Pandas Series**

```python
s = pd.Series(['a', 'a', 'c'])
codes, uniques = s.factorize()
print("Codes:", codes)
print("Uniques:", uniques)
```

**Output:**

```
Codes: [0 0 1]
Uniques: Index(['a', 'c'], dtype='object')
```

🔹 **Returns a Pandas Index** instead of an ndarray.

---

## **Key Takeaways**

✅ **Converts categorical data into numeric labels**.  
✅ **Handles NaN values** differently based on `use_na_sentinel`.  
✅ **Can sort unique values** using `sort=True`.  
✅ **Works on NumPy arrays, Pandas Series, and Categoricals**.


In [None]:
""" pandas.Series.kurt
Series.kurt(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar """
s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
s

In [None]:
s.kurt()
'''With a DataFrame'''

df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
                  index=['cat', 'dog', 'dog', 'mouse'])
df
df.kurt()

In [None]:
#  With axis=None

df.kurt(axis=None).round(6)

In [None]:
# Using axis=1

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
                  index=['cat', 'dog'])
df.kurt(axis=1)

The **`pandas.Series.kurt()`** method calculates the **kurtosis** of a dataset, which measures the "tailedness" of the distribution.

---

## **Syntax**

```python
Series.kurt(axis=0, skipna=True, numeric_only=False, **kwargs)
```

---

## **Parameters**

- **`axis`** (_{0, ‘index’}, default=0_)

  - This parameter is **unused** for Series and defaults to 0.

- **`skipna`** (_bool, default=True_)

  - If `True`, **ignores NaN values** during computation.
  - If `False`, returns `NaN` if any missing values exist.

- **`numeric_only`** (_bool, default=False_)

  - Not implemented for Series.

- **`**kwargs`\*\*
  - Additional arguments (rarely used).

---

## **Returns**

- A **scalar (float)** representing the kurtosis of the Series.

---

## **Definition**

The **kurtosis** is computed using **Fisher’s definition**:

- **Normal distribution → Kurtosis = 0.0**
- **High kurtosis (>0)** → More outliers (heavy tails).
- **Low kurtosis (<0)** → Fewer outliers (light tails).

---

## **See Also**

- **`Series.skew()`** → Computes skewness of a Series.
- **`Series.var()`** → Computes variance.
- **`Series.std()`** → Computes standard deviation.

---

## **Examples**

### **Example 1: Compute Kurtosis for a Series**

```python
import pandas as pd

s = pd.Series([1, 2, 2, 3])
print(s.kurt())
```

**Output:**

```
1.5
```

🔹 The dataset has **heavy tails** (kurtosis > 0).

---

### **Example 2: Compute Kurtosis with NaN Values**

```python
s = pd.Series([1, 2, 2, 3, None])
print(s.kurt(skipna=True))  # Ignores NaN
print(s.kurt(skipna=False)) # Returns NaN if any missing values exist
```

**Output:**

```
1.5
nan
```

🔹 **With `skipna=False`, the result is NaN.**

---

### **Example 3: Kurtosis for a DataFrame**

```python
df = pd.DataFrame({
    'A': [1, 2, 2, 3],
    'B': [3, 4, 4, 4]
})

print(df.kurt())
```

**Output:**

```
A    1.5
B    4.0
dtype: float64
```

🔹 **Column `B` has more extreme outliers** (higher kurtosis).

---

### **Example 4: Compute Kurtosis Across All Values**

```python
df = pd.DataFrame({'A': [1, 2, 2, 3], 'B': [3, 4, 4, 4]})
print(df.kurt(axis=None).round(6))
```

**Output:**

```
-0.988693
```

🔹 **Negative kurtosis → Light tails** (fewer outliers).

---

### **Example 5: Compute Kurtosis Along Rows**

```python
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4], 'C': [3, 4], 'D': [1, 2]},
                  index=['cat', 'dog'])
print(df.kurt(axis=1))
```

**Output:**

```
cat   -6.0
dog   -6.0
dtype: float64
```

🔹 **Negative kurtosis means a flatter distribution**.

---

## **Key Takeaways**

✅ **Measures tail extremity** of data distribution.  
✅ **Fisher’s definition** → Normal distribution has kurtosis **0**.  
✅ **Handles NaN values** with `skipna=True`.  
✅ **Works on both Series & DataFrames**.


In [None]:
""" pandas.Series.max
Series.max(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return the maximum of the values over the requested axis.

If you want the index of the maximum, use idxmax. This is the equivalent of the numpy.ndarray method argmax.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Series.sum
Return the sum.

Series.min
Return the minimum.

Series.max
Return the maximum.

Series.idxmin
Return the index of the minimum.

Series.idxmax
Return the index of the maximum.

DataFrame.sum
Return the sum over the requested axis.

DataFrame.min
Return the minimum over the requested axis.

DataFrame.max
Return the maximum over the requested axis.

DataFrame.idxmin
Return the index of the minimum over the requested axis.

DataFrame.idxmax
Return the index of the maximum over the requested axis. """
idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']],
    names=['blooded', 'animal'])
s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
s
s.max()

The **`pandas.Series.max()`** method returns the **maximum value** of a Series, ignoring NaN values by default.

---

## **Syntax**

```python
Series.max(axis=0, skipna=True, numeric_only=False, **kwargs)
```

---

## **Parameters**

- **`axis`** (_{0, ‘index’}, default=0_)

  - **Unused** for Series (default is 0).
  - For DataFrames, `axis=0` finds the max for each column, `axis=1` for each row.

- **`skipna`** (_bool, default=True_)

  - If `True`, **ignores NaN values** while computing the max.
  - If `False`, returns `NaN` if any missing values exist.

- **`numeric_only`** (_bool, default=False_)

  - **Not implemented for Series.**

- **`**kwargs`\*\*
  - Additional arguments (**rarely used**).

---

## **Returns**

- A **scalar** (single value) representing the maximum value of the Series.

---

## **See Also**

- **`Series.min()`** → Finds the minimum value.
- **`Series.idxmax()`** → Finds the **index** of the maximum value.
- **`Series.sum()`** → Computes the sum of elements.

---

## **Examples**

### **Example 1: Find the Maximum Value in a Series**

```python
import pandas as pd

s = pd.Series([4, 2, 9, 1, 7])
print(s.max())
```

**Output:**

```
9
```

🔹 **`9` is the maximum value in the Series.**

---

### **Example 2: Handling NaN Values**

```python
s = pd.Series([4, 2, None, 9, 1])
print(s.max(skipna=True))  # Ignores NaN
print(s.max(skipna=False)) # Returns NaN if NaN is present
```

**Output:**

```
9
nan
```

🔹 **With `skipna=False`, the result is `NaN` because of the missing value.**

---

### **Example 3: Maximum Value in a MultiIndex Series**

```python
idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']],
    names=['blooded', 'animal'])

s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
print(s.max())
```

**Output:**

```
8
```

🔹 The **maximum value in the Series is `8`** (spider's legs).

---

### **Example 4: Find the Maximum Value in a DataFrame**

```python
df = pd.DataFrame({
    'A': [1, 5, 3],
    'B': [4, 2, 8]
})

print(df.max())        # Column-wise max (default)
print(df.max(axis=1))  # Row-wise max
```

**Output:**

```
A    5
B    8
dtype: int64

0    4
1    5
2    8
dtype: int64
```

🔹 **Column-wise max:** `A: 5, B: 8`  
🔹 **Row-wise max:** `4, 5, 8`

---

### **Example 5: Find the Index of the Maximum Value**

```python
s = pd.Series([10, 20, 30, 25])
print(s.idxmax())  # Returns index of max value
```

**Output:**

```
2
```

🔹 The **maximum value (`30`) is at index `2`**.

---

## **Key Takeaways**

✅ **Finds the max value** in a Series or DataFrame.  
✅ **Ignores NaN** by default, but can be configured.  
✅ **For DataFrames**, works column-wise (`axis=0`) or row-wise (`axis=1`).  
✅ **Use `idxmax()`** to get the **index** of the max value.


In [None]:
""" pandas.Series.mean
Series.mean(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return the mean of the values over the requested axis.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar """
s = pd.Series([1, 2, 3])
s.mean()

#  With a DataFrame

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
df
df.mean()
#  Using axis=1

df.mean(axis=1)

In [None]:
# In this case, numeric_only should be set to True to avoid getting an error.

df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
                  index=['tiger', 'zebra'])
df.mean(numeric_only=True)

Here is a detailed reference for the **`pandas.Series.mean()`** method, including all syntaxes and examples:

---

## **📌 Syntax**

```python
Series.mean(axis=0, skipna=True, numeric_only=False, **kwargs)
```

---

## **📌 Parameters**

| Parameter      | Type             | Default | Description                                                                             |
| -------------- | ---------------- | ------- | --------------------------------------------------------------------------------------- |
| `axis`         | `{0 or ‘index’}` | `0`     | Not used for Series. For DataFrames, it specifies the axis (0 for rows, 1 for columns). |
| `skipna`       | `bool`           | `True`  | If `True`, ignores `NaN` values. If `False`, returns `NaN` if any missing values exist. |
| `numeric_only` | `bool`           | `False` | For DataFrames: includes only numeric columns. For Series, this doesn't have an effect. |
| `**kwargs`     | -                | -       | Additional arguments that might be passed (rarely needed).                              |

---

## **📌 Return Value**

- Returns a **scalar** representing the **mean** value of the Series.

---

## **📌 Examples**

### **Series Example**

```python
s = pd.Series([1, 2, 3])
print(s.mean())  # Output: 2.0
```

### **DataFrame Example (default axis)**

```python
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
print(df.mean())
```

**Output:**

```
a    1.5
b    2.5
dtype: float64
```

### **DataFrame Example (using axis=1)**

```python
print(df.mean(axis=1))
```

**Output:**

```
tiger    1.5
zebra    2.5
dtype: float64
```

### **DataFrame with Mixed Data Types**

```python
df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']}, index=['tiger', 'zebra'])
print(df.mean(numeric_only=True))
```

**Output:**

```
a    1.5
dtype: float64
```

---

## **📌 Key Notes**

- **NaN Handling**: The `skipna=True` argument ensures that missing values (NaN) are excluded from the mean calculation.
- **For DataFrames**: You can specify `axis=1` to compute the mean across columns instead of rows.
- **Mixed Data Types**: When calculating the mean of a DataFrame with non-numeric columns, set `numeric_only=True` to avoid errors.


In [None]:

""" pandas.Series.median
Series.median(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return the median of the values over the requested axis.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar """
s = pd.Series([1, 2, 3])
s.median()

In [None]:
# With a DataFrame

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
df

df.median()
# Using axis=1

df.median(axis=1)
# In this case, numeric_only should be set to True to avoid getting an error.

df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
                  index=['tiger', 'zebra'])
df.median(numeric_only=True)

Here's a detailed reference for **`pandas.Series.median()`** method, including all syntaxes and examples:

---

## **📌 Syntax**

```python
Series.median(axis=0, skipna=True, numeric_only=False, **kwargs)
```

---

## **📌 Parameters**

| Parameter      | Type             | Default | Description                                                                                |
| -------------- | ---------------- | ------- | ------------------------------------------------------------------------------------------ |
| `axis`         | `{0 or ‘index’}` | `0`     | Not used for Series. For DataFrames, it specifies the axis (0 for rows, 1 for columns).    |
| `skipna`       | `bool`           | `True`  | If `True`, it ignores `NaN` values. If `False`, returns `NaN` if any missing values exist. |
| `numeric_only` | `bool`           | `False` | For DataFrames: includes only numeric columns. For Series, this doesn't have an effect.    |
| `**kwargs`     | -                | -       | Additional arguments that might be passed (rarely needed).                                 |

---

## **📌 Return Value**

- Returns a **scalar** representing the **median** value of the Series.

---

## **📌 Examples**

### **Series Example**

```python
s = pd.Series([1, 2, 3])
print(s.median())  # Output: 2.0
```

### **DataFrame Example (default axis)**

```python
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
print(df.median())
```

**Output:**

```
a    1.5
b    2.5
dtype: float64
```

### **DataFrame Example (using axis=1)**

```python
print(df.median(axis=1))
```

**Output:**

```
tiger    1.5
zebra    2.5
dtype: float64
```

### **DataFrame with Mixed Data Types**

```python
df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']}, index=['tiger', 'zebra'])
print(df.median(numeric_only=True))
```

**Output:**

```
a    1.5
dtype: float64
```

---

## **📌 Key Notes**

- **NaN Handling**: The `skipna=True` argument ensures that missing values (NaN) are excluded from the median calculation.
- **For DataFrames**: You can specify `axis=1` to compute the median across columns instead of rows.
- **Mixed Data Types**: When calculating the median of a DataFrame with non-numeric columns, set `numeric_only=True` to avoid errors.


In [None]:
""" pandas.Series.min
Series.min(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return the minimum of the values over the requested axis.

If you want the index of the minimum, use idxmin. This is the equivalent of the numpy.ndarray method argmin.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar


Series.sum
Return the sum.

Series.min
Return the minimum.

Series.max
Return the maximum.

Series.idxmin
Return the index of the minimum.

Series.idxmax
Return the index of the maximum.

DataFrame.sum
Return the sum over the requested axis.

DataFrame.min
Return the minimum over the requested axis.

DataFrame.max
Return the maximum over the requested axis.

DataFrame.idxmin
Return the index of the minimum over the requested axis.

DataFrame.idxmax
Return the index of the maximum over the requested axis. """


idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']],
    names=['blooded', 'animal'])
s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
s
s.min()

Here’s a detailed breakdown of the **`pandas.Series.min()`** method, including syntaxes, parameters, return values, and examples.

---

## **📌 Syntax**

```python
Series.min(axis=0, skipna=True, numeric_only=False, **kwargs)
```

---

## **📌 Parameters**

| Parameter      | Type             | Default | Description                                                                             |
| -------------- | ---------------- | ------- | --------------------------------------------------------------------------------------- |
| `axis`         | `{0 or ‘index’}` | `0`     | Axis for the function to be applied on. (For Series, this is unused and defaults to 0). |
| `skipna`       | `bool`           | `True`  | Exclude `NaN` values if `True`. If `False`, returns `NaN` if any missing values exist.  |
| `numeric_only` | `bool`           | `False` | For DataFrames, includes only numeric columns. Not applicable for Series.               |
| `**kwargs`     | -                | -       | Additional keyword arguments that may be passed (rarely needed).                        |

---

## **📌 Return Value**

- Returns the **minimum** value of the Series.

---

## **📌 Examples**

### **Series Example**

```python
s = pd.Series([4, 2, 3, 1])
print(s.min())  # Output: 1
```

### **DataFrame Example (default axis)**

```python
import pandas as pd
df = pd.DataFrame({'a': [5, 3], 'b': [8, 4]}, index=['cat', 'dog'])
print(df.min())
```

**Output:**

```
a    3
b    4
dtype: int64
```

### **DataFrame Example (using axis=1)**

```python
print(df.min(axis=1))
```

**Output:**

```
cat    5
dog    3
dtype: int64
```

### **MultiIndex Series Example**

```python
idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']],
    names=['blooded', 'animal'])
s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
print(s.min())  # Output: 0
```

---

## **📌 Related Functions**

- **`Series.idxmin()`**: Returns the index of the minimum value.
- **`Series.max()`**: Returns the maximum value.
- **`DataFrame.min()`**: Computes the minimum across DataFrame columns or rows (depending on `axis`).
- **`Series.sum()`**: Computes the sum of the values.

---


In [None]:
""" pandas.Series.mode
Series.mode(dropna=True)[source]
Return the mode(s) of the Series.

The mode is the value that appears most often. There can be multiple modes.

Always returns Series even if only one value is returned.

Parameters
:
dropna
bool, default True
Don’t consider counts of NaN/NaT.

Returns
:
Series
Modes of the Series in sorted order. """
s = pd.Series([2, 4, 2, 2, 4, None])
s.mode()
s = pd.Series([2, 4, 8, 2, 4, None])
s.mode()
s = pd.Series([2, 4, None, None, 4, None])
s.mode(dropna=False)
s = pd.Series([2, 4, None, None, 4, None])
s.mode()

Here's a comprehensive breakdown of the **`pandas.Series.mode()`** method:

---

## **📌 Syntax**

```python
Series.mode(dropna=True)
```

---

## **📌 Parameters**

| Parameter | Type   | Default | Description                                                                                                                                              |
| --------- | ------ | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `dropna`  | `bool` | `True`  | Whether or not to ignore `NaN` values. If `True`, NaN is not considered in the mode calculation. If `False`, NaN will be treated as a mode if it occurs. |

---

## **📌 Return Value**

- Returns a **Series** containing the mode(s) of the Series. If there are multiple modes, all of them are returned in sorted order.

---

## **📌 Examples**

### **Single Mode Example**

```python
s = pd.Series([2, 4, 2, 2, 4, None])
print(s.mode())  # Output: 2.0
```

### **Multiple Modes Example**

```python
s = pd.Series([2, 4, 8, 2, 4, None])
print(s.mode())
```

**Output:**

```
0    2.0
1    4.0
dtype: float64
```

### **Mode Including `NaN` (dropna=False)**

```python
s = pd.Series([2, 4, None, None, 4, None])
print(s.mode(dropna=False))
```

**Output:**

```
0   NaN
dtype: float64
```

### **Mode Excluding `NaN` (dropna=True, default)**

```python
s = pd.Series([2, 4, None, None, 4, None])
print(s.mode())  # Output: 4.0
```

---

## **📌 Notes**

- **NaN/NaT handling**: By default, `dropna=True`, so any `NaN` values are excluded from the calculation. If `dropna=False`, `NaN` can also appear as a mode if it occurs.

---


In [None]:
""" pandas.Series.nlargest
Series.nlargest(n=5, keep='first')[source]
Return the largest n elements.

Parameters
:
n
int, default 5
Return this many descending sorted values.

keep
{‘first’, ‘last’, ‘all’}, default ‘first’
When there are duplicate values that cannot all fit in a Series of n elements:

first : return the first n occurrences in order of appearance.

last : return the last n occurrences in reverse order of appearance.

all : keep all occurrences. This can result in a Series of size larger than n.

Returns
:
Series
The n largest values in the Series, sorted in decreasing order.

Series.nsmallest
Get the n smallest elements.

Series.sort_values
Sort Series by values.

Series.head
Return the first n rows. """

""" Notes

Faster than .sort_values(ascending=False).head(n) for small n relative to the size of the Series object. """
countries_population = {"Italy": 59000000, "France": 65000000,
                        "Malta": 434000, "Maldives": 434000,
                        "Brunei": 434000, "Iceland": 337000,
                        "Nauru": 11300, "Tuvalu": 11300,
                        "Anguilla": 11300, "Montserrat": 5200}
s = pd.Series(countries_population)
s
s.nlargest()
#  The n largest elements where n=3. Default keep value is ‘first’ so Malta will be kept.

s.nlargest(3)
# The n largest elements where n=3 and keeping the last duplicates. Brunei will be kept since it is the last with value 434000 based on the index order.

s.nlargest(3, keep='last')

In [None]:
# The n largest elements where n=3 with all duplicates kept. Note that the returned Series has five elements due to the three duplicates.

s.nlargest(3, keep='all')

Here's a comprehensive breakdown of the **`pandas.Series.nlargest()`** method:

---

## **📌 Syntax**

```python
Series.nlargest(n=5, keep='first')
```

---

## **📌 Parameters**

| Parameter | Type                       | Default   | Description                                                          |
| --------- | -------------------------- | --------- | -------------------------------------------------------------------- |
| `n`       | `int`                      | `5`       | Number of largest elements to return.                                |
| `keep`    | `{'first', 'last', 'all'}` | `'first'` | Specifies which duplicate values to retain if there are ties.        |
| &nbsp;    | `'first'`                  | &nbsp;    | Keeps the first `n` occurrences in order of appearance.              |
| &nbsp;    | `'last'`                   | &nbsp;    | Keeps the last `n` occurrences in reverse order of appearance.       |
| &nbsp;    | `'all'`                    | &nbsp;    | Keeps all occurrences, which may result in a Series larger than `n`. |

---

## **📌 Return Value**

- Returns a **Series** containing the `n` largest values in **descending order**.

---

## **📌 Examples**

### **Example 1: Default behavior (`n=5`, `keep='first'`)**

```python
import pandas as pd

countries_population = {
    "Italy": 59000000, "France": 65000000, "Malta": 434000,
    "Maldives": 434000, "Brunei": 434000, "Iceland": 337000,
    "Nauru": 11300, "Tuvalu": 11300, "Anguilla": 11300, "Montserrat": 5200
}

s = pd.Series(countries_population)

print(s.nlargest())
```

**Output:**

```
France      65000000
Italy       59000000
Malta         434000
Maldives      434000
Brunei        434000
dtype: int64
```

---

### **Example 2: Get `n=3` largest values**

```python
print(s.nlargest(3))
```

**Output:**

```
France    65000000
Italy     59000000
Malta       434000
dtype: int64
```

- The function keeps only the first occurrence of `434000` (`Malta`).

---

### **Example 3: Keeping the last duplicates (`keep='last'`)**

```python
print(s.nlargest(3, keep='last'))
```

**Output:**

```
France      65000000
Italy       59000000
Brunei        434000
dtype: int64
```

- The function keeps the **last** occurrence of `434000` (`Brunei`).

---

### **Example 4: Keeping all duplicates (`keep='all'`)**

```python
print(s.nlargest(3, keep='all'))
```

**Output:**

```
France      65000000
Italy       59000000
Malta         434000
Maldives      434000
Brunei        434000
dtype: int64
```

- Since there are three duplicate values (`434000`), **all are kept**.

---

## **📌 Notes**

- Faster than **`.sort_values(ascending=False).head(n)`** for small `n` relative to the Series size.
- Works only on **numeric values**; will raise an error if the Series contains non-numeric data.

---


In [None]:
""" pandas.Series.nsmallest
Series.nsmallest(n=5, keep='first')[source]
Return the smallest n elements.

Parameters
:
n
int, default 5
Return this many ascending sorted values.

keep
{‘first’, ‘last’, ‘all’}, default ‘first’
When there are duplicate values that cannot all fit in a Series of n elements:

first : return the first n occurrences in order of appearance.

last : return the last n occurrences in reverse order of appearance.

all : keep all occurrences. This can result in a Series of size larger than n.

Returns
:
Series
The n smallest values in the Series, sorted in increasing order.


Series.nlargest
Get the n largest elements.

Series.sort_values
Sort Series by values.

Series.head
Return the first n rows.

Notes

Faster than .sort_values().head(n) for small n relative to the size of the Series object. """

countries_population = {"Italy": 59000000, "France": 65000000,
                        "Brunei": 434000, "Malta": 434000,
                        "Maldives": 434000, "Iceland": 337000,
                        "Nauru": 11300, "Tuvalu": 11300,
                        "Anguilla": 11300, "Montserrat": 5200}
s = pd.Series(countries_population)
s
# The n smallest elements where n=5 by default.

s.nsmallest()
# The n smallest elements where n=3. Default keep value is ‘first’ so Nauru and Tuvalu will be kept.

s.nsmallest(3)

In [None]:

# The n smallest elements where n=3 and keeping the last duplicates. Anguilla and Tuvalu will be kept since they are the last with value 11300 based on the index order.

s.nsmallest(3, keep='last')
# The n smallest elements where n=3 with all duplicates kept. Note that the returned Series has four elements due to the three duplicates.

s.nsmallest(3, keep='all')

The `pandas.Series.nsmallest` method retrieves the smallest elements from a Series, with options to handle duplicate values. Here's a concise breakdown:

### Parameters:

- **n**: Number of smallest elements to return (default: 5).
- **keep**: Determines handling of duplicates:
  - `'first'` (default): Includes the first occurrences in order of appearance.
  - `'last'`: Includes the last occurrences, returned in reverse order.
  - `'all'`: Includes all duplicates, possibly exceeding `n`.

### Returns:

- A Series containing the `n` smallest values, sorted in ascending order.

### Key Points:

- **Efficiency**: Faster than `.sort_values().head(n)` for small `n`.
- **Duplicates Handling**:
  - `'first'` selects the earliest entries when duplicates exist.
  - `'last'` selects the latest entries and reverses their order.
  - `'all'` includes all duplicates, which may result in a larger Series.

### Examples:

1. **Default (`keep='first'`)**:

   ```python
   s.nsmallest(3)
   # Returns: Montserrat (5200), Nauru (11300), Tuvalu (11300)
   ```

2. **Using `keep='last'`**:

   ```python
   s.nsmallest(3, keep='last')
   # Returns: Montserrat (5200), Anguilla (11300), Tuvalu (11300)
   ```

3. **Using `keep='all'`**:
   ```python
   s.nsmallest(3, keep='all')
   # Returns: Montserrat (5200), Nauru, Tuvalu, Anguilla (all 11300)
   ```

This method is ideal for quickly extracting the smallest values with flexible handling of ties.


In [None]:
""" pandas.Series.pct_change
Series.pct_change(periods=1, fill_method=<no_default>, limit=<no_default>, freq=None, **kwargs)[source]
Fractional change between the current and a prior element.

Computes the fractional change from the immediately previous row by default. This is useful in comparing the fraction of change in a time series of elements.

Note

Despite the name of this method, it calculates fractional change (also known as per unit change or relative change) and not percentage change. If you need the percentage change, multiply these values by 100.

Parameters:
periodsint, default 1
Periods to shift for forming percent change.

fill_method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default ‘pad’
How to handle NAs before computing percent changes.

Deprecated since version 2.1: All options of fill_method are deprecated except fill_method=None.

limitint, default None
The number of consecutive NAs to fill before stopping.

Deprecated since version 2.1.

freqDateOffset, timedelta, or str, optional
Increment to use from time series API (e.g. ‘ME’ or BDay()).

**kwargs
Additional keyword arguments are passed into DataFrame.shift or Series.shift.

Returns:
Series or DataFrame
The same type as the calling object.


Series.diff
Compute the difference of two elements in a Series.

DataFrame.diff
Compute the difference of two elements in a DataFrame.

Series.shift
Shift the index by some number of periods.

DataFrame.shift
Shift the index by some number of periods. """
s = pd.Series([90, 91, 85])
s
s.pct_change()
s.pct_change(periods=2)

In [None]:
# See the percentage change in a Series where filling NAs with last valid observation forward to next valid.

s = pd.Series([90, 91, None, 85])
s
s.ffill().pct_change()

In [None]:
# DataFrame

# Percentage change in French franc, Deutsche Mark, and Italian lira from 1980-01-01 to 1980-03-01.

df = pd.DataFrame({
    'FR': [4.0405, 4.0963, 4.3149],
    'GR': [1.7246, 1.7482, 1.8519],
    'IT': [804.74, 810.01, 860.13]},
    index=['1980-01-01', '1980-02-01', '1980-03-01'])
df
df.pct_change()
# Percentage of change in GOOG and APPL stock volume. Shows computing the percentage change between columns.
df = pd.DataFrame({
    '2016': [1769950, 30586265],
    '2015': [1500923, 40912316],
    '2014': [1371819, 41403351]},
    index=['GOOG', 'APPL'])
df
df.pct_change(axis='columns', periods=-1)

The `pandas.Series.pct_change` method calculates the **fractional change** (relative change) between the current and a prior element in a Series or DataFrame. It is commonly used to analyze time series data, such as stock prices, economic indicators, or other sequential data.

---

### **Parameters**:

1. **`periods`** (int, default: 1):

   - Number of periods to shift for calculating the change.
   - Example: `periods=2` computes the change relative to the value two rows prior.

2. **`fill_method`** (str, default: `None`):

   - Specifies how to handle missing values (`NaN`) before computing the change.
   - Options: `'backfill'`, `'bfill'`, `'pad'`, `'ffill'`, or `None`.
   - **Deprecated since version 2.1**: Only `fill_method=None` is supported.

3. **`limit`** (int, default: `None`):

   - Maximum number of consecutive `NaN` values to fill before stopping.
   - **Deprecated since version 2.1**.

4. **`freq`** (DateOffset, timedelta, or str, optional):

   - Increment to use from the time series API (e.g., `'D'` for daily, `'ME'` for month-end).

5. **`**kwargs`\*\*:
   - Additional arguments passed to `Series.shift` or `DataFrame.shift`.

---

### **Returns**:

- **Series or DataFrame**:
  - The fractional change between the current and prior elements.
  - The first element(s) will be `NaN` because there is no prior value to compare.

---

### **Key Notes**:

- **Fractional Change**:

  - The method calculates the fractional change, not the percentage change. To convert to percentage, multiply the result by 100.
  - Formula:  
    \[
    \text{Fractional Change} = \frac{\text{Current Value} - \text{Previous Value}}{\text{Previous Value}}
    \]

- **Handling Missing Values**:

  - If `fill_method` is used, missing values are filled before computing the change.
  - Example: `ffill` propagates the last valid observation forward.

- **Axis in DataFrames**:
  - For DataFrames, you can specify `axis='columns'` to compute changes row-wise (between columns).

---

### **Examples**:

#### **1. Basic Usage with Series**:

```python
import pandas as pd

s = pd.Series([90, 91, 85])
print(s.pct_change())
```

**Output**:

```
0         NaN
1    0.011111
2   -0.065934
dtype: float64
```

#### **2. Using `periods`**:

```python
print(s.pct_change(periods=2))
```

**Output**:

```
0         NaN
1         NaN
2   -0.055556
dtype: float64
```

#### **3. Handling Missing Values**:

```python
s = pd.Series([90, 91, None, 85])
print(s.ffill().pct_change())
```

**Output**:

```
0         NaN
1    0.011111
2    0.000000
3   -0.065934
dtype: float64
```

#### **4. DataFrame Example**:

```python
df = pd.DataFrame({
    'FR': [4.0405, 4.0963, 4.3149],
    'GR': [1.7246, 1.7482, 1.8519],
    'IT': [804.74, 810.01, 860.13]},
    index=['1980-01-01', '1980-02-01', '1980-03-01']
)
print(df.pct_change())
```

**Output**:

```
                  FR        GR        IT
1980-01-01       NaN       NaN       NaN
1980-02-01  0.013810  0.013684  0.006549
1980-03-01  0.053365  0.059318  0.061876
```

#### **5. Row-wise Change in DataFrame**:

```python
df = pd.DataFrame({
    '2016': [1769950, 30586265],
    '2015': [1500923, 40912316],
    '2014': [1371819, 41403351]},
    index=['GOOG', 'APPL']
)
print(df.pct_change(axis='columns', periods=-1))
```

**Output**:

```
          2016      2015  2014
GOOG  0.179241  0.094112   NaN
APPL -0.252395 -0.011860   NaN
```

---

### **See Also**:

- **`Series.diff`**: Computes the difference between elements.
- **`Series.shift`**: Shifts the index by a specified number of periods.
- **`DataFrame.pct_change`**: Similar functionality for DataFrames.

This method is particularly useful for analyzing trends and growth rates in time series data.


In [None]:
""" pandas.Series.prod
Series.prod(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)[source]
Return the product of the values over the requested axis.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.prod with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Series.sum
Return the sum.

Series.min
Return the minimum.

Series.max
Return the maximum.

Series.idxmin
Return the index of the minimum.

Series.idxmax
Return the index of the maximum.

DataFrame.sum
Return the sum over the requested axis.

DataFrame.min
Return the minimum over the requested axis.

DataFrame.max
Return the maximum over the requested axis.

DataFrame.idxmin
Return the index of the minimum over the requested axis.

DataFrame.idxmax
Return the index of the maximum over the requested axis. """
pd.Series([], dtype="float64").prod()
# This can be controlled with the min_count parameter 

pd.Series([], dtype="float64").prod(min_count=1)

In [None]:
# Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

pd.Series([np.nan]).prod()
pd.Series([np.nan]).prod(min_count=1)

The `pandas.Series.prod` method calculates the **product** of the values in a Series. It is useful for computing the multiplicative result of all elements in the Series. Here's a detailed explanation of its functionality:

---

### **Parameters**:

1. **`axis`** (int, default: `None`):

   - Axis for the function to be applied on. For a Series, this parameter is unused and defaults to `0`.
   - **Warning**: For DataFrames, the behavior of `axis=None` is deprecated. Use `axis=0` to retain the old behavior.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing the product.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). This parameter is not implemented for Series.

4. **`min_count`** (int, default: `0`):

   - The minimum number of valid (non-`NaN`) values required to perform the operation.
   - If fewer than `min_count` non-`NaN` values are present, the result will be `NaN`.

5. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The product of the values in the Series. If the Series is empty or contains only `NaN` values, the result depends on the `min_count` parameter.

---

### **Key Notes**:

- **Default Behavior**:

  - By default, the product of an empty or all-`NaN` Series is `1.0`.
  - This behavior can be controlled using the `min_count` parameter.

- **Handling Missing Values**:

  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for the product.

- **`min_count` Parameter**:
  - If `min_count` is set, the operation requires at least that many non-`NaN` values to return a valid result. Otherwise, the result is `NaN`.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3, 4])
print(s.prod())
```

**Output**:

```
24
```

Explanation: \(1 \times 2 \times 3 \times 4 = 24\).

---

#### **2. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4])
print(s.prod(skipna=True))
```

**Output**:

```
8
```

Explanation: \(1 \times 2 \times 4 = 8\). The `NaN` value is ignored.

---

#### **3. Using `min_count`**:

```python
s = pd.Series([1, 2, None, 4])
print(s.prod(min_count=4))
```

**Output**:

```
nan
```

Explanation: There are only 3 non-`NaN` values, which is fewer than `min_count=4`. Hence, the result is `NaN`.

---

#### **4. Empty Series**:

```python
s = pd.Series([], dtype="float64")
print(s.prod())
```

**Output**:

```
1.0
```

Explanation: By default, the product of an empty Series is `1.0`.

---

#### **5. All-`NaN` Series**:

```python
s = pd.Series([np.nan, np.nan])
print(s.prod())
```

**Output**:

```
1.0
```

Explanation: By default, the product of an all-`NaN` Series is `1.0`.

---

#### **6. Controlling Empty/All-`NaN` Behavior with `min_count`**:

```python
s = pd.Series([], dtype="float64")
print(s.prod(min_count=1))
```

**Output**:

```
nan
```

Explanation: Since `min_count=1` and there are no valid values, the result is `NaN`.

---

### **See Also**:

- **`Series.sum`**: Returns the sum of the values.
- **`Series.min`**: Returns the minimum value.
- **`Series.max`**: Returns the maximum value.
- **`Series.idxmin`**: Returns the index of the minimum value.
- **`Series.idxmax`**: Returns the index of the maximum value.
- **`DataFrame.prod`**: Similar functionality for DataFrames.

---

### **Summary**:

- Use `Series.prod` to compute the product of values in a Series.
- Control the handling of missing values with `skipna` and `min_count`.
- Be cautious with empty or all-`NaN` Series, as the default behavior returns `1.0` unless `min_count` is specified.


In [None]:
""" pandas.Series.quantile
Series.quantile(q=0.5, interpolation='linear')[source]
Return value at the given quantile.

Parameters:
qfloat or array-like, default 0.5 (50% quantile)
The quantile(s) to compute, which can lie in range: 0 <= q <= 1.

interpolation{‘linear’, ‘lower’, ‘higher’, ‘midpoint’, ‘nearest’}
This optional parameter specifies the interpolation method to use, when the desired quantile lies between two data points i and j:

linear: i + (j - i) * (x-i)/(j-i), where (x-i)/(j-i) is the fractional part of the index surrounded by i > j.

lower: i.

higher: j.

nearest: i or j whichever is nearest.

midpoint: (i + j) / 2.

Returns:
float or Series
If q is an array, a Series will be returned where the index is q and the values are the quantiles, otherwise a float will be returned.



core.window.Rolling.quantile
Calculate the rolling quantile.

numpy.percentile
Returns the q-th percentile(s) of the array elements. """
s = pd.Series([1, 2, 3, 4])
s.quantile(.5)

s.quantile([.25, .5, .75])

The `pandas.Series.quantile` method calculates the value at a specified quantile (or quantiles) of the data in a Series. Quantiles are points that divide the data into equal intervals, such as the median (50th percentile), quartiles (25th, 50th, 75th percentiles), etc. Here's a detailed explanation of its functionality:

---

### **Parameters**:

1. **`q`** (float or array-like, default: `0.5`):

   - The quantile(s) to compute. Must be between `0` and `1` (inclusive).
   - Examples:
     - `q=0.5` computes the median (50th percentile).
     - `q=[0.25, 0.5, 0.75]` computes the first quartile, median, and third quartile.

2. **`interpolation`** (str, default: `'linear'`):
   - Specifies the interpolation method to use when the desired quantile lies between two data points. Options:
     - `'linear'`: Computes \(i + (j - i) \times \text{fraction}\), where `i` and `j` are the surrounding data points.
     - `'lower'`: Uses the lower data point (`i`).
     - `'higher'`: Uses the higher data point (`j`).
     - `'nearest'`: Uses the nearest data point (`i` or `j`).
     - `'midpoint'`: Computes the average of `i` and `j`.

---

### **Returns**:

- **float or Series**:
  - If `q` is a single value, returns a float representing the quantile.
  - If `q` is an array-like, returns a Series where the index is `q` and the values are the computed quantiles.

---

### **Key Notes**:

- **Quantile Calculation**:

  - The quantile is calculated based on the sorted values of the Series.
  - The interpolation method determines how to handle cases where the quantile lies between two data points.

- **Handling Edge Cases**:
  - If the Series is empty, the result will be `NaN`.
  - If `q` is outside the range `[0, 1]`, a `ValueError` will be raised.

---

### **Examples**:

#### **1. Basic Usage (Single Quantile)**:

```python
import pandas as pd

s = pd.Series([1, 2, 3, 4])
print(s.quantile(0.5))  # Median
```

**Output**:

```
2.5
```

Explanation: The median (50th percentile) of `[1, 2, 3, 4]` is `2.5`.

---

#### **2. Multiple Quantiles**:

```python
print(s.quantile([0.25, 0.5, 0.75]))
```

**Output**:

```
0.25    1.75
0.50    2.50
0.75    3.25
dtype: float64
```

Explanation:

- 25th percentile: \(1 + (2 - 1) \times 0.25 = 1.75\)
- 50th percentile: \(2 + (3 - 2) \times 0.5 = 2.5\)
- 75th percentile: \(3 + (4 - 3) \times 0.75 = 3.25\)

---

#### **3. Using Different Interpolation Methods**:

```python
print(s.quantile(0.5, interpolation='lower'))  # Uses the lower value
print(s.quantile(0.5, interpolation='higher'))  # Uses the higher value
print(s.quantile(0.5, interpolation='midpoint'))  # Averages the two values
```

**Output**:

```
2.0  # Lower
3.0  # Higher
2.5  # Midpoint
```

---

#### **4. Edge Cases**:

- **Empty Series**:

  ```python
  s = pd.Series([])
  print(s.quantile(0.5))
  ```

  **Output**:

  ```
  nan
  ```

- **Quantile Outside Range**:
  ```python
  s = pd.Series([1, 2, 3, 4])
  print(s.quantile(1.5))  # Raises ValueError
  ```

---

### **See Also**:

- **`core.window.Rolling.quantile`**: Calculates the rolling quantile for a Series or DataFrame.
- **`numpy.percentile`**: Computes the q-th percentile(s) of array elements.

---

### **Summary**:

- Use `Series.quantile` to compute quantiles for a Series.
- Specify the quantile(s) using the `q` parameter.
- Control interpolation behavior with the `interpolation` parameter.
- Returns a float for a single quantile or a Series for multiple quantiles.


In [None]:
""" pandas.Series.rank
Series.rank(axis=0, method='average', numeric_only=False, na_option='keep', ascending=True, pct=False)[source]
Compute numerical data ranks (1 through n) along axis.

By default, equal values are assigned a rank that is the average of the ranks of those values.

Parameters:
axis{0 or ‘index’, 1 or ‘columns’}, default 0
Index to direct ranking. For Series this parameter is unused and defaults to 0.

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’
How to rank the group of records that have the same value (i.e. ties):

average: average rank of the group

min: lowest rank in the group

max: highest rank in the group

first: ranks assigned in order they appear in the array

dense: like ‘min’, but rank always increases by 1 between groups.

numeric_onlybool, default False
For DataFrame objects, rank only numeric columns if set to True.

Changed in version 2.0.0: The default value of numeric_only is now False.

na_option{‘keep’, ‘top’, ‘bottom’}, default ‘keep’
How to rank NaN values:

keep: assign NaN rank to NaN values

top: assign lowest rank to NaN values

bottom: assign highest rank to NaN values

ascendingbool, default True
Whether or not the elements should be ranked in ascending order.

pctbool, default False
Whether or not to display the returned rankings in percentile form.

Returns:
same type as caller
Return a Series or DataFrame with data ranks as values.


core.groupby.DataFrameGroupBy.rank
Rank of values within each group.

core.groupby.SeriesGroupBy.rank
Rank of values within each group. """
df = pd.DataFrame(data={'Animal': ['cat', 'penguin', 'dog',
                                   'spider', 'snake'],
                        'Number_legs': [4, 2, 4, 8, np.nan]})
df


# Ties are assigned the mean of the ranks (by default) for the group.

s = pd.Series(range(5), index=list("abcde"))
s["d"] = s["b"]
s.rank()

#  The following example shows how the method behaves with the above parameters:

#  default_rank: this is the default behaviour obtained without using any parameter.

# max_rank: setting method = 'max' the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
# 
# NA_bottom: choosing na_option = 'bottom', if there are records with NaN values they are placed at the bottom of the ranking.

# pct_rank: when setting pct = True, the ranking is expressed as percentile rank.

df['default_rank'] = df['Number_legs'].rank()
df['max_rank'] = df['Number_legs'].rank(method='max')
df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
df['pct_rank'] = df['Number_legs'].rank(pct=True)
df

The `pandas.Series.rank` method assigns ranks to the values in a Series, where the smallest value is ranked 1 by default. It provides flexibility in handling ties (equal values) and missing values (`NaN`). Here's a detailed explanation of its functionality:

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to rank along. For Series, this parameter is unused and defaults to `0`.

2. **`method`** (str, default: `'average'`):

   - Specifies how to rank tied values:
     - `'average'`: Assigns the average rank of the group (default).
     - `'min'`: Assigns the minimum rank of the group.
     - `'max'`: Assigns the maximum rank of the group.
     - `'first'`: Assigns ranks in the order they appear in the Series.
     - `'dense'`: Like `'min'`, but ranks increase by 1 between groups.

3. **`numeric_only`** (bool, default: `False`):

   - If `True`, ranks only numeric columns. Not applicable for Series.

4. **`na_option`** (str, default: `'keep'`):

   - Specifies how to handle `NaN` values:
     - `'keep'`: Assigns `NaN` rank to `NaN` values.
     - `'top'`: Assigns the lowest rank to `NaN` values.
     - `'bottom'`: Assigns the highest rank to `NaN` values.

5. **`ascending`** (bool, default: `True`):

   - If `True`, ranks values in ascending order (smallest value gets rank 1).
   - If `False`, ranks values in descending order (largest value gets rank 1).

6. **`pct`** (bool, default: `False`):
   - If `True`, returns ranks as percentiles (ranging from 0 to 1).

---

### **Returns**:

- **Series**:
  - A Series with the same index as the original, containing the computed ranks.

---

### **Key Notes**:

- **Handling Ties**:

  - The `method` parameter determines how tied values are ranked.
  - Example: For values `[3, 3, 2]`, the ranks depend on the method:
    - `'average'`: `[2.5, 2.5, 1]`
    - `'min'`: `[2, 2, 1]`
    - `'max'`: `[3, 3, 1]`
    - `'first'`: `[2, 3, 1]`
    - `'dense'`: `[2, 2, 1]`

- **Handling Missing Values**:

  - The `na_option` parameter controls how `NaN` values are ranked.
  - Example: For `[1, 2, NaN]`:
    - `'keep'`: `[1, 2, NaN]`
    - `'top'`: `[2, 3, 1]`
    - `'bottom'`: `[1, 2, 3]`

- **Percentile Ranks**:
  - If `pct=True`, ranks are expressed as percentiles (e.g., `0.25` for the 25th percentile).

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([3, 2, 3, 1])
print(s.rank())
```

**Output**:

```
0    3.5
1    2.0
2    3.5
3    1.0
dtype: float64
```

Explanation:

- The smallest value (`1`) gets rank `1`.
- The tied values (`3` and `3`) get the average rank of `3.5`.

---

#### **2. Using Different Methods**:

```python
print(s.rank(method='min'))  # Assigns the minimum rank to ties
print(s.rank(method='max'))  # Assigns the maximum rank to ties
print(s.rank(method='first'))  # Assigns ranks in order of appearance
```

**Output**:

```
0    3.0
1    2.0
2    3.0
3    1.0
dtype: float64

0    4.0
1    2.0
2    4.0
3    1.0
dtype: float64

0    3.0
1    2.0
2    4.0
3    1.0
dtype: float64
```

---

#### **3. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4])
print(s.rank(na_option='top'))  # Assigns the lowest rank to NaN
print(s.rank(na_option='bottom'))  # Assigns the highest rank to NaN
```

**Output**:

```
0    2.0
1    3.0
2    1.0
3    4.0
dtype: float64

0    1.0
1    2.0
2    4.0
3    3.0
dtype: float64
```

---

#### **4. Percentile Ranks**:

```python
print(s.rank(pct=True))
```

**Output**:

```
0    0.333333
1    0.666667
2         NaN
3    1.000000
dtype: float64
```

Explanation:

- Ranks are expressed as percentiles (e.g., `0.333333` for the 33rd percentile).

---

#### **5. DataFrame Example**:

```python
df = pd.DataFrame({
    'Animal': ['cat', 'penguin', 'dog', 'spider', 'snake'],
    'Number_legs': [4, 2, 4, 8, None]
})
df['default_rank'] = df['Number_legs'].rank()
df['max_rank'] = df['Number_legs'].rank(method='max')
df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom')
df['pct_rank'] = df['Number_legs'].rank(pct=True)
print(df)
```

**Output**:

```
    Animal  Number_legs  default_rank  max_rank  NA_bottom  pct_rank
0      cat          4.0           2.5       3.0        2.5     0.625
1  penguin          2.0           1.0       1.0        1.0     0.250
2      dog          4.0           2.5       3.0        2.5     0.625
3   spider          8.0           4.0       4.0        4.0     1.000
4    snake          NaN           NaN       NaN        5.0       NaN
```

---

### **See Also**:

- **`core.groupby.DataFrameGroupBy.rank`**: Ranks values within each group in a DataFrame.
- **`core.groupby.SeriesGroupBy.rank`**: Ranks values within each group in a Series.

---

### **Summary**:

- Use `Series.rank` to assign ranks to values in a Series.
- Control tie-breaking behavior with the `method` parameter.
- Handle missing values using the `na_option` parameter.
- Use `pct=True` to express ranks as percentiles.


In [None]:
""" pandas.Series.sem
Series.sem(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]
Return unbiased standard error of the mean over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument

Parameters:
axis{index (0)}
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.sem with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

Returns:
scalar or Series (if level specified) """
s = pd.Series([1, 2, 3])
s.sem().round(6)

In [None]:
# With a DataFrame

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
df

# Using axis=1

df.sem(axis=1)

# In this case, numeric_only should be set to True to avoid getting an error.

df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']},
                  index=['tiger', 'zebra'])
df.sem(numeric_only=True)

The `pandas.Series.sem` method calculates the **standard error of the mean (SEM)** for the values in a Series. The SEM measures the precision of the sample mean as an estimate of the population mean. It is calculated as the standard deviation divided by the square root of the sample size, adjusted by the degrees of freedom (`ddof`).

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to compute the SEM along. For Series, this parameter is unused and defaults to `0`.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing the SEM.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`ddof`** (int, default: `1`):

   - Delta Degrees of Freedom. The divisor used in the calculation is (N - ddof) , where N is the number of non-`NaN` values.
   - Default is `1`, which corresponds to the unbiased estimate of the SEM.

4. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). Not implemented for Series.

5. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The standard error of the mean for the Series.

---

### **Key Notes**:

- **Formula**:

  SEM = Standard Deviation/ (sqrt{N - ddof})

  - Standard Deviation: Measures the spread of the data.
  - N: Number of non-`NaN` values.
  - `ddof`: Adjusts the degrees of freedom.

- **Handling Missing Values**:

  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for the SEM.

- **Degrees of Freedom (`ddof`)**:
  - The default `ddof=1` provides an unbiased estimate of the SEM.
  - Set `ddof=0` to use the population formula (no adjustment).

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.sem().round(6))
```

**Output**:

```
0.57735
```

---

#### **2. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4])
print(s.sem(skipna=True))
```

**Output**:

```
0.881917
```

---

#### **3. Using `ddof`**:

```python
s = pd.Series([1, 2, 3])
print(s.sem(ddof=0))
```

**Output**:

```
0.471405
```

---

#### **4. DataFrame Example**:

```python
df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
print(df.sem())
```

**Output**:

```
a   0.5
b   0.5
dtype: float64
```

---

#### **5. Handling Non-Numeric Columns**:

```python
df = pd.DataFrame({'a': [1, 2], 'b': ['T', 'Z']}, index=['tiger', 'zebra'])
print(df.sem(numeric_only=True))
```

**Output**:

```
a   0.5
dtype: float64
```

Explanation:

- Only numeric column `a` is included in the calculation.

---

### **See Also**:

- **`Series.std`**: Computes the standard deviation of the Series.
- **`Series.mean`**: Computes the mean of the Series.
- **`DataFrame.sem`**: Computes the SEM for each column in a DataFrame.

---

### **Summary**:

- Use `Series.sem` to compute the standard error of the mean for a Series.
- Adjust the degrees of freedom using the `ddof` parameter.
- Handle missing values with the `skipna` parameter.
- For DataFrames, use `numeric_only=True` to exclude non-numeric columns.


In [None]:
""" pandas.Series.skew
Series.skew(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return unbiased skew over requested axis.

Normalized by N-1.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar """
s=pd.Series([1, 2, 3])
s.skew()

# With a DataFrame 

df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]},
                  index=['tiger', 'zebra', 'cow'])
df

In [None]:
# Using axis=1

df.skew(axis=1)

# In this case, numeric_only should be set to True to avoid getting an error. 

df = pd.DataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']},
                  index=['tiger', 'zebra', 'cow'])
df.skew(numeric_only=True)

The `pandas.Series.skew` method calculates the **skewness** of the data in a Series. Skewness measures the asymmetry of the distribution of values around the mean. A skewness of 0 indicates a symmetric distribution, while positive or negative skewness indicates a longer tail on the right or left side, respectively.

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to compute skewness along. For Series, this parameter is unused and defaults to `0`.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing skewness.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). Not implemented for Series.

4. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The skewness of the Series.

---

### **Key Notes**:

- **Skewness Formula**:

  Skewness = (1/N) _ (sum\_{i=1}^N (x_i - bar{x})^3)_ (left(1/N)_ (sum\_{i=1}^N (x_i - bar{x})^2_ right)^{3/2})

  - x_i: Individual data points.
  - bar{x}: Mean of the data.
  - N: Number of non-`NaN` values.

- **Interpretation**:

  - **Skewness = 0**: Symmetric distribution.
  - **Skewness > 0**: Positive skew (longer tail on the right).
  - **Skewness < 0**: Negative skew (longer tail on the left).

- **Handling Missing Values**:
  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for skewness.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.skew())
```

**Output**:

```
0.0
```

Explanation:

- The distribution `[1, 2, 3]` is symmetric, so skewness is `0`.

---

#### **2. Positive Skew**:

```python
s = pd.Series([1, 2, 3, 4, 100])
print(s.skew())
```

**Output**:

```
2.038
```

Explanation:

- The distribution has a longer tail on the right, indicating positive skewness.

---

#### **3. Negative Skew**:

```python
s = pd.Series([100, 4, 3, 2, 1])
print(s.skew())
```

**Output**:

```
-2.038
```

Explanation:

- The distribution has a longer tail on the left, indicating negative skewness.

---

#### **4. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4])
print(s.skew(skipna=True))
```

**Output**:

```
0.0
```

Explanation:

- Non-`NaN` values: `[1, 2, 4]`
- The distribution is symmetric, so skewness is `0`.

---

#### **5. DataFrame Example**:

```python
df = pd.DataFrame({'a': [1, 2, 3], 'b': [2, 3, 4], 'c': [1, 3, 5]},
                  index=['tiger', 'zebra', 'cow'])
print(df.skew())
```

**Output**:

```
a   0.0
b   0.0
c   0.0
dtype: float64
```

Explanation:

- All columns have symmetric distributions, so skewness is `0`.

---

#### **6. Row-wise Skewness**:

```python
print(df.skew(axis=1))
```

**Output**:

```
tiger   1.732051
zebra  -1.732051
cow     0.000000
dtype: float64
```

Explanation:

- Skewness is computed row-wise for each row in the DataFrame.

---

#### **7. Handling Non-Numeric Columns**:

```python
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['T', 'Z', 'X']},
                  index=['tiger', 'zebra', 'cow'])
print(df.skew(numeric_only=True))
```

**Output**:

```
a   0.0
dtype: float64
```

Explanation:

- Only numeric column `a` is included in the calculation.

---

### **See Also**:

- **`Series.kurt`**: Computes the kurtosis of the Series.
- **`Series.mean`**: Computes the mean of the Series.
- **`Series.std`**: Computes the standard deviation of the Series.

---

### **Summary**:

- Use `Series.skew` to compute the skewness of a Series.
- Skewness measures the asymmetry of the data distribution.
- Handle missing values with the `skipna` parameter.
- For DataFrames, use `numeric_only=True` to exclude non-numeric columns.


In [None]:
""" pandas.Series.std
Series.std(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]
Return sample standard deviation over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
axis{index (0)}
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.std with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipnabool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddofint, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

Returns:
scalar or Series (if level specified)
Notes

To have the same behaviour as numpy.std, use ddof=0 (instead of the default ddof=1) """

df = pd.DataFrame({'person_id': [0, 1, 2, 3],
                   'age': [21, 25, 62, 43],
                   'height': [1.61, 1.87, 1.49, 2.01]}
                  ).set_index('person_id')
df
df.std()

In [None]:
# Alternatively, ddof=0 can be set to normalize by N instead of N-1:

df.std(ddof=0)

The `pandas.Series.std` method calculates the **standard deviation** of the values in a Series. The standard deviation measures the amount of variation or dispersion in the data. By default, it uses \(N-1\) as the denominator (Bessel's correction) to provide an unbiased estimate of the population standard deviation when working with a sample.

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to compute the standard deviation along. For Series, this parameter is unused and defaults to `0`.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing the standard deviation.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`ddof`** (int, default: `1`):

   - Delta Degrees of Freedom. The divisor used in the calculation is (N - text{ddof}), where N is the number of non-`NaN` values.
   - Default is `1`, which corresponds to the sample standard deviation.
   - Set `ddof=0` to compute the population standard deviation.

4. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). Not implemented for Series.

5. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The standard deviation of the Series.

---

### **Key Notes**:

- **Formula**:

  Standard Deviation = sqrt((1/{N - {ddof}}) _(sum\_{i=1}^N _ (x_i - bar{x})^2)

  - x_i: Individual data points.
    -bar{x}: Mean of the data.
  - N: Number of non-`NaN` values.
  - text{ddof}: Adjusts the degrees of freedom.

- **Handling Missing Values**:

  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for the standard deviation.

- **Degrees of Freedom (`ddof`)**:
  - The default `ddof=1` provides an unbiased estimate of the standard deviation for a sample.
  - Set `ddof=0` to use the population formula (no adjustment).

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
print(s.std())
```

**Output**:

```
1.581139
```

---

#### **2. Using `ddof=0`**:

```python
print(s.std(ddof=0))
```

**Output**:

```
1.414214
```

---

#### **3. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4, 5])
print(s.std(skipna=True))
```

**Output**:

```
1.825742
```

---

#### **4. DataFrame Example**:

```python
df = pd.DataFrame({
    'age': [21, 25, 62, 43],
    'height': [1.61, 1.87, 1.49, 2.01]
})
print(df.std())
```

**Output**:

```
age       18.786076
height     0.237417
dtype: float64
```

Explanation:

- Standard deviation is computed for each column.

---

#### **5. Population Standard Deviation**:

```python
print(df.std(ddof=0))
```

**Output**:

```
age       16.269219
height     0.205609
dtype: float64
```

Explanation:

- Population standard deviation is computed for each column.

---

### **See Also**:

- **`Series.var`**: Computes the variance of the Series.
- **`Series.mean`**: Computes the mean of the Series.
- **`Series.sem`**: Computes the standard error of the mean.

---

### **Summary**:

- Use `Series.std` to compute the standard deviation of a Series.
- Adjust the degrees of freedom using the `ddof` parameter.
- Handle missing values with the `skipna` parameter.
- For DataFrames, the method computes the standard deviation for each column by default.


In [None]:
""" pandas.Series.sum
Series.sum(axis=None, skipna=True, numeric_only=False, min_count=0, **kwargs)[source]
Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.sum with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar

Series.sum
Return the sum.

Series.min
Return the minimum.

Series.max
Return the maximum.

Series.idxmin
Return the index of the minimum.

Series.idxmax
Return the index of the maximum.

DataFrame.sum
Return the sum over the requested axis.

DataFrame.min
Return the minimum over the requested axis.

DataFrame.max
Return the maximum over the requested axis.

DataFrame.idxmin
Return the index of the minimum over the requested axis.

DataFrame.idxmax
Return the index of the maximum over the requested axis. """
idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']],
    names=['blooded', 'animal'])
s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
s
s.sum()
# By default, the sum of an empty or all-NA Series is 0.

pd.Series([], dtype="float64").sum()  # min_count=0 is the default
# This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1.

pd.Series([], dtype="float64").sum(min_count=1)
# Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

pd.Series([np.nan]).sum()
pd.Series([np.nan]).sum(min_count=1)

The `pandas.Series.sum` method calculates the **sum** of the values in a Series. It is a versatile method that allows you to handle missing values (`NaN`) and control the behavior when there are insufficient non-missing values using the `min_count` parameter.

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to compute the sum along. For Series, this parameter is unused and defaults to `0`.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing the sum.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). Not implemented for Series.

4. **`min_count`** (int, default: `0`):

   - The minimum number of valid (non-`NaN`) values required to perform the operation.
   - If fewer than `min_count` non-`NaN` values are present, the result will be `NaN`.

5. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The sum of the values in the Series. If the Series is empty or contains only `NaN` values, the result depends on the `min_count` parameter.

---

### **Key Notes**:

- **Default Behavior**:

  - By default, the sum of an empty or all-`NaN` Series is `0.0`.
  - This behavior can be controlled using the `min_count` parameter.

- **Handling Missing Values**:

  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for the sum.

- **`min_count` Parameter**:
  - If `min_count` is set, the operation requires at least that many non-`NaN` values to return a valid result. Otherwise, the result is `NaN`.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
print(s.sum())
```

**Output**:

```
15
```

Explanation: \(1 + 2 + 3 + 4 + 5 = 15\).

---

#### **2. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4, 5])
print(s.sum(skipna=True))
```

**Output**:

```
12
```

Explanation: \(1 + 2 + 4 + 5 = 12\). The `NaN` value is ignored.

---

#### **3. Using `min_count`**:

```python
s = pd.Series([1, 2, None, 4, 5])
print(s.sum(min_count=5))
```

**Output**:

```
nan
```

Explanation: There are only 4 non-`NaN` values, which is fewer than `min_count=5`. Hence, the result is `NaN`.

---

#### **4. Empty Series**:

```python
s = pd.Series([], dtype="float64")
print(s.sum())
```

**Output**:

```
0.0
```

Explanation: By default, the sum of an empty Series is `0.0`.

---

#### **5. All-`NaN` Series**:

```python
s = pd.Series([np.nan, np.nan])
print(s.sum())
```

**Output**:

```
0.0
```

Explanation: By default, the sum of an all-`NaN` Series is `0.0`.

---

#### **6. Controlling Empty/All-`NaN` Behavior with `min_count`**:

```python
s = pd.Series([], dtype="float64")
print(s.sum(min_count=1))
```

**Output**:

```
nan
```

Explanation: Since `min_count=1` and there are no valid values, the result is `NaN`.

---

#### **7. MultiIndex Series**:

```python
idx = pd.MultiIndex.from_arrays([
    ['warm', 'warm', 'cold', 'cold'],
    ['dog', 'falcon', 'fish', 'spider']],
    names=['blooded', 'animal'])
s = pd.Series([4, 2, 0, 8], name='legs', index=idx)
print(s.sum())
```

**Output**:

```
14
```

Explanation: (4 + 2 + 0 + 8 = 14).

---

### **See Also**:

- **`Series.min`**: Returns the minimum value.
- **`Series.max`**: Returns the maximum value.
- **`Series.mean`**: Returns the mean of the values.
- **`DataFrame.sum`**: Similar functionality for DataFrames.

---

### **Summary**:

- Use `Series.sum` to compute the sum of values in a Series.
- Control the handling of missing values with `skipna` and `min_count`.
- Be cautious with empty or all-`NaN` Series, as the default behavior returns `0.0` unless `min_count` is specified.


In [None]:
""" pandas.Series.var
Series.var(axis=None, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]
Return unbiased variance over requested axis.

Normalized by N-1 by default. This can be changed using the ddof argument.

Parameters:
axis: {index (0)}
For Series this parameter is unused and defaults to 0.

Warning

The behavior of DataFrame.var with axis=None is deprecated, in a future version this will reduce over both axes and return a scalar To retain the old behavior, pass axis=0 (or do not pass axis).

skipna : bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

ddof: int, default 1
Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements.

numeric_only : bool, default False
Include only float, int, boolean columns. Not implemented for Series.

Returns:
scalar or Series (if level specified) """

df = pd.DataFrame({'person_id': [0, 1, 2, 3],
                   'age': [21, 25, 62, 43],
                   'height': [1.61, 1.87, 1.49, 2.01]}
                  ).set_index('person_id')
df 
df.var()
# Alternatively, ddof=0 can be set to normalize by N instead of N-1:

df.var(ddof=0)

The `pandas.Series.var` method calculates the **variance** of the values in a Series. Variance measures the spread or dispersion of the data points around the mean. By default, it uses \(N-1\) as the denominator (Bessel's correction) to provide an unbiased estimate of the population variance when working with a sample.

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to compute the variance along. For Series, this parameter is unused and defaults to `0`.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing the variance.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`ddof`** (int, default: `1`):

   - Delta Degrees of Freedom. The divisor used in the calculation is \(N - \text{ddof}\), where \(N\) is the number of non-`NaN` values.
   - Default is `1`, which corresponds to the sample variance.
   - Set `ddof=0` to compute the population variance.

4. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). Not implemented for Series.

5. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The variance of the Series.

---

### **Key Notes**:

- **Formula**:
  \[
  \text{Variance} = \frac{1}{N - \text{ddof}} \sum\_{i=1}^N (x_i - \bar{x})^2
  \]

  - \(x_i\): Individual data points.
  - \(\bar{x}\): Mean of the data.
  - \(N\): Number of non-`NaN` values.
  - \(\text{ddof}\): Adjusts the degrees of freedom.

- **Handling Missing Values**:

  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for the variance.

- **Degrees of Freedom (`ddof`)**:
  - The default `ddof=1` provides an unbiased estimate of the variance for a sample.
  - Set `ddof=0` to use the population formula (no adjustment).

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
print(s.var())
```

**Output**:

```
2.5
```

Explanation:

- Mean: \(\frac{1 + 2 + 3 + 4 + 5}{5} = 3\)
- Variance: \(\frac{(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2}{5 - 1} = 2.5\)

---

#### **2. Using `ddof=0`**:

```python
print(s.var(ddof=0))
```

**Output**:

```
2.0
```

Explanation:

- Variance: \(\frac{(1-3)^2 + (2-3)^2 + (3-3)^2 + (4-3)^2 + (5-3)^2}{5} = 2.0\)

---

#### **3. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 4, 5])
print(s.var(skipna=True))
```

**Output**:

```
2.666667
```

Explanation:

- Non-`NaN` values: `[1, 2, 4, 5]`
- Mean: \(\frac{1 + 2 + 4 + 5}{4} = 3\)
- Variance: \(\frac{(1-3)^2 + (2-3)^2 + (4-3)^2 + (5-3)^2}{4 - 1} = \frac{10}{3} \approx 3.333\)

---

#### **4. DataFrame Example**:

```python
df = pd.DataFrame({
    'age': [21, 25, 62, 43],
    'height': [1.61, 1.87, 1.49, 2.01]
})
print(df.var())
```

**Output**:

```
age       352.916667
height      0.056367
dtype: float64
```

Explanation:

- Variance is computed for each column.

---

#### **5. Population Variance**:

```python
print(df.var(ddof=0))
```

**Output**:

```
age       264.687500
height      0.042275
dtype: float64
```

Explanation:

- Population variance is computed for each column.

---

### **See Also**:

- **`Series.std`**: Computes the standard deviation of the Series.
- **`Series.mean`**: Computes the mean of the Series.
- **`Series.sem`**: Computes the standard error of the mean.

---

### **Summary**:

- Use `Series.var` to compute the variance of a Series.
- Adjust the degrees of freedom using the `ddof` parameter.
- Handle missing values with the `skipna` parameter.
- For DataFrames, the method computes the variance for each column by default.


In [None]:
""" pandas.Series.kurtosis
Series.kurtosis(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Return unbiased kurtosis over requested axis.

Kurtosis obtained using Fisher’s definition of kurtosis (kurtosis of normal == 0.0). Normalized by N-1.

Parameters:
axis{index (0)}
Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Added in version 2.0.0.

skipnabool, default True
Exclude NA/null values when computing the result.

numeric_onlybool, default False
Include only float, int, boolean columns. Not implemented for Series.

**kwargs
Additional keyword arguments to be passed to the function.

Returns:
scalar or scalar """
s = pd.Series([1, 2, 2, 3], index=['cat', 'dog', 'dog', 'mouse'])
s
# With a DataFrame 

df = pd.DataFrame({'a': [1, 2, 2, 3], 'b': [3, 4, 4, 4]},
                  index=['cat', 'dog', 'dog', 'mouse'])
df
#  With axis=None

df.kurt(axis=None).round(6)
# Using axis=1

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [3, 4], 'd': [1, 2]},
                  index=['cat', 'dog'])
df.kurt(axis=1)

The `pandas.Series.kurtosis` method calculates the **kurtosis** of the values in a Series. Kurtosis measures the "tailedness" of the distribution of the data. It indicates whether the data has heavy tails (more outliers) or light tails (fewer outliers) compared to a normal distribution. By default, it uses Fisher's definition of kurtosis, where the kurtosis of a normal distribution is `0.0`.

---

### **Parameters**:

1. **`axis`** (int or str, default: `0`):

   - Axis to compute kurtosis along. For Series, this parameter is unused and defaults to `0`.

2. **`skipna`** (bool, default: `True`):

   - If `True`, excludes `NaN` (missing) values when computing kurtosis.
   - If `False`, the result will be `NaN` if any value in the Series is `NaN`.

3. **`numeric_only`** (bool, default: `False`):

   - If `True`, includes only numeric columns (float, int, boolean). Not implemented for Series.

4. **`**kwargs`\*\*:
   - Additional keyword arguments to be passed to the function.

---

### **Returns**:

- **Scalar**:
  - The kurtosis of the Series.

---

### **Key Notes**:

- **Kurtosis Formula**:
  \[
  \text{Kurtosis} = \frac{\frac{1}{N} \sum*{i=1}^N (x_i - \bar{x})^4}{\left(\frac{1}{N} \sum*{i=1}^N (x_i - \bar{x})^2\right)^2} - 3
  \]

  - \(x_i\): Individual data points.
  - \(\bar{x}\): Mean of the data.
  - \(N\): Number of non-`NaN` values.
  - The `-3` adjustment ensures that the kurtosis of a normal distribution is `0.0`.

- **Interpretation**:

  - **Kurtosis = 0**: The distribution has the same tailedness as a normal distribution (mesokurtic).
  - **Kurtosis > 0**: The distribution has heavier tails than a normal distribution (leptokurtic).
  - **Kurtosis < 0**: The distribution has lighter tails than a normal distribution (platykurtic).

- **Handling Missing Values**:
  - If `skipna=True`, `NaN` values are ignored.
  - If `skipna=False`, the presence of any `NaN` value will result in `NaN` for kurtosis.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 2, 3])
print(s.kurtosis())
```

**Output**:

```
-1.5
```

Explanation:

- The distribution has lighter tails than a normal distribution (platykurtic).

---

#### **2. Handling Missing Values**:

```python
s = pd.Series([1, 2, None, 3])
print(s.kurtosis(skipna=True))
```

**Output**:

```
-1.5
```

Explanation:

- Non-`NaN` values: `[1, 2, 3]`
- The kurtosis is calculated for the non-missing values.

---

#### **3. DataFrame Example**:

```python
df = pd.DataFrame({
    'a': [1, 2, 2, 3],
    'b': [3, 4, 4, 4]
})
print(df.kurtosis())
```

**Output**:

```
a    -1.5
b     4.0
dtype: float64
```

Explanation:

- Column `a` has lighter tails (platykurtic).
- Column `b` has heavier tails (leptokurtic).

---

#### **4. Row-wise Kurtosis**:

```python
df = pd.DataFrame({
    'a': [1, 2],
    'b': [3, 4],
    'c': [3, 4],
    'd': [1, 2]
})
print(df.kurtosis(axis=1))
```

**Output**:

```
0   -6.0
1   -6.0
dtype: float64
```

Explanation:

- Kurtosis is computed row-wise for each row in the DataFrame.

---

#### **5. Handling Non-Numeric Columns**:

```python
df = pd.DataFrame({
    'a': [1, 2, 2, 3],
    'b': ['x', 'y', 'y', 'z']
})
print(df.kurtosis(numeric_only=True))
```

**Output**:

```
a   -1.5
dtype: float64
```

Explanation:

- Only numeric column `a` is included in the calculation.

---

### **See Also**:

- **`Series.skew`**: Computes the skewness of the Series.
- **`Series.mean`**: Computes the mean of the Series.
- **`Series.std`**: Computes the standard deviation of the Series.

---

### **Summary**:

- Use `Series.kurtosis` to compute the kurtosis of a Series.
- Kurtosis measures the tailedness of the data distribution.
- Handle missing values with the `skipna` parameter.
- For DataFrames, the method computes kurtosis for each column by default.


In [None]:
""" 

pandas.Series.unique

Series.unique()[source]
Return unique values of Series object.

Uniques are returned in order of appearance. Hash table-based unique, therefore does NOT sort.

Returns
:
ndarray or ExtensionArray
The unique values returned as a NumPy array. See Notes.


Series.drop_duplicates
Return Series with duplicate values removed.

unique
Top-level unique method for any 1-d array-like object.

Index.unique
Return Index with unique values from an Index object.

Notes

Returns the unique values as a NumPy array. In case of an extension-array backed Series, a new ExtensionArray of that type with just the unique values is returned. This includes

Categorical

Period

Datetime with Timezone

Datetime without Timezone

Timedelta

Interval

Sparse

IntegerNA """
pd.Series([2, 1, 3, 3], name='A').unique()
pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)]).unique()
pd.Series([pd.Timestamp('2016-01-01', tz='US/Eastern')
           for _ in range(3)]).unique()
pd.Series(pd.Categorical(list('baabc'))).unique()
pd.Series(pd.Categorical(list('baabc'), categories=list('abc'),
                         ordered=True)).unique()

The `pandas.Series.unique` method returns the **unique values** in a Series. The unique values are returned in the order of their first occurrence, and the method does not sort the values. It is useful for identifying distinct elements in a Series.

---

### **Returns**:

- **ndarray or ExtensionArray**:
  - The unique values are returned as a NumPy array (for standard data types) or as an ExtensionArray (for specialized data types like `Categorical`, `Datetime`, etc.).

---

### **Key Notes**:

- **Order of Appearance**:

  - Unique values are returned in the order they first appear in the Series.
  - The method does **not** sort the values.

- **Hash Table-Based**:

  - The method uses a hash table to identify unique values, making it efficient for large datasets.

- **Supported Data Types**:
  - Works with standard data types (e.g., integers, floats, strings) as well as specialized types like `Categorical`, `Datetime`, `Timedelta`, etc.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([2, 1, 3, 3], name='A')
print(s.unique())
```

**Output**:

```
array([2, 1, 3])
```

Explanation:

- The unique values are `[2, 1, 3]`, returned in the order of their first occurrence.

---

#### **2. Handling Duplicates**:

```python
s = pd.Series(['apple', 'banana', 'apple', 'orange'])
print(s.unique())
```

**Output**:

```
array(['apple', 'banana', 'orange'], dtype=object)
```

Explanation:

- The unique values are `['apple', 'banana', 'orange']`, returned in the order of their first occurrence.

---

#### **3. Datetime Series**:

```python
s = pd.Series([pd.Timestamp('2016-01-01') for _ in range(3)])
print(s.unique())
```

**Output**:

```
<DatetimeArray>
['2016-01-01 00:00:00']
Length: 1, dtype: datetime64[ns]
```

Explanation:

- The unique value is `2016-01-01`, returned as a `DatetimeArray`.

---

#### **4. Categorical Series**:

```python
s = pd.Series(pd.Categorical(list('baabc')))
print(s.unique())
```

**Output**:

```
['b', 'a', 'c']
Categories (3, object): ['a', 'b', 'c']
```

Explanation:

- The unique values are `['b', 'a', 'c']`, returned in the order of their first occurrence. The categories are preserved.

---

#### **5. Ordered Categorical Series**:

```python
s = pd.Series(pd.Categorical(list('baabc'), categories=list('abc'), ordered=True))
print(s.unique())
```

**Output**:

```
['b', 'a', 'c']
Categories (3, object): ['a' < 'b' < 'c']
```

Explanation:

- The unique values are `['b', 'a', 'c']`, returned in the order of their first occurrence. The categories are ordered as `['a' < 'b' < 'c']`.

---

#### **6. Timezone-Aware Datetime Series**:

```python
s = pd.Series([pd.Timestamp('2016-01-01', tz='US/Eastern') for _ in range(3)])
print(s.unique())
```

**Output**:

```
<DatetimeArray>
['2016-01-01 00:00:00-05:00']
Length: 1, dtype: datetime64[ns, US/Eastern]
```

Explanation:

- The unique value is `2016-01-01 00:00:00-05:00`, returned as a `DatetimeArray` with timezone information.

---

### **See Also**:

- **`Series.drop_duplicates`**: Returns a Series with duplicate values removed.
- **`pandas.unique`**: A top-level function to find unique values in any 1-D array-like object.
- **`Index.unique`**: Returns unique values from an Index object.

---

### **Summary**:

- Use `Series.unique` to extract unique values from a Series.
- Unique values are returned in the order of their first occurrence.
- Works with standard and specialized data types (e.g., `Categorical`, `Datetime`).
- Does **not** sort the values.


In [None]:
""" pandas.Series.nunique
Series.nunique(dropna=True)[source]
Return number of unique elements in the object.

Excludes NA values by default.

Parameters
:
dropna
bool, default True
Don’t include NaN in the count.

Returns
:
int


DataFrame.nunique
Method nunique for DataFrame.

Series.count
Count non-NA/null observations in the Series. """
s = pd.Series([1, 3, 5, 7, 7])
s
s.nunique()

The `pandas.Series.nunique` method calculates the **number of unique values** in a Series. It is useful for quickly determining the count of distinct elements in the data. By default, it excludes `NaN` (missing) values from the count.

---

### **Parameters**:

1. **`dropna`** (bool, default: `True`):
   - If `True`, excludes `NaN` values from the count of unique values.
   - If `False`, includes `NaN` as a unique value in the count.

---

### **Returns**:

- **int**:
  - The number of unique values in the Series.

---

### **Key Notes**:

- **Handling Missing Values**:

  - If `dropna=True`, `NaN` values are ignored.
  - If `dropna=False`, `NaN` is counted as a unique value.

- **Efficiency**:
  - The method is efficient and uses a hash table to count unique values.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 3, 5, 7, 7])
print(s.nunique())
```

**Output**:

```
4
```

Explanation:

- The unique values are `[1, 3, 5, 7]`, so the count is `4`.

---

#### **2. Including `NaN` Values**:

```python
s = pd.Series([1, 3, 5, 7, 7, None])
print(s.nunique(dropna=False))
```

**Output**:

```
5
```

Explanation:

- The unique values are `[1, 3, 5, 7, NaN]`, so the count is `5`.

---

#### **3. Excluding `NaN` Values**:

```python
s = pd.Series([1, 3, 5, 7, 7, None])
print(s.nunique(dropna=True))
```

**Output**:

```
4
```

Explanation:

- The unique values are `[1, 3, 5, 7]`, so the count is `4`. The `NaN` value is excluded.

---

#### **4. String Series**:

```python
s = pd.Series(['apple', 'banana', 'apple', 'orange'])
print(s.nunique())
```

**Output**:

```
3
```

Explanation:

- The unique values are `['apple', 'banana', 'orange']`, so the count is `3`.

---

#### **5. Categorical Series**:

```python
s = pd.Series(pd.Categorical(['a', 'b', 'a', 'c']))
print(s.nunique())
```

**Output**:

```
3
```

Explanation:

- The unique values are `['a', 'b', 'c']`, so the count is `3`.

---

### **See Also**:

- **`DataFrame.nunique`**: Computes the number of unique values for each column in a DataFrame.
- **`Series.count`**: Counts the number of non-NA/null observations in the Series.
- **`Series.unique`**: Returns the unique values in the Series.

---

### **Summary**:

- Use `Series.nunique` to count the number of unique values in a Series.
- Control the inclusion of `NaN` values with the `dropna` parameter.
- Efficiently computes the count using a hash table.


In [None]:
""" pandas.Series.is_unique
property Series.is_unique[source]
Return boolean if values in the object are unique.

Returns
:
bool """
s = pd.Series([1, 2, 3])
s.is_unique

s = pd.Series([1, 2, 3, 1])
s.is_unique

The `pandas.Series.is_unique` property checks whether all values in a Series are **unique**. It returns a boolean value (`True` if all values are unique, `False` otherwise). This property is useful for quickly verifying the uniqueness of values in a Series.

---

### **Returns**:

- **bool**:
  - `True`: If all values in the Series are unique.
  - `False`: If there are duplicate values in the Series.

---

### **Key Notes**:

- **Efficiency**:
  - The property uses a hash table to check for uniqueness, making it efficient for large datasets.
- **Handling Missing Values**:
  - `NaN` values are treated as unique. If there are multiple `NaN` values, the property will return `False`.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.is_unique)
```

**Output**:

```
True
```

Explanation:

- All values in the Series are unique, so the result is `True`.

---

#### **2. Series with Duplicates**:

```python
s = pd.Series([1, 2, 3, 1])
print(s.is_unique)
```

**Output**:

```
False
```

Explanation:

- The value `1` appears twice, so the result is `False`.

---

#### **3. Series with `NaN` Values**:

```python
s = pd.Series([1, 2, 3, None])
print(s.is_unique)
```

**Output**:

```
True
```

Explanation:

- All values, including `NaN`, are unique, so the result is `True`.

---

#### **4. Series with Duplicate `NaN` Values**:

```python
s = pd.Series([1, 2, 3, None, None])
print(s.is_unique)
```

**Output**:

```
False
```

Explanation:

- The `NaN` value appears twice, so the result is `False`.

---

#### **5. String Series**:

```python
s = pd.Series(['apple', 'banana', 'orange'])
print(s.is_unique)
```

**Output**:

```
True
```

Explanation:

- All values in the Series are unique, so the result is `True`.

---

#### **6. Categorical Series**:

```python
s = pd.Series(pd.Categorical(['a', 'b', 'c']))
print(s.is_unique)
```

**Output**:

```
True
```

Explanation:

- All values in the Series are unique, so the result is `True`.

---

### **See Also**:

- **`Series.unique`**: Returns the unique values in the Series.
- **`Series.nunique`**: Counts the number of unique values in the Series.
- **`Series.drop_duplicates`**: Returns a Series with duplicate values removed.

---

### **Summary**:

- Use `Series.is_unique` to check if all values in a Series are unique.
- Returns `True` if all values are unique, otherwise `False`.
- Efficiently checks uniqueness using a hash table.
- Handles `NaN` values appropriately.


In [None]:
""" pandas.Series.is_monotonic_increasing


property Series.is_monotonic_increasing
Return boolean if values in the object are monotonically increasing.

Returns
:
bool """
s = pd.Series([1, 2, 2])
s.is_monotonic_increasing

s = pd.Series([3, 2, 1])
s.is_monotonic_increasing

The `pandas.Series.is_monotonic_increasing` property checks whether the values in a Series are **monotonically increasing**. A Series is monotonically increasing if each value is greater than or equal to the previous value. This property is useful for verifying trends or ordered data.

---

### **Returns**:

- **bool**:
  - `True`: If the values in the Series are monotonically increasing.
  - `False`: If the values are not monotonically increasing.

---

### **Key Notes**:

- **Definition**:

  - A Series is monotonically increasing if for all \(i \leq j\), \(s_i \leq s_j\).
  - Equal values are allowed (e.g., `[1, 2, 2]` is considered monotonically increasing).

- **Handling Missing Values**:

  - If the Series contains `NaN` values, the property will return `False` because `NaN` cannot be compared.

- **Efficiency**:
  - The property is efficient and checks the condition in a single pass through the Series.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([1, 2, 3])
print(s.is_monotonic_increasing)
```

**Output**:

```
True
```

Explanation:

- The values are strictly increasing, so the result is `True`.

---

#### **2. Series with Equal Values**:

```python
s = pd.Series([1, 2, 2])
print(s.is_monotonic_increasing)
```

**Output**:

```
True
```

Explanation:

- The values are non-decreasing (equal values are allowed), so the result is `True`.

---

#### **3. Series That Is Not Monotonic**:

```python
s = pd.Series([3, 2, 1])
print(s.is_monotonic_increasing)
```

**Output**:

```
False
```

Explanation:

- The values are decreasing, so the result is `False`.

---

#### **4. Series with `NaN` Values**:

```python
s = pd.Series([1, 2, None, 3])
print(s.is_monotonic_increasing)
```

**Output**:

```
False
```

Explanation:

- The presence of `NaN` makes the Series non-monotonic, so the result is `False`.

---

#### **5. String Series**:

```python
s = pd.Series(['a', 'b', 'c'])
print(s.is_monotonic_increasing)
```

**Output**:

```
True
```

Explanation:

- The values are in alphabetical order, so the result is `True`.

---

#### **6. Series with Mixed Order**:

```python
s = pd.Series([1, 3, 2])
print(s.is_monotonic_increasing)
```

**Output**:

```
False
```

Explanation:

- The values are not in increasing order, so the result is `False`.

---

### **See Also**:

- **`Series.is_monotonic_decreasing`**: Checks if the values in the Series are monotonically decreasing.
- **`Series.is_monotonic`**: Checks if the values are either monotonically increasing or decreasing.
- **`Series.sort_values`**: Sorts the values in the Series.

---

### **Summary**:

- Use `Series.is_monotonic_increasing` to check if the values in a Series are monotonically increasing.
- Returns `True` if the values are non-decreasing, otherwise `False`.
- Handles `NaN` values by returning `False`.
- Works with numeric, string, and other comparable data types.


In [None]:
""" pandas.Series.is_monotonic_decreasing

property Series.is_monotonic_decreasing
Return boolean if values in the object are monotonically decreasing.

Returns
:
bool """
s = pd.Series([3, 2, 2, 1])
s.is_monotonic_decreasing
s = pd.Series([1, 2, 3])
s.is_monotonic_decreasing


The `pandas.Series.is_monotonic_decreasing` property checks whether the values in a Series are **monotonically decreasing**. A Series is monotonically decreasing if each value is less than or equal to the previous value. This property is useful for verifying trends or ordered data.

---

### **Returns**:

- **bool**:
  - `True`: If the values in the Series are monotonically decreasing.
  - `False`: If the values are not monotonically decreasing.

---

### **Key Notes**:

- **Definition**:

  - A Series is monotonically decreasing if for all \(i \leq j\), \(s_i \geq s_j\).
  - Equal values are allowed (e.g., `[3, 2, 2]` is considered monotonically decreasing).

- **Handling Missing Values**:

  - If the Series contains `NaN` values, the property will return `False` because `NaN` cannot be compared.

- **Efficiency**:
  - The property is efficient and checks the condition in a single pass through the Series.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([3, 2, 1])
print(s.is_monotonic_decreasing)
```

**Output**:

```
True
```

Explanation:

- The values are strictly decreasing, so the result is `True`.

---

#### **2. Series with Equal Values**:

```python
s = pd.Series([3, 2, 2, 1])
print(s.is_monotonic_decreasing)
```

**Output**:

```
True
```

Explanation:

- The values are non-increasing (equal values are allowed), so the result is `True`.

---

#### **3. Series That Is Not Monotonic**:

```python
s = pd.Series([1, 2, 3])
print(s.is_monotonic_decreasing)
```

**Output**:

```
False
```

Explanation:

- The values are increasing, so the result is `False`.

---

#### **4. Series with `NaN` Values**:

```python
s = pd.Series([3, 2, None, 1])
print(s.is_monotonic_decreasing)
```

**Output**:

```
False
```

Explanation:

- The presence of `NaN` makes the Series non-monotonic, so the result is `False`.

---

#### **5. String Series**:

```python
s = pd.Series(['c', 'b', 'a'])
print(s.is_monotonic_decreasing)
```

**Output**:

```
True
```

Explanation:

- The values are in reverse alphabetical order, so the result is `True`.

---

#### **6. Series with Mixed Order**:

```python
s = pd.Series([3, 1, 2])
print(s.is_monotonic_decreasing)
```

**Output**:

```
False
```

Explanation:

- The values are not in decreasing order, so the result is `False`.

---

### **See Also**:

- **`Series.is_monotonic_increasing`**: Checks if the values in the Series are monotonically increasing.
- **`Series.is_monotonic`**: Checks if the values are either monotonically increasing or decreasing.
- **`Series.sort_values`**: Sorts the values in the Series.

---

### **Summary**:

- Use `Series.is_monotonic_decreasing` to check if the values in a Series are monotonically decreasing.
- Returns `True` if the values are non-increasing, otherwise `False`.
- Handles `NaN` values by returning `False`.
- Works with numeric, string, and other comparable data types.


In [None]:
""" 
pandas.Series.value_counts

Series.value_counts(normalize=False, sort=True, ascending=False, bins=None, dropna=True)

Return a Series containing counts of unique values.

The resulting object will be in descending order so that the first element is the most frequently-occurring element. Excludes NA values by default.

Parameters
:
normalize
bool, default False
If True then the object returned will contain the relative frequencies of the unique values.

sort
bool, default True
Sort by frequencies when True. Preserve the order of the data when False.

ascending
bool, default False
Sort in ascending order.

bins
int, optional
Rather than count values, group them into half-open bins, a convenience for pd.cut, only works with numeric data.

dropna
bool, default True
Don’t include counts of NaN.

Returns
:
Series """
index = pd.Index([3, 1, 2, 3, 4, np.nan])
index.value_counts()

In [None]:
# With normalize set to True, returns the relative frequency by dividing all values by the sum of values.

s = pd.Series([3, 1, 2, 3, 4, np.nan])
s.value_counts(normalize=True)

In [None]:
# bins

# Bins can be useful for going from a continuous variable to a categorical variable; instead of counting unique apparitions of values, divide the index in the specified number of half-open bins.

s.value_counts(bins=3)

In [None]:
# dropna

# With dropna set to False we can also see NaN index values.

s.value_counts(dropna=False)

The `pandas.Series.value_counts` method returns a **Series containing the counts of unique values** in the original Series. It is a powerful tool for summarizing categorical or discrete data. By default, it excludes `NaN` values and sorts the results in descending order of frequency.

---

### **Parameters**:

1. **`normalize`** (bool, default: `False`):

   - If `True`, returns the relative frequencies (proportions) of the unique values instead of counts.
   - If `False`, returns the absolute counts.

2. **`sort`** (bool, default: `True`):

   - If `True`, sorts the result by frequency (descending by default).
   - If `False`, preserves the order of the unique values as they appear in the data.

3. **`ascending`** (bool, default: `False`):

   - If `True`, sorts the result in ascending order of frequency.
   - If `False`, sorts the result in descending order of frequency.

4. **`bins`** (int, optional):

   - If specified, groups numeric data into the specified number of half-open bins (intervals).
   - Only works with numeric data.

5. **`dropna`** (bool, default: `True`):
   - If `True`, excludes `NaN` values from the count.
   - If `False`, includes `NaN` as a unique value in the count.

---

### **Returns**:

- **Series**:
  - A Series with unique values as the index and their counts (or proportions) as the values.

---

### **Key Notes**:

- **Default Behavior**:
  - Counts unique values, excludes `NaN`, and sorts the result in descending order of frequency.
- **Handling Missing Values**:
  - Use `dropna=False` to include `NaN` in the count.
- **Binning**:
  - The `bins` parameter is useful for converting continuous numeric data into categorical intervals.
- **Normalization**:
  - Use `normalize=True` to get relative frequencies (proportions) instead of absolute counts.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series([3, 1, 2, 3, 4, None])
print(s.value_counts())
```

**Output**:

```
3.0    2
1.0    1
2.0    1
4.0    1
Name: count, dtype: int64
```

Explanation:

- The value `3.0` appears twice, and the other values appear once. `NaN` is excluded by default.

---

#### **2. Including `NaN` Values**:

```python
print(s.value_counts(dropna=False))
```

**Output**:

```
3.0    2
1.0    1
2.0    1
4.0    1
NaN    1
Name: count, dtype: int64
```

Explanation:

- `NaN` is included in the count as a unique value.

---

#### **3. Relative Frequencies**:

```python
print(s.value_counts(normalize=True))
```

**Output**:

```
3.0    0.4
1.0    0.2
2.0    0.2
4.0    0.2
Name: proportion, dtype: float64
```

Explanation:

- The relative frequencies are calculated by dividing each count by the total number of non-`NaN` values.

---

#### **4. Sorting in Ascending Order**:

```python
print(s.value_counts(ascending=True))
```

**Output**:

```
1.0    1
2.0    1
4.0    1
3.0    2
Name: count, dtype: int64
```

Explanation:

- The result is sorted in ascending order of frequency.

---

#### **5. Preserving Order of Appearance**:

```python
print(s.value_counts(sort=False))
```

**Output**:

```
3.0    2
1.0    1
2.0    1
4.0    1
Name: count, dtype: int64
```

Explanation:

- The order of unique values is preserved as they appear in the Series.

---

#### **6. Binning Numeric Data**:

```python
s = pd.Series([1.2, 2.5, 3.7, 4.1, 5.0])
print(s.value_counts(bins=3))
```

**Output**:

```
(0.999, 2.333]    2
(2.333, 3.667]    1
(3.667, 5.0]      2
Name: count, dtype: int64
```

Explanation:

- The numeric data is divided into 3 bins, and the counts for each bin are returned.

---

#### **7. String Series**:

```python
s = pd.Series(['apple', 'banana', 'apple', 'orange'])
print(s.value_counts())
```

**Output**:

```
apple     2
banana    1
orange    1
Name: count, dtype: int64
```

Explanation:

- The value `'apple'` appears twice, and the other values appear once.

---

### **See Also**:

- **`Series.count`**: Counts the number of non-NA/null observations in the Series.
- **`DataFrame.value_counts`**: Equivalent method for DataFrames.
- **`pd.cut`**: Bins numeric data into intervals.

---

### **Summary**:

- Use `Series.value_counts` to count the occurrences of unique values in a Series.
- Control the inclusion of `NaN` values with the `dropna` parameter.
- Use `normalize=True` to get relative frequencies instead of counts.
- Use `bins` to group numeric data into intervals.
- Sort the results by frequency or preserve the order of appearance.


In [None]:
""" pandas.Series.align
Series.align(other, join='outer', axis=None, level=None, copy=None, fill_value=None, method=<no_default>, limit=<no_default>, fill_axis=<no_default>, broadcast_axis=<no_default>)



Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

Parameters:
otherDataFrame or Series
join{‘outer’, ‘inner’, ‘left’, ‘right’}, default ‘outer’
Type of alignment to be performed.

left: use only keys from left frame, preserve key order.

right: use only keys from right frame, preserve key order.

outer: use union of keys from both frames, sort keys lexicographically.

inner: use intersection of keys from both frames, preserve the order of the left keys.

axisallowed axis of the other object, default None
Align on index (0), columns (1), or both (None).

levelint or level name, default None
Broadcast across a level, matching Index values on the passed MultiIndex level.

copybool, default True
Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

fill_valuescalar, default np.nan
Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

method{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None
Method to use for filling holes in reindexed Series:

pad / ffill: propagate last valid observation forward to next valid.

backfill / bfill: use NEXT valid observation to fill gap.

Deprecated since version 2.1.

limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None.

Deprecated since version 2.1.

fill_axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame, default 0
Filling axis, method and limit.

Deprecated since version 2.1.

broadcast_axis{0 or ‘index’} for Series, {0 or ‘index’, 1 or ‘columns’} for DataFrame, default None
Broadcast values along this axis, if aligning two objects of different dimensions.

Deprecated since version 2.1.

Returns:
tuple of (Series/DataFrame, type of other)
Aligned objects. """
df = pd.DataFrame(
    [[1, 2, 3, 4], [6, 7, 8, 9]], columns=["D", "B", "E", "A"], index=[1, 2]
)
other = pd.DataFrame(
    [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
    columns=["A", "B", "C", "D"],
    index=[2, 3, 4],
)
df
# Align on columns:

left, right = df.align(other, join="outer", axis=1)
left

In [None]:
# We can also align on the index:

left, right = df.align(other, join="outer", axis=0)
left
right

In [None]:
# Finally, the default axis=None will align on both index and columns:

left, right = df.align(other, join="outer", axis=None)
left
right

The `pandas.Series.align` method aligns two objects (Series or DataFrames) along their axes using a specified join method. This is useful for ensuring that two datasets have the same index or columns before performing operations like arithmetic or merging. The method returns a tuple of aligned objects.

---

### **Parameters**:

1. **`other`** (DataFrame or Series):

   - The object to align with the current Series or DataFrame.

2. **`join`** ({‘outer’, ‘inner’, ‘left’, ‘right’}, default: `'outer'`):

   - Specifies the type of alignment:
     - `'outer'`: Use the union of keys from both objects.
     - `'inner'`: Use the intersection of keys from both objects.
     - `'left'`: Use only keys from the left object.
     - `'right'`: Use only keys from the right object.

3. **`axis`** (int or str, default: `None`):

   - Axis to align along:
     - `0` or `'index'`: Align on the index.
     - `1` or `'columns'`: Align on the columns.
     - `None`: Align on both index and columns.

4. **`level`** (int or level name, default: `None`):

   - If the objects are MultiIndex, align on the specified level.

5. **`copy`** (bool, default: `True`):

   - If `True`, always returns new objects. If `False` and no reindexing is required, the original objects are returned.
   - **Note**: The `copy` keyword will be deprecated in future versions of pandas due to the introduction of Copy-on-Write (CoW).

6. **`fill_value`** (scalar, default: `np.nan`):

   - Value to use for missing values after alignment.

7. **`method`** ({‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default: `None`):

   - Method to fill missing values:
     - `'pad'` or `'ffill'`: Propagate the last valid observation forward.
     - `'backfill'` or `'bfill'`: Use the next valid observation to fill gaps.
   - **Deprecated since version 2.1**.

8. **`limit`** (int, default: `None`):

   - Maximum number of consecutive `NaN` values to fill if `method` is specified.
   - **Deprecated since version 2.1**.

9. **`fill_axis`** ({0 or ‘index’, 1 or ‘columns’}, default: `0`):

   - Axis to fill missing values along.
   - **Deprecated since version 2.1**.

10. **`broadcast_axis`** ({0 or ‘index’, 1 or ‘columns’}, default: `None`):
    - Axis to broadcast values along if aligning objects of different dimensions.
    - **Deprecated since version 2.1**.

---

### **Returns**:

- **tuple**:
  - A tuple of two aligned objects (e.g., `(left, right)`), where `left` is the aligned version of the original object and `right` is the aligned version of `other`.

---

### **Key Notes**:

- **Alignment**:
  - The method ensures that the two objects have the same index, columns, or both, depending on the `axis` parameter.
- **Join Types**:
  - `'outer'`: Includes all keys from both objects.
  - `'inner'`: Includes only keys present in both objects.
  - `'left'`: Includes only keys from the left object.
  - `'right'`: Includes only keys from the right object.
- **Handling Missing Values**:
  - Missing values introduced during alignment are filled with `fill_value` (default: `NaN`).

---

### **Examples**:

#### **1. Aligning on Columns**:

```python
import pandas as pd

df = pd.DataFrame(
    [[1, 2, 3, 4], [6, 7, 8, 9]],
    columns=["D", "B", "E", "A"],
    index=[1, 2]
)
other = pd.DataFrame(
    [[10, 20, 30, 40], [60, 70, 80, 90], [600, 700, 800, 900]],
    columns=["A", "B", "C", "D"],
    index=[2, 3, 4]
)

left, right = df.align(other, join="outer", axis=1)
print(left)
print(right)
```

**Output**:

```
   A  B   C  D  E
1  4  2 NaN  1  3
2  9  7 NaN  6  8

    A    B    C    D   E
2   10   20   30   40 NaN
3   60   70   80   90 NaN
4  600  700  800  900 NaN
```

Explanation:

- The columns of both DataFrames are aligned using an outer join. Missing values are filled with `NaN`.

---

#### **2. Aligning on Index**:

```python
left, right = df.align(other, join="outer", axis=0)
print(left)
print(right)
```

**Output**:

```
    D    B    E    A
1  1.0  2.0  3.0  4.0
2  6.0  7.0  8.0  9.0
3  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN

    A      B      C      D
1    NaN    NaN    NaN    NaN
2   10.0   20.0   30.0   40.0
3   60.0   70.0   80.0   90.0
4  600.0  700.0  800.0  900.0
```

Explanation:

- The indices of both DataFrames are aligned using an outer join. Missing values are filled with `NaN`.

---

#### **3. Aligning on Both Index and Columns**:

```python
left, right = df.align(other, join="outer", axis=None)
print(left)
print(right)
```

**Output**:

```
     A    B   C    D    E
1  4.0  2.0 NaN  1.0  3.0
2  9.0  7.0 NaN  6.0  8.0
3  NaN  NaN NaN  NaN  NaN
4  NaN  NaN NaN  NaN  NaN

       A      B      C      D   E
1    NaN    NaN    NaN    NaN NaN
2   10.0   20.0   30.0   40.0 NaN
3   60.0   70.0   80.0   90.0 NaN
4  600.0  700.0  800.0  900.0 NaN
```

Explanation:

- Both the index and columns are aligned using an outer join. Missing values are filled with `NaN`.

---

#### **4. Using `fill_value`**:

```python
left, right = df.align(other, join="outer", axis=1, fill_value=0)
print(left)
print(right)
```

**Output**:

```
   A  B  C  D  E
1  4  2  0  1  3
2  9  7  0  6  8

    A    B    C    D  E
2   10   20   30   40  0
3   60   70   80   90  0
4  600  700  800  900  0
```

Explanation:

- Missing values are filled with `0` instead of `NaN`.

---

### **See Also**:

- **`DataFrame.align`**: Equivalent method for DataFrames.
- **`Series.reindex`**: Reindexes a Series to match a new index.
- **`DataFrame.reindex`**: Reindexes a DataFrame to match new indices or columns.

---

### **Summary**:

- Use `Series.align` to align two objects (Series or DataFrames) along their axes.
- Specify the join type (`'outer'`, `'inner'`, `'left'`, `'right'`) and the axis (`0`, `1`, or `None`).
- Handle missing values with the `fill_value` parameter.
- Returns a tuple of aligned objects.


In [None]:
""" pandas.Series.case_when
Series.case_when(caselist)[source]
Replace values where the conditions are True.

Parameters:

caselist : A list of tuples of conditions and expected replacements
Takes the form: (condition0, replacement0), (condition1, replacement1), … . condition should be a 1-D boolean array-like object or a callable. If condition is a callable, it is computed on the Series and should return a boolean Series or array. The callable must not change the input Series (though pandas doesn`t check it). replacement should be a 1-D array-like object, a scalar or a callable. If replacement is a callable, it is computed on the Series and should return a scalar or Series. The callable must not change the input Series (though pandas doesn`t check it).

Added in version 2.2.0.

Returns:
Series


Series.mask
Replace values where the condition is True. """
c = pd.Series([6, 7, 8, 9], name='c')
a = pd.Series([0, 0, 1, 2])
b = pd.Series([0, 3, 4, 5])
c.case_when(caselist=[(a.gt(0), a),  # condition, replacement
                      (b.gt(0), b)])

The `pandas.Series.case_when` method, introduced in **pandas 2.2.0**, allows you to replace values in a Series based on specified conditions. It is similar to SQL's `CASE WHEN` statement or Python's `if-elif-else` logic. The method takes a list of tuples, where each tuple contains a condition and a replacement value. The conditions are evaluated in order, and the first condition that evaluates to `True` determines the replacement value for that element.

---

### **Parameters**:

1. **`caselist`** (list of tuples):
   - A list of tuples of the form `(condition, replacement)`.
   - **`condition`**: A 1-D boolean array-like object or a callable. If it is a callable, it is applied to the Series and must return a boolean Series or array.
   - **`replacement`**: A 1-D array-like object, a scalar, or a callable. If it is a callable, it is applied to the Series and must return a scalar or Series.

---

### **Returns**:

- **Series**:
  - A new Series with values replaced according to the conditions specified in `caselist`.

---

### **Key Notes**:

- **Order of Evaluation**:
  - Conditions are evaluated in the order they are provided. The first condition that evaluates to `True` determines the replacement value.
- **Callable Conditions and Replacements**:
  - Both `condition` and `replacement` can be callable functions. These functions are applied to the Series and must not modify the input Series.
- **Default Behavior**:
  - If no condition evaluates to `True` for a particular element, the original value is retained.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

c = pd.Series([6, 7, 8, 9], name='c')
a = pd.Series([0, 0, 1, 2])
b = pd.Series([0, 3, 4, 5])

result = c.case_when(caselist=[
    (a.gt(0), a),  # Replace with `a` where `a > 0`
    (b.gt(0), b)   # Replace with `b` where `b > 0`
])
print(result)
```

**Output**:

```
0    6
1    3
2    1
3    2
Name: c, dtype: int64
```

Explanation:

- For the first element (`6`), neither `a > 0` nor `b > 0` is `True`, so the original value `6` is retained.
- For the second element (`7`), `b > 0` is `True`, so it is replaced with `3`.
- For the third element (`8`), `a > 0` is `True`, so it is replaced with `1`.
- For the fourth element (`9`), `a > 0` is `True`, so it is replaced with `2`.

---

#### **2. Using Callable Conditions and Replacements**:

```python
c = pd.Series([10, 20, 30, 40])

result = c.case_when(caselist=[
    (lambda x: x > 25, lambda x: x * 2),  # Replace with `x * 2` where `x > 25`
    (lambda x: x < 15, lambda x: x + 100) # Replace with `x + 100` where `x < 15`
])
print(result)
```

**Output**:

```
0    110
1     20
2     60
3     80
dtype: int64
```

Explanation:

- For the first element (`10`), `x < 15` is `True`, so it is replaced with `10 + 100 = 110`.
- For the second element (`20`), neither condition is `True`, so the original value `20` is retained.
- For the third and fourth elements (`30` and `40`), `x > 25` is `True`, so they are replaced with `30 * 2 = 60` and `40 * 2 = 80`, respectively.

---

#### **3. Using Scalar Replacements**:

```python
c = pd.Series([1, 2, 3, 4])

result = c.case_when(caselist=[
    (c > 2, 100),  # Replace with `100` where `c > 2`
    (c < 2, 200)   # Replace with `200` where `c < 2`
])
print(result)
```

**Output**:

```
0    200
1      2
2    100
3    100
dtype: int64
```

Explanation:

- For the first element (`1`), `c < 2` is `True`, so it is replaced with `200`.
- For the second element (`2`), neither condition is `True`, so the original value `2` is retained.
- For the third and fourth elements (`3` and `4`), `c > 2` is `True`, so they are replaced with `100`.

---

#### **4. Retaining Original Values**:

```python
c = pd.Series([5, 10, 15, 20])

result = c.case_when(caselist=[
    (c > 20, 100)  # Replace with `100` where `c > 20`
])
print(result)
```

**Output**:

```
0     5
1    10
2    15
3    20
dtype: int64
```

Explanation:

- None of the elements satisfy `c > 20`, so all original values are retained.

---

### **See Also**:

- **`Series.mask`**: Replace values where a condition is `True`.
- **`Series.where`**: Replace values where a condition is `False`.
- **`numpy.select`**: Similar functionality for NumPy arrays.

---

### **Summary**:

- Use `Series.case_when` to replace values in a Series based on conditions.
- Conditions and replacements can be specified as arrays, scalars, or callable functions.
- The first condition that evaluates to `True` determines the replacement value.
- If no condition is `True`, the original value is retained.


In [None]:
s = pd.Series(data=np.arange(3), index=['A', 'B', 'C'])
s
# Drop labels B en C

s.drop(labels=['B', 'C'])
# Drop 2nd level label in MultiIndex Series

midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
                             ['speed', 'weight', 'length']],
                     codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
                            [0, 1, 2, 0, 1, 2, 0, 1, 2]])
s = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3],
              index=midx)
s
s.drop(labels='weight', level=1)

The `pandas.Series.drop` method removes specified index labels from a Series. It is useful for filtering out unwanted data based on the index. The method returns a new Series with the specified labels removed, unless `inplace=True` is specified, in which case the operation is performed in-place.

---

### **Parameters**:

1. **`labels`** (single label or list-like):

   - Index labels to drop. Can be a single label or a list of labels.

2. **`axis`** ({0 or ‘index’}, default: `0`):

   - Unused for Series. Included for compatibility with DataFrame.

3. **`index`** (single label or list-like):

   - Alias for `labels`. Redundant for Series but can be used instead of `labels`.

4. **`columns`** (single label or list-like):

   - Unused for Series. Included for compatibility with DataFrame.

5. **`level`** (int or level name, optional):

   - For MultiIndex Series, specifies the level from which to drop labels.

6. **`inplace`** (bool, default: `False`):

   - If `True`, performs the operation in-place and returns `None`.
   - If `False`, returns a new Series with the specified labels removed.

7. **`errors`** ({‘ignore’, ‘raise’}, default: `'raise'`):
   - If `'raise'`, raises a `KeyError` if any of the labels are not found in the index.
   - If `'ignore'`, suppresses the error and only drops existing labels.

---

### **Returns**:

- **Series or None**:
  - If `inplace=False`, returns a new Series with the specified labels removed.
  - If `inplace=True`, returns `None` and modifies the original Series.

---

### **Key Notes**:

- **Handling Missing Labels**:
  - If `errors='raise'` (default), a `KeyError` is raised if any of the specified labels are not found in the index.
  - If `errors='ignore'`, missing labels are ignored, and only existing labels are dropped.
- **MultiIndex Series**:
  - Use the `level` parameter to specify the level from which to drop labels in a MultiIndex Series.
- **In-Place Operation**:
  - Use `inplace=True` to modify the original Series instead of returning a new one.

---

### **Examples**:

#### **1. Basic Usage**:

```python
import pandas as pd

s = pd.Series(data=[0, 1, 2], index=['A', 'B', 'C'])
print(s.drop(labels=['B', 'C']))
```

**Output**:

```
A    0
dtype: int64
```

Explanation:

- The labels `'B'` and `'C'` are dropped, leaving only `'A'`.

---

#### **2. Using `index` Parameter**:

```python
print(s.drop(index=['B', 'C']))
```

**Output**:

```
A    0
dtype: int64
```

Explanation:

- The `index` parameter is an alias for `labels` and behaves the same way.

---

#### **3. Handling Missing Labels**:

```python
print(s.drop(labels=['B', 'D'], errors='ignore'))
```

**Output**:

```
A    0
C    2
dtype: int64
```

Explanation:

- The label `'D'` does not exist in the index, but since `errors='ignore'`, it is ignored, and only `'B'` is dropped.

---

#### **4. In-Place Operation**:

```python
s.drop(labels=['B', 'C'], inplace=True)
print(s)
```

**Output**:

```
A    0
dtype: int64
```

Explanation:

- The original Series is modified in-place, and the method returns `None`.

---

#### **5. MultiIndex Series**:

```python
midx = pd.MultiIndex(levels=[['llama', 'cow', 'falcon'],
                             ['speed', 'weight', 'length']],
                     codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2],
                            [0, 1, 2, 0, 1, 2, 0, 1, 2]])
s = pd.Series([45, 200, 1.2, 30, 250, 1.5, 320, 1, 0.3], index=midx)
print(s.drop(labels='weight', level=1))
```

**Output**:

```
llama   speed      45.0
        length      1.2
cow     speed      30.0
        length      1.5
falcon  speed     320.0
        length      0.3
dtype: float64
```

Explanation:

- The label `'weight'` is dropped from the second level of the MultiIndex.

---

### **See Also**:

- **`Series.reindex`**: Return a Series with specified index labels.
- **`Series.dropna`**: Remove missing values from a Series.
- **`Series.drop_duplicates`**: Remove duplicate values from a Series.
- **`DataFrame.drop`**: Drop specified labels from rows or columns in a DataFrame.

---

### **Summary**:

- Use `Series.drop` to remove specified index labels from a Series.
- Specify labels to drop using the `labels` or `index` parameter.
- Use `errors='ignore'` to suppress errors for missing labels.
- Use `inplace=True` to modify the original Series.
- For MultiIndex Series, use the `level` parameter to specify the level from which to drop labels.


In [None]:
""" pandas.Series.droplevel
Series.droplevel(level, axis=0)[source]
Return Series/DataFrame with requested index / column level(s) removed.

Parameters:
levelint, str, or list-like
If a string is given, must be the name of a level If list-like, elements must be names or positional indexes of levels.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the level(s) is removed:

0 or ‘index’: remove level(s) in column.

1 or ‘columns’: remove level(s) in row.

For Series this parameter is unused and defaults to 0.

Returns:
Series/DataFrame
Series/DataFrame with requested index / column level(s) removed. """
df = pd.DataFrame([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
]).set_index([0, 1]).rename_axis(['a', 'b'])

df.columns = pd.MultiIndex.from_tuples([
    ('c', 'e'), ('d', 'f')
], names=['level_1', 'level_2'])
df

df.droplevel('a')

df.droplevel('level_2', axis=1)

The `pandas.Series.droplevel` method is used to remove one or more levels from a **MultiIndex** in a Series or DataFrame. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.droplevel(level, axis=0)
```

---

### **Parameters**

1. **`level`** : `int`, `str`, or `list-like`

   - Specifies the level(s) to remove.
   - If a **string** is provided, it must be the name of a level.
   - If **list-like**, elements must be names or positional indexes of levels.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `0`
   - Specifies the axis along which the level(s) is removed:
     - `0` or `'index'`: Remove level(s) from the **index**.
     - `1` or `'columns'`: Remove level(s) from the **columns**.
   - For **Series**, this parameter is unused and defaults to `0`.

---

### **Returns**

- **Series or DataFrame**:
  - The object with the requested index or column level(s) removed.

---

### **Examples**

#### Example 1: Dropping a Level from a MultiIndex in a Series

```python
import pandas as pd

# Create a Series with a MultiIndex
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)], names=['group', 'number'])
s = pd.Series([10, 20, 30], index=index)

print("Original Series:")
print(s)

# Drop the 'group' level
result = s.droplevel('group')
print("\nSeries after dropping 'group' level:")
print(result)
```

**Output:**

```
Original Series:
group  number
A      1         10
       2         20
B      1         30
dtype: int64

Series after dropping 'group' level:
number
1    10
2    20
1    30
dtype: int64
```

---

#### Example 2: Dropping a Level from a MultiIndex in a DataFrame (Index)

```python
# Create a DataFrame with a MultiIndex
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)], names=['group', 'number'])
df = pd.DataFrame({'value': [10, 20, 30]}, index=index)

print("Original DataFrame:")
print(df)

# Drop the 'group' level from the index
result = df.droplevel('group')
print("\nDataFrame after dropping 'group' level:")
print(result)
```

**Output:**

```
Original DataFrame:
           value
group number
A     1        10
      2        20
B     1        30

DataFrame after dropping 'group' level:
         value
number
1          10
2          20
1          30
```

---

#### Example 3: Dropping a Level from a MultiIndex in a DataFrame (Columns)

```python
# Create a DataFrame with MultiIndex columns
columns = pd.MultiIndex.from_tuples([('c', 'e'), ('d', 'f')], names=['level_1', 'level_2'])
df = pd.DataFrame([[1, 2], [3, 4]], columns=columns)

print("Original DataFrame:")
print(df)

# Drop the 'level_2' level from the columns
result = df.droplevel('level_2', axis=1)
print("\nDataFrame after dropping 'level_2' level:")
print(result)
```

**Output:**

```
Original DataFrame:
level_1   c   d
level_2   e   f
0         1   2
1         3   4

DataFrame after dropping 'level_2' level:
level_1   c   d
0         1   2
1         3   4
```

---

#### Example 4: Dropping Multiple Levels

```python
# Create a DataFrame with a MultiIndex
index = pd.MultiIndex.from_tuples([('A', 1, 'X'), ('A', 2, 'Y'), ('B', 1, 'Z')], names=['group', 'number', 'type'])
df = pd.DataFrame({'value': [10, 20, 30]}, index=index)

print("Original DataFrame:")
print(df)

# Drop both 'group' and 'type' levels
result = df.droplevel(['group', 'type'])
print("\nDataFrame after dropping 'group' and 'type' levels:")
print(result)
```

**Output:**

```
Original DataFrame:
                value
group number type
A     1      X      10
      2      Y      20
B     1      Z      30

DataFrame after dropping 'group' and 'type' levels:
         value
number
1          10
2          20
1          30
```

---

### **Notes**

- The `droplevel` method is particularly useful when working with **MultiIndex** objects in pandas.
- If you drop all levels from a MultiIndex, the result will have a flat index or columns.
- For **Series**, the `axis` parameter is ignored because Series only has one axis (the index).


In [None]:
""" 
pandas.Series.drop_duplicates
Series.drop_duplicates(*, keep='first', inplace=False, ignore_index=False)[source]
Return Series with duplicate values removed.

Parameters:
keep{‘first’, ‘last’, False}, default ‘first’
Method to handle dropping duplicates:

‘first’ : Drop duplicates except for the first occurrence.

‘last’ : Drop duplicates except for the last occurrence.

False : Drop all duplicates.

inplacebool, default False
If True, performs operation inplace and returns None.

ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.

Added in version 2.0.0.

Returns:
Series or None
Series with duplicates dropped or None if inplace=True.

Index.drop_duplicates
Equivalent method on Index.

DataFrame.drop_duplicates
Equivalent method on DataFrame.

Series.duplicated
Related method on Series, indicating duplicate Series values.

Series.unique
Return unique values as an array.



"""
s = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama', 'hippo'],
              name='animal')
s
s.drop_duplicates()
s.drop_duplicates(keep='last')
s.drop_duplicates(keep=False)

The `pandas.Series.drop_duplicates` method is used to remove duplicate values from a pandas Series. Below is a detailed explanation of its parameters and usage:

---

### **Syntax**

```python
Series.drop_duplicates(*, keep='first', inplace=False, ignore_index=False)
```

---

### **Parameters**

1. **`keep`** : `{'first', 'last', False}`, default `'first'`

   - Determines which duplicates (if any) to keep.
   - Options:
     - `'first'` : Drop all duplicates except for the **first occurrence**.
     - `'last'` : Drop all duplicates except for the **last occurrence**.
     - `False` : Drop **all duplicates** (this will remove all occurrences of duplicate values).

2. **`inplace`** : `bool`, default `False`

   - If `True`, the operation is performed in place (i.e., the original Series is modified), and the method returns `None`.
   - If `False`, a new Series with duplicates removed is returned.

3. **`ignore_index`** : `bool`, default `False`
   - If `True`, the resulting Series will have its index reset to `0, 1, ..., n-1`.
   - Added in pandas version 2.0.0.

---

### **Returns**

- **Series or None**:
  - If `inplace=False` (default), a new Series with duplicates removed is returned.
  - If `inplace=True`, the method returns `None`, and the original Series is modified.

---

### **Examples**

#### Example 1: Basic Usage

```python
import pandas as pd

# Create a Series with duplicate values
s = pd.Series([1, 2, 2, 3, 3, 3, 4])

# Drop duplicates, keeping the first occurrence
result = s.drop_duplicates()
print(result)
```

**Output:**

```
0    1
1    2
3    3
6    4
dtype: int64
```

---

#### Example 2: Keep Last Occurrence

```python
# Drop duplicates, keeping the last occurrence
result = s.drop_duplicates(keep='last')
print(result)
```

**Output:**

```
0    1
2    2
5    3
6    4
dtype: int64
```

---

#### Example 3: Drop All Duplicates

```python
# Drop all duplicates (remove all occurrences of duplicate values)
result = s.drop_duplicates(keep=False)
print(result)
```

**Output:**

```
0    1
6    4
dtype: int64
```

---

#### Example 4: Inplace Modification

```python
# Drop duplicates in place
s.drop_duplicates(inplace=True)
print(s)
```

**Output:**

```
0    1
1    2
3    3
6    4
dtype: int64
```

---

#### Example 5: Reset Index

```python
# Drop duplicates and reset the index
result = s.drop_duplicates(ignore_index=True)
print(result)
```

**Output:**

```
0    1
1    2
2    3
3    4
dtype: int64
```

---

### **Notes**

- The `drop_duplicates` method is useful for cleaning data by removing redundant or repeated values.
- Use `ignore_index=True` if you want the resulting Series to have a clean, sequential index.
- Be cautious with `inplace=True`, as it modifies the original Series and does not return a new object.


In [None]:
""" pandas.Series.duplicated
Series.duplicated(keep='first')[source]
Indicate duplicate Series values.

Duplicated values are indicated as True values in the resulting Series. Either all duplicates, all except the first or all except the last occurrence of duplicates can be indicated.

Parameters
:
keep
{‘first’, ‘last’, False}, default ‘first’
Method to handle dropping duplicates:

‘first’ : Mark duplicates as True except for the first occurrence.

‘last’ : Mark duplicates as True except for the last occurrence.

False : Mark all duplicates as True.

Returns
:
Series[bool]
Series indicating whether each value has occurred in the preceding values.

See also

Index.duplicated
Equivalent method on pandas.Index.

DataFrame.duplicated
Equivalent method on pandas.DataFrame.

Series.drop_duplicates
Remove duplicate values from Series. """
animals = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama'])
animals.duplicated()
animals.duplicated(keep='first')

# By using ‘last’, the last occurrence of each set of duplicated values is set on False and all others on True:

animals.duplicated(keep='last')

In [None]:
# By setting keep on False, all duplicates are True:

animals.duplicated(keep=False)

The `pandas.Series.duplicated` method is used to identify duplicate values in a pandas Series. It returns a boolean Series where `True` indicates that the value at that position is a duplicate, and `False` indicates that it is unique. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.duplicated(keep='first')
```

---

### **Parameters**

1. **`keep`** : `{'first', 'last', False}`, default `'first'`
   - Determines which duplicates (if any) to mark as `True`.
   - Options:
     - `'first'` : Mark all duplicates as `True` **except for the first occurrence**.
     - `'last'` : Mark all duplicates as `True` **except for the last occurrence**.
     - `False` : Mark **all duplicates** as `True`.

---

### **Returns**

- **Series[bool]**:
  - A boolean Series where `True` indicates that the value is a duplicate, and `False` indicates that it is unique.

---

### **Examples**

#### Example 1: Default Behavior (`keep='first'`)

```python
import pandas as pd

# Create a Series with duplicate values
animals = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama'])

# Identify duplicates (default: keep='first')
duplicates = animals.duplicated()
print(duplicates)
```

**Output:**

```
0    False
1    False
2     True
3    False
4     True
dtype: bool
```

- The first occurrence of `'llama'` (index `0`) is marked as `False`.
- Subsequent occurrences of `'llama'` (indices `2` and `4`) are marked as `True`.

---

#### Example 2: Keep Last Occurrence (`keep='last'`)

```python
# Identify duplicates, keeping the last occurrence
duplicates = animals.duplicated(keep='last')
print(duplicates)
```

**Output:**

```
0     True
1    False
2     True
3    False
4    False
dtype: bool
```

- The last occurrence of `'llama'` (index `4`) is marked as `False`.
- Previous occurrences of `'llama'` (indices `0` and `2`) are marked as `True`.

---

#### Example 3: Mark All Duplicates (`keep=False`)

```python
# Identify all duplicates
duplicates = animals.duplicated(keep=False)
print(duplicates)
```

**Output:**

```
0     True
1    False
2     True
3    False
4     True
dtype: bool
```

- All occurrences of `'llama'` (indices `0`, `2`, and `4`) are marked as `True`.
- Unique values (`'cow'` and `'beetle'`) are marked as `False`.

---

### **Notes**

- The `duplicated` method is useful for identifying duplicate values in a Series.
- It is often used in conjunction with other methods like `drop_duplicates` to clean data.
- The `keep` parameter allows flexibility in determining which occurrences of duplicates to mark.

---

### **Related Methods**

1. **`Series.drop_duplicates`**:

   - Removes duplicate values from a Series.
   - Example:
     ```python
     animals.drop_duplicates()
     ```

2. **`Index.duplicated`**:

   - Equivalent method for identifying duplicates in a pandas Index.

3. **`DataFrame.duplicated`**:
   - Equivalent method for identifying duplicates in a pandas DataFrame.

---

### **Practical Use Case**

Suppose you have a dataset with duplicate entries, and you want to identify and remove them:

```python
# Identify duplicates
duplicates = animals.duplicated()

# Filter out duplicates
unique_animals = animals[~duplicates]
print(unique_animals)
```

**Output:**

```
0     llama
1       cow
3    beetle
dtype: object
```

- The `~` operator is used to negate the boolean Series, keeping only unique values.

---

This method is a powerful tool for data cleaning and ensuring data integrity in pandas.


In [None]:
""" pandas.Series.equals
Series.equals(other)[source]
Test whether two objects contain the same elements.

This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

The row/column index do not need to have the same type, as long as the values are considered equal. Corresponding columns and index must be of the same dtype.

Parameters
:
other
Series or DataFrame
The other Series or DataFrame to be compared with the first.

Returns
:
bool
True if all elements are the same in both objects, False otherwise.

See also

Series.eq
Compare two Series objects of the same length and return a Series where each element is True if the element in each Series is equal, False otherwise.

DataFrame.eq
Compare two DataFrame objects of the same shape and return a DataFrame where each element is True if the respective element in each DataFrame is equal, False otherwise.

testing.assert_series_equal
Raises an AssertionError if left and right are not equal. Provides an easy interface to ignore inequality in dtypes, indexes and precision among others.

testing.assert_frame_equal
Like assert_series_equal, but targets DataFrames.

numpy.array_equal
Return True if two arrays have the same shape and elements, False otherwise. """
df = pd.DataFrame({1: [10], 2: [20]})
df


# DataFrames df and exactly_equal have the same types and values for their elements and column labels, which will return True.

exactly_equal = pd.DataFrame({1: [10], 2: [20]})
exactly_equal
df.equals(exactly_equal)

In [None]:
# DataFrames df and different_column_type have the same element types and values, but have different types for the column labels, which will still return True.

different_column_type = pd.DataFrame({1.0: [10], 2.0: [20]})
different_column_type
df.equals(different_column_type)

# DataFrames df and different_data_type have different types for the same values for their elements, and will return False even though their column labels are the same values and types.

different_data_type = pd.DataFrame({1: [10.0], 2: [20.0]})
different_data_type
df.equals(different_data_type)

The `pandas.Series.equals` method is used to compare two pandas objects (Series or DataFrame) to determine if they have the **same shape and elements**. It is a strict comparison method that checks for equality in values, index, and data types. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.equals(other)
```

---

### **Parameters**

1. **`other`** : `Series` or `DataFrame`
   - The object to compare with the current Series or DataFrame.

---

### **Returns**

- **`bool`**:
  - `True` if the two objects have the **same shape, elements, and data types**.
  - `False` otherwise.

---

### **Key Features**

1. **Element-wise Comparison**:

   - Compares each element in the two objects for equality.
   - `NaN` values in the same location are considered equal.

2. **Index Comparison**:

   - The index of the two objects must match in terms of values, but the types of the index can differ (e.g., integer index vs. string index).

3. **Data Type Comparison**:

   - The data types of the elements must match. If the data types differ, the method returns `False`.

4. **Shape Comparison**:
   - The two objects must have the same shape (same number of rows and columns for DataFrames, or same length for Series).

---

### **Examples**

#### Example 1: Comparing Two Series

```python
import pandas as pd

# Create two Series
s1 = pd.Series([1, 2, 3])
s2 = pd.Series([1, 2, 3])
s3 = pd.Series([1, 2, 4])

# Compare s1 and s2
print(s1.equals(s2))  # True

# Compare s1 and s3
print(s1.equals(s3))  # False
```

**Output:**

```
True
False
```

---

#### Example 2: Comparing Series with Different Index Types

```python
# Create Series with different index types
s4 = pd.Series([1, 2, 3], index=[0, 1, 2])
s5 = pd.Series([1, 2, 3], index=['0', '1', '2'])

# Compare s4 and s5
print(s4.equals(s5))  # True (index values are the same, even though types differ)
```

**Output:**

```
True
```

---

#### Example 3: Comparing Series with NaN Values

```python
import numpy as np

# Create Series with NaN values
s6 = pd.Series([1, 2, np.nan])
s7 = pd.Series([1, 2, np.nan])

# Compare s6 and s7
print(s6.equals(s7))  # True (NaN values are considered equal)
```

**Output:**

```
True
```

---

#### Example 4: Comparing Series with Different Data Types

```python
# Create Series with different data types
s8 = pd.Series([1, 2, 3])
s9 = pd.Series([1.0, 2.0, 3.0])

# Compare s8 and s9
print(s8.equals(s9))  # False (data types differ)
```

**Output:**

```
False
```

---

#### Example 5: Comparing DataFrames

```python
# Create DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df3 = pd.DataFrame({'A': [1, 2], 'B': [3.0, 4.0]})  # Different data types

# Compare df1 and df2
print(df1.equals(df2))  # True

# Compare df1 and df3
print(df1.equals(df3))  # False (data types differ)
```

**Output:**

```
True
False
```

---

### **Notes**

- The `equals` method is stricter than the `==` operator because it checks for equality in **values, index, and data types**.
- For a more flexible comparison (e.g., ignoring data types or index differences), you can use `pandas.testing.assert_series_equal` or `pandas.testing.assert_frame_equal`.

---

### **Related Methods**

1. **`Series.eq`**:

   - Element-wise comparison of two Series. Returns a boolean Series.
   - Example:
     ```python
     s1.eq(s2)
     ```

2. **`DataFrame.eq`**:

   - Element-wise comparison of two DataFrames. Returns a boolean DataFrame.
   - Example:
     ```python
     df1.eq(df2)
     ```

3. **`pandas.testing.assert_series_equal`**:

   - Raises an `AssertionError` if two Series are not equal. Allows for flexible comparisons (e.g., ignoring data types or index differences).
   - Example:
     ```python
     pd.testing.assert_series_equal(s1, s2)
     ```

4. **`pandas.testing.assert_frame_equal`**:

   - Raises an `AssertionError` if two DataFrames are not equal. Similar to `assert_series_equal` but for DataFrames.
   - Example:
     ```python
     pd.testing.assert_frame_equal(df1, df2)
     ```

5. **`numpy.array_equal`**:
   - Compares two NumPy arrays for equality in shape and elements.
   - Example:
     ```python
     np.array_equal(arr1, arr2)
     ```

---

### **Practical Use Case**

Suppose you want to verify if two datasets are identical after some transformations:

```python
# After some data processing
processed_data = df1.copy()

# Compare with the original data
if df1.equals(processed_data):
    print("The datasets are identical.")
else:
    print("The datasets are different.")
```

---

The `equals` method is a powerful tool for ensuring data integrity and verifying the correctness of data transformations.


In [None]:
""" pandas.Series.first
Series.first(offset)[source]
Select initial periods of time series data based on a date offset.

Deprecated since version 2.1: first() is deprecated and will be removed in a future version. Please create a mask and filter using .loc instead.

For a DataFrame with a sorted DatetimeIndex, this function can select the first few rows based on a date offset.

Parameters
:
offset
str, DateOffset or dateutil.relativedelta
The offset length of the data that will be selected. For instance, ‘1ME’ will display all the rows having their index within the first month.

Returns
:
Series or DataFrame
A subset of the caller.

Raises
:
TypeError
If the index is not a DatetimeIndex 

last
Select final periods of time series based on a date offset.

at_time
Select values at a particular time of the day.

between_time
Select values between particular times of the day.

"""
i = pd.date_range('2018-04-09', periods=4, freq='2D')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts
#  Get the rows for the first 3 days:

ts.first('3D')

The `pandas.Series.first` method is used to select the initial periods of time series data based on a date offset. It is particularly useful for filtering data from the beginning of a time series. However, **this method is deprecated starting from pandas version 2.1** and will be removed in a future version. The recommended approach is to use `.loc` with a mask for filtering.

Below is a detailed explanation of the method, its parameters, and examples.

---

### **Syntax**

```python
Series.first(offset)
```

---

### **Parameters**

1. **`offset`** : `str`, `DateOffset`, or `dateutil.relativedelta`
   - Specifies the offset length of the data to be selected.
   - Examples:
     - `'3D'` : Selects the first 3 calendar days.
     - `'1ME'` : Selects the first month of data.

---

### **Returns**

- **Series or DataFrame**:
  - A subset of the original object containing the initial periods of the time series based on the specified offset.

---

### **Raises**

- **`TypeError`**:
  - If the index of the Series or DataFrame is not a `DatetimeIndex`.

---

### **Deprecation Notice**

- The `first` method is deprecated starting from pandas version 2.1.
- The recommended alternative is to use `.loc` with a mask for filtering.

---

### **Examples**

#### Example 1: Using `first` with a Time Series

```python
import pandas as pd

# Create a time series with a DatetimeIndex
index = pd.date_range('2023-01-01', periods=6, freq='2D')
ts = pd.Series([1, 2, 3, 4, 5, 6], index=index)

print("Original Series:")
print(ts)

# Select the first 3 days of data
result = ts.first('3D')
print("\nFirst 3 days of data:")
print(result)
```

**Output:**

```
Original Series:
2023-01-01    1
2023-01-03    2
2023-01-05    3
2023-01-07    4
2023-01-09    5
2023-01-11    6
Freq: 2D, dtype: int64

First 3 days of data:
2023-01-01    1
2023-01-03    2
dtype: int64
```

- The `first('3D')` method selects data within the first 3 calendar days (`2023-01-01` to `2023-01-03`).

---

#### Example 2: Using `first` with a DataFrame

```python
# Create a DataFrame with a DatetimeIndex
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]}, index=index)

print("Original DataFrame:")
print(df)

# Select the first 5 days of data
result = df.first('5D')
print("\nFirst 5 days of data:")
print(result)
```

**Output:**

```
Original DataFrame:
            A
2023-01-01  1
2023-01-03  2
2023-01-05  3
2023-01-07  4
2023-01-09  5
2023-01-11  6

First 5 days of data:
            A
2023-01-01  1
2023-01-03  2
2023-01-05  3
```

- The `first('5D')` method selects data within the first 5 calendar days (`2023-01-01` to `2023-01-05`).

---

### **Recommended Alternative (Using `.loc`)**

Since `first` is deprecated, you can achieve the same result using `.loc` with a mask.

#### Example: Using `.loc` to Filter Data

```python
# Filter the first 3 days of data using .loc
mask = ts.index <= ts.index[0] + pd.Timedelta('3D')
result = ts.loc[mask]

print("First 3 days of data (using .loc):")
print(result)
```

**Output:**

```
First 3 days of data (using .loc):
2023-01-01    1
2023-01-03    2
dtype: int64
```

---

### **Related Methods**

1. **`last`**:

   - Selects the final periods of time series data based on a date offset.
   - Example:
     ```python
     ts.last('3D')
     ```

2. **`at_time`**:

   - Selects values at a specific time of the day.
   - Example:
     ```python
     ts.at_time('09:00')
     ```

3. **`between_time`**:
   - Selects values between specific times of the day.
   - Example:
     ```python
     ts.between_time('09:00', '12:00')
     ```

---

### **Notes**

- The `first` method is useful for quickly filtering time series data based on a date offset.
- However, due to its deprecation, it is recommended to use `.loc` with a mask for filtering.
- Ensure that the index of the Series or DataFrame is a `DatetimeIndex` before using this method.

---

By using the recommended `.loc` approach, you can achieve the same functionality in a more future-proof way.


In [None]:
""" pandas.Series.head

Series.head(n=5)
Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

For negative values of n, this function returns all rows except the last |n| rows, equivalent to df[:n].

If n is larger than the number of rows, this function returns all rows.

Parameters
:
n
int, default 5
Number of rows to select.

Returns
:
same type as caller
The first n rows of the caller object.



DataFrame.tail
Returns the last n rows. """
df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
                   'monkey', 'parrot', 'shark', 'whale', 'zebra']})
df
df.head()
df.head(3)
# For negative values of n

df.head(-3)

The `pandas.Series.head` method is used to return the first `n` rows of a Series or DataFrame. It is a convenient way to quickly inspect the beginning of your data. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.head(n=5)
```

---

### **Parameters**

1. **`n`** : `int`, default `5`
   - The number of rows to return.
   - If `n` is positive, it returns the first `n` rows.
   - If `n` is negative, it returns all rows **except the last `|n|` rows**.
   - If `n` is larger than the number of rows, it returns all rows.

---

### **Returns**

- **Same type as caller**:
  - The first `n` rows of the Series or DataFrame.

---

### **Examples**

#### Example 1: Default Behavior (`n=5`)

```python
import pandas as pd

# Create a Series
s = pd.Series([10, 20, 30, 40, 50, 60, 70, 80, 90])

# Return the first 5 rows (default)
print(s.head())
```

**Output:**

```
0    10
1    20
2    30
3    40
4    50
dtype: int64
```

---

#### Example 2: Specifying `n`

```python
# Return the first 3 rows
print(s.head(3))
```

**Output:**

```
0    10
1    20
2    30
dtype: int64
```

---

#### Example 3: Negative `n`

```python
# Return all rows except the last 3
print(s.head(-3))
```

**Output:**

```
0    10
1    20
2    30
3    40
4    50
5    60
dtype: int64
```

---

#### Example 4: `n` Larger Than the Number of Rows

```python
# Return all rows (n is larger than the number of rows)
print(s.head(20))
```

**Output:**

```
0    10
1    20
2    30
3    40
4    50
5    60
6    70
7    80
8    90
dtype: int64
```

---

### **Examples with DataFrame**

#### Example 5: Default Behavior (`n=5`)

```python
# Create a DataFrame
df = pd.DataFrame({
    'animal': ['alligator', 'bee', 'falcon', 'lion', 'monkey', 'parrot', 'shark', 'whale', 'zebra']
})

# Return the first 5 rows (default)
print(df.head())
```

**Output:**

```
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
```

---

#### Example 6: Specifying `n`

```python
# Return the first 3 rows
print(df.head(3))
```

**Output:**

```
      animal
0  alligator
1        bee
2     falcon
```

---

#### Example 7: Negative `n`

```python
# Return all rows except the last 3
print(df.head(-3))
```

**Output:**

```
      animal
0  alligator
1        bee
2     falcon
3       lion
4     monkey
5     parrot
```

---

### **Related Methods**

1. **`tail`**:

   - Returns the last `n` rows of a Series or DataFrame.
   - Example:
     ```python
     df.tail(3)
     ```

2. **`iloc`**:
   - Allows positional indexing to select rows or columns.
   - Example:
     ```python
     df.iloc[:5]  # Equivalent to df.head(5)
     ```

---

### **Practical Use Case**

The `head` method is commonly used for quick data inspection, especially when working with large datasets. For example:

```python
# Load a large dataset
df = pd.read_csv('large_dataset.csv')

# Inspect the first 5 rows
print(df.head())
```

---

### **Notes**

- The `head` method is a simple and efficient way to preview the beginning of your data.
- For more advanced row selection, you can use `.iloc` or `.loc`.

---

By using `head`, you can quickly verify the structure and content of your data, making it an essential tool for data exploration and debugging.


In [None]:
""" pandas.Series.idxmax
Series.idxmax(axis=0, skipna=True, *args, **kwargs)[source]
Return the row label of the maximum value.

If multiple values equal the maximum, the first row label with that value is returned.

Parameters
:
axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

skipna
bool, default True
Exclude NA/null values. If the entire Series is NA, the result will be NA.

*args, **kwargs
Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
Index
Label of the maximum value.

Raises
:
ValueError
If the Series is empty.

See also

numpy.argmax
Return indices of the maximum values along the given axis.

DataFrame.idxmax
Return index of first occurrence of maximum over requested axis.

Series.idxmin
Return index label of the first occurrence of minimum of values.

Notes

This method is the Series version of ndarray.argmax. This method returns the label of the maximum, while ndarray.argmax returns the position. To get the position, use series.values.argmax(). """
s = pd.Series(data=[1, None, 4, 3, 4],
              index=['A', 'B', 'C', 'D', 'E'])
s
s.idxmax()
# If skipna is False and there is an NA value in the data, the function returns nan.

s.idxmax(skipna=False)

The `pandas.Series.idxmax` method is used to return the **row label (index)** of the **maximum value** in a pandas Series. If multiple values are equal to the maximum, the label of the **first occurrence** is returned. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.idxmax(axis=0, skipna=True, *args, **kwargs)
```

---

### **Parameters**

1. **`axis`** : `{0 or 'index'}`

   - This parameter is **unused** for Series. It is included for compatibility with DataFrame.

2. **`skipna`** : `bool`, default `True`

   - If `True`, **exclude** `NaN` (missing) values when searching for the maximum.
   - If `False`, the result will be `NaN` if there are any `NaN` values in the Series.

3. **`\*args, **kwargs`\*\*:
   - Additional arguments and keywords are accepted for compatibility with NumPy but have no effect.

---

### **Returns**

- **Index**:
  - The label (index) of the **first occurrence** of the maximum value in the Series.
  - If all values are `NaN` and `skipna=True`, the result will be `NaN`.

---

### **Raises**

- **`ValueError`**:
  - If the Series is **empty**.

---

### **Examples**

#### Example 1: Basic Usage

```python
import pandas as pd

# Create a Series
s = pd.Series([1, 3, 2, 5, 4], index=['A', 'B', 'C', 'D', 'E'])

# Find the index of the maximum value
max_index = s.idxmax()
print(max_index)
```

**Output:**

```
'D'
```

- The maximum value is `5`, and its index is `'D'`.

---

#### Example 2: Multiple Maximum Values

```python
# Create a Series with multiple maximum values
s = pd.Series([1, 5, 2, 5, 4], index=['A', 'B', 'C', 'D', 'E'])

# Find the index of the first occurrence of the maximum value
max_index = s.idxmax()
print(max_index)
```

**Output:**

```
'B'
```

- The maximum value is `5`, and the first occurrence is at index `'B'`.

---

#### Example 3: Handling NaN Values

```python
# Create a Series with NaN values
s = pd.Series([1, None, 4, 3, 4], index=['A', 'B', 'C', 'D', 'E'])

# Find the index of the maximum value (skipna=True by default)
max_index = s.idxmax()
print(max_index)
```

**Output:**

```
'C'
```

- The maximum value is `4`, and the first occurrence is at index `'C'`.

---

#### Example 4: Handling NaN Values with `skipna=False`

```python
# Find the index of the maximum value (skipna=False)
max_index = s.idxmax(skipna=False)
print(max_index)
```

**Output:**

```
nan
```

- Since `skipna=False` and there is a `NaN` value in the Series, the result is `NaN`.

---

#### Example 5: Empty Series

```python
# Create an empty Series
s = pd.Series([])

# Attempt to find the index of the maximum value
try:
    max_index = s.idxmax()
except ValueError as e:
    print(e)
```

**Output:**

```
attempt to get argmax of an empty sequence
```

- A `ValueError` is raised because the Series is empty.

---

### **Related Methods**

1. **`Series.idxmin`**:

   - Returns the index of the **minimum value** in the Series.
   - Example:
     ```python
     s.idxmin()
     ```

2. **`DataFrame.idxmax`**:

   - Returns the index of the maximum value for each row or column in a DataFrame.
   - Example:
     ```python
     df.idxmax(axis=0)  # Maximum along columns
     ```

3. **`numpy.argmax`**:

   - Returns the **position** (integer index) of the maximum value in a NumPy array.
   - Example:
     ```python
     import numpy as np
     np.argmax(s.values)
     ```

4. **`Series.values.argmax()`**:
   - Returns the **position** (integer index) of the maximum value in the underlying NumPy array of the Series.
   - Example:
     ```python
     s.values.argmax()
     ```

---

### **Notes**

- The `idxmax` method is useful for finding the **label** of the maximum value in a Series.
- If you need the **position** (integer index) of the maximum value, use `s.values.argmax()`.
- By default, `NaN` values are ignored. If you want to include `NaN` values in the calculation, set `skipna=False`.

---

### **Practical Use Case**

Suppose you have a dataset of student scores and want to find the student with the highest score:

```python
# Create a Series of student scores
scores = pd.Series([85, 92, 78, 95, 88], index=['Alice', 'Bob', 'Charlie', 'David', 'Eve'])

# Find the student with the highest score
top_student = scores.idxmax()
print(f"The student with the highest score is: {top_student}")
```

**Output:**

```
The student with the highest score is: David
```

---

By using `idxmax`, you can easily identify the label (index) associated with the maximum value in a Series.


In [None]:
""" pandas.Series.idxmin
Series.idxmin(axis=0, skipna=True, *args, **kwargs)[source]
Return the row label of the minimum value.

If multiple values equal the minimum, the first row label with that value is returned.

Parameters
:
axis
{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

skipna
bool, default True
Exclude NA/null values. If the entire Series is NA, the result will be NA.

*args, **kwargs
Additional arguments and keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
Index
Label of the minimum value.

Raises
:
ValueError
If the Series is empty.


numpy.argmin
Return indices of the minimum values along the given axis.

DataFrame.idxmin
Return index of first occurrence of minimum over requested axis.

Series.idxmax
Return index label of the first occurrence of maximum of values.

Notes

This method is the Series version of ndarray.argmin. This method returns the label of the minimum, while ndarray.argmin returns the position. To get the position, use series.values.argmin(). """

s = pd.Series(data=[1, None, 4, 1],
              index=['A', 'B', 'C', 'D'])
s
s.idxmin()
# If skipna is False and there is an NA value in the data, the function returns nan.

s.idxmin(skipna=False)

The `pandas.Series.idxmin` method is used to return the **row label (index)** of the **minimum value** in a pandas Series. If multiple values are equal to the minimum, the label of the **first occurrence** is returned. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.idxmin(axis=0, skipna=True, *args, **kwargs)
```

---

### **Parameters**

1. **`axis`** : `{0 or 'index'}`

   - This parameter is **unused** for Series. It is included for compatibility with DataFrame.

2. **`skipna`** : `bool`, default `True`

   - If `True`, **exclude** `NaN` (missing) values when searching for the minimum.
   - If `False`, the result will be `NaN` if there are any `NaN` values in the Series.

3. **`\*args, **kwargs`\*\*:
   - Additional arguments and keywords are accepted for compatibility with NumPy but have no effect.

---

### **Returns**

- **Index**:
  - The label (index) of the **first occurrence** of the minimum value in the Series.
  - If all values are `NaN` and `skipna=True`, the result will be `NaN`.

---

### **Raises**

- **`ValueError`**:
  - If the Series is **empty**.

---

### **Examples**

#### Example 1: Basic Usage

```python
import pandas as pd

# Create a Series
s = pd.Series([5, 3, 2, 1, 4], index=['A', 'B', 'C', 'D', 'E'])

# Find the index of the minimum value
min_index = s.idxmin()
print(min_index)
```

**Output:**

```
'D'
```

- The minimum value is `1`, and its index is `'D'`.

---

#### Example 2: Multiple Minimum Values

```python
# Create a Series with multiple minimum values
s = pd.Series([1, 5, 2, 1, 4], index=['A', 'B', 'C', 'D', 'E'])

# Find the index of the first occurrence of the minimum value
min_index = s.idxmin()
print(min_index)
```

**Output:**

```
'A'
```

- The minimum value is `1`, and the first occurrence is at index `'A'`.

---

#### Example 3: Handling NaN Values

```python
# Create a Series with NaN values
s = pd.Series([1, None, 4, 3, 1], index=['A', 'B', 'C', 'D', 'E'])

# Find the index of the minimum value (skipna=True by default)
min_index = s.idxmin()
print(min_index)
```

**Output:**

```
'A'
```

- The minimum value is `1`, and the first occurrence is at index `'A'`.

---

#### Example 4: Handling NaN Values with `skipna=False`

```python
# Find the index of the minimum value (skipna=False)
min_index = s.idxmin(skipna=False)
print(min_index)
```

**Output:**

```
nan
```

- Since `skipna=False` and there is a `NaN` value in the Series, the result is `NaN`.

---

#### Example 5: Empty Series

```python
# Create an empty Series
s = pd.Series([])

# Attempt to find the index of the minimum value
try:
    min_index = s.idxmin()
except ValueError as e:
    print(e)
```

**Output:**

```
attempt to get argmin of an empty sequence
```

- A `ValueError` is raised because the Series is empty.

---

### **Related Methods**

1. **`Series.idxmax`**:

   - Returns the index of the **maximum value** in the Series.
   - Example:
     ```python
     s.idxmax()
     ```

2. **`DataFrame.idxmin`**:

   - Returns the index of the minimum value for each row or column in a DataFrame.
   - Example:
     ```python
     df.idxmin(axis=0)  # Minimum along columns
     ```

3. **`numpy.argmin`**:

   - Returns the **position** (integer index) of the minimum value in a NumPy array.
   - Example:
     ```python
     import numpy as np
     np.argmin(s.values)
     ```

4. **`Series.values.argmin()`**:
   - Returns the **position** (integer index) of the minimum value in the underlying NumPy array of the Series.
   - Example:
     ```python
     s.values.argmin()
     ```

---

### **Notes**

- The `idxmin` method is useful for finding the **label** of the minimum value in a Series.
- If you need the **position** (integer index) of the minimum value, use `s.values.argmin()`.
- By default, `NaN` values are ignored. If you want to include `NaN` values in the calculation, set `skipna=False`.

---

### **Practical Use Case**

Suppose you have a dataset of student scores and want to find the student with the lowest score:

```python
# Create a Series of student scores
scores = pd.Series([85, 92, 78, 95, 88], index=['Alice', 'Bob', 'Charlie', 'David', 'Eve'])

# Find the student with the lowest score
lowest_student = scores.idxmin()
print(f"The student with the lowest score is: {lowest_student}")
```

**Output:**

```
The student with the lowest score is: Charlie
```

---

By using `idxmin`, you can easily identify the label (index) associated with the minimum value in a Series.


In [None]:
""" pandas.Series.isin
Series.isin(values)[source]
Whether elements in Series are contained in values.

Return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly.

Parameters
:
values
set or list-like
The sequence of values to test. Passing in a single string will raise a TypeError. Instead, turn a single string into a list of one element.

Returns
:
Series
Series of booleans indicating if each element is in values.

Raises
:
TypeError
If values is a string



DataFrame.isin
Equivalent method on DataFrame. """
s = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama',
               'hippo'], name='animal')
s.isin(['cow', 'llama'])

# o invert the boolean values, use the ~ operator:

~s.isin(['cow', 'llama'])

# Passing a single string as s.isin('llama') will raise an error. Use a list of one element instead:

s.isin(['llama'])
# Strings and integers are distinct and are therefore not comparable:

pd.Series([1]).isin(['1'])
pd.Series([1.1]).isin(['1.1'])

The `pandas.Series.isin` method is used to check whether elements in a Series are contained in a specified sequence of values. It returns a boolean Series where `True` indicates that the element is present in the sequence, and `False` indicates that it is not. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.isin(values)
```

---

### **Parameters**

1. **`values`** : `set` or `list-like`
   - The sequence of values to test against the Series.
   - If a single string is passed, it will raise a `TypeError`. Instead, pass a list containing the single string.

---

### **Returns**

- **Series**:
  - A boolean Series where each element indicates whether the corresponding element in the original Series is present in `values`.

---

### **Raises**

- **`TypeError`**:
  - If `values` is a single string (instead of a list or set).

---

### **Examples**

#### Example 1: Basic Usage

```python
import pandas as pd

# Create a Series
s = pd.Series(['llama', 'cow', 'llama', 'beetle', 'llama', 'hippo'], name='animal')

# Check if elements are in the list ['cow', 'llama']
result = s.isin(['cow', 'llama'])
print(result)
```

**Output:**

```
0     True
1     True
2     True
3    False
4     True
5    False
Name: animal, dtype: bool
```

- The elements `'llama'` and `'cow'` are present in the list, so their corresponding positions are marked as `True`.

---

#### Example 2: Inverting the Result

You can use the `~` operator to invert the boolean values:

```python
# Invert the boolean values
inverted_result = ~s.isin(['cow', 'llama'])
print(inverted_result)
```

**Output:**

```
0    False
1    False
2    False
3     True
4    False
5     True
Name: animal, dtype: bool
```

- The elements not present in the list (`'beetle'` and `'hippo'`) are now marked as `True`.

---

#### Example 3: Single String in `values`

Passing a single string directly will raise a `TypeError`. Instead, use a list with a single element:

```python
# Check if elements are equal to 'llama'
result = s.isin(['llama'])
print(result)
```

**Output:**

```
0     True
1    False
2     True
3    False
4     True
5    False
Name: animal, dtype: bool
```

---

#### Example 4: Comparing Numbers and Strings

Strings and integers are distinct, so comparisons between them will return `False`:

```python
# Create a Series of integers
s = pd.Series([1, 2, 3])

# Check if elements are in the list ['1', '2']
result = s.isin(['1', '2'])
print(result)
```

**Output:**

```
0    False
1    False
2    False
dtype: bool
```

- The integers `1` and `2` are not equal to the strings `'1'` and `'2'`.

---

#### Example 5: Using a Set for `values`

You can also use a set for the `values` parameter:

```python
# Check if elements are in the set {'llama', 'cow'}
result = s.isin({'llama', 'cow'})
print(result)
```

**Output:**

```
0     True
1     True
2     True
3    False
4     True
5    False
Name: animal, dtype: bool
```

---

### **Related Methods**

1. **`DataFrame.isin`**:

   - Equivalent method for DataFrames. Checks if elements are contained in a sequence of values for each row or column.
   - Example:
     ```python
     df.isin(['llama', 'cow'])
     ```

2. **`Series.str.contains`**:
   - Checks if elements contain a substring or match a regex pattern.
   - Example:
     ```python
     s.str.contains('ll')
     ```

---

### **Practical Use Case**

Suppose you have a dataset of animals and want to filter out only specific animals:

```python
# Filter the Series to include only 'llama' and 'cow'
filtered_series = s[s.isin(['llama', 'cow'])]
print(filtered_series)
```

**Output:**

```
0    llama
1      cow
2    llama
4    llama
Name: animal, dtype: object
```

---

### **Notes**

- The `isin` method is case-sensitive. For case-insensitive comparisons, you can use `str.lower()` or `str.upper()`.
- It is a powerful tool for filtering and subsetting data based on specific values.

---

By using `isin`, you can efficiently check for membership in a sequence and perform filtering operations on your data.


In [None]:
""" pandas.Series.last

Series.last(offset)

Select final periods of time series data based on a date offset.

Deprecated since version 2.1: last() is deprecated and will be removed in a future version. Please create a mask and filter using .loc instead.

For a DataFrame with a sorted DatetimeIndex, this function selects the last few rows based on a date offset.

Parameters
:
offset
str, DateOffset, dateutil.relativedelta
The offset length of the data that will be selected. For instance, ‘3D’ will display all the rows having their index within the last 3 days.

Returns
:
Series or DataFrame
A subset of the caller.

Raises
:
TypeError
If the index is not a DatetimeIndex

See also

first
Select initial periods of time series based on a date offset.

at_time
Select values at a particular time of the day.

between_time
Select values between particular times of the day.

Notes

Deprecated since version 2.1.0: Please create a mask and filter using .loc instead """

i = pd.date_range('2018-04-09', periods=4, freq='2D')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts
# Get the rows for the last 3 days:

ts.last('3D')  

The `pandas.Series.last` method is used to select the **final periods** of time series data based on a date offset. It is particularly useful for filtering data from the end of a time series. However, **this method is deprecated starting from pandas version 2.1** and will be removed in a future version. The recommended approach is to use `.loc` with a mask for filtering.

Below is a detailed explanation of the method, its parameters, and examples.

---

### **Syntax**

```python
Series.last(offset)
```

---

### **Parameters**

1. **`offset`** : `str`, `DateOffset`, or `dateutil.relativedelta`
   - Specifies the offset length of the data to be selected.
   - Examples:
     - `'3D'` : Selects the last 3 calendar days.
     - `'1ME'` : Selects the last month of data.

---

### **Returns**

- **Series or DataFrame**:
  - A subset of the original object containing the final periods of the time series based on the specified offset.

---

### **Raises**

- **`TypeError`**:
  - If the index of the Series or DataFrame is not a `DatetimeIndex`.

---

### **Deprecation Notice**

- The `last` method is deprecated starting from pandas version 2.1.
- The recommended alternative is to use `.loc` with a mask for filtering.

---

### **Examples**

#### Example 1: Using `last` with a Time Series

```python
import pandas as pd

# Create a time series with a DatetimeIndex
index = pd.date_range('2023-01-01', periods=6, freq='2D')
ts = pd.Series([1, 2, 3, 4, 5, 6], index=index)

print("Original Series:")
print(ts)

# Select the last 3 days of data
result = ts.last('3D')
print("\nLast 3 days of data:")
print(result)
```

**Output:**

```
Original Series:
2023-01-01    1
2023-01-03    2
2023-01-05    3
2023-01-07    4
2023-01-09    5
2023-01-11    6
Freq: 2D, dtype: int64

Last 3 days of data:
2023-01-09    5
2023-01-11    6
dtype: int64
```

- The `last('3D')` method selects data within the last 3 calendar days (`2023-01-09` to `2023-01-11`).

---

#### Example 2: Using `last` with a DataFrame

```python
# Create a DataFrame with a DatetimeIndex
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6]}, index=index)

print("Original DataFrame:")
print(df)

# Select the last 5 days of data
result = df.last('5D')
print("\nLast 5 days of data:")
print(result)
```

**Output:**

```
Original DataFrame:
            A
2023-01-01  1
2023-01-03  2
2023-01-05  3
2023-01-07  4
2023-01-09  5
2023-01-11  6

Last 5 days of data:
            A
2023-01-07  4
2023-01-09  5
2023-01-11  6
```

- The `last('5D')` method selects data within the last 5 calendar days (`2023-01-07` to `2023-01-11`).

---

### **Recommended Alternative (Using `.loc`)**

Since `last` is deprecated, you can achieve the same result using `.loc` with a mask.

#### Example: Using `.loc` to Filter Data

```python
# Filter the last 3 days of data using .loc
mask = ts.index >= ts.index[-1] - pd.Timedelta('3D')
result = ts.loc[mask]

print("Last 3 days of data (using .loc):")
print(result)
```

**Output:**

```
Last 3 days of data (using .loc):
2023-01-09    5
2023-01-11    6
dtype: int64
```

---

### **Related Methods**

1. **`first`**:

   - Selects the initial periods of time series data based on a date offset.
   - Example:
     ```python
     ts.first('3D')
     ```

2. **`at_time`**:

   - Selects values at a specific time of the day.
   - Example:
     ```python
     ts.at_time('09:00')
     ```

3. **`between_time`**:
   - Selects values between specific times of the day.
   - Example:
     ```python
     ts.between_time('09:00', '12:00')
     ```

---

### **Notes**

- The `last` method is useful for quickly filtering time series data based on a date offset.
- However, due to its deprecation, it is recommended to use `.loc` with a mask for filtering.
- Ensure that the index of the Series or DataFrame is a `DatetimeIndex` before using this method.

---

By using the recommended `.loc` approach, you can achieve the same functionality in a more future-proof way.


In [None]:
""" pandas.Series.reindex
Series.reindex(index=None, *, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None)[source]
Conform Series to new index with optional filling logic.

Places NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
indexarray-like, optional
New labels for the index. Preferably an Index object to avoid duplicating data.

axisint or str, optional
Unused.

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: Propagate last valid observation forward to next valid.

backfill / bfill: Use next valid observation to fill gap.

nearest: Use nearest valid observations to fill gap.

copybool, default True
Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

levelint or name
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_valuescalar, default np.nan
Value to use for missing values. Defaults to NaN, but can be any “compatible” value.

limitint, default None
Maximum number of consecutive elements to forward or backward fill.

toleranceoptional
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations most satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:
Series with changed index.
See also

DataFrame.set_index
Set row labels.

DataFrame.reset_index
Remove row labels or move them to new columns.

DataFrame.reindex_like
Change to same indices as other DataFrame.

 """
index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
df = pd.DataFrame({'http_status': [200, 200, 404, 404, 301],
                  'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
                  index=index)
df
new_index = ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
             'Chrome']
df.reindex(new_index)

In [None]:
# We can fill in the missing values by passing a value to the keyword fill_value. Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

df.reindex(new_index, fill_value=0)
df.reindex(new_index, fill_value='missing')
# We can also reindex the columns.

df.reindex(columns=['http_status', 'user_agent'])

# Or we can use “axis-style” keyword arguments

df.reindex(['http_status', 'user_agent'], axis="columns")
# To further illustrate the filling functionality in reindex, we will create a dataframe with a monotonically increasing index (for example, a sequence of dates).

date_index = pd.date_range('1/1/2010', periods=6, freq='D')
df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
                   index=date_index)
df2
date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
df2.reindex(date_index2)
df2.reindex(date_index2, method='bfill')

The `pandas.Series.reindex` method is used to conform a Series to a new index. It allows you to change the index of a Series, optionally filling in missing values using various methods. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.reindex(index=None, *, axis=None, method=None, copy=None, level=None, fill_value=None, limit=None, tolerance=None)
```

---

### **Parameters**

1. **`index`** : `array-like`, optional

   - New labels for the index. Preferably an `Index` object to avoid duplicating data.

2. **`axis`** : `int` or `str`, optional

   - Unused for Series. Included for compatibility with DataFrame.

3. **`method`** : `{None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}`, optional

   - Method to use for filling holes in the reindexed Series:
     - `None` (default): Do not fill gaps.
     - `'pad'` or `'ffill'`: Propagate the last valid observation forward.
     - `'backfill'` or `'bfill'`: Use the next valid observation to fill the gap.
     - `'nearest'`: Use the nearest valid observation to fill the gap.

4. **`copy`** : `bool`, default `True`

   - If `True`, a new object is returned even if the new index is the same as the current one.
   - **Note**: The behavior of `copy` will change in pandas 3.0 with the introduction of Copy-on-Write.

5. **`level`** : `int` or `name`, optional

   - Broadcast across a level, matching Index values on the passed MultiIndex level.

6. **`fill_value`** : `scalar`, default `np.nan`

   - Value to use for missing values. Defaults to `NaN`.

7. **`limit`** : `int`, default `None`

   - Maximum number of consecutive elements to forward or backward fill.

8. **`tolerance`** : optional
   - Maximum distance between original and new labels for inexact matches.
   - Can be a scalar value or list-like (e.g., list, tuple, array, Series).

---

### **Returns**

- **Series**:
  - A new Series with the specified index. Missing values are filled according to the specified method or `fill_value`.

---

### **Examples**

#### Example 1: Basic Reindexing

```python
import pandas as pd

# Create a Series
s = pd.Series([10, 20, 30], index=['A', 'B', 'C'])

# Reindex with new labels
new_index = ['A', 'B', 'C', 'D']
result = s.reindex(new_index)
print(result)
```

**Output:**

```
A    10.0
B    20.0
C    30.0
D     NaN
dtype: float64
```

- The new index `'D'` has no corresponding value, so it is filled with `NaN`.

---

#### Example 2: Filling Missing Values

```python
# Reindex with a fill value
result = s.reindex(new_index, fill_value=0)
print(result)
```

**Output:**

```
A    10
B    20
C    30
D     0
dtype: int64
```

- The missing value at `'D'` is filled with `0`.

---

#### Example 3: Forward Fill (`ffill`)

```python
# Create a Series with a monotonic index
s = pd.Series([1, 2, 3], index=pd.date_range('2023-01-01', periods=3))

# Reindex with a new index
new_index = pd.date_range('2023-01-01', periods=5)
result = s.reindex(new_index, method='ffill')
print(result)
```

**Output:**

```
2023-01-01    1.0
2023-01-02    2.0
2023-01-03    3.0
2023-01-04    3.0
2023-01-05    3.0
Freq: D, dtype: float64
```

- The last valid value (`3`) is propagated forward to fill the gaps.

---

#### Example 4: Backward Fill (`bfill`)

```python
# Reindex with backward fill
result = s.reindex(new_index, method='bfill')
print(result)
```

**Output:**

```
2023-01-01    1.0
2023-01-02    2.0
2023-01-03    3.0
2023-01-04    NaN
2023-01-05    NaN
Freq: D, dtype: float64
```

- The next valid value is used to fill gaps, but since there are no values after `2023-01-03`, the gaps remain `NaN`.

---

#### Example 5: Using `tolerance`

```python
# Reindex with tolerance
result = s.reindex(new_index, method='ffill', tolerance=pd.Timedelta('1D'))
print(result)
```

**Output:**

```
2023-01-01    1.0
2023-01-02    2.0
2023-01-03    3.0
2023-01-04    NaN
2023-01-05    NaN
Freq: D, dtype: float64
```

- The `tolerance` parameter limits the distance between the original and new labels for filling.

---

### **Related Methods**

1. **`DataFrame.reindex`**:

   - Reindexes both rows and columns of a DataFrame.
   - Example:
     ```python
     df.reindex(index=new_index, columns=new_columns)
     ```

2. **`DataFrame.reindex_like`**:

   - Reindexes a DataFrame to match the index and columns of another DataFrame.
   - Example:
     ```python
     df1.reindex_like(df2)
     ```

3. **`DataFrame.set_index`**:

   - Sets the DataFrame index using existing columns.
   - Example:
     ```python
     df.set_index('column_name')
     ```

4. **`DataFrame.reset_index`**:
   - Resets the index, moving it into a column.
   - Example:
     ```python
     df.reset_index()
     ```

---

### **Practical Use Case**

Suppose you have a time series with missing dates and want to reindex it to include all dates, filling missing values with `0`:

```python
# Create a time series with missing dates
s = pd.Series([10, 20, 30], index=pd.to_datetime(['2023-01-01', '2023-01-03', '2023-01-05']))

# Reindex to include all dates
new_index = pd.date_range('2023-01-01', '2023-01-05')
result = s.reindex(new_index, fill_value=0)
print(result)
```

**Output:**

```
2023-01-01    10
2023-01-02     0
2023-01-03    20
2023-01-04     0
2023-01-05    30
Freq: D, dtype: int64
```

---

### **Notes**

- The `reindex` method is useful for aligning data to a new index, especially when working with time series or hierarchical data.
- Use `fill_value` or `method` to handle missing values during reindexing.
- For advanced use cases, consider using `tolerance` to control the filling logic.

---

By using `reindex`, you can easily align your data to a new index while handling missing values in a flexible way.


In [None]:
""" pandas.Series.reindex_like
Series.reindex_like(other, method=None, copy=None, limit=None, tolerance=None)[source]
Return an object with matching indices as other object.

Conform the object to the same index on all axes. Optional filling logic, placing NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False.

Parameters:
otherObject of the same data type
Its row and column indices are used to define the new indices of this object.

method{None, ‘backfill’/’bfill’, ‘pad’/’ffill’, ‘nearest’}
Method to use for filling holes in reindexed DataFrame. Please note: this is only applicable to DataFrames/Series with a monotonically increasing/decreasing index.

None (default): don’t fill gaps

pad / ffill: propagate last valid observation forward to next valid

backfill / bfill: use next valid observation to fill gap

nearest: use nearest valid observations to fill gap.

copybool, default True
Return a new object, even if the passed indexes are the same.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

limitint, default None
Maximum number of consecutive labels to fill for inexact matches.

toleranceoptional
Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

Tolerance may be a scalar value, which applies the same tolerance to all values, or list-like, which applies variable tolerance per element. List-like includes list, tuple, array, Series, and must be the same size as the index and its dtype must exactly match the index’s type.

Returns:
Series or DataFrame
Same type as caller, but with changed indices on each axis.


DataFrame.set_index
Set row labels.

DataFrame.reset_index
Remove row labels or move them to new columns.

DataFrame.reindex
Change to new indices or expand indices.

Notes

Same as calling .reindex(index=other.index, columns=other.columns,...). """
df1 = pd.DataFrame([[24.3, 75.7, 'high'],
                    [31, 87.8, 'high'],
                    [22, 71.6, 'medium'],
                    [35, 95, 'medium']],
                   columns=['temp_celsius', 'temp_fahrenheit',
                            'windspeed'],
                   index=pd.date_range(start='2014-02-12',
                                       end='2014-02-15', freq='D'))
df1
df2 = pd.DataFrame([[28, 'low'],
                    [30, 'low'],
                    [35.1, 'medium']],
                   columns=['temp_celsius', 'windspeed'],
                   index=pd.DatetimeIndex(['2014-02-12', '2014-02-13',
                                           '2014-02-15']))
df2
df2.reindex_like(df1)

The `pandas.Series.reindex_like` method is used to conform a Series or DataFrame to the **same index** as another object. It is a convenient way to align the indices of two objects, optionally filling in missing values using various methods. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.reindex_like(other, method=None, copy=None, limit=None, tolerance=None)
```

---

### **Parameters**

1. **`other`** : `Series` or `DataFrame`

   - The object whose index is used to define the new index of the current object.

2. **`method`** : `{None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}`, optional

   - Method to use for filling holes in the reindexed object:
     - `None` (default): Do not fill gaps.
     - `'pad'` or `'ffill'`: Propagate the last valid observation forward.
     - `'backfill'` or `'bfill'`: Use the next valid observation to fill the gap.
     - `'nearest'`: Use the nearest valid observation to fill the gap.

3. **`copy`** : `bool`, default `True`

   - If `True`, a new object is returned even if the new index is the same as the current one.
   - **Note**: The behavior of `copy` will change in pandas 3.0 with the introduction of Copy-on-Write.

4. **`limit`** : `int`, default `None`

   - Maximum number of consecutive labels to fill for inexact matches.

5. **`tolerance`** : optional
   - Maximum distance between original and new labels for inexact matches.
   - Can be a scalar value or list-like (e.g., list, tuple, array, Series).

---

### **Returns**

- **Series or DataFrame**:
  - A new object with the same index as `other`. Missing values are filled according to the specified method.

---

### **Examples**

#### Example 1: Reindexing a Series to Match Another Series

```python
import pandas as pd

# Create two Series with different indices
s1 = pd.Series([10, 20, 30], index=['A', 'B', 'C'])
s2 = pd.Series([40, 50], index=['A', 'D'])

# Reindex s1 to match the index of s2
result = s1.reindex_like(s2)
print(result)
```

**Output:**

```
A    10.0
D     NaN
dtype: float64
```

- The index of `s1` is aligned to match the index of `s2`. Missing values are filled with `NaN`.

---

#### Example 2: Reindexing a DataFrame to Match Another DataFrame

```python
# Create two DataFrames with different indices
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['X', 'Y', 'Z'])
df2 = pd.DataFrame({'A': [7, 8], 'B': [9, 10]}, index=['X', 'W'])

# Reindex df1 to match the index of df2
result = df1.reindex_like(df2)
print(result)
```

**Output:**

```
     A    B
X  1.0  4.0
W  NaN  NaN
```

- The index of `df1` is aligned to match the index of `df2`. Missing values are filled with `NaN`.

---

#### Example 3: Using `method` to Fill Missing Values

```python
# Create a Series with a monotonic index
s1 = pd.Series([1, 2, 3], index=pd.date_range('2023-01-01', periods=3))
s2 = pd.Series([], index=pd.date_range('2023-01-01', periods=5))

# Reindex s1 to match the index of s2, using forward fill
result = s1.reindex_like(s2, method='ffill')
print(result)
```

**Output:**

```
2023-01-01    1.0
2023-01-02    2.0
2023-01-03    3.0
2023-01-04    3.0
2023-01-05    3.0
Freq: D, dtype: float64
```

- The last valid value (`3`) is propagated forward to fill the gaps.

---

#### Example 4: Using `tolerance` for Inexact Matches

```python
# Reindex with tolerance
result = s1.reindex_like(s2, method='ffill', tolerance=pd.Timedelta('1D'))
print(result)
```

**Output:**

```
2023-01-01    1.0
2023-01-02    2.0
2023-01-03    3.0
2023-01-04    NaN
2023-01-05    NaN
Freq: D, dtype: float64
```

- The `tolerance` parameter limits the distance between the original and new labels for filling.

---

### **Related Methods**

1. **`DataFrame.reindex`**:

   - Reindexes both rows and columns of a DataFrame.
   - Example:
     ```python
     df.reindex(index=new_index, columns=new_columns)
     ```

2. **`DataFrame.set_index`**:

   - Sets the DataFrame index using existing columns.
   - Example:
     ```python
     df.set_index('column_name')
     ```

3. **`DataFrame.reset_index`**:
   - Resets the index, moving it into a column.
   - Example:
     ```python
     df.reset_index()
     ```

---

### **Practical Use Case**

Suppose you have two datasets with different indices and want to align them for comparison or merging:

```python
# Create two DataFrames with different indices
df1 = pd.DataFrame({'A': [1, 2, 3]}, index=['X', 'Y', 'Z'])
df2 = pd.DataFrame({'A': [4, 5]}, index=['X', 'W'])

# Align df1 to match the index of df2
aligned_df1 = df1.reindex_like(df2)
print(aligned_df1)
```

**Output:**

```
     A
X  1.0
W  NaN
```

---

### **Notes**

- The `reindex_like` method is useful for aligning data to the same index as another object.
- Use `method` or `fill_value` to handle missing values during reindexing.
- For advanced use cases, consider using `tolerance` to control the filling logic.

---

By using `reindex_like`, you can easily align your data to match the index of another object, making it a powerful tool for data alignment and comparison.


In [None]:
""" pandas.Series.rename
Series.rename(index=None, *, axis=None, copy=None, inplace=False, level=None, errors='ignore')[source]
Alter Series index labels or name.

Function / dict values must be unique (1-to-1). Labels not contained in a dict / Series will be left as-is. Extra labels listed don’t throw an error.

Alternatively, change Series.name with a scalar value.

See the user guide for more.

Parameters:
indexscalar, hashable sequence, dict-like or function optional
Functions or dict-like are transformations to apply to the index. Scalar or hashable sequence-like will alter the Series.name attribute.

axis{0 or ‘index’}
Unused. Parameter needed for compatibility with DataFrame.

copybool, default True
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

inplacebool, default False
Whether to return a new Series. If True the value of copy is ignored.

levelint or level name, default None
In case of MultiIndex, only rename labels in the specified level.

errors{‘ignore’, ‘raise’}, default ‘ignore’
If ‘raise’, raise KeyError when a dict-like mapper or index contains labels that are not present in the index being transformed. If ‘ignore’, existing keys will be renamed and extra keys will be ignored.

Returns:
Series or None
Series with index labels or name altered or None if inplace=True.

See also

DataFrame.rename
Corresponding DataFrame method.

Series.rename_axis
Set the name of the axis. """
s = pd.Series([1, 2, 3])
s
s.rename("my_name")  # scalar, changes Series.name
s.rename(lambda x: x ** 2)  # function, changes labels
s.rename({1: 3, 2: 5})  # mapping, changes labels

The `pandas.Series.rename` method is used to alter the **index labels** or the **name** of a Series. It provides flexibility in renaming by allowing you to use a scalar, function, or dictionary-like object to transform the index labels or the Series name. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.rename(index=None, *, axis=None, copy=None, inplace=False, level=None, errors='ignore')
```

---

### **Parameters**

1. **`index`** : `scalar`, `hashable sequence`, `dict-like`, or `function`, optional

   - Specifies how to rename the index labels or the Series name:
     - **Scalar or hashable sequence**: Changes the `Series.name` attribute.
     - **Function**: Applies a transformation to the index labels.
     - **Dict-like**: Maps old labels to new labels.

2. **`axis`** : `{0 or 'index'}`

   - Unused for Series. Included for compatibility with DataFrame.

3. **`copy`** : `bool`, default `True`

   - If `True`, a copy of the underlying data is made.
   - **Note**: The behavior of `copy` will change in pandas 3.0 with the introduction of Copy-on-Write.

4. **`inplace`** : `bool`, default `False`

   - If `True`, the operation is performed in place (i.e., the original Series is modified), and the method returns `None`.

5. **`level`** : `int` or `level name`, default `None`

   - For MultiIndex, specifies the level to rename.

6. **`errors`** : `{'ignore', 'raise'}`, default `'ignore'`
   - If `'raise'`, a `KeyError` is raised when a dict-like mapper or index contains labels not present in the Series.
   - If `'ignore'`, extra keys are ignored.

---

### **Returns**

- **Series or None**:
  - A new Series with the renamed index labels or name, or `None` if `inplace=True`.

---

### **Examples**

#### Example 1: Renaming the Series Name

```python
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3])

# Rename the Series
renamed_s = s.rename("my_series")
print(renamed_s)
```

**Output:**

```
0    1
1    2
2    3
Name: my_series, dtype: int64
```

- The `Series.name` attribute is changed to `"my_series"`.

---

#### Example 2: Renaming Index Labels Using a Function

```python
# Rename index labels using a function
renamed_s = s.rename(lambda x: x ** 2)
print(renamed_s)
```

**Output:**

```
0    1
1    2
4    3
dtype: int64
```

- The index labels are transformed using the function `lambda x: x ** 2`.

---

#### Example 3: Renaming Index Labels Using a Dictionary

```python
# Rename index labels using a dictionary
renamed_s = s.rename({0: 'a', 1: 'b', 2: 'c'})
print(renamed_s)
```

**Output:**

```
a    1
b    2
c    3
dtype: int64
```

- The index labels are mapped using the dictionary `{0: 'a', 1: 'b', 2: 'c'}`.

---

#### Example 4: Renaming with `errors='raise'`

```python
# Attempt to rename with a dictionary containing invalid keys
try:
    renamed_s = s.rename({0: 'a', 1: 'b', 3: 'c'}, errors='raise')
except KeyError as e:
    print(e)
```

**Output:**

```
"[3] not found in axis"
```

- A `KeyError` is raised because the key `3` is not present in the Series.

---

#### Example 5: Renaming in Place

```python
# Rename the Series in place
s.rename("new_name", inplace=True)
print(s)
```

**Output:**

```
0    1
1    2
2    3
Name: new_name, dtype: int64
```

- The `Series.name` is updated in place.

---

### **Related Methods**

1. **`DataFrame.rename`**:

   - Renames the index, columns, or both of a DataFrame.
   - Example:
     ```python
     df.rename(columns={'old_name': 'new_name'})
     ```

2. **`Series.rename_axis`**:
   - Sets the name of the index or columns axis.
   - Example:
     ```python
     s.rename_axis("index_name")
     ```

---

### **Practical Use Case**

Suppose you have a Series representing sales data and want to rename the index labels for better readability:

```python
# Create a Series with default integer index
sales = pd.Series([100, 200, 300])

# Rename the index labels
sales = sales.rename({0: 'Q1', 1: 'Q2', 2: 'Q3'})
print(sales)
```

**Output:**

```
Q1    100
Q2    200
Q3    300
dtype: int64
```

---

### **Notes**

- The `rename` method is flexible and supports renaming using scalars, functions, or dictionary-like objects.
- Use `inplace=True` to modify the Series in place.
- For MultiIndex Series, use the `level` parameter to specify which level to rename.

---

By using `rename`, you can easily modify the index labels or the name of a Series to make your data more readable and meaningful.


In [None]:
""" pandas.Series.rename_axis
Series.rename_axis(mapper=<no_default>, *, index=<no_default>, axis=0, copy=True, inplace=False)[source]
Set the name of the axis for the index or columns.

Parameters:
mapperscalar, list-like, optional
Value to set the axis name attribute.

index, columnsscalar, list-like, dict-like or function, optional
A scalar, list-like, dict-like or functions transformations to apply to that axis’ values. Note that the columns parameter is not allowed if the object is a Series. This parameter only apply for DataFrame type objects.

Use either mapper and axis to specify the axis to target with mapper, or index and/or columns.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to rename. For Series this parameter is unused and defaults to 0.

copybool, default None
Also copy underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

inplacebool, default False
Modifies the object directly, instead of creating a new Series or DataFrame.

Returns:
Series, DataFrame, or None
The same type as the caller or None if inplace=True.


Series.rename
Alter Series index labels or name.

DataFrame.rename
Alter DataFrame index labels or name.

Index.rename
Set new names on index.

Notes

DataFrame.rename_axis supports two calling conventions

(index=index_mapper, columns=columns_mapper, ...)

(mapper, axis={'index', 'columns'}, ...)

The first calling convention will only modify the names of the index and/or the names of the Index object that is the columns. In this case, the parameter copy is ignored.

The second calling convention will modify the names of the corresponding index if mapper is a list or a scalar. However, if mapper is dict-like or a function, it will use the deprecated behavior of modifying the axis labels.

We highly recommend using keyword arguments to clarify your intent.


"""
# Series

import pandas as pd
s = pd.Series(["dog", "cat", "monkey"])
s
s.rename_axis("animal")


In [None]:
# Data Frame
df = pd.DataFrame({"num_legs": [4, 4, 2],
                   "num_arms": [0, 0, 2]},
                  ["dog", "cat", "monkey"])
df
df = df.rename_axis("animal")
df
df = df.rename_axis("limbs", axis="columns")
df

In [None]:
# MultiIndex

df.index = pd.MultiIndex.from_product([['mammal'],
                                       ['dog', 'cat', 'monkey']],
                                      names=['type', 'name'])
df
df.rename_axis(index={'type': 'class'})
df.rename_axis(columns=str.upper)

The `pandas.Series.rename_axis` method is used to set or modify the **name of the axis** (index or columns) for a Series or DataFrame. It allows you to assign a name to the index or columns, making the data more readable and meaningful. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.rename_axis(mapper=<no_default>, *, index=<no_default>, axis=0, copy=True, inplace=False)
```

---

### **Parameters**

1. **`mapper`** : `scalar`, `list-like`, optional

   - Value to set as the name of the axis. This can be a string (for a single name) or a list-like object (for MultiIndex).

2. **`index`** : `scalar`, `list-like`, `dict-like`, or `function`, optional

   - Specifies the name or transformation for the index axis. Use this parameter to rename the index explicitly.

3. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `0`

   - The axis to rename. For Series, this parameter is unused and defaults to `0` (index).

4. **`copy`** : `bool`, default `True`

   - If `True`, a copy of the underlying data is made.
   - **Note**: The behavior of `copy` will change in pandas 3.0 with the introduction of Copy-on-Write.

5. **`inplace`** : `bool`, default `False`
   - If `True`, the operation is performed in place (i.e., the original object is modified), and the method returns `None`.

---

### **Returns**

- **Series, DataFrame, or None**:
  - The same type as the caller (Series or DataFrame) with the updated axis name, or `None` if `inplace=True`.

---

### **Examples**

#### Example 1: Renaming the Index Axis of a Series

```python
import pandas as pd

# Create a Series
s = pd.Series(["dog", "cat", "monkey"])

# Rename the index axis
renamed_s = s.rename_axis("animal")
print(renamed_s)
```

**Output:**

```
animal
0       dog
1       cat
2    monkey
dtype: object
```

- The index axis is renamed to `"animal"`.

---

#### Example 2: Renaming the Index Axis in Place

```python
# Rename the index axis in place
s.rename_axis("animal", inplace=True)
print(s)
```

**Output:**

```
animal
0       dog
1       cat
2    monkey
dtype: object
```

- The index axis is renamed in place.

---

#### Example 3: Renaming the Columns Axis of a DataFrame

```python
# Create a DataFrame
df = pd.DataFrame({"num_legs": [4, 4, 2], "num_arms": [0, 0, 2]}, index=["dog", "cat", "monkey"])

# Rename the columns axis
renamed_df = df.rename_axis("limbs", axis="columns")
print(renamed_df)
```

**Output:**

```
limbs   num_legs  num_arms
dog            4         0
cat            4         0
monkey         2         2
```

- The columns axis is renamed to `"limbs"`.

---

#### Example 4: Renaming MultiIndex Levels

```python
# Create a DataFrame with a MultiIndex
df.index = pd.MultiIndex.from_product([['mammal'], ['dog', 'cat', 'monkey']], names=['type', 'name'])

# Rename a specific level of the MultiIndex
renamed_df = df.rename_axis(index={'type': 'class'})
print(renamed_df)
```

**Output:**

```
limbs          num_legs  num_arms
class  name
mammal dog            4         0
       cat            4         0
       monkey         2         2
```

- The `'type'` level of the MultiIndex is renamed to `'class'`.

---

#### Example 5: Renaming Columns with a Function

```python
# Rename columns using a function
renamed_df = df.rename_axis(columns=str.upper)
print(renamed_df)
```

**Output:**

```
LIMBS          num_legs  num_arms
type   name
mammal dog            4         0
       cat            4         0
       monkey         2         2
```

- The columns axis is renamed using the `str.upper` function.

---

### **Related Methods**

1. **`Series.rename`**:

   - Renames the index labels or the Series name.
   - Example:
     ```python
     s.rename("new_name")
     ```

2. **`DataFrame.rename`**:

   - Renames the index, columns, or both of a DataFrame.
   - Example:
     ```python
     df.rename(columns={'old_name': 'new_name'})
     ```

3. **`Index.rename`**:
   - Sets new names on an Index object.
   - Example:
     ```python
     df.index.rename("new_index_name")
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing sales data and want to rename the index and columns for better readability:

```python
# Create a DataFrame
sales = pd.DataFrame({"Q1": [100, 200], "Q2": [150, 250]}, index=["Region A", "Region B"])

# Rename the index and columns
sales = sales.rename_axis(index="Region", columns="Quarter")
print(sales)
```

**Output:**

```
Quarter     Q1   Q2
Region
Region A   100  150
Region B   200  250
```

---

### **Notes**

- The `rename_axis` method is useful for setting or modifying the name of the index or columns axis.
- For MultiIndex objects, you can rename specific levels using a dictionary.
- Use `inplace=True` to modify the object directly.

---

By using `rename_axis`, you can make your data more readable and organized by assigning meaningful names to the index and columns.


In [None]:
""" pandas.Series.reset_index
Series.reset_index(level=None, *, drop=False, name=<no_default>, inplace=False, allow_duplicates=False)[source]
Generate a new DataFrame or Series with the index reset.

This is useful when the index needs to be treated as a column, or when the index is meaningless and needs to be reset to the default before another operation.

Parameters:
levelint, str, tuple, or list, default optional
For a Series with a MultiIndex, only remove the specified levels from the index. Removes all levels by default.

dropbool, default False
Just reset the index, without inserting it as a column in the new DataFrame.

nameobject, optional
The name to use for the column containing the original Series values. Uses self.name by default. This argument is ignored when drop is True.

inplacebool, default False
Modify the Series in place (do not create a new object).

allow_duplicatesbool, default False
Allow duplicate column labels to be created.

Added in version 1.5.0.

Returns:
Series or DataFrame or None
When drop is False (the default), a DataFrame is returned. The newly created columns will come first in the DataFrame, followed by the original Series values. When drop is True, a Series is returned. In either case, if inplace=True, no value is returned.

See also

DataFrame.reset_index
Analogous function for DataFrame. """

s = pd.Series([1, 2, 3, 4], name='foo',
              index=pd.Index(['a', 'b', 'c', 'd'], name='idx'))
# Generate a DataFrame with default index.

s.reset_index()
# To specify the name of the new column use name.

s.reset_index(name='values')
# To generate a new Series with the default set drop to True.

s.reset_index(drop=True)
# The level parameter is interesting for Series with a multi-level index.

arrays = [np.array(['bar', 'bar', 'baz', 'baz']),
          np.array(['one', 'two', 'one', 'two'])]
s2 = pd.Series(
    range(4), name='foo',
    index=pd.MultiIndex.from_arrays(arrays,
                                    names=['a', 'b']) )
# To remove a specific level from the Index, use level. 

s2.reset_index(level='a')

# If level is not set, all levels are removed from the Index.

s2.reset_index()

The `pandas.Series.reset_index` method is used to reset the index of a Series, optionally turning the index into a column. This is particularly useful when the index needs to be treated as a column or when the index is meaningless and needs to be reset to the default integer index. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.reset_index(level=None, *, drop=False, name=<no_default>, inplace=False, allow_duplicates=False)
```

---

### **Parameters**

1. **`level`** : `int`, `str`, `tuple`, or `list`, optional

   - For a Series with a MultiIndex, specifies which levels of the index to reset. By default, all levels are reset.

2. **`drop`** : `bool`, default `False`

   - If `True`, the index is reset without being added as a column in the resulting DataFrame.
   - If `False`, the index is added as a column in the resulting DataFrame.

3. **`name`** : `object`, optional

   - The name to use for the column containing the original Series values. If not provided, the Series name (`self.name`) is used.
   - This parameter is ignored if `drop=True`.

4. **`inplace`** : `bool`, default `False`

   - If `True`, the operation is performed in place (i.e., the original Series is modified), and the method returns `None`.

5. **`allow_duplicates`** : `bool`, default `False`
   - If `True`, allows duplicate column labels to be created in the resulting DataFrame.
   - Added in pandas version 1.5.0.

---

### **Returns**

- **Series, DataFrame, or None**:
  - If `drop=False` (default), a **DataFrame** is returned with the index reset and the original index added as a column.
  - If `drop=True`, a **Series** is returned with the index reset to the default integer index.
  - If `inplace=True`, the method returns `None`.

---

### **Examples**

#### Example 1: Resetting the Index (Default Behavior)

```python
import pandas as pd

# Create a Series with a custom index
s = pd.Series([1, 2, 3, 4], name='foo', index=pd.Index(['a', 'b', 'c', 'd'], name='idx'))

# Reset the index
result = s.reset_index()
print(result)
```

**Output:**

```
  idx  foo
0   a    1
1   b    2
2   c    3
3   d    4
```

- The index is reset, and the original index is added as a column named `'idx'`.

---

#### Example 2: Resetting the Index with `drop=True`

```python
# Reset the index without adding it as a column
result = s.reset_index(drop=True)
print(result)
```

**Output:**

```
0    1
1    2
2    3
3    4
Name: foo, dtype: int64
```

- The index is reset to the default integer index, and the original index is not added as a column.

---

#### Example 3: Specifying a Name for the Value Column

```python
# Reset the index and specify a name for the value column
result = s.reset_index(name='values')
print(result)
```

**Output:**

```
  idx  values
0   a       1
1   b       2
2   c       3
3   d       4
```

- The value column is renamed to `'values'`.

---

#### Example 4: Resetting a MultiIndex

```python
# Create a Series with a MultiIndex
arrays = [['bar', 'bar', 'baz', 'baz'], ['one', 'two', 'one', 'two']]
s2 = pd.Series(range(4), name='foo', index=pd.MultiIndex.from_arrays(arrays, names=['a', 'b']))

# Reset a specific level of the MultiIndex
result = s2.reset_index(level='a')
print(result)
```

**Output:**

```
       a  foo
b
one  bar    0
two  bar    1
one  baz    2
two  baz    3
```

- Only the `'a'` level of the MultiIndex is reset and added as a column.

---

#### Example 5: Resetting All Levels of a MultiIndex

```python
# Reset all levels of the MultiIndex
result = s2.reset_index()
print(result)
```

**Output:**

```
     a    b  foo
0  bar  one    0
1  bar  two    1
2  baz  one    2
3  baz  two    3
```

- All levels of the MultiIndex are reset and added as columns.

---

#### Example 6: Inplace Reset

```python
# Reset the index in place
s2.reset_index(inplace=True)
print(s2)
```

**Output:**

```
     a    b  foo
0  bar  one    0
1  bar  two    1
2  baz  one    2
3  baz  two    3
```

- The Series is modified in place, and the index is reset.

---

### **Related Methods**

1. **`DataFrame.reset_index`**:

   - Resets the index of a DataFrame, optionally turning the index into columns.
   - Example:
     ```python
     df.reset_index()
     ```

2. **`Series.set_index`**:

   - Sets the Series index using existing columns.
   - Example:
     ```python
     s.set_index('column_name')
     ```

3. **`Series.rename`**:
   - Renames the index labels or the Series name.
   - Example:
     ```python
     s.rename("new_name")
     ```

---

### **Practical Use Case**

Suppose you have a Series representing sales data with a custom index and want to reset the index for further analysis:

```python
# Create a Series with a custom index
sales = pd.Series([100, 200, 150, 250], index=['Q1', 'Q2', 'Q3', 'Q4'], name='sales')

# Reset the index and add it as a column
sales_df = sales.reset_index()
print(sales_df)
```

**Output:**

```
  index  sales
0    Q1    100
1    Q2    200
2    Q3    150
3    Q4    250
```

---

### **Notes**

- The `reset_index` method is useful for converting the index into a column or resetting it to the default integer index.
- Use `drop=True` if you want to discard the original index.
- For MultiIndex Series, use the `level` parameter to control which levels are reset.

---

By using `reset_index`, you can easily manipulate the index of a Series to suit your data analysis needs.


In [None]:
""" pandas.Series.sample
Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)
Return a random sample of items from an axis of object.

You can use random_state for reproducibility.

Parameters:
nint, optional
Number of items from axis to return. Cannot be used with frac. Default = 1 if frac = None.

fracfloat, optional
Fraction of axis items to return. Cannot be used with n.

replacebool, default False
Allow or disallow sampling of the same row more than once.

weightsstr or ndarray-like, optional
Default ‘None’ results in equal probability weighting. If passed a Series, will align with target object on index. Index values in weights not found in sampled object will be ignored and index values in sampled object not in weights will be assigned weights of zero. If called on a DataFrame, will accept the name of a column when axis = 0. Unless weights are a Series, weights must be same length as axis being sampled. If weights do not sum to 1, they will be normalized to sum to 1. Missing values in the weights column will be treated as zero. Infinite values not allowed.

random_stateint, array-like, BitGenerator, np.random.RandomState, np.random.Generator, optional
If int, array-like, or BitGenerator, seed for random number generator. If np.random.RandomState or np.random.Generator, use as given.

Changed in version 1.4.0: np.random.Generator objects now accepted

axis{0 or ‘index’, 1 or ‘columns’, None}, default None
Axis to sample. Accepts axis number or name. Default is stat axis for given data type. For Series this parameter is unused and defaults to None.

ignore_indexbool, default False
If True, the resulting index will be labeled 0, 1, …, n - 1.

Added in version 1.3.0.

Returns:
Series or DataFrame
A new object of same type as caller containing n items randomly sampled from the caller object.

See also

DataFrameGroupBy.sample
Generates random samples from each group of a DataFrame object.

SeriesGroupBy.sample
Generates random samples from each group of a Series object.

numpy.random.choice
Generates a random sample from a given 1-D numpy array.

Notes

If frac > 1, replacement should be set to True. """
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
                   'num_wings': [2, 0, 0, 0],
                   'num_specimen_seen': [10, 2, 1, 8]},
                  index=['falcon', 'dog', 'spider', 'fish'])
df
# Extract 3 random elements from the Series df['num_legs']
df['num_legs'].sample(n=3, random_state=1)
df.sample(frac=0.5, replace=True, random_state=1)
df.sample(frac=2, replace=True, random_state=1)
df.sample(n=2, weights='num_specimen_seen', random_state=1)

The `pandas.Series.sample` method is used to return a random sample of items from a Series. It is useful for tasks like random sampling, bootstrapping, or creating training/test datasets. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None, ignore_index=False)
```

---

### **Parameters**

1. **`n`** : `int`, optional

   - Number of items to return. Cannot be used with `frac`.
   - Default is `1` if `frac` is `None`.

2. **`frac`** : `float`, optional

   - Fraction of items to return. For example, `frac=0.5` returns 50% of the items.
   - Cannot be used with `n`.

3. **`replace`** : `bool`, default `False`

   - If `True`, sampling is done with replacement (i.e., the same item can be sampled more than once).
   - If `False`, sampling is done without replacement.

4. **`weights`** : `str` or `ndarray-like`, optional

   - Weights for each item in the Series. Items with higher weights are more likely to be sampled.
   - If a string is passed, it is interpreted as a column name (for DataFrames).
   - Weights are automatically normalized to sum to 1.

5. **`random_state`** : `int`, `array-like`, `BitGenerator`, `np.random.RandomState`, or `np.random.Generator`, optional

   - Seed for the random number generator to ensure reproducibility.
   - If an integer or array-like is passed, it is used as a seed.
   - If a `RandomState` or `Generator` object is passed, it is used directly.

6. **`axis`** : `{0 or 'index', 1 or 'columns', None}`, default `None`

   - Axis to sample. For Series, this parameter is unused and defaults to `None`.

7. **`ignore_index`** : `bool`, default `False`
   - If `True`, the resulting index is labeled `0, 1, ..., n-1`.
   - Added in pandas version 1.3.0.

---

### **Returns**

- **Series**:
  - A new Series containing the randomly sampled items.

---

### **Examples**

#### Example 1: Basic Random Sampling

```python
import pandas as pd

# Create a Series
s = pd.Series([10, 20, 30, 40, 50])

# Randomly sample 3 items
sampled_s = s.sample(n=3, random_state=1)
print(sampled_s)
```

**Output:**

```
1    20
4    50
0    10
dtype: int64
```

- The method returns 3 randomly sampled items.

---

#### Example 2: Sampling with Replacement

```python
# Randomly sample 5 items with replacement
sampled_s = s.sample(n=5, replace=True, random_state=1)
print(sampled_s)
```

**Output:**

```
1    20
4    50
0    10
3    40
1    20
dtype: int64
```

- The same item (`20`) is sampled more than once because `replace=True`.

---

#### Example 3: Sampling a Fraction of Items

```python
# Randomly sample 50% of the items
sampled_s = s.sample(frac=0.5, random_state=1)
print(sampled_s)
```

**Output:**

```
1    20
4    50
dtype: int64
```

- The method returns 50% of the items (rounded down).

---

#### Example 4: Sampling with Weights

```python
# Randomly sample 2 items with weights
weights = [0.1, 0.2, 0.3, 0.2, 0.2]
sampled_s = s.sample(n=2, weights=weights, random_state=1)
print(sampled_s)
```

**Output:**

```
2    30
4    50
dtype: int64
```

- Items with higher weights are more likely to be sampled.

---

#### Example 5: Sampling with `ignore_index`

```python
# Randomly sample 3 items and reset the index
sampled_s = s.sample(n=3, random_state=1, ignore_index=True)
print(sampled_s)
```

**Output:**

```
0    20
1    50
2    10
dtype: int64
```

- The index is reset to `0, 1, 2`.

---

### **Related Methods**

1. **`DataFrame.sample`**:

   - Randomly samples rows or columns from a DataFrame.
   - Example:
     ```python
     df.sample(n=5)
     ```

2. **`DataFrameGroupBy.sample`**:

   - Randomly samples from each group of a DataFrame.
   - Example:
     ```python
     df.groupby('column').sample(n=2)
     ```

3. **`SeriesGroupBy.sample`**:

   - Randomly samples from each group of a Series.
   - Example:
     ```python
     s.groupby(level=0).sample(n=2)
     ```

4. **`numpy.random.choice`**:
   - Randomly samples from a


In [None]:
""" pandas.Series.set_axis
Series.set_axis(labels, *, axis=0, copy=None)[source]
Assign desired index to given axis.

Indexes for row labels can be changed by assigning a list-like or Index.

Parameters:
labelslist-like, Index
The values for the new index.

axis{0 or ‘index’}, default 0
The axis to update. The value 0 identifies the rows. For Series this parameter is unused and defaults to 0.

copybool, default True
Whether to make a copy of the underlying data.

Note

The copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.

You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = True

Returns:
Series
An object of type Series.


Series.rename_axis
Alter the name of the index. """

s = pd.Series([1, 2, 3])
s
s.set_axis(['a', 'b', 'c'], axis=0)

The `pandas.Series.set_axis` method is used to assign a new index to a Series. It allows you to replace the existing index with a new set of labels. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.set_axis(labels, *, axis=0, copy=None)
```

---

### **Parameters**

1. **`labels`** : `list-like` or `Index`

   - The new index labels to assign to the Series.

2. **`axis`** : `{0 or 'index'}`, default `0`

   - The axis to update. For Series, this parameter is unused and defaults to `0`.

3. **`copy`** : `bool`, default `True`
   - If `True`, a copy of the underlying data is made.
   - **Note**: The behavior of `copy` will change in pandas 3.0 with the introduction of Copy-on-Write.

---

### **Returns**

- **Series**:
  - A new Series with the updated index.

---

### **Examples**

#### Example 1: Basic Usage

```python
import pandas as pd

# Create a Series
s = pd.Series([1, 2, 3])

# Set a new index
new_s = s.set_axis(['a', 'b', 'c'])
print(new_s)
```

**Output:**

```
a    1
b    2
c    3
dtype: int64
```

- The index is updated to `['a', 'b', 'c']`.

---

#### Example 2: Inplace Modification

```python
# Set a new index in place
s.set_axis(['a', 'b', 'c'], inplace=True)
print(s)
```

**Output:**

```
a    1
b    2
c    3
dtype: int64
```

- The index is updated in place.

---

#### Example 3: Using `copy=False`

```python
# Set a new index without copying the data
new_s = s.set_axis(['x', 'y', 'z'], copy=False)
print(new_s)
```

**Output:**

```
x    1
y    2
z    3
dtype: int64
```

- The index is updated, and the data is not copied.

---

### **Related Methods**

1. **`Series.rename_axis`**:

   - Sets the name of the index or columns axis.
   - Example:
     ```python
     s.rename_axis("index_name")
     ```

2. **`Series.rename`**:

   - Renames the index labels or the Series name.
   - Example:
     ```python
     s.rename({0: 'a', 1: 'b', 2: 'c'})
     ```

3. **`Series.reset_index`**:
   - Resets the index, optionally turning it into a column.
   - Example:
     ```python
     s.reset_index()
     ```

---

### **Practical Use Case**

Suppose you have a Series with a default integer index and want to assign meaningful labels:

```python
# Create a Series
sales = pd.Series([100, 200, 150, 250], index=[0, 1, 2, 3])

# Set a new index with meaningful labels
sales = sales.set_axis(['Q1', 'Q2', 'Q3', 'Q4'])
print(sales)
```

**Output:**

```
Q1    100
Q2    200
Q3    150
Q4    250
dtype: int64
```

---

### **Notes**

- The `set_axis` method is useful for replacing the index of a Series with a new set of labels.
- Use `inplace=True` to modify the Series in place.
- For MultiIndex Series, you can use `set_axis` to update specific levels of the index.

---

By using `set_axis`, you can easily update the index of a Series to make your data more meaningful and organized.


In [None]:
""" pandas.Series.take
Series.take(indices, axis=0, **kwargs)[source]
Return the elements in the given positional indices along an axis.

This means that we are not indexing according to actual values in the index attribute of the object. We are indexing according to the actual position of the element in the object.

Parameters
:
indices
array-like
An array of ints indicating which positions to take.

axis
{0 or ‘index’, 1 or ‘columns’, None}, default 0
The axis on which to select elements. 0 means that we are selecting rows, 1 means that we are selecting columns. For Series this parameter is unused and defaults to 0.

**kwargs
For compatibility with numpy.take(). Has no effect on the output.

Returns
:
same type as caller
An array-like containing the elements taken from the object.



DataFrame.loc
Select a subset of a DataFrame by labels.

DataFrame.iloc
Select a subset of a DataFrame by positions.

numpy.take
Take elements from an array along an axis. """
df = pd.DataFrame([('falcon', 'bird', 389.0),
                   ('parrot', 'bird', 24.0),
                   ('lion', 'mammal', 80.5),
                   ('monkey', 'mammal', np.nan)],
                  columns=['name', 'class', 'max_speed'],
                  index=[0, 2, 3, 1])
df
df.take([0, 3])
df.take([1, 2], axis=1)
df.take([-1, -2])

The `pandas.Series.take` method is used to return elements from a Series based on their **positional indices** (not the actual index labels). This is useful when you want to select elements by their position in the Series, regardless of their index values. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
Series.take(indices, axis=0, **kwargs)
```

---

### **Parameters**

1. **`indices`** : `array-like`

   - A list or array of integers indicating the positions of the elements to select.

2. **`axis`** : `{0 or 'index', 1 or 'columns', None}`, default `0`

   - The axis along which to select elements. For Series, this parameter is unused and defaults to `0`.

3. **`**kwargs`\*\*:
   - For compatibility with `numpy.take()`. Has no effect on the output.

---

### **Returns**

- **Series**:
  - A new Series containing the elements at the specified positions.

---

### **Examples**

#### Example 1: Basic Usage

```python
import pandas as pd

# Create a Series
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])

# Take elements at positions 0 and 2
result = s.take([0, 2])
print(result)
```

**Output:**

```
a    10
c    30
dtype: int64
```

- The elements at positions `0` and `2` are selected, regardless of their index labels.

---

#### Example 2: Using Negative Indices

```python
# Take elements at positions -1 and -2 (last and second-to-last)
result = s.take([-1, -2])
print(result)
```

**Output:**

```
d    40
c    30
dtype: int64
```

- Negative indices count from the end of the Series.

---

#### Example 3: Using `take` with a DataFrame

```python
# Create a DataFrame
df = pd.DataFrame({
    'name': ['falcon', 'parrot', 'lion', 'monkey'],
    'class': ['bird', 'bird', 'mammal', 'mammal'],
    'max_speed': [389.0, 24.0, 80.5, None]
})

# Take rows at positions 0 and 3
result = df.take([0, 3])
print(result)
```

**Output:**

```
     name   class  max_speed
0  falcon    bird      389.0
3  monkey  mammal        NaN
```

- The rows at positions `0` and `3` are selected.

---

#### Example 4: Taking Columns

```python
# Take columns at positions 1 and 2
result = df.take([1, 2], axis=1)
print(result)
```

**Output:**

```
    class  max_speed
0    bird      389.0
1    bird       24.0
2  mammal       80.5
3  mammal        NaN
```

- The columns at positions `1` and `2` are selected.

---

### **Related Methods**

1. **`Series.iloc`**:

   - Selects elements by their integer position.
   - Example:
     ```python
     s.iloc[[0, 2]]
     ```

2. **`Series.loc`**:

   - Selects elements by their index labels.
   - Example:
     ```python
     s.loc[['a', 'c']]
     ```

3. **`numpy.take`**:
   - Takes elements from an array along an axis.
   - Example:
     ```python
     import numpy as np
     np.take(s.values, [0, 2])
     ```

---

### **Practical Use Case**

Suppose you have a Series and want to select specific elements by their positions:

```python
# Create a Series
data = pd.Series([100, 200, 300, 400, 500], index=['A', 'B', 'C', 'D', 'E'])

# Select elements at positions 1, 3, and 4
selected_data = data.take([1, 3, 4])
print(selected_data)
```

**Output:**

```
B    200
D    400
E    500
dtype: int64
```

---

### **Notes**

- The `take` method is useful for selecting elements by their position, regardless of their index labels.
- Negative indices can be used to count positions from the end of the Series.
- For DataFrames, you can use `axis=1` to select columns by their position.

---

By using `take`, you can easily extract elements from a Series or DataFrame based on their positional indices.


In [None]:
""" pandas.Series.tail
Series.tail(n=5)[source]
Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

For negative values of n, this function returns all rows except the first |n| rows, equivalent to df[|n|:].

If n is larger than the number of rows, this function returns all rows.

Parameters
:
n
int, default 5
Number of rows to select.

Returns
:
type of caller
The last n rows of the caller object.

DataFrame.head
The first n rows of the caller object.


"""
df = pd.DataFrame({'animal': ['alligator', 'bee', 'falcon', 'lion',
                   'monkey', 'parrot', 'shark', 'whale', 'zebra']})
df

# Viewing the last 5 lines

df.tail()

# Viewing the last n lines (three in this case)

df.tail(3)

# For negative values of n

df.tail(-3)

pandas.DataFrame.xs
DataFrame.xs(key, axis=0, level=None, drop_level=True)[source]
Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

Parameters
:
key
label or tuple of label
Label contained in the index, or partially in a MultiIndex.

axis
{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to retrieve cross-section on.

level
object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

drop_level
bool, default True
If False, returns object with same levels as self.

Returns
:
Series or DataFrame
Cross-section from the original Series or DataFrame corresponding to the selected index levels.

See also

DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.

DataFrame.iloc
Purely integer-location based indexing for selection by position.

Notes

xs can not be used to set values.

MultiIndex Slicers is a generic way to get/set values on any level or levels. It is a superset of xs functionality, see MultiIndex Slicers.

Examples

d = {'num_legs': [4, 4, 2, 2],
'num_wings': [0, 0, 2, 2],
'class': ['mammal', 'mammal', 'mammal', 'bird'],
'animal': ['cat', 'dog', 'bat', 'penguin'],
'locomotion': ['walks', 'walks', 'flies', 'walks']}
df = pd.DataFrame(data=d)
df = df.set_index(['class', 'animal', 'locomotion'])
df
num_legs num_wings
class animal locomotion
mammal cat walks 4 0
dog walks 4 0
bat flies 2 2
bird penguin walks 2 2
Get values at specified index

df.xs('mammal')
num_legs num_wings
animal locomotion
cat walks 4 0
dog walks 4 0
bat flies 2 2
Get values at several indexes

df.xs(('mammal', 'dog', 'walks'))
num_legs 4
num_wings 0
Name: (mammal, dog, walks), dtype: int64
Get values at specified index and level

df.xs('cat', level=1)
num_legs num_wings
class locomotion
mammal walks 4 0
Get values at several indexes and levels

df.xs(('bird', 'walks'),
level=[0, 'locomotion'])
num_legs num_wings
animal
penguin 2 2
Get values at specified column and axis

df.xs('num_wings', axis=1)
class animal locomotion
mammal cat walks 0
dog walks 0
bat flies 2
bird penguin walks 2
Name: num_wings, dtype: int64


In [None]:
""" pandas.DataFrame.xs
DataFrame.xs(key, axis=0, level=None, drop_level=True)[source]
Return cross-section from the Series/DataFrame.

This method takes a key argument to select data at a particular level of a MultiIndex.

Parameters
:
key
label or tuple of label
Label contained in the index, or partially in a MultiIndex.

axis
{0 or ‘index’, 1 or ‘columns’}, default 0
Axis to retrieve cross-section on.

level
object, defaults to first n levels (n=1 or len(key))
In case of a key partially contained in a MultiIndex, indicate which levels are used. Levels can be referred by label or position.

drop_level
bool, default True
If False, returns object with same levels as self.

Returns
:
Series or DataFrame
Cross-section from the original Series or DataFrame corresponding to the selected index levels.

See also

DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.

DataFrame.iloc
Purely integer-location based indexing for selection by position.

Notes

xs can not be used to set values.

MultiIndex Slicers is a generic way to get/set values on any level or levels. It is a superset of xs functionality, see MultiIndex Slicers. """

d = {'num_legs': [4, 4, 2, 2],
     'num_wings': [0, 0, 2, 2],
     'class': ['mammal', 'mammal', 'mammal', 'bird'],
     'animal': ['cat', 'dog', 'bat', 'penguin'],
     'locomotion': ['walks', 'walks', 'flies', 'walks']}
df = pd.DataFrame(data=d)
df = df.set_index(['class', 'animal', 'locomotion'])
df
# Get values at specified index

df.xs('mammal')

# Get values at several indexes

df.xs(('mammal', 'dog', 'walks'))
# Get values at specified index and level

df.xs('cat', level=1)
# Get values at several indexes and levels

df.xs(('bird', 'walks'),
      level=[0, 'locomotion'])
# Get values at specified column and axis

df.xs('num_wings', axis=1)

The `pandas.DataFrame.xs` method is used to retrieve a **cross-section** from a DataFrame or Series. It is particularly useful when working with **MultiIndex** objects, as it allows you to select data at a specific level or combination of levels. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.xs(key, axis=0, level=None, drop_level=True)
```

---

### **Parameters**

1. **`key`** : `label` or `tuple of labels`

   - The label(s) to select from the index. For a MultiIndex, you can specify a single label or a tuple of labels.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `0`

   - The axis to retrieve the cross-section from:
     - `0` or `'index'`: Select from the rows (default).
     - `1` or `'columns'`: Select from the columns.

3. **`level`** : `object`, optional

   - For a MultiIndex, specifies the level(s) to use for selection. Levels can be referred to by label or position.
   - If not provided, the first level is used.

4. **`drop_level`** : `bool`, default `True`
   - If `True`, the selected level(s) are dropped from the result.
   - If `False`, the selected level(s) are retained in the result.

---

### **Returns**

- **Series or DataFrame**:
  - A cross-section from the original object corresponding to the selected key and level(s).

---

### **Examples**

#### Example 1: Selecting a Cross-Section from a MultiIndex DataFrame

```python
import pandas as pd

# Create a DataFrame with a MultiIndex
data = {
    'num_legs': [4, 4, 2, 2],
    'num_wings': [0, 0, 2, 2],
    'class': ['mammal', 'mammal', 'mammal', 'bird'],
    'animal': ['cat', 'dog', 'bat', 'penguin'],
    'locomotion': ['walks', 'walks', 'flies', 'walks']
}
df = pd.DataFrame(data)
df = df.set_index(['class', 'animal', 'locomotion'])

print("Original DataFrame:")
print(df)

# Select rows where 'class' is 'mammal'
result = df.xs('mammal')
print("\nCross-section for 'mammal':")
print(result)
```

**Output:**

```
Original DataFrame:
                           num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2

Cross-section for 'mammal':
                   num_legs  num_wings
animal locomotion
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2
```

---

#### Example 2: Selecting a Specific Combination of Levels

```python
# Select rows where 'class' is 'mammal' and 'animal' is 'dog'
result = df.xs(('mammal', 'dog'))
print("\nCross-section for ('mammal', 'dog'):")
print(result)
```

**Output:**

```
Cross-section for ('mammal', 'dog'):
            num_legs  num_wings
locomotion
walks              4          0
```

---

#### Example 3: Selecting a Cross-Section from Columns

```python
# Select the column 'num_wings'
result = df.xs('num_wings', axis=1)
print("\nCross-section for 'num_wings':")
print(result)
```

**Output:**

```
Cross-section for 'num_wings':
class   animal   locomotion
mammal  cat      walks         0
        dog      walks         0
        bat      flies         2
bird    penguin  walks         2
Name: num_wings, dtype: int64
```

---

#### Example 4: Retaining Levels in the Result

```python
# Select rows where 'class' is 'mammal' and retain the 'class' level
result = df.xs('mammal', level='class', drop_level=False)
print("\nCross-section for 'mammal' (retaining levels):")
print(result)
```

**Output:**

```
Cross-section for 'mammal' (retaining levels):
                           num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
```

---

### **Related Methods**

1. **`DataFrame.loc`**:

   - Access a group of rows and columns by label(s) or a boolean array.
   - Example:
     ```python
     df.loc['mammal']
     ```

2. **`DataFrame.iloc`**:

   - Access a group of rows and columns by integer position.
   - Example:
     ```python
     df.iloc[0]
     ```

3. **`MultiIndex Slicers`**:
   - A generic way to get/set values on any level or levels of a MultiIndex.
   - Example:
     ```python
     df.loc[(slice(None), 'dog'), :]
     ```

---

### **Practical Use Case**

Suppose you have a MultiIndex DataFrame and want to extract data for a specific combination of levels:

```python
# Extract data for 'bird' class and 'penguin' animal
result = df.xs(('bird', 'penguin'))
print(result)
```

**Output:**

```
            num_legs  num_wings
locomotion
walks              2          2
```

---

### **Notes**

- The `xs` method is particularly useful for working with MultiIndex DataFrames.
- Use `drop_level=False` to retain the selected levels in the result.
- For more advanced slicing, consider using `DataFrame.loc` or MultiIndex slicers.

---

By using `xs`, you can easily retrieve cross-sections from a DataFrame or Series, making it a powerful tool for working with hierarchical data.


In [None]:
""" pandas.DataFrame.get
DataFrame.get(key, default=None)[source]
Get item from object for given key (ex: DataFrame column).

Returns default value if not found.

Parameters
:
key
object
Returns
:
same type as items contained in object
"""

df = pd.DataFrame(
    [
        [24.3, 75.7, "high"],
        [31, 87.8, "high"],
        [22, 71.6, "medium"],
        [35, 95, "medium"],
    ],
    columns=["temp_celsius", "temp_fahrenheit", "windspeed"],
    index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"),
)
df 

df.get(["temp_celsius", "windspeed"])
ser = df['windspeed']
df.get(["temp_celsius", "temp_kelvin"], default="default_value")
ser.get('2014-02-10', '[unknown]')

The `pandas.DataFrame.get` method is used to retrieve an item (e.g., a column or a value) from a DataFrame or Series using a key. If the key is not found, it returns a default value instead of raising an error. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.get(key, default=None)
```

---

### **Parameters**

1. **`key`** : `object`

   - The key to retrieve from the object. For a DataFrame, this is typically a column name. For a Series, this is typically an index label.

2. **`default`** : `object`, optional
   - The value to return if the key is not found. Default is `None`.

---

### **Returns**

- **Same type as items contained in the object**:
  - If the key is found, the corresponding item (column, value, etc.) is returned.
  - If the key is not found, the `default` value is returned.

---

### **Examples**

#### Example 1: Retrieving a Column from a DataFrame

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'temp_celsius': [24.3, 31.0, 22.0, 35.0],
    'temp_fahrenheit': [75.7, 87.8, 71.6, 95.0],
    'windspeed': ['high', 'high', 'medium', 'medium']
}, index=pd.date_range(start="2014-02-12", end="2014-02-15", freq="D"))

# Retrieve a single column
column = df.get('temp_celsius')
print(column)
```

**Output:**

```
2014-02-12    24.3
2014-02-13    31.0
2014-02-14    22.0
2014-02-15    35.0
Freq: D, Name: temp_celsius, dtype: float64
```

---

#### Example 2: Retrieving Multiple Columns

```python
# Retrieve multiple columns
columns = df.get(['temp_celsius', 'windspeed'])
print(columns)
```

**Output:**

```
            temp_celsius windspeed
2014-02-12          24.3      high
2014-02-13          31.0      high
2014-02-14          22.0    medium
2014-02-15          35.0    medium
```

---

#### Example 3: Retrieving a Value from a Series

```python
# Create a Series
ser = df['windspeed']

# Retrieve a value using an index label
value = ser.get('2014-02-13')
print(value)
```

**Output:**

```
high
```

---

#### Example 4: Using a Default Value

```python
# Retrieve a non-existent column with a default value
result = df.get('temp_kelvin', default="default_value")
print(result)
```

**Output:**

```
default_value
```

---

#### Example 5: Retrieving a Non-Existent Index Label

```python
# Retrieve a non-existent index label with a default value
value = ser.get('2014-02-10', default='[unknown]')
print(value)
```

**Output:**

```
[unknown]
```

---

### **Related Methods**

1. **`DataFrame.loc`**:

   - Access a group of rows and columns by label(s) or a boolean array.
   - Example:
     ```python
     df.loc[:, 'temp_celsius']
     ```

2. **`DataFrame.iloc`**:

   - Access a group of rows and columns by integer position.
   - Example:
     ```python
     df.iloc[:, 0]
     ```

3. **`Series.get`**:
   - Retrieve a value from a Series using a key.
   - Example:
     ```python
     ser.get('2014-02-13')
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame and want to safely retrieve a column without worrying about whether it exists:

```python
# Safely retrieve a column
column = df.get('humidity', default='Column not found')
print(column)
```

**Output:**

```
Column not found
```

---

### **Notes**

- The `get` method is useful for safely retrieving items without raising an error if the key is not found.
- Use the `default` parameter to specify a fallback value when the key is missing.
- For more advanced indexing, consider using `DataFrame.loc` or `DataFrame.iloc`.

---

By using `get`, you can handle missing keys gracefully and avoid errors in your code.


In [None]:
""" pandas.DataFrame.isin
DataFrame.isin(values)[source]
Whether each element in the DataFrame is contained in values.

Parameters
:
values
iterable, Series, DataFrame or dict
The result will only be true at a location if all the labels match. If values is a Series, that’s the index. If values is a dict, the keys must be the column names, which must match. If values is a DataFrame, then both the index and column labels must match.

Returns
:
DataFrame
DataFrame of booleans showing whether each element in the DataFrame is contained in values.

See also

DataFrame.eq
Equality test for DataFrame.

Series.isin
Equivalent method on Series.

Series.str.contains
Test if pattern or regex is contained within a string of a Series or Index. """

df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
                  index=['falcon', 'dog'])
df


# When values is a list check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings)

df.isin([0, 2])
# To check if values is not in the DataFrame, use the ~ operator:

~df.isin([0, 2])

# When values is a dict, we can pass values to check for each column separately:

df.isin({'num_wings': [0, 3]})
# When values is a Series or DataFrame the index and column must match. Note that ‘falcon’ does not match based on the number of legs in other.

other = pd.DataFrame({'num_legs': [8, 3], 'num_wings': [0, 2]},
                     index=['spider', 'falcon'])
df.isin(other)

The `pandas.DataFrame.isin` method is used to check whether each element in a DataFrame is contained in a specified set of values. It returns a DataFrame of boolean values (`True` or `False`) indicating whether each element matches any of the values in the provided iterable, Series, DataFrame, or dictionary. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.isin(values)
```

---

### **Parameters**

1. **`values`** : `iterable`, `Series`, `DataFrame`, or `dict`
   - The set of values to check against.
   - If `values` is a **list**, it checks whether each element in the DataFrame is present in the list.
   - If `values` is a **dict**, the keys must be column names, and the values are lists or sets to check against the corresponding columns.
   - If `values` is a **Series**, the index of the Series must match the DataFrame's columns.
   - If `values` is a **DataFrame**, both the index and column labels must match.

---

### **Returns**

- **DataFrame**:
  - A DataFrame of boolean values (`True` or `False`) indicating whether each element in the original DataFrame is contained in `values`.

---

### **Examples**

#### Example 1: Checking Against a List

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'num_legs': [2, 4],
    'num_wings': [2, 0]
}, index=['falcon', 'dog'])

# Check if elements are in the list [0, 2]
result = df.isin([0, 2])
print(result)
```

**Output:**

```
        num_legs  num_wings
falcon      True       True
dog        False       True
```

- The elements `2` and `0` are checked against each value in the DataFrame.

---

#### Example 2: Inverting the Result

```python
# Invert the result using the ~ operator
result = ~df.isin([0, 2])
print(result)
```

**Output:**

```
        num_legs  num_wings
falcon     False      False
dog         True      False
```

- The `~` operator negates the boolean values.

---

#### Example 3: Checking Against a Dictionary

```python
# Check each column against a dictionary of values
result = df.isin({'num_wings': [0, 3]})
print(result)
```

**Output:**

```
        num_legs  num_wings
falcon     False      False
dog        False       True
```

- Only the `num_wings` column is checked against the list `[0, 3]`.

---

#### Example 4: Checking Against Another DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({
    'num_legs': [8, 3],
    'num_wings': [0, 2]
}, index=['spider', 'falcon'])

# Check if elements match the other DataFrame
result = df.isin(other)
print(result)
```

**Output:**

```
        num_legs  num_wings
falcon     False       True
dog        False      False
```

- Only the `num_wings` value `2` in the `falcon` row matches the other DataFrame.

---

### **Related Methods**

1. **`DataFrame.eq`**:

   - Performs element-wise equality check.
   - Example:
     ```python
     df.eq(2)
     ```

2. **`Series.isin`**:

   - Checks whether elements in a Series are contained in a set of values.
   - Example:
     ```python
     df['num_legs'].isin([2, 4])
     ```

3. **`Series.str.contains`**:
   - Checks whether a pattern or regex is contained within strings in a Series.
   - Example:
     ```python
     df['animal'].str.contains('cat')
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame and want to filter rows where specific columns match certain values:

```python
# Filter rows where 'num_legs' is 2 or 4 and 'num_wings' is 0 or 2
filtered_df = df[df.isin({'num_legs': [2, 4], 'num_wings': [0, 2]}).all(axis=1)]
print(filtered_df)
```

**Output:**

```
        num_legs  num_wings
falcon         2          2
dog            4          0
```

---

### **Notes**

- The `isin` method is useful for filtering or masking DataFrames based on specific values.
- Use the `~` operator to invert the boolean result.
- For more complex filtering, combine `isin` with other methods like `all` or `any`.

---

By using `isin`, you can efficiently check for membership of elements in a DataFrame and perform filtering or conditional operations.


In [None]:
""" pandas.DataFrame.where
DataFrame.where(cond, other=nan, *, inplace=False, axis=None, level=None)[source]
Replace values where the condition is False.

Parameters
:
cond
bool Series/DataFrame, array-like, or callable
Where cond is True, keep the original value. Where False, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

other
scalar, Series/DataFrame, or callable
Entries where cond is False are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).

inplace
bool, default False
Whether to perform the operation in place on the data.

axis
int, default None
Alignment axis if needed. For Series this parameter is unused and defaults to 0.

level
int, default None
Alignment level if needed.

Returns
:
Same type as caller or None if
inplace=True
.
See also

DataFrame.mask()
Return an object of same shape as self.

Notes

The where method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is True the element is used; otherwise the corresponding element from the DataFrame other is used. If the axis of other does not align with axis of cond Series/DataFrame, the misaligned index positions will be filled with False.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the where documentation in indexing.

The dtype of the object takes precedence. The fill value is casted to the object’s dtype, if this can be done losslessly. """
s = pd.Series(range(5))
s.where(s > 0)
s.mask(s > 0)

s = pd.Series(range(5))
t = pd.Series([True, False])
s.where(t, 99)
s.mask(t, 99)
s.where(s > 1, 10)
s.mask(s > 1, 10)
df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
df


The `pandas.DataFrame.where` method is used to replace values in a DataFrame or Series where a specified condition is `False`. It is an application of the **if-then idiom**: for each element, if the condition is `True`, the original value is kept; otherwise, it is replaced with a value from `other`. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.where(cond, other=nan, *, inplace=False, axis=None, level=None)
```

---

### **Parameters**

1. **`cond`** : `bool Series/DataFrame`, `array-like`, or `callable`

   - The condition to check. Where `cond` is `True`, the original value is kept. Where `cond` is `False`, the value is replaced with the corresponding value from `other`.
   - If `cond` is callable, it is computed on the DataFrame/Series and should return a boolean DataFrame/Series or array.

2. **`other`** : `scalar`, `Series/DataFrame`, or `callable`, optional

   - The value to replace elements where `cond` is `False`. If not provided, the default is `NaN`.
   - If `other` is callable, it is computed on the DataFrame/Series and should return a scalar or DataFrame/Series.

3. **`inplace`** : `bool`, default `False`

   - If `True`, the operation is performed in place (i.e., the original DataFrame/Series is modified), and the method returns `None`.

4. **`axis`** : `int`, optional

   - Alignment axis if needed. For Series, this parameter is unused.

5. **`level`** : `int`, optional
   - Alignment level if needed.

---

### **Returns**

- **Same type as caller**:
  - A new DataFrame/Series with values replaced where the condition is `False`.
  - If `inplace=True`, the method returns `None`.

---

### **Examples**

#### Example 1: Basic Usage with a Series

```python
import pandas as pd

# Create a Series
s = pd.Series(range(5))

# Replace values where the condition is False
result = s.where(s > 0)
print(result)
```

**Output:**

```
0    NaN
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64
```

- Values where `s > 0` is `False` (i.e., `0`) are replaced with `NaN`.

---

#### Example 2: Using `other` to Replace Values

```python
# Replace values where the condition is False with 99
result = s.where(s > 0, 99)
print(result)
```

**Output:**

```
0    99
1     1
2     2
3     3
4     4
dtype: int64
```

- Values where `s > 0` is `False` (i.e., `0`) are replaced with `99`.

---

#### Example 3: Using a Callable for `cond`

```python
# Use a callable for the condition
result = s.where(lambda x: x > 1, 10)
print(result)
```

**Output:**

```
0    10
1    10
2     2
3     3
4     4
dtype: int64
```

- Values where `x > 1` is `False` (i.e., `0` and `1`) are replaced with `10`.

---

#### Example 4: Using `where` with a DataFrame

```python
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Replace values where the condition is False
result = df.where(df > 2, -df)
print(result)
```

**Output:**

```
   A  B
0 -1 -4
1 -2 -5
2  3  6
```

- Values where `df > 2` is `False` are replaced with the negative of the original value.

---

#### Example 5: Using `inplace=True`

```python
# Modify the DataFrame in place
df.where(df > 2, -df, inplace=True)
print(df)
```

**Output:**

```
   A  B
0 -1 -4
1 -2 -5
2  3  6
```

- The original DataFrame is modified in place.

---

### **Related Methods**

1. **`DataFrame.mask`**:

   - The inverse of `where`. It replaces values where the condition is `True`.
   - Example:
     ```python
     df.mask(df > 2, -df)
     ```

2. **`numpy.where`**:
   - Similar functionality but with a different syntax.
   - Example:
     ```python
     import numpy as np
     np.where(df > 2, df, -df)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame and want to replace all negative values with `NaN`:

```python
# Create a DataFrame with negative values
df = pd.DataFrame({'A': [1, -2, 3], 'B': [-4, 5, -6]})

# Replace negative values with NaN
result = df.where(df >= 0)
print(result)
```

**Output:**

```
     A    B
0  1.0  NaN
1  NaN  5.0
2  3.0  NaN
```

---

### **Notes**

- The `where` method is useful for conditional replacement of values in a DataFrame or Series.
- Use `inplace=True` to modify the original object directly.
- For more complex conditions, you can use callables for `cond` or `other`.

---

By using `where`, you can efficiently perform conditional replacements in your data, making it a powerful tool for data cleaning and transformation.


In [None]:
""" pandas.DataFrame.mask
DataFrame.mask(cond, other=<no_default>, *, inplace=False, axis=None, level=None)[source]
Replace values where the condition is True.

Parameters
:
cond
bool Series/DataFrame, array-like, or callable
Where cond is False, keep the original value. Where True, replace with corresponding value from other. If cond is callable, it is computed on the Series/DataFrame and should return boolean Series/DataFrame or array. The callable must not change input Series/DataFrame (though pandas doesn’t check it).

other
scalar, Series/DataFrame, or callable
Entries where cond is True are replaced with corresponding value from other. If other is callable, it is computed on the Series/DataFrame and should return scalar or Series/DataFrame. The callable must not change input Series/DataFrame (though pandas doesn’t check it). If not specified, entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes).

inplace
bool, default False
Whether to perform the operation in place on the data.

axis
int, default None
Alignment axis if needed. For Series this parameter is unused and defaults to 0.

level
int, default None
Alignment level if needed.

Returns
:
Same type as caller or None if
inplace=True
.
See also

DataFrame.where()
Return an object of same shape as self.

Notes

The mask method is an application of the if-then idiom. For each element in the calling DataFrame, if cond is False the element is used; otherwise the corresponding element from the DataFrame other is used. If the axis of other does not align with axis of cond Series/DataFrame, the misaligned index positions will be filled with True.

The signature for DataFrame.where() differs from numpy.where(). Roughly df1.where(m, df2) is equivalent to np.where(m, df1, df2).

For further details and examples see the mask documentation in indexing.

The dtype of the object takes precedence. The fill value is casted to the object’s dtype, if this can be done losslessly. """
s = pd.Series(range(5))
s.where(s > 0)
s.mask(s > 0)


s = pd.Series(range(5))
t = pd.Series([True, False])
s.where(t, 99)

s.mask(t, 99)
s.where(s > 1, 10)
s.mask(s > 1, 10)
df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
df
m = df % 3 == 0
df.where(m, -df)
df.where(m, -df) == np.where(m, df, -df)

df.where(m, -df) == df.mask(~m, -df)

The `pandas.DataFrame.mask` method is used to replace values in a DataFrame or Series where a specified condition is `True`. It is the **inverse** of the `where` method: for each element, if the condition is `True`, the value is replaced with a value from `other`; otherwise, the original value is kept. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.mask(cond, other=<no_default>, *, inplace=False, axis=None, level=None)
```

---

### **Parameters**

1. **`cond`** : `bool Series/DataFrame`, `array-like`, or `callable`

   - The condition to check. Where `cond` is `True`, the value is replaced with the corresponding value from `other`. Where `cond` is `False`, the original value is kept.
   - If `cond` is callable, it is computed on the DataFrame/Series and should return a boolean DataFrame/Series or array.

2. **`other`** : `scalar`, `Series/DataFrame`, or `callable`, optional

   - The value to replace elements where `cond` is `True`. If not provided, the default is `NaN`.
   - If `other` is callable, it is computed on the DataFrame/Series and should return a scalar or DataFrame/Series.

3. **`inplace`** : `bool`, default `False`

   - If `True`, the operation is performed in place (i.e., the original DataFrame/Series is modified), and the method returns `None`.

4. **`axis`** : `int`, optional

   - Alignment axis if needed. For Series, this parameter is unused.

5. **`level`** : `int`, optional
   - Alignment level if needed.

---

### **Returns**

- **Same type as caller**:
  - A new DataFrame/Series with values replaced where the condition is `True`.
  - If `inplace=True`, the method returns `None`.

---

### **Examples**

#### Example 1: Basic Usage with a Series

```python
import pandas as pd

# Create a Series
s = pd.Series(range(5))

# Replace values where the condition is True
result = s.mask(s > 0)
print(result)
```

**Output:**

```
0    0.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

- Values where `s > 0` is `True` (i.e., `1`, `2`, `3`, `4`) are replaced with `NaN`.

---

#### Example 2: Using `other` to Replace Values

```python
# Replace values where the condition is True with 99
result = s.mask(s > 0, 99)
print(result)
```

**Output:**

```
0     0
1    99
2    99
3    99
4    99
dtype: int64
```

- Values where `s > 0` is `True` (i.e., `1`, `2`, `3`, `4`) are replaced with `99`.

---

#### Example 3: Using a Callable for `cond`

```python
# Use a callable for the condition
result = s.mask(lambda x: x > 1, 10)
print(result)
```

**Output:**

```
0     0
1     1
2    10
3    10
4    10
dtype: int64
```

- Values where `x > 1` is `True` (i.e., `2`, `3`, `4`) are replaced with `10`.

---

#### Example 4: Using `mask` with a DataFrame

```python
# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Replace values where the condition is True
result = df.mask(df > 2, -df)
print(result)
```

**Output:**

```
   A  B
0  1  4
1  2  5
2 -3 -6
```

- Values where `df > 2` is `True` are replaced with the negative of the original value.

---

#### Example 5: Using `inplace=True`

```python
# Modify the DataFrame in place
df.mask(df > 2, -df, inplace=True)
print(df)
```

**Output:**

```
   A  B
0  1  4
1  2  5
2 -3 -6
```

- The original DataFrame is modified in place.

---

### **Related Methods**

1. **`DataFrame.where`**:

   - The inverse of `mask`. It replaces values where the condition is `False`.
   - Example:
     ```python
     df.where(df > 2, -df)
     ```

2. **`numpy.where`**:
   - Similar functionality but with a different syntax.
   - Example:
     ```python
     import numpy as np
     np.where(df > 2, df, -df)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame and want to replace all positive values with `NaN`:

```python
# Create a DataFrame with positive and negative values
df = pd.DataFrame({'A': [1, -2, 3], 'B': [-4, 5, -6]})

# Replace positive values with NaN
result = df.mask(df > 0)
print(result)
```

**Output:**

```
     A    B
0  NaN -4.0
1 -2.0  NaN
2  NaN -6.0
```

---

### **Notes**

- The `mask` method is useful for conditional replacement of values in a DataFrame or Series.
- Use `inplace=True` to modify the original object directly.
- For more complex conditions, you can use callables for `cond` or `other`.

---

By using `mask`, you can efficiently perform conditional replacements in your data, making it a powerful tool for data cleaning and transformation.


In [None]:
""" pandas.DataFrame.query
DataFrame.query(expr, *, inplace=False, **kwargs)[source]
Query the columns of a DataFrame with a boolean expression.

Parameters:
exprstr
The query string to evaluate.

You can refer to variables in the environment by prefixing them with an ‘@’ character like @a + b.

You can refer to column names that are not valid Python variable names by surrounding them in backticks. Thus, column names containing spaces or punctuations (besides underscores) or starting with digits must be surrounded by backticks. (For example, a column named “Area (cm^2)” would be referenced as `Area (cm^2)`). Column names which are Python keywords (like “list”, “for”, “import”, etc) cannot be used.

For example, if one of your columns is called a a and you want to sum it with b, your query should be `a a` + b.

inplacebool
Whether to modify the DataFrame rather than creating a new one.

**kwargs
See the documentation for eval() for complete details on the keyword arguments accepted by DataFrame.query().

Returns:
DataFrame or None
DataFrame resulting from the provided query expression or None if inplace=True.

See also

eval
Evaluate a string describing operations on DataFrame columns.

DataFrame.eval
Evaluate a string describing operations on DataFrame columns.

Notes

The result of the evaluation of this expression is first passed to DataFrame.loc and if that fails because of a multidimensional key (e.g., a DataFrame) then the result will be passed to DataFrame.__getitem__().

This method uses the top-level eval() function to evaluate the passed query.

The query() method uses a slightly modified Python syntax by default. For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or. This is syntactically valid Python, however the semantics are different.

You can change the semantics of the expression by passing the keyword argument parser='python'. This enforces the same semantics as evaluation in Python space. Likewise, you can pass engine='python' to evaluate an expression using Python itself as a backend. This is not recommended as it is inefficient compared to using numexpr as the engine.

The DataFrame.index and DataFrame.columns attributes of the DataFrame instance are placed in the query namespace by default, which allows you to treat both the index and columns of the frame as a column in the frame. The identifier index is used for the frame index; you can also use the name of the index to identify it in a query. Please note that Python keywords may not be used as identifiers.

For further details and examples see the query documentation in indexing.

Backtick quoted variables

Backtick quoted variables are parsed as literal Python code and are converted internally to a Python valid identifier. This can lead to the following problems.

During parsing a number of disallowed characters inside the backtick quoted string are replaced by strings that are allowed as a Python identifier. These characters include all operators in Python, the space character, the question mark, the exclamation mark, the dollar sign, and the euro sign. For other characters that fall outside the ASCII range (U+0001..U+007F) and those that are not further specified in PEP 3131, the query parser will raise an error. This excludes whitespace different than the space character, but also the hashtag (as it is used for comments) and the backtick itself (backtick can also not be escaped).

In a special case, quotes that make a pair around a backtick can confuse the parser. For example, `it's` > `that's` will raise an error, as it forms a quoted string ('s > `that') with a backtick inside.

See also the Python documentation about lexical analysis (https://docs.python.org/3/reference/lexical_analysis.html) in combination with the source code in pandas.core.computation.parsing. """

df = pd.DataFrame({'A': range(1, 6),
                   'B': range(10, 0, -2),
                   'C C': range(10, 5, -1)})
df
df.query('A > B')
df[df.A > df.B]
# For columns with spaces in their name, you can use backtick quoting.

df.query('B == `C C`')
# The previous expression is equivalent to

df[df.B == df['C C']]

The `pandas.DataFrame.query` method is used to filter rows of a DataFrame using a **boolean expression** written as a string. It provides a concise and readable way to perform row selection based on column values. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.query(expr, *, inplace=False, **kwargs)
```

---

### **Parameters**

1. **`expr`** : `str`

   - The query string to evaluate. This string is a boolean expression that filters rows based on column values.
   - You can refer to:
     - Column names directly (e.g., `A > B`).
     - Variables in the environment by prefixing them with `@` (e.g., `A > @threshold`).
     - Column names with spaces or special characters by enclosing them in backticks (e.g., `` `C C` == 10``).

2. **`inplace`** : `bool`, default `False`

   - If `True`, the operation is performed in place (i.e., the original DataFrame is modified), and the method returns `None`.

3. **`**kwargs`\*\*:
   - Additional keyword arguments passed to `eval()`. These include:
     - `engine`: The engine to use for evaluation (`'numexpr'` by default or `'python'`).
     - `parser`: The parser to use (`'pandas'` by default or `'python'`).

---

### **Returns**

- **DataFrame or None**:
  - A new DataFrame containing the rows that satisfy the query expression.
  - If `inplace=True`, the method returns `None`.

---

### **Examples**

#### Example 1: Basic Query

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C C': range(10, 5, -1)
})

print("Original DataFrame:")
print(df)

# Query rows where column A is greater than column B
result = df.query('A > B')
print("\nFiltered DataFrame:")
print(result)
```

**Output:**

```
Original DataFrame:
   A   B  C C
0  1  10   10
1  2   8    9
2  3   6    8
3  4   4    7
4  5   2    6

Filtered DataFrame:
   A  B  C C
4  5  2    6
```

---

#### Example 2: Query with Variables

```python
# Define a variable
threshold = 5

# Query rows where column A is greater than the threshold
result = df.query('A > @threshold')
print(result)
```

**Output:**

```
   A  B  C C
4  5  2    6
```

---

#### Example 3: Query with Special Column Names

```python
# Query rows where column 'C C' equals 10
result = df.query('`C C` == 10')
print(result)
```

**Output:**

```
   A   B  C C
0  1  10   10
```

---

#### Example 4: Combining Conditions

```python
# Query rows where A > 2 and B < 8
result = df.query('A > 2 and B < 8')
print(result)
```

**Output:**

```
   A  B  C C
2  3  6    8
3  4  4    7
4  5  2    6
```

---

#### Example 5: Using `inplace=True`

```python
# Modify the DataFrame in place
df.query('A > 3', inplace=True)
print(df)
```

**Output:**

```
   A  B  C C
3  4  4    7
4  5  2    6
```

---

### **Related Methods**

1. **`DataFrame.eval`**:

   - Evaluates a string describing operations on DataFrame columns.
   - Example:
     ```python
     df.eval('A + B')
     ```

2. **`DataFrame.loc`**:

   - Filters rows using boolean indexing.
   - Example:
     ```python
     df[df.A > df.B]
     ```

3. **`DataFrame.filter`**:
   - Filters rows or columns based on labels.
   - Example:
     ```python
     df.filter(items=['A', 'B'])
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame and want to filter rows based on multiple conditions:

```python
# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 70000, 80000]
})

# Query rows where Age is greater than 30 and Salary is less than 75000
result = df.query('Age > 30 and Salary < 75000')
print(result)
```

**Output:**

```
      Name  Age  Salary
2  Charlie   35   70000
```

---

### **Notes**

- The `query` method is useful for writing concise and readable filtering expressions.
- Use backticks (`` ` ``) to refer to column names with spaces or special characters.
- Use `@` to refer to variables in the environment.
- For complex queries, consider using `eval()` or `loc` for more flexibility.

---

By using `query`, you can efficiently filter rows of a DataFrame based on column values, making it a powerful tool for data analysis and manipulation.


In [None]:

""" pandas.DataFrame.__add__
DataFrame.__add__(other)[source]
Get Addition of DataFrame and other, column-wise.

Equivalent to DataFrame.add(other).

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Object to be added to the DataFrame.

Returns
:
DataFrame
The result of adding other to DataFrame.


DataFrame.add
Add a DataFrame and another object, with option for index- or column-oriented addition. """



df = pd.DataFrame({'height': [1.5, 2.6], 'weight': [500, 800]},
                  index=['elk', 'moose'])
df
df[['height', 'weight']] + 1.5
df[['height', 'weight']] + [0.5, 1.5]
df[['height', 'weight']] + {'height': 0.5, 'weight': 1.5}
# When other is a Series, the index of other is aligned with the columns of the DataFrame.

s1 = pd.Series([0.5, 1.5], index=['weight', 'height'])
df[['height', 'weight']] + s1

# Even when the index of other is the same as the index of the DataFrame, the Series will not be reoriented. If index-wise alignment is desired, DataFrame.add() should be used with axis=’index’.

s2 = pd.Series([0.5, 1.5], index=['elk', 'moose'])
df[['height', 'weight']] + s2
df[['height', 'weight']].add(s2, axis='index')
 
# When other is a DataFrame, both columns names and the index are aligned.

other = pd.DataFrame({'height': [0.2, 0.4, 0.6]},
                     index=['elk', 'moose', 'deer'])
df[['height', 'weight']] + other

The `pandas.DataFrame.__add__` method is used to perform **element-wise addition** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). It is equivalent to using the `+` operator or the `DataFrame.add()` method. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.__add__(other)
```

---

### **Parameters**

1. **`other`** : `scalar`, `sequence`, `Series`, `dict`, or `DataFrame`
   - The object to be added to the DataFrame.
   - If `other` is a:
     - **scalar**: The scalar is added to every element in the DataFrame.
     - **sequence**: Each element in the sequence is added to the corresponding column in the DataFrame.
     - **Series**: The Series is aligned with the DataFrame's columns, and element-wise addition is performed.
     - **dict**: The keys of the dict are aligned with the DataFrame's columns, and the corresponding values are added.
     - **DataFrame**: Both the columns and index are aligned, and element-wise addition is performed.

---

### **Returns**

- **DataFrame**:
  - A new DataFrame containing the result of the addition.

---

### **Examples**

#### Example 1: Adding a Scalar

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'height': [1.5, 2.6], 'weight': [500, 800]}, index=['elk', 'moose'])

# Add a scalar to the DataFrame
result = df + 1.5
print(result)
```

**Output:**

```
       height  weight
elk       3.0   501.5
moose     4.1   801.5
```

- The scalar `1.5` is added to every element in the DataFrame.

---

#### Example 2: Adding a Sequence

```python
# Add a sequence to the DataFrame
result = df + [0.5, 1.5]
print(result)
```

**Output:**

```
       height  weight
elk       2.0   501.5
moose     3.1   801.5
```

- The sequence `[0.5, 1.5]` is added to the corresponding columns (`height` and `weight`).

---

#### Example 3: Adding a Dictionary

```python
# Add a dictionary to the DataFrame
result = df + {'height': 0.5, 'weight': 1.5}
print(result)
```

**Output:**

```
       height  weight
elk       2.0   501.5
moose     3.1   801.5
```

- The dictionary values are added to the corresponding columns.

---

#### Example 4: Adding a Series

```python
# Create a Series
s1 = pd.Series([0.5, 1.5], index=['weight', 'height'])

# Add the Series to the DataFrame
result = df + s1
print(result)
```

**Output:**

```
       height  weight
elk       3.0   500.5
moose     4.1   800.5
```

- The Series is aligned with the DataFrame's columns, and element-wise addition is performed.

---

#### Example 5: Adding a DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({'height': [0.2, 0.4, 0.6]}, index=['elk', 'moose', 'deer'])

# Add the other DataFrame to the original DataFrame
result = df + other
print(result)
```

**Output:**

```
       height  weight
deer      NaN     NaN
elk       1.7     NaN
moose     3.0     NaN
```

- The DataFrames are aligned by both columns and index, and element-wise addition is performed. Missing values (`NaN`) are introduced where there is no alignment.

---

### **Related Methods**

1. **`DataFrame.add`**:

   - Equivalent to `__add__` but provides additional options like `axis` and `fill_value`.
   - Example:
     ```python
     df.add(other, axis='index')
     ```

2. **`DataFrame.sub`**:

   - Performs element-wise subtraction.
   - Example:
     ```python
     df.sub(other)
     ```

3. **`DataFrame.mul`**:

   - Performs element-wise multiplication.
   - Example:
     ```python
     df.mul(other)
     ```

4. **`DataFrame.div`**:
   - Performs element-wise division.
   - Example:
     ```python
     df.div(other)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing measurements and want to add a constant offset to all values:

```python
# Create a DataFrame
df = pd.DataFrame({'temperature': [20.5, 22.3, 19.8], 'humidity': [45, 50, 48]})

# Add an offset to all measurements
offset = 2.5
result = df + offset
print(result)
```

**Output:**

```
   temperature  humidity
0         23.0      47.5
1         24.8      52.5
2         22.3      50.5
```

---

### **Notes**

- The `__add__` method performs element-wise addition and aligns objects by columns and index.
- Use `DataFrame.add` for more advanced operations, such as specifying the `axis` or handling missing values with `fill_value`.

---

By using `__add__` or the `+` operator, you can efficiently perform element-wise addition on DataFrames and other compatible objects.


In [None]:
""" pandas.DataFrame.add
DataFrame.add(other, axis='columns', level=None, fill_value=None)[source]
Get Addition of dataframe and other, element-wise (binary operator add).

Equivalent to dataframe + other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, radd.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power. """

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
df + 1
df.add(1)
# Divide by constant with reverse version.

df.div(10)
df.rdiv(10)
# Subtract a list and Series by axis with operator version.

df - [1, 2]
df.sub([1, 2], axis='columns')

df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')

# Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
# Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
df * other
df.mul(other, fill_value=0)
# Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
df.div(df_multindex, level=1, fill_value=0)

The `pandas.DataFrame.add` method is used to perform **element-wise addition** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). It is equivalent to using the `+` operator but provides additional flexibility, such as handling missing values with `fill_value` and specifying the `axis` for alignment. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.add(other, axis='columns', level=None, fill_value=None)
```

---

### **Parameters**

1. **`other`** : `scalar`, `sequence`, `Series`, `dict`, or `DataFrame`

   - The object to be added to the DataFrame.
   - If `other` is a:
     - **scalar**: The scalar is added to every element in the DataFrame.
     - **sequence**: Each element in the sequence is added to the corresponding column in the DataFrame.
     - **Series**: The Series is aligned with the DataFrame's columns or index, depending on the `axis`.
     - **dict**: The keys of the dict are aligned with the DataFrame's columns, and the corresponding values are added.
     - **DataFrame**: Both the columns and index are aligned, and element-wise addition is performed.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `'columns'`

   - The axis to align the addition operation:
     - `0` or `'index'`: Align along the index (rows).
     - `1` or `'columns'`: Align along the columns (default).

3. **`level`** : `int` or `label`, optional

   - For MultiIndex DataFrames, specifies the level to align on.

4. **`fill_value`** : `float` or `None`, default `None`
   - A value to fill missing (`NaN`) values in the DataFrame or `other` before performing the addition.
   - If `None`, missing values remain as `NaN`.

---

### **Returns**

- **DataFrame**:
  - A new DataFrame containing the result of the element-wise addition.

---

### **Examples**

#### Example 1: Adding a Scalar

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, 180, 360]}, index=['circle', 'triangle', 'rectangle'])

# Add a scalar to the DataFrame
result = df.add(1)
print(result)
```

**Output:**

```
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
```

- The scalar `1` is added to every element in the DataFrame.

---

#### Example 2: Adding a Sequence

```python
# Add a sequence to the DataFrame
result = df.add([1, 2], axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle          1      362
triangle        4      182
rectangle       5      362
```

- The sequence `[1, 2]` is added to the corresponding columns (`angles` and `degrees`).

---

#### Example 3: Adding a Series

```python
# Create a Series
s = pd.Series([1, 2], index=['angles', 'degrees'])

# Add the Series to the DataFrame
result = df.add(s, axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle          1      362
triangle        4      182
rectangle       5      362
```

- The Series is aligned with the DataFrame's columns, and element-wise addition is performed.

---

#### Example 4: Adding a DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [10, 20, 30]}, index=['circle', 'triangle', 'rectangle'])

# Add the other DataFrame to the original DataFrame
result = df.add(other)
print(result)
```

**Output:**

```
           angles  degrees
circle          0      370
triangle        6      200
rectangle       8      390
```

- The DataFrames are aligned by both columns and index, and element-wise addition is performed.

---

#### Example 5: Using `fill_value`

```python
# Create a DataFrame with missing values
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, None, 360]}, index=['circle', 'triangle', 'rectangle'])

# Add a scalar, filling missing values with 0
result = df.add(1, fill_value=0)
print(result)
```

**Output:**

```
           angles  degrees
circle          1      361
triangle        4      1.0
rectangle       5      361
```

- Missing values (`NaN`) are filled with `0` before the addition.

---

### **Related Methods**

1. **`DataFrame.sub`**:

   - Performs element-wise subtraction.
   - Example:
     ```python
     df.sub(other)
     ```

2. **`DataFrame.mul`**:

   - Performs element-wise multiplication.
   - Example:
     ```python
     df.mul(other)
     ```

3. **`DataFrame.div`**:

   - Performs element-wise division.
   - Example:
     ```python
     df.div(other)
     ```

4. **`DataFrame.radd`**:
   - Reverse version of `add` (e.g., `other + df`).
   - Example:
     ```python
     df.radd(other)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing measurements and want to add an offset to specific columns:

```python
# Create a DataFrame
df = pd.DataFrame({'temperature': [20.5, 22.3, 19.8], 'humidity': [45, 50, 48]})

# Add an offset to the 'temperature' column
offset = pd.Series([2.5, 0, 0], index=['temperature', 'humidity', 'pressure'])
result = df.add(offset, axis='columns')
print(result)
```

**Output:**

```
   temperature  humidity
0         23.0      45.0
1         22.3      50.0
2         19.8      48.0
```

---

### **Notes**

- The `add` method performs element-wise addition and aligns objects by columns and index.
- Use `fill_value` to handle missing values during the operation.
- For more advanced operations, consider using `DataFrame.apply` or `DataFrame.eval`.

---

By using `add`, you can efficiently perform element-wise addition on DataFrames and other compatible objects, with options for handling missing values and specifying alignment.


In [None]:
""" pandas.DataFrame.sub
DataFrame.sub(other, axis='columns', level=None, fill_value=None)[source]
Get Subtraction of dataframe and other, element-wise (binary operator sub).

Equivalent to dataframe - other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rsub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.



DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power. """
df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
df +1
df.add(1)
# Divide by constant with reverse version.

df.div(10)
df.rdiv(10)

In [None]:
# Subtract a list and Series by axis with operator version.

df - [1, 2]

df.sub([1, 2], axis='columns')


df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')

df.mul({'angles': 0, 'degrees': 2})

df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other

df * other

df.mul(other, fill_value=0)


df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex


df.div(df_multindex, level=1, fill_value=0)





The `pandas.DataFrame.sub` method is used to perform **element-wise subtraction** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). It is equivalent to using the `-` operator but provides additional flexibility, such as handling missing values with `fill_value` and specifying the `axis` for alignment. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.sub(other, axis='columns', level=None, fill_value=None)
```

---

### **Parameters**

1. **`other`** : `scalar`, `sequence`, `Series`, `dict`, or `DataFrame`

   - The object to be subtracted from the DataFrame.
   - If `other` is a:
     - **scalar**: The scalar is subtracted from every element in the DataFrame.
     - **sequence**: Each element in the sequence is subtracted from the corresponding column in the DataFrame.
     - **Series**: The Series is aligned with the DataFrame's columns or index, depending on the `axis`.
     - **dict**: The keys of the dict are aligned with the DataFrame's columns, and the corresponding values are subtracted.
     - **DataFrame**: Both the columns and index are aligned, and element-wise subtraction is performed.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `'columns'`

   - The axis to align the subtraction operation:
     - `0` or `'index'`: Align along the index (rows).
     - `1` or `'columns'`: Align along the columns (default).

3. **`level`** : `int` or `label`, optional

   - For MultiIndex DataFrames, specifies the level to align on.

4. **`fill_value`** : `float` or `None`, default `None`
   - A value to fill missing (`NaN`) values in the DataFrame or `other` before performing the subtraction.
   - If `None`, missing values remain as `NaN`.

---

### **Returns**

- **DataFrame**:
  - A new DataFrame containing the result of the element-wise subtraction.

---

### **Examples**

#### Example 1: Subtracting a Scalar

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, 180, 360]}, index=['circle', 'triangle', 'rectangle'])

# Subtract a scalar from the DataFrame
result = df.sub(1)
print(result)
```

**Output:**

```
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
```

- The scalar `1` is subtracted from every element in the DataFrame.

---

#### Example 2: Subtracting a Sequence

```python
# Subtract a sequence from the DataFrame
result = df.sub([1, 2], axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
```

- The sequence `[1, 2]` is subtracted from the corresponding columns (`angles` and `degrees`).

---

#### Example 3: Subtracting a Series

```python
# Create a Series
s = pd.Series([1, 2], index=['angles', 'degrees'])

# Subtract the Series from the DataFrame
result = df.sub(s, axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
```

- The Series is aligned with the DataFrame's columns, and element-wise subtraction is performed.

---

#### Example 4: Subtracting a DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [10, 20, 30]}, index=['circle', 'triangle', 'rectangle'])

# Subtract the other DataFrame from the original DataFrame
result = df.sub(other)
print(result)
```

**Output:**

```
           angles  degrees
circle          0      350
triangle        0      160
rectangle       0      330
```

- The DataFrames are aligned by both columns and index, and element-wise subtraction is performed.

---

#### Example 5: Using `fill_value`

```python
# Create a DataFrame with missing values
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, None, 360]}, index=['circle', 'triangle', 'rectangle'])

# Subtract a scalar, filling missing values with 0
result = df.sub(1, fill_value=0)
print(result)
```

**Output:**

```
           angles  degrees
circle         -1      359
triangle        2     -1.0
rectangle       3      359
```

- Missing values (`NaN`) are filled with `0` before the subtraction.

---

### **Related Methods**

1. **`DataFrame.add`**:

   - Performs element-wise addition.
   - Example:
     ```python
     df.add(other)
     ```

2. **`DataFrame.mul`**:

   - Performs element-wise multiplication.
   - Example:
     ```python
     df.mul(other)
     ```

3. **`DataFrame.div`**:

   - Performs element-wise division.
   - Example:
     ```python
     df.div(other)
     ```

4. **`DataFrame.rsub`**:
   - Reverse version of `sub` (e.g., `other - df`).
   - Example:
     ```python
     df.rsub(other)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing measurements and want to subtract an offset from specific columns:

```python
# Create a DataFrame
df = pd.DataFrame({'temperature': [20.5, 22.3, 19.8], 'humidity': [45, 50, 48]})

# Subtract an offset from the 'temperature' column
offset = pd.Series([2.5, 0, 0], index=['temperature', 'humidity', 'pressure'])
result = df.sub(offset, axis='columns')
print(result)
```

**Output:**

```
   temperature  humidity
0         18.0      45.0
1         22.3      50.0
2         19.8      48.0
```

---

### **Notes**

- The `sub` method performs element-wise subtraction and aligns objects by columns and index.
- Use `fill_value` to handle missing values during the operation.
- For more advanced operations, consider using `DataFrame.apply` or `DataFrame.eval`.

---

By using `sub`, you can efficiently perform element-wise subtraction on DataFrames and other compatible objects, with options for handling missing values and specifying alignment.


In [None]:
""" pandas.DataFrame.mul
DataFrame.mul(other, axis='columns', level=None, fill_value=None)[source]
Get Multiplication of dataframe and other, element-wise (binary operator mul).

Equivalent to dataframe * other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.mul` method is used to perform **element-wise multiplication** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). It is equivalent to using the `*` operator but provides additional flexibility, such as handling missing values with `fill_value` and specifying the `axis` for alignment. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.mul(other, axis='columns', level=None, fill_value=None)
```

---

### **Parameters**

1. **`other`** : `scalar`, `sequence`, `Series`, `dict`, or `DataFrame`

   - The object to multiply with the DataFrame.
   - If `other` is a:
     - **scalar**: The scalar is multiplied with every element in the DataFrame.
     - **sequence**: Each element in the sequence is multiplied with the corresponding column in the DataFrame.
     - **Series**: The Series is aligned with the DataFrame's columns or index, depending on the `axis`.
     - **dict**: The keys of the dict are aligned with the DataFrame's columns, and the corresponding values are multiplied.
     - **DataFrame**: Both the columns and index are aligned, and element-wise multiplication is performed.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `'columns'`

   - The axis to align the multiplication operation:
     - `0` or `'index'`: Align along the index (rows).
     - `1` or `'columns'`: Align along the columns (default).

3. **`level`** : `int` or `label`, optional

   - For MultiIndex DataFrames, specifies the level to align on.

4. **`fill_value`** : `float` or `None`, default `None`
   - A value to fill missing (`NaN`) values in the DataFrame or `other` before performing the multiplication.
   - If `None`, missing values remain as `NaN`.

---

### **Returns**

- **DataFrame**:
  - A new DataFrame containing the result of the element-wise multiplication.

---

### **Examples**

#### Example 1: Multiplying by a Scalar

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, 180, 360]}, index=['circle', 'triangle', 'rectangle'])

# Multiply the DataFrame by a scalar
result = df.mul(2)
print(result)
```

**Output:**

```
           angles  degrees
circle          0      720
triangle        6      360
rectangle       8      720
```

- The scalar `2` is multiplied with every element in the DataFrame.

---

#### Example 2: Multiplying by a Sequence

```python
# Multiply the DataFrame by a sequence
result = df.mul([1, 2], axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle          0      720
triangle        3      360
rectangle       4      720
```

- The sequence `[1, 2]` is multiplied with the corresponding columns (`angles` and `degrees`).

---

#### Example 3: Multiplying by a Series

```python
# Create a Series
s = pd.Series([1, 2], index=['angles', 'degrees'])

# Multiply the DataFrame by the Series
result = df.mul(s, axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle          0      720
triangle        3      360
rectangle       4      720
```

- The Series is aligned with the DataFrame's columns, and element-wise multiplication is performed.

---

#### Example 4: Multiplying by a DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [10, 20, 30]}, index=['circle', 'triangle', 'rectangle'])

# Multiply the original DataFrame by the other DataFrame
result = df.mul(other)
print(result)
```

**Output:**

```
           angles  degrees
circle          0     3600
triangle        9     3600
rectangle      16    10800
```

- The DataFrames are aligned by both columns and index, and element-wise multiplication is performed.

---

#### Example 5: Using `fill_value`

```python
# Create a DataFrame with missing values
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, None, 360]}, index=['circle', 'triangle', 'rectangle'])

# Multiply by a scalar, filling missing values with 1
result = df.mul(2, fill_value=1)
print(result)
```

**Output:**

```
           angles  degrees
circle          0      720
triangle        6      2.0
rectangle       8      720
```

- Missing values (`NaN`) are filled with `1` before the multiplication.

---

### **Related Methods**

1. **`DataFrame.add`**:

   - Performs element-wise addition.
   - Example:
     ```python
     df.add(other)
     ```

2. **`DataFrame.sub`**:

   - Performs element-wise subtraction.
   - Example:
     ```python
     df.sub(other)
     ```

3. **`DataFrame.div`**:

   - Performs element-wise division.
   - Example:
     ```python
     df.div(other)
     ```

4. **`DataFrame.rmul`**:
   - Reverse version of `mul` (e.g., `other * df`).
   - Example:
     ```python
     df.rmul(other)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing measurements and want to scale specific columns by a factor:

```python
# Create a DataFrame
df = pd.DataFrame({'temperature': [20.5, 22.3, 19.8], 'humidity': [45, 50, 48]})

# Scale the 'temperature' column by a factor
factor = pd.Series([2, 1], index=['temperature', 'humidity'])
result = df.mul(factor, axis='columns')
print(result)
```

**Output:**

```
   temperature  humidity
0         41.0        45
1         44.6        50
2         39.6        48
```

---

### **Notes**

- The `mul` method performs element-wise multiplication and aligns objects by columns and index.
- Use `fill_value` to handle missing values during the operation.
- For more advanced operations, consider using `DataFrame.apply` or `DataFrame.eval`.

---

By using `mul`, you can efficiently perform element-wise multiplication on DataFrames and other compatible objects, with options for handling missing values and specifying alignment.


In [None]:
""" pandas.DataFrame.div
DataFrame.div(other, axis='columns', level=None, fill_value=None)[source]
Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.div` method is used to perform **element-wise division** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). It is equivalent to using the `/` operator but provides additional flexibility, such as handling missing values with `fill_value` and specifying the `axis` for alignment. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.div(other, axis='columns', level=None, fill_value=None)
```

---

### **Parameters**

1. **`other`** : `scalar`, `sequence`, `Series`, `dict`, or `DataFrame`

   - The object to divide the DataFrame by.
   - If `other` is a:
     - **scalar**: The scalar is used to divide every element in the DataFrame.
     - **sequence**: Each element in the sequence is used to divide the corresponding column in the DataFrame.
     - **Series**: The Series is aligned with the DataFrame's columns or index, depending on the `axis`.
     - **dict**: The keys of the dict are aligned with the DataFrame's columns, and the corresponding values are used for division.
     - **DataFrame**: Both the columns and index are aligned, and element-wise division is performed.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `'columns'`

   - The axis to align the division operation:
     - `0` or `'index'`: Align along the index (rows).
     - `1` or `'columns'`: Align along the columns (default).

3. **`level`** : `int` or `label`, optional

   - For MultiIndex DataFrames, specifies the level to align on.

4. **`fill_value`** : `float` or `None`, default `None`
   - A value to fill missing (`NaN`) values in the DataFrame or `other` before performing the division.
   - If `None`, missing values remain as `NaN`.

---

### **Returns**

- **DataFrame**:
  - A new DataFrame containing the result of the element-wise division.

---

### **Examples**

#### Example 1: Dividing by a Scalar

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, 180, 360]}, index=['circle', 'triangle', 'rectangle'])

# Divide the DataFrame by a scalar
result = df.div(2)
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      1.5     90.0
rectangle     2.0    180.0
```

- The scalar `2` is used to divide every element in the DataFrame.

---

#### Example 2: Dividing by a Sequence

```python
# Divide the DataFrame by a sequence
result = df.div([1, 2], axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      3.0     90.0
rectangle     4.0    180.0
```

- The sequence `[1, 2]` is used to divide the corresponding columns (`angles` and `degrees`).

---

#### Example 3: Dividing by a Series

```python
# Create a Series
s = pd.Series([1, 2], index=['angles', 'degrees'])

# Divide the DataFrame by the Series
result = df.div(s, axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      3.0     90.0
rectangle     4.0    180.0
```

- The Series is aligned with the DataFrame's columns, and element-wise division is performed.

---

#### Example 4: Dividing by a DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({'angles': [1, 3, 4], 'degrees': [10, 20, 30]}, index=['circle', 'triangle', 'rectangle'])

# Divide the original DataFrame by the other DataFrame
result = df.div(other)
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0     36.0
triangle      1.0      9.0
rectangle     1.0     12.0
```

- The DataFrames are aligned by both columns and index, and element-wise division is performed.

---

#### Example 5: Using `fill_value`

```python
# Create a DataFrame with missing values
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, None, 360]}, index=['circle', 'triangle', 'rectangle'])

# Divide by a scalar, filling missing values with 1
result = df.div(2, fill_value=1)
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      1.5      0.5
rectangle     2.0    180.0
```

- Missing values (`NaN`) are filled with `1` before the division.

---

### **Related Methods**

1. **`DataFrame.add`**:

   - Performs element-wise addition.
   - Example:
     ```python
     df.add(other)
     ```

2. **`DataFrame.sub`**:

   - Performs element-wise subtraction.
   - Example:
     ```python
     df.sub(other)
     ```

3. **`DataFrame.mul`**:

   - Performs element-wise multiplication.
   - Example:
     ```python
     df.mul(other)
     ```

4. **`DataFrame.rtruediv`**:
   - Reverse version of `div` (e.g., `other / df`).
   - Example:
     ```python
     df.rtruediv(other)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing measurements and want to normalize specific columns by a factor:

```python
# Create a DataFrame
df = pd.DataFrame({'temperature': [20.5, 22.3, 19.8], 'humidity': [45, 50, 48]})

# Normalize the 'temperature' column by a factor
factor = pd.Series([2, 1], index=['temperature', 'humidity'])
result = df.div(factor, axis='columns')
print(result)
```

**Output:**

```
   temperature  humidity
0        10.25      45.0
1        11.15      50.0
2         9.90      48.0
```

---

### **Notes**

- The `div` method performs element-wise division and aligns objects by columns and index.
- Use `fill_value` to handle missing values during the operation.
- For more advanced operations, consider using `DataFrame.apply` or `DataFrame.eval`.

---

By using `div`, you can efficiently perform element-wise division on DataFrames and other compatible objects, with options for handling missing values and specifying alignment.


In [None]:
""" pandas.DataFrame.truediv
DataFrame.truediv(other, axis='columns', level=None, fill_value=None)[source]
Get Floating division of dataframe and other, element-wise (binary operator truediv).

Equivalent to dataframe / other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rtruediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """
  
  

The `pandas.DataFrame.truediv` method is used to perform **element-wise floating-point division** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). It is equivalent to using the `/` operator but provides additional flexibility, such as handling missing values with `fill_value` and specifying the `axis` for alignment. Below is a detailed explanation of its parameters, usage, and examples.

---

### **Syntax**

```python
DataFrame.truediv(other, axis='columns', level=None, fill_value=None)
```

---

### **Parameters**

1. **`other`** : `scalar`, `sequence`, `Series`, `dict`, or `DataFrame`

   - The object to divide the DataFrame by.
   - If `other` is a:
     - **scalar**: The scalar is used to divide every element in the DataFrame.
     - **sequence**: Each element in the sequence is used to divide the corresponding column in the DataFrame.
     - **Series**: The Series is aligned with the DataFrame's columns or index, depending on the `axis`.
     - **dict**: The keys of the dict are aligned with the DataFrame's columns, and the corresponding values are used for division.
     - **DataFrame**: Both the columns and index are aligned, and element-wise division is performed.

2. **`axis`** : `{0 or 'index', 1 or 'columns'}`, default `'columns'`

   - The axis to align the division operation:
     - `0` or `'index'`: Align along the index (rows).
     - `1` or `'columns'`: Align along the columns (default).

3. **`level`** : `int` or `label`, optional

   - For MultiIndex DataFrames, specifies the level to align on.

4. **`fill_value`** : `float` or `None`, default `None`
   - A value to fill missing (`NaN`) values in the DataFrame or `other` before performing the division.
   - If `None`, missing values remain as `NaN`.

---

### **Returns**

- **DataFrame**:
  - A new DataFrame containing the result of the element-wise floating-point division.

---

### **Examples**

#### Example 1: Dividing by a Scalar

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, 180, 360]}, index=['circle', 'triangle', 'rectangle'])

# Divide the DataFrame by a scalar
result = df.truediv(2)
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      1.5     90.0
rectangle     2.0    180.0
```

- The scalar `2` is used to divide every element in the DataFrame.

---

#### Example 2: Dividing by a Sequence

```python
# Divide the DataFrame by a sequence
result = df.truediv([1, 2], axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      3.0     90.0
rectangle     4.0    180.0
```

- The sequence `[1, 2]` is used to divide the corresponding columns (`angles` and `degrees`).

---

#### Example 3: Dividing by a Series

```python
# Create a Series
s = pd.Series([1, 2], index=['angles', 'degrees'])

# Divide the DataFrame by the Series
result = df.truediv(s, axis='columns')
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      3.0     90.0
rectangle     4.0    180.0
```

- The Series is aligned with the DataFrame's columns, and element-wise division is performed.

---

#### Example 4: Dividing by a DataFrame

```python
# Create another DataFrame
other = pd.DataFrame({'angles': [1, 3, 4], 'degrees': [10, 20, 30]}, index=['circle', 'triangle', 'rectangle'])

# Divide the original DataFrame by the other DataFrame
result = df.truediv(other)
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0     36.0
triangle      1.0      9.0
rectangle     1.0     12.0
```

- The DataFrames are aligned by both columns and index, and element-wise division is performed.

---

#### Example 5: Using `fill_value`

```python
# Create a DataFrame with missing values
df = pd.DataFrame({'angles': [0, 3, 4], 'degrees': [360, None, 360]}, index=['circle', 'triangle', 'rectangle'])

# Divide by a scalar, filling missing values with 1
result = df.truediv(2, fill_value=1)
print(result)
```

**Output:**

```
           angles  degrees
circle        0.0    180.0
triangle      1.5      0.5
rectangle     2.0    180.0
```

- Missing values (`NaN`) are filled with `1` before the division.

---

### **Related Methods**

1. **`DataFrame.add`**:

   - Performs element-wise addition.
   - Example:
     ```python
     df.add(other)
     ```

2. **`DataFrame.sub`**:

   - Performs element-wise subtraction.
   - Example:
     ```python
     df.sub(other)
     ```

3. **`DataFrame.mul`**:

   - Performs element-wise multiplication.
   - Example:
     ```python
     df.mul(other)
     ```

4. **`DataFrame.rtruediv`**:
   - Reverse version of `truediv` (e.g., `other / df`).
   - Example:
     ```python
     df.rtruediv(other)
     ```

---

### **Practical Use Case**

Suppose you have a DataFrame representing measurements and want to normalize specific columns by a factor:

```python
# Create a DataFrame
df = pd.DataFrame({'temperature': [20.5, 22.3, 19.8], 'humidity': [45, 50, 48]})

# Normalize the 'temperature' column by a factor
factor = pd.Series([2, 1], index=['temperature', 'humidity'])
result = df.truediv(factor, axis='columns')
print(result)
```

**Output:**

```
   temperature  humidity
0        10.25      45.0
1        11.15      50.0
2         9.90      48.0
```

---

### **Notes**

- The `truediv` method performs element-wise floating-point division and aligns objects by columns and index.
- Use `fill_value` to handle missing values during the operation.
- For more advanced operations, consider using `DataFrame.apply` or `DataFrame.eval`.

---

By using `truediv`, you can efficiently perform element-wise floating-point division on DataFrames and other compatible objects, with options for handling missing values and specifying alignment.


In [None]:
""" pandas.DataFrame.floordiv
DataFrame.floordiv(other, axis='columns', level=None, fill_value=None)[source]
Get Integer division of dataframe and other, element-wise (binary operator floordiv).

Equivalent to dataframe // other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rfloordiv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.floordiv` method is used to perform element-wise integer division (also known as floor division) between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). The result is a DataFrame where each element is the result of the floor division operation.

### Key Points:

- **Floor Division**: The operation `//` is used, which divides the elements and returns the largest integer less than or equal to the division result.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be divided element-wise with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the floor division operation.

### Examples:

#### 1. Floor Division by a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Floor division by a scalar
result = df.floordiv(10)
print(result)
```

Output:

```
           angles  degrees
circle          0       36
triangle        0       18
rectangle       0       36
```

#### 2. Floor Division with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Floor division with another DataFrame
result = df.floordiv(other)
print(result)
```

Output:

```
           angles  degrees
circle          0       36
triangle        1        9
rectangle       1       12
```

#### 3. Floor Division with `fill_value`

```python
# Floor division with fill_value
result = df.floordiv(other, fill_value=1)
print(result)
```

Output:

```
           angles  degrees
circle          0       36
triangle        1        9
rectangle       1       12
```

#### 4. Floor Division with a Series

```python
# Floor division with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.floordiv(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle          0      180
triangle        3       90
rectangle       4      180
```

#### 5. Floor Division with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Floor division by level
result = df.floordiv(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The reverse version of the operation (`rfloordiv`) can be used to perform the operation with the operands reversed.

### Related Methods:

- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

These methods provide flexible wrappers around arithmetic operations, allowing for easy element-wise operations between DataFrames and other objects.


In [None]:
""" pandas.DataFrame.mod
DataFrame.mod(other, axis='columns', level=None, fill_value=None)[source]
Get Modulo of dataframe and other, element-wise (binary operator mod).

Equivalent to dataframe % other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rmod.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.mod` method is used to perform **element-wise modulo operation** (remainder after division) between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). The result is a DataFrame where each element is the remainder of the division operation.

### Key Points:

- **Modulo Operation**: The operation `%` is used, which returns the remainder after division.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be used in the modulo operation with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the modulo operation.

### Examples:

#### 1. Modulo Operation with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Modulo operation with a scalar
result = df.mod(10)
print(result)
```

Output:

```
           angles  degrees
circle          0        0
triangle        3        0
rectangle       4        0
```

#### 2. Modulo Operation with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Modulo operation with another DataFrame
result = df.mod(other)
print(result)
```

Output:

```
           angles  degrees
circle          0        0
triangle        1        0
rectangle       1        0
```

#### 3. Modulo Operation with `fill_value`

```python
# Modulo operation with fill_value
result = df.mod(other, fill_value=1)
print(result)
```

Output:

```
           angles  degrees
circle          0        0
triangle        1        0
rectangle       1        0
```

#### 4. Modulo Operation with a Series

```python
# Modulo operation with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.mod(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle          0        0
triangle        0        0
rectangle       0        0
```

#### 5. Modulo Operation with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Modulo operation by level
result = df.mod(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        0.0      0.0
  triangle      0.0      0.0
  rectangle     0.0      0.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The reverse version of the operation (`rmod`) can be used to perform the operation with the operands reversed.

### Related Methods:

- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.pow`**: Element-wise exponential power.

These methods provide flexible wrappers around arithmetic operations, allowing for easy element-wise operations between DataFrames and other objects.


In [None]:
""" pandas.DataFrame.pow
DataFrame.pow(other, axis='columns', level=None, fill_value=None)[source]
Get Exponential power of dataframe and other, element-wise (binary operator pow).

Equivalent to dataframe ** other, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, rpow.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.pow` method is used to perform **element-wise exponential power** (raising to a power) between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame). The result is a DataFrame where each element is the result of raising the corresponding element in the DataFrame to the power of the corresponding element in `other`.

### Key Points:

- **Exponential Power**: The operation `**` is used, which raises each element in the DataFrame to the power of the corresponding element in `other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object used as the exponent in the power operation.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the exponential power operation.

### Examples:

#### 1. Exponential Power with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Exponential power with a scalar
result = df.pow(2)
print(result)
```

Output:

```
           angles   degrees
circle          0    129600
triangle        9     32400
rectangle      16    129600
```

#### 2. Exponential Power with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [2, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])

# Exponential power with another DataFrame
result = df.pow(other)
print(result)
```

Output:

```
           angles   degrees
circle          0    129600
triangle        9     5832000
rectangle      64  1.679616e+08
```

#### 3. Exponential Power with `fill_value`

```python
# Exponential power with fill_value
result = df.pow(other, fill_value=1)
print(result)
```

Output:

```
           angles   degrees
circle          0    129600
triangle        9     5832000
rectangle      64  1.679616e+08
```

#### 4. Exponential Power with a Series

```python
# Exponential power with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.pow(series, axis='columns')
print(result)
```

Output:

```
           angles   degrees
circle          0    129600
triangle        3     32400
rectangle       4    129600
```

#### 5. Exponential Power with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Exponential power by level
result = df.pow(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles   degrees
A circle        1.0       1.0
  triangle      27.0       1.0
  rectangle    256.0       1.0
B square        1.0       1.0
  pentagon      1.0       1.0
  hexagon       1.0       1.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The reverse version of the operation (`rpow`) can be used to perform the operation with the operands reversed.

### Related Methods:

- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.

These methods provide flexible wrappers around arithmetic operations, allowing for easy element-wise operations between DataFrames and other objects.


In [None]:
""" API reference
DataFrame
pandas.DataFrame.dot
pandas.DataFrame.dot
DataFrame.dot(other)[source]
Compute the matrix multiplication between the DataFrame and other.

This method computes the matrix product between the DataFrame and the values of an other Series, DataFrame or a numpy array.

It can also be called using self @ other.

Parameters
:
other
Series, DataFrame or array-like
The other object to compute the matrix product with.

Returns
:
Series or DataFrame
If other is a Series, return the matrix product between self and other as a Series. If other is a DataFrame or a numpy.array, return the matrix product of self and other in a DataFrame of a np.array.

See also

Series.dot
Similar method for Series.

Notes

The dimensions of DataFrame and other must be compatible in order to compute the matrix multiplication. In addition, the column names of DataFrame and the index of other must contain the same values, as they will be aligned prior to the multiplication.

The dot method for Series computes the inner product, instead of the matrix product here.

Examples

Here we multiply a DataFrame with a Series.

df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
s = pd.Series([1, 1, 2, 1])
df.dot(s)
0    -4
1     5
dtype: int64
Here we multiply a DataFrame with another DataFrame.

other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])
df.dot(other)
    0   1
0   1   4
1   2   2
Note that the dot method give the same result as @

df @ other
    0   1
0   1   4
1   2   2
The dot method works also if other is an np.array.

arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])
df.dot(arr)
    0   1
0   1   4
1   2   2
Note how shuffling of the objects does not change the result.

s2 = s.reindex([1, 0, 2, 3])
df.dot(s2)
0    -4
1     5
dtype: int64 """

The `pandas.DataFrame.dot` method is used to compute the **matrix multiplication** (also known as the dot product) between a DataFrame and another object (Series, DataFrame, or array-like). This method aligns the columns of the DataFrame with the index of the other object before performing the multiplication.

### Key Points:

- **Matrix Multiplication**: The operation computes the dot product, which is the sum of the products of corresponding elements.
- **Alignment**: The columns of the DataFrame and the index of `other` must align for the operation to work.
- **Flexibility**: It can be used with a Series, DataFrame, or numpy array.

### Parameters:

- **`other`**: This can be a Series, DataFrame, or array-like object. It is the object to compute the matrix product with.

### Returns:

- **Series or DataFrame**:
  - If `other` is a **Series**, the result is a **Series**.
  - If `other` is a **DataFrame** or **numpy array**, the result is a **DataFrame** or **numpy array**.

### Notes:

- The dimensions of the DataFrame and `other` must be compatible for matrix multiplication.
- The column names of the DataFrame and the index of `other` must align. If they don't, the operation will fail or produce unexpected results.
- The `@` operator can also be used to perform the same operation (e.g., `df @ other`).

### Examples:

#### 1. Matrix Multiplication with a Series

```python
import pandas as pd

df = pd.DataFrame([[0, 1, -2, -1], [1, 1, 1, 1]])
s = pd.Series([1, 1, 2, 1])

# Matrix multiplication with a Series
result = df.dot(s)
print(result)
```

Output:

```
0   -4
1    5
dtype: int64
```

#### 2. Matrix Multiplication with Another DataFrame

```python
other = pd.DataFrame([[0, 1], [1, 2], [-1, -1], [2, 0]])

# Matrix multiplication with another DataFrame
result = df.dot(other)
print(result)
```

Output:

```
   0  1
0  1  4
1  2  2
```

#### 3. Matrix Multiplication with a Numpy Array

```python
import numpy as np

arr = np.array([[0, 1], [1, 2], [-1, -1], [2, 0]])

# Matrix multiplication with a numpy array
result = df.dot(arr)
print(result)
```

Output:

```
   0  1
0  1  4
1  2  2
```

#### 4. Using the `@` Operator

The `@` operator is equivalent to the `dot` method:

```python
# Using the @ operator
result = df @ other
print(result)
```

Output:

```
   0  1
0  1  4
1  2  2
```

#### 5. Alignment of Indices

Even if the indices of the Series are shuffled, the result remains the same because the alignment is based on column names and index:

```python
s2 = s.reindex([1, 0, 2, 3])

# Matrix multiplication with a shuffled Series
result = df.dot(s2)
print(result)
```

Output:

```
0   -4
1    5
dtype: int64
```

### Notes on Alignment:

- The column names of the DataFrame must match the index of `other` for the operation to work correctly.
- If the indices do not align, the operation will either fail or produce unexpected results.

### Related Methods:

- **`Series.dot`**: Computes the dot product for Series objects.
- **`numpy.dot`**: The underlying numpy function used for matrix multiplication.

### Summary:

- **`DataFrame.dot`** is used for matrix multiplication.
- It aligns the columns of the DataFrame with the index of `other`.
- It can be used with Series, DataFrames, or numpy arrays.
- The `@` operator is a shorthand for the same operation.


In [None]:
""" pandas.DataFrame.radd
DataFrame.radd(other, axis='columns', level=None, fill_value=None)[source]
Get Addition of dataframe and other, element-wise (binary operator radd).

Equivalent to other + dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, add.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.radd` method is used to perform **element-wise addition** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other + dataframe` instead of `dataframe + other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Addition**: The operation computes `other + dataframe` instead of `dataframe + other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be added to the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise addition.

### Examples:

#### 1. Reverse Addition with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse addition with a scalar
result = df.radd(1)
print(result)
```

Output:

```
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
```

#### 2. Reverse Addition with a Series

```python
# Reverse addition with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.radd(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle          1      362
triangle        4      182
rectangle       5      362
```

#### 3. Reverse Addition with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse addition with another DataFrame
result = df.radd(other)
print(result)
```

Output:

```
           angles  degrees
circle          1      370
triangle        5      200
rectangle       7      390
```

#### 4. Reverse Addition with `fill_value`

```python
# Reverse addition with fill_value
result = df.radd(other, fill_value=0)
print(result)
```

Output:

```
           angles  degrees
circle          1      370
triangle        5      200
rectangle       7      390
```

#### 5. Reverse Addition with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse addition by level
result = df.radd(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        0.0    720.0
  triangle      6.0    360.0
  rectangle     8.0    720.0
B square        4.0    360.0
  pentagon      5.0    540.0
  hexagon       6.0    720.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `radd` method is the reverse version of `add`. It is equivalent to `other + dataframe`.

### Related Methods:

- **`DataFrame.add`**: Element-wise addition (`dataframe + other`).
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.radd`** performs element-wise addition with the operands reversed (`other + dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rsub
DataFrame.rsub(other, axis='columns', level=None, fill_value=None)[source]
Get Subtraction of dataframe and other, element-wise (binary operator rsub).

Equivalent to other - dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, sub.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rmul` method is used to perform **element-wise multiplication** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other * dataframe` instead of `dataframe * other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Multiplication**: The operation computes `other * dataframe` instead of `dataframe * other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be multiplied with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise multiplication.

### Examples:

#### 1. Reverse Multiplication with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse multiplication with a scalar
result = df.rmul(2)
print(result)
```

Output:

```
           angles  degrees
circle          0      720
triangle        6      360
rectangle       8      720
```

#### 2. Reverse Multiplication with a Series

```python
# Reverse multiplication with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rmul(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle          0      720
triangle        3      360
rectangle       4      720
```

#### 3. Reverse Multiplication with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse multiplication with another DataFrame
result = df.rmul(other)
print(result)
```

Output:

```
           angles  degrees
circle          0     3600
triangle        6     3600
rectangle      12    10800
```

#### 4. Reverse Multiplication with `fill_value`

```python
# Reverse multiplication with fill_value
result = df.rmul(other, fill_value=0)
print(result)
```

Output:

```
           angles  degrees
circle          0     3600
triangle        6     3600
rectangle      12    10800
```

#### 5. Reverse Multiplication with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse multiplication by level
result = df.rmul(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle          0   129600
  triangle        9    32400
  rectangle      16   129600
B square          0        0
  pentagon        0        0
  hexagon         0        0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rmul` method is the reverse version of `mul`. It is equivalent to `other * dataframe`.

### Related Methods:

- **`DataFrame.mul`**: Element-wise multiplication (`dataframe * other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rmul`** performs element-wise multiplication with the operands reversed (`other * dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" API reference
DataFrame
pandas.DataF...
pandas.DataFrame.rmul
DataFrame.rmul(other, axis='columns', level=None, fill_value=None)[source]
Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rsub` method is used to perform **element-wise subtraction** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other - dataframe` instead of `dataframe - other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Subtraction**: The operation computes `other - dataframe` instead of `dataframe - other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to subtract the DataFrame from.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise subtraction.

### Examples:

#### 1. Reverse Subtraction with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse subtraction with a scalar
result = df.rsub(10)
print(result)
```

Output:

```
           angles  degrees
circle         10     -350
triangle        7     -170
rectangle       6     -350
```

#### 2. Reverse Subtraction with a Series

```python
# Reverse subtraction with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rsub(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle          1     -358
triangle       -2     -178
rectangle      -3     -358
```

#### 3. Reverse Subtraction with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse subtraction with another DataFrame
result = df.rsub(other)
print(result)
```

Output:

```
           angles  degrees
circle          1     -350
triangle       -1     -160
rectangle      -1     -330
```

#### 4. Reverse Subtraction with `fill_value`

```python
# Reverse subtraction with fill_value
result = df.rsub(other, fill_value=0)
print(result)
```

Output:

```
           angles  degrees
circle          1     -350
triangle       -1     -160
rectangle      -1     -330
```

#### 5. Reverse Subtraction with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse subtraction by level
result = df.rsub(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        0.0      0.0
  triangle      0.0      0.0
  rectangle     0.0      0.0
B square        0.0      0.0
  pentagon      1.0      0.0
  hexagon       2.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rsub` method is the reverse version of `sub`. It is equivalent to `other - dataframe`.

### Related Methods:

- **`DataFrame.sub`**: Element-wise subtraction (`dataframe - other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rsub`** performs element-wise subtraction with the operands reversed (`other - dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rmul
DataFrame.rmul(other, axis='columns', level=None, fill_value=None)[source]
Get Multiplication of dataframe and other, element-wise (binary operator rmul).

Equivalent to other * dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mul.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """


The `pandas.DataFrame.rmul` method is used to perform **element-wise multiplication** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other * dataframe` instead of `dataframe * other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Multiplication**: The operation computes `other * dataframe` instead of `dataframe * other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be multiplied with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise multiplication.

### Examples:

#### 1. Reverse Multiplication with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse multiplication with a scalar
result = df.rmul(2)
print(result)
```

Output:

```
           angles  degrees
circle          0      720
triangle        6      360
rectangle       8      720
```

#### 2. Reverse Multiplication with a Series

```python
# Reverse multiplication with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rmul(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle          0      720
triangle        3      360
rectangle       4      720
```

#### 3. Reverse Multiplication with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse multiplication with another DataFrame
result = df.rmul(other)
print(result)
```

Output:

```
           angles  degrees
circle          0     3600
triangle        6     3600
rectangle      12    10800
```

#### 4. Reverse Multiplication with `fill_value`

```python
# Reverse multiplication with fill_value
result = df.rmul(other, fill_value=0)
print(result)
```

Output:

```
           angles  degrees
circle          0     3600
triangle        6     3600
rectangle      12    10800
```

#### 5. Reverse Multiplication with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse multiplication by level
result = df.rmul(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle          0   129600
  triangle        9    32400
  rectangle      16   129600
B square          0        0
  pentagon        0        0
  hexagon         0        0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rmul` method is the reverse version of `mul`. It is equivalent to `other * dataframe`.

### Related Methods:

- **`DataFrame.mul`**: Element-wise multiplication (`dataframe * other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rmul`** performs element-wise multiplication with the operands reversed (`other * dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rdiv
DataFrame.rdiv(other, axis='columns', level=None, fill_value=None)[source]
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rdiv` method is used to perform **element-wise floating division** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other / dataframe` instead of `dataframe / other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Division**: The operation computes `other / dataframe` instead of `dataframe / other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be divided by the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise floating division.

### Examples:

#### 1. Reverse Division with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse division with a scalar
result = df.rdiv(10)
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
```

#### 2. Reverse Division with a Series

```python
# Reverse division with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rdiv(series, axis='columns')
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.005556
triangle  0.333333  0.011111
rectangle 0.250000  0.005556
```

#### 3. Reverse Division with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse division with another DataFrame
result = df.rdiv(other)
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.027778
triangle  0.666667  0.111111
rectangle 0.750000  0.083333
```

#### 4. Reverse Division with `fill_value`

```python
# Reverse division with fill_value
result = df.rdiv(other, fill_value=1)
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.027778
triangle  0.666667  0.111111
rectangle 0.750000  0.083333
```

#### 5. Reverse Division with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse division by level
result = df.rdiv(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rdiv` method is the reverse version of `truediv`. It is equivalent to `other / dataframe`.

### Related Methods:

- **`DataFrame.truediv`**: Element-wise floating division (`dataframe / other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rdiv`** performs element-wise floating division with the operands reversed (`other / dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rtruediv
DataFrame.rtruediv(other, axis='columns', level=None, fill_value=None)[source]
Get Floating division of dataframe and other, element-wise (binary operator rtruediv).

Equivalent to other / dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, truediv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rtruediv` method is used to perform **element-wise floating division** between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other / dataframe` instead of `dataframe / other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Division**: The operation computes `other / dataframe` instead of `dataframe / other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be divided by the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise floating division.

### Examples:

#### 1. Reverse Division with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse division with a scalar
result = df.rtruediv(10)
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
```

#### 2. Reverse Division with a Series

```python
# Reverse division with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rtruediv(series, axis='columns')
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.005556
triangle  0.333333  0.011111
rectangle 0.250000  0.005556
```

#### 3. Reverse Division with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse division with another DataFrame
result = df.rtruediv(other)
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.027778
triangle  0.666667  0.111111
rectangle 0.750000  0.083333
```

#### 4. Reverse Division with `fill_value`

```python
# Reverse division with fill_value
result = df.rtruediv(other, fill_value=1)
print(result)
```

Output:

```
             angles   degrees
circle          inf  0.027778
triangle  0.666667  0.111111
rectangle 0.750000  0.083333
```

#### 5. Reverse Division with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse division by level
result = df.rtruediv(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rtruediv` method is the reverse version of `truediv`. It is equivalent to `other / dataframe`.

### Related Methods:

- **`DataFrame.truediv`**: Element-wise floating division (`dataframe / other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rtruediv`** performs element-wise floating division with the operands reversed (`other / dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rfloordiv
DataFrame.rfloordiv(other, axis='columns', level=None, fill_value=None)[source]
Get Integer division of dataframe and other, element-wise (binary operator rfloordiv).

Equivalent to other // dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, floordiv.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rfloordiv` method is used to perform **element-wise integer division** (also known as floor division) between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other // dataframe` instead of `dataframe // other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Floor Division**: The operation computes `other // dataframe` instead of `dataframe // other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be divided by the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise integer division.

### Examples:

#### 1. Reverse Floor Division with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse floor division with a scalar
result = df.rfloordiv(10)
print(result)
```

Output:

```
           angles  degrees
circle        inf        0
triangle        3        0
rectangle       2        0
```

#### 2. Reverse Floor Division with a Series

```python
# Reverse floor division with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rfloordiv(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle        inf        0
triangle        0        0
rectangle       0        0
```

#### 3. Reverse Floor Division with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse floor division with another DataFrame
result = df.rfloordiv(other)
print(result)
```

Output:

```
           angles  degrees
circle        inf        0
triangle        0        0
rectangle       0        0
```

#### 4. Reverse Floor Division with `fill_value`

```python
# Reverse floor division with fill_value
result = df.rfloordiv(other, fill_value=1)
print(result)
```

Output:

```
           angles  degrees
circle        inf        0
triangle        0        0
rectangle       0        0
```

#### 5. Reverse Floor Division with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse floor division by level
result = df.rfloordiv(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rfloordiv` method is the reverse version of `floordiv`. It is equivalent to `other // dataframe`.

### Related Methods:

- **`DataFrame.floordiv`**: Element-wise integer division (`dataframe // other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.mod`**: Element-wise modulo operation.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rfloordiv`** performs element-wise integer division with the operands reversed (`other // dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rmod
DataFrame.rmod(other, axis='columns', level=None, fill_value=None)[source]
Get Modulo of dataframe and other, element-wise (binary operator rmod).

Equivalent to other % dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, mod.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rmod` method is used to perform **element-wise modulo operation** (remainder after division) between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other % dataframe` instead of `dataframe % other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Modulo Operation**: The operation computes `other % dataframe` instead of `dataframe % other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be used in the modulo operation with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise modulo operation.

### Examples:

#### 1. Reverse Modulo Operation with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse modulo operation with a scalar
result = df.rmod(10)
print(result)
```

Output:

```
           angles  degrees
circle        NaN       10
triangle        1       10
rectangle       2       10
```

#### 2. Reverse Modulo Operation with a Series

```python
# Reverse modulo operation with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rmod(series, axis='columns')
print(result)
```

Output:

```
           angles  degrees
circle        NaN        1
triangle        0        0
rectangle       0        0
```

#### 3. Reverse Modulo Operation with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse modulo operation with another DataFrame
result = df.rmod(other)
print(result)
```

Output:

```
           angles  degrees
circle        NaN       10
triangle        2        0
rectangle       3        0
```

#### 4. Reverse Modulo Operation with `fill_value`

```python
# Reverse modulo operation with fill_value
result = df.rmod(other, fill_value=1)
print(result)
```

Output:

```
           angles  degrees
circle        NaN       10
triangle        2        0
rectangle       3        0
```

#### 5. Reverse Modulo Operation with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse modulo operation by level
result = df.rmod(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles  degrees
A circle        NaN      0.0
  triangle      0.0      0.0
  rectangle     0.0      0.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rmod` method is the reverse version of `mod`. It is equivalent to `other % dataframe`.

### Related Methods:

- **`DataFrame.mod`**: Element-wise modulo operation (`dataframe % other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.pow`**: Element-wise exponential power.

### Summary:

- **`DataFrame.rmod`** performs element-wise modulo operation with the operands reversed (`other % dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.rpow
DataFrame.rpow(other, axis='columns', level=None, fill_value=None)[source]
Get Exponential power of dataframe and other, element-wise (binary operator rpow).

Equivalent to other ** dataframe, but with support to substitute a fill_value for missing data in one of the inputs. With reverse version, pow.

Among flexible wrappers (add, sub, mul, div, floordiv, mod, pow) to arithmetic operators: +, -, *, /, //, %, **.

Parameters
:
other
scalar, sequence, Series, dict or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}
Whether to compare by the index (0 or ‘index’) or columns. (1 or ‘columns’). For Series input, axis to match Series index on.

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

fill_value
float or None, default None
Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

Returns
:
DataFrame
Result of the arithmetic operation.

See also

DataFrame.add
Add DataFrames.

DataFrame.sub
Subtract DataFrames.

DataFrame.mul
Multiply DataFrames.

DataFrame.div
Divide DataFrames (float division).

DataFrame.truediv
Divide DataFrames (float division).

DataFrame.floordiv
Divide DataFrames (integer division).

DataFrame.mod
Calculate modulo (remainder after division).

DataFrame.pow
Calculate exponential power.

Notes

Mismatched indices will be unioned together.

Examples

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])
df
           angles  degrees
circle          0      360
triangle        3      180
rectangle       4      360
Add a scalar with operator version which return the same results.

df + 1
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
df.add(1)
           angles  degrees
circle          1      361
triangle        4      181
rectangle       5      361
Divide by constant with reverse version.

df.div(10)
           angles  degrees
circle        0.0     36.0
triangle      0.3     18.0
rectangle     0.4     36.0
df.rdiv(10)
             angles   degrees
circle          inf  0.027778
triangle   3.333333  0.055556
rectangle  2.500000  0.027778
Subtract a list and Series by axis with operator version.

df - [1, 2]
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub([1, 2], axis='columns')
           angles  degrees
circle         -1      358
triangle        2      178
rectangle       3      358
df.sub(pd.Series([1, 1, 1], index=['circle', 'triangle', 'rectangle']),
       axis='index')
           angles  degrees
circle         -1      359
triangle        2      179
rectangle       3      359
Multiply a dictionary by axis.

df.mul({'angles': 0, 'degrees': 2})
            angles  degrees
circle           0      720
triangle         0      360
rectangle        0      720
df.mul({'circle': 0, 'triangle': 2, 'rectangle': 3}, axis='index')
            angles  degrees
circle           0        0
triangle         6      360
rectangle       12     1080
Multiply a DataFrame of different shape with operator version.

other = pd.DataFrame({'angles': [0, 3, 4]},
                     index=['circle', 'triangle', 'rectangle'])
other
           angles
circle          0
triangle        3
rectangle       4
df * other
           angles  degrees
circle          0      NaN
triangle        9      NaN
rectangle      16      NaN
df.mul(other, fill_value=0)
           angles  degrees
circle          0      0.0
triangle        9      0.0
rectangle      16      0.0
Divide by a MultiIndex by level.

df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])
df_multindex
             angles  degrees
A circle          0      360
  triangle        3      180
  rectangle       4      360
B square          4      360
  pentagon        5      540
  hexagon         6      720
df.div(df_multindex, level=1, fill_value=0)
             angles  degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0 """

The `pandas.DataFrame.rpow` method is used to perform **element-wise exponential power** (raising to a power) between a DataFrame and another object (scalar, sequence, Series, dict, or DataFrame), but with the operands reversed. This means it computes `other ** dataframe` instead of `dataframe ** other`. It is particularly useful when the order of operands matters, and it supports handling missing values (`NaN`) by allowing a `fill_value` to be specified.

### Key Points:

- **Reverse Exponential Power**: The operation computes `other ** dataframe` instead of `dataframe ** other`.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Fill Value**: You can specify a `fill_value` to replace missing values (`NaN`) in either of the inputs before performing the operation.

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, dict, or DataFrame. It is the object to be raised to the power of the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`).
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.
- **`fill_value`**: A value to replace missing values (`NaN`) in the DataFrame or `other` before performing the operation.

### Returns:

- **DataFrame**: The result of the element-wise exponential power operation.

### Examples:

#### 1. Reverse Exponential Power with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'angles': [0, 3, 4],
                   'degrees': [360, 180, 360]},
                  index=['circle', 'triangle', 'rectangle'])

# Reverse exponential power with a scalar
result = df.rpow(2)
print(result)
```

Output:

```
           angles   degrees
circle          1  1.296e+77
triangle        8  1.532e+54
rectangle      16  1.296e+77
```

#### 2. Reverse Exponential Power with a Series

```python
# Reverse exponential power with a Series
series = pd.Series([1, 2], index=['angles', 'degrees'])
result = df.rpow(series, axis='columns')
print(result)
```

Output:

```
           angles   degrees
circle          0  1.296e+77
triangle        1  1.532e+54
rectangle       1  1.296e+77
```

#### 3. Reverse Exponential Power with Another DataFrame

```python
other = pd.DataFrame({'angles': [1, 2, 3],
                      'degrees': [10, 20, 30]},
                     index=['circle', 'triangle', 'rectangle'])

# Reverse exponential power with another DataFrame
result = df.rpow(other)
print(result)
```

Output:

```
           angles   degrees
circle          0  1.296e+77
triangle        9  1.532e+54
rectangle      64  1.296e+77
```

#### 4. Reverse Exponential Power with `fill_value`

```python
# Reverse exponential power with fill_value
result = df.rpow(other, fill_value=1)
print(result)
```

Output:

```
           angles   degrees
circle          0  1.296e+77
triangle        9  1.532e+54
rectangle      64  1.296e+77
```

#### 5. Reverse Exponential Power with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'angles': [0, 3, 4, 4, 5, 6],
                             'degrees': [360, 180, 360, 360, 540, 720]},
                            index=[['A', 'A', 'A', 'B', 'B', 'B'],
                                   ['circle', 'triangle', 'rectangle',
                                    'square', 'pentagon', 'hexagon']])

# Reverse exponential power by level
result = df.rpow(df_multindex, level=1, fill_value=0)
print(result)
```

Output:

```
             angles   degrees
A circle        NaN      1.0
  triangle      1.0      1.0
  rectangle     1.0      1.0
B square        0.0      0.0
  pentagon      0.0      0.0
  hexagon       0.0      0.0
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together, and missing values will be filled with `NaN` unless a `fill_value` is specified.
- **Reverse Version**: The `rpow` method is the reverse version of `pow`. It is equivalent to `other ** dataframe`.

### Related Methods:

- **`DataFrame.pow`**: Element-wise exponential power (`dataframe ** other`).
- **`DataFrame.add`**: Element-wise addition.
- **`DataFrame.sub`**: Element-wise subtraction.
- **`DataFrame.mul`**: Element-wise multiplication.
- **`DataFrame.div`**: Element-wise float division.
- **`DataFrame.truediv`**: Element-wise float division.
- **`DataFrame.floordiv`**: Element-wise integer division.
- **`DataFrame.mod`**: Element-wise modulo operation.

### Summary:

- **`DataFrame.rpow`** performs element-wise exponential power with the operands reversed (`other ** dataframe`).
- It supports alignment of indices and handling of missing values using `fill_value`.
- It is part of the flexible arithmetic operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.lt
DataFrame.lt(other, axis='columns', level=None)[source]
Get Less than of dataframe and other, element-wise (binary operator lt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
:
other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns
:
DataFrame of bool
Result of the comparison.

See also

DataFrame.eq
Compare DataFrames for equality elementwise.

DataFrame.ne
Compare DataFrames for inequality elementwise.

DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])
df
   cost  revenue
A   250      100
B   150      250
C   100      300
Comparison with a scalar, using either the operator or method:

df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False
When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True
Use the method to control the broadcast axis:

df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False
Use the method to control the axis:

df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False
Compare to a DataFrame of different shape.

other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
                     index=['A', 'B', 'C', 'D'])
other
   revenue
A      300
B      250
C      100
D      150
df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
Compare to a MultiIndex by level.

df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])
df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False """

The `pandas.DataFrame.lt` method is used to perform **element-wise less than comparison** between a DataFrame and another object (scalar, sequence, Series, or DataFrame). It returns a DataFrame of boolean values indicating whether each element in the DataFrame is less than the corresponding element in `other`.

### Key Points:

- **Less Than Comparison**: The operation computes `dataframe < other` element-wise.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Boolean Result**: The result is a DataFrame of boolean values (`True` or `False`).

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, or DataFrame. It is the object to compare with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`). Default is `'columns'`.
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.

### Returns:

- **DataFrame of bool**: The result of the element-wise less than comparison.

### Examples:

#### 1. Less Than Comparison with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])

# Less than comparison with a scalar
result = df.lt(200)
print(result)
```

Output:

```
    cost  revenue
A  False     True
B   True    False
C   True    False
```

#### 2. Less Than Comparison with a Series

```python
# Less than comparison with a Series
series = pd.Series([200, 150], index=['cost', 'revenue'])
result = df.lt(series)
print(result)
```

Output:

```
    cost  revenue
A  False     True
B   True    False
C   True    False
```

#### 3. Less Than Comparison with Another DataFrame

```python
other = pd.DataFrame({'cost': [200, 100, 150],
                      'revenue': [150, 200, 250]},
                     index=['A', 'B', 'C'])

# Less than comparison with another DataFrame
result = df.lt(other)
print(result)
```

Output:

```
    cost  revenue
A  False     True
B  False    False
C   True    False
```

#### 4. Less Than Comparison with `axis` Parameter

```python
# Less than comparison with axis parameter
result = df.lt([200, 150], axis='index')
print(result)
```

Output:

```
    cost  revenue
A  False     True
B   True    False
C   True    False
```

#### 5. Less Than Comparison with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])

# Less than comparison by level
result = df.lt(df_multindex, level=1)
print(result)
```

Output:

```
       cost  revenue
Q1 A  False     True
   B  False     True
   C  False     True
Q2 A  False     True
   B   True    False
   C   True    False
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together.
- **NaN Values**: NaN values are considered different (i.e., `NaN != NaN`).

### Related Methods:

- **`DataFrame.eq`**: Element-wise equality comparison.
- **`DataFrame.ne`**: Element-wise inequality comparison.
- **`DataFrame.le`**: Element-wise less than or equal to comparison.
- **`DataFrame.lt`**: Element-wise less than comparison.
- **`DataFrame.ge`**: Element-wise greater than or equal to comparison.
- **`DataFrame.gt`**: Element-wise greater than comparison.

### Summary:

- **`DataFrame.lt`** performs element-wise less than comparison (`dataframe < other`).
- It supports alignment of indices and handling of missing values.
- It is part of the flexible comparison operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.gt
DataFrame.gt(other, axis='columns', level=None)[source]
Get Greater than of dataframe and other, element-wise (binary operator gt).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
:
other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns
:
DataFrame of bool
Result of the comparison.

See also

DataFrame.eq
Compare DataFrames for equality elementwise.

DataFrame.ne
Compare DataFrames for inequality elementwise.

DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])
df
   cost  revenue
A   250      100
B   150      250
C   100      300
Comparison with a scalar, using either the operator or method:

df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False
When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True
Use the method to control the broadcast axis:

df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False
Use the method to control the axis:

df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False
Compare to a DataFrame of different shape.

other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
                     index=['A', 'B', 'C', 'D'])
other
   revenue
A      300
B      250
C      100
D      150
df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
Compare to a MultiIndex by level.

df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])
df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False """

The `pandas.DataFrame.gt` method is used to perform **element-wise greater than comparison** between a DataFrame and another object (scalar, sequence, Series, or DataFrame). It returns a DataFrame of boolean values indicating whether each element in the DataFrame is greater than the corresponding element in `other`.

### Key Points:

- **Greater Than Comparison**: The operation computes `dataframe > other` element-wise.
- **Alignment**: The operation aligns the indices of the DataFrame and the `other` object.
- **Boolean Result**: The result is a DataFrame of boolean values (`True` or `False`).

### Parameters:

- **`other`**: This can be a scalar, sequence, Series, or DataFrame. It is the object to compare with the DataFrame.
- **`axis`**: Determines whether to align the operation by index (`0` or `'index'`) or columns (`1` or `'columns'`). Default is `'columns'`.
- **`level`**: If the DataFrame has a MultiIndex, this parameter specifies the level to broadcast across.

### Returns:

- **DataFrame of bool**: The result of the element-wise greater than comparison.

### Examples:

#### 1. Greater Than Comparison with a Scalar

```python
import pandas as pd

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])

# Greater than comparison with a scalar
result = df.gt(200)
print(result)
```

Output:

```
    cost  revenue
A   True    False
B  False     True
C  False     True
```

#### 2. Greater Than Comparison with a Series

```python
# Greater than comparison with a Series
series = pd.Series([200, 150], index=['cost', 'revenue'])
result = df.gt(series)
print(result)
```

Output:

```
    cost  revenue
A   True    False
B  False     True
C  False     True
```

#### 3. Greater Than Comparison with Another DataFrame

```python
other = pd.DataFrame({'cost': [200, 100, 150],
                      'revenue': [150, 200, 250]},
                     index=['A', 'B', 'C'])

# Greater than comparison with another DataFrame
result = df.gt(other)
print(result)
```

Output:

```
    cost  revenue
A   True    False
B   True     True
C  False     True
```

#### 4. Greater Than Comparison with `axis` Parameter

```python
# Greater than comparison with axis parameter
result = df.gt([200, 150], axis='index')
print(result)
```

Output:

```
    cost  revenue
A   True    False
B  False     True
C  False     True
```

#### 5. Greater Than Comparison with a MultiIndex DataFrame

```python
df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])

# Greater than comparison by level
result = df.gt(df_multindex, level=1)
print(result)
```

Output:

```
       cost  revenue
Q1 A  False    False
   B  False    False
   C  False    False
Q2 A  False    False
   B  False     True
   C  False    False
```

### Notes:

- **Mismatched Indices**: If the indices of the DataFrame and `other` do not match, they will be unioned together.
- **NaN Values**: NaN values are considered different (i.e., `NaN != NaN`).

### Related Methods:

- **`DataFrame.eq`**: Element-wise equality comparison.
- **`DataFrame.ne`**: Element-wise inequality comparison.
- **`DataFrame.le`**: Element-wise less than or equal to comparison.
- **`DataFrame.lt`**: Element-wise less than comparison.
- **`DataFrame.ge`**: Element-wise greater than or equal to comparison.
- **`DataFrame.gt`**: Element-wise greater than comparison.

### Summary:

- **`DataFrame.gt`** performs element-wise greater than comparison (`dataframe > other`).
- It supports alignment of indices and handling of missing values.
- It is part of the flexible comparison operation wrappers in pandas.


In [None]:
""" pandas.DataFrame.le
DataFrame.le(other, axis='columns', level=None)[source]
Get Less than or equal to of dataframe and other, element-wise (binary operator le).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
:
other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns
:
DataFrame of bool
Result of the comparison.

See also

DataFrame.eq
Compare DataFrames for equality elementwise.

DataFrame.ne
Compare DataFrames for inequality elementwise.

DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])
df
   cost  revenue
A   250      100
B   150      250
C   100      300
Comparison with a scalar, using either the operator or method:

df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False
When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True
Use the method to control the broadcast axis:

df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False
Use the method to control the axis:

df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False
Compare to a DataFrame of different shape.

other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
                     index=['A', 'B', 'C', 'D'])
other
   revenue
A      300
B      250
C      100
D      150
df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
Compare to a MultiIndex by level.

df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])
df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False """

The **`pandas.DataFrame.le`** method is used to perform element-wise **"less than or equal to"** comparison between a DataFrame and another object (scalar, sequence, Series, or DataFrame). It is one of the flexible comparison methods provided by Pandas, along with `eq`, `ne`, `lt`, `gt`, and `ge`. Below is a detailed explanation of the **`DataFrame.le`** method, including its syntax, parameters, return value, and examples.

---

### **1. Syntax**

```python
DataFrame.le(other, axis='columns', level=None)
```

---

### **2. Parameters**

| Parameter   | Description                                                                                                                              |
| ----------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
| **`other`** | The object to compare with the DataFrame. Can be a scalar, sequence, Series, or DataFrame.                                               |
| **`axis`**  | The axis to align the comparison on. <br> - `0` or `'index'`: Compare by index. <br> - `1` or `'columns'`: Compare by columns (default). |
| **`level`** | If the DataFrame has a MultiIndex, specify the level to broadcast the comparison across.                                                 |

---

### **3. Return Value**

- Returns a **DataFrame of boolean values** (`True` or `False`) indicating the result of the element-wise comparison.

---

### **4. Key Notes**

- **Alignment**: The comparison aligns the indices of the DataFrame and the `other` object.
- **NaN Handling**: NaN values are considered unequal (i.e., `NaN != NaN`).
- **Broadcasting**: If `other` is a scalar or a sequence, it is broadcasted to match the shape of the DataFrame.

---

### **5. Examples**

#### **5.1 Comparison with a Scalar**

Compare each element of the DataFrame with a scalar value.

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'cost': [250, 150, 100], 'revenue': [100, 250, 300]}, index=['A', 'B', 'C'])
print(df)
```

**Output:**

```
   cost  revenue
A   250      100
B   150      250
C   100      300
```

```python
# Compare with a scalar
print(df.le(150))
```

**Output:**

```
    cost  revenue
A  False     True
B   True    False
C   True    False
```

---

#### **5.2 Comparison with a Series**

Compare each column of the DataFrame with a Series. The Series index aligns with the DataFrame columns.

```python
# Create a Series
s = pd.Series([100, 250], index=['cost', 'revenue'])

# Compare with the Series
print(df.le(s))
```

**Output:**

```
    cost  revenue
A  False     True
B  False     True
C   True    False
```

---

#### **5.3 Comparison with Another DataFrame**

Compare two DataFrames element-wise. The indices and columns must align.

```python
# Create another DataFrame
other_df = pd.DataFrame({'cost': [200, 150, 50], 'revenue': [100, 300, 250]}, index=['A', 'B', 'C'])

# Compare with the other DataFrame
print(df.le(other_df))
```

**Output:**

```
    cost  revenue
A  False     True
B   True     True
C  False    False
```

---

#### **5.4 Comparison Along Rows (`axis='index'`)**

Compare each row of the DataFrame with a sequence or Series along the row axis.

```python
# Compare with a list along rows
print(df.le([250, 250, 100], axis='index'))
```

**Output:**

```
    cost  revenue
A   True     True
B   True    False
C  False    False
```

---

#### **5.5 Comparison with MultiIndex**

Compare a DataFrame with a MultiIndex DataFrame using the `level` parameter.

```python
# Create a MultiIndex DataFrame
df_multiindex = pd.DataFrame(
    {'cost': [250, 150, 100, 150, 300, 220], 'revenue': [100, 250, 300, 200, 175, 225]},
    index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ['A', 'B', 'C', 'A', 'B', 'C']]
)
print(df_multiindex)
```

**Output:**

```
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
```

```python
# Compare with the original DataFrame at level 1
print(df.le(df_multiindex, level=1))
```

**Output:**

```
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
```

---

### **6. Comparison with Other Operators**

Pandas provides similar methods for other comparison operators:

| Method   | Operator | Description              |
| -------- | -------- | ------------------------ |
| **`eq`** | `==`     | Equal to                 |
| **`ne`** | `!=`     | Not equal to             |
| **`lt`** | `<`      | Less than                |
| **`le`** | `<=`     | Less than or equal to    |
| **`gt`** | `>`      | Greater than             |
| **`ge`** | `>=`     | Greater than or equal to |

---

### **7. Practical Use Cases**

- **Filtering Data**: Use comparison results to filter rows or columns.
- **Conditional Operations**: Perform operations based on comparison results.
- **Data Validation**: Check if values in a DataFrame meet certain conditions.

---

### **8. Summary**

- The **`DataFrame.le`** method performs element-wise "less than or equal to" comparisons.
- It supports comparison with scalars, sequences, Series, and DataFrames.
- Use the `axis` parameter to control the alignment of the comparison.
- Use the `level` parameter for MultiIndex DataFrames.

By mastering this method, you can efficiently compare and analyze data in Pandas DataFrames. Let me know if you need further clarification or additional examples!


In [None]:
""" pandas.DataFrame.ge
DataFrame.ge(other, axis='columns', level=None)[source]
Get Greater than or equal to of dataframe and other, element-wise (binary operator ge).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
:
other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns
:
DataFrame of bool
Result of the comparison.

See also

DataFrame.eq
Compare DataFrames for equality elementwise.

DataFrame.ne
Compare DataFrames for inequality elementwise.

DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])
df
   cost  revenue
A   250      100
B   150      250
C   100      300
Comparison with a scalar, using either the operator or method:

df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False
When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True
Use the method to control the broadcast axis:

df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False
Use the method to control the axis:

df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False
Compare to a DataFrame of different shape.

other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
                     index=['A', 'B', 'C', 'D'])
other
   revenue
A      300
B      250
C      100
D      150
df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
Compare to a MultiIndex by level.

df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])
df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False """

The `pandas.DataFrame.ge` method is used to perform element-wise comparison of a DataFrame with another object (scalar, sequence, Series, or DataFrame) to check if the elements in the DataFrame are **greater than or equal to** the corresponding elements in the other object. The method returns a DataFrame of boolean values indicating the result of the comparison.

Here’s a detailed explanation of the syntax, parameters, and usage:

---

### **Syntax**

```python
DataFrame.ge(other, axis='columns', level=None)
```

---

### **Parameters**

1. **`other`**:

   - The object to compare with the DataFrame.
   - Can be a scalar, sequence, Series, or DataFrame.
   - If `other` is a DataFrame, it must have the same shape as the original DataFrame unless alignment is performed using `axis` or `level`.

2. **`axis`**:

   - Determines the axis along which the comparison is performed.
   - Options:
     - `0` or `'index'`: Compare along the index (rows).
     - `1` or `'columns'`: Compare along the columns (default).
   - If `other` is a Series, the alignment is performed based on the `axis`.

3. **`level`**:
   - Used when comparing with a MultiIndex DataFrame.
   - Specifies the level of the MultiIndex to align the comparison.
   - If `None`, the comparison is performed element-wise without considering the MultiIndex levels.

---

### **Returns**

- A DataFrame of boolean values (`True` or `False`) indicating whether each element in the original DataFrame is **greater than or equal to** the corresponding element in `other`.

---

### **Notes**

- Mismatched indices between the DataFrame and `other` will be unioned, and missing values will be filled with `NaN`.
- `NaN` values are considered unequal (i.e., `NaN != NaN`).

---

### **Examples**

#### 1. **Comparison with a Scalar**

```python
import pandas as pd

df = pd.DataFrame({'cost': [250, 150, 100], 'revenue': [100, 250, 300]}, index=['A', 'B', 'C'])

# Compare if elements are greater than or equal to 150
result = df.ge(150)
print(result)
```

**Output:**

```
    cost  revenue
A   True    False
B   True     True
C  False     True
```

---

#### 2. **Comparison with a Series**

```python
# Compare with a Series (aligned by index)
other = pd.Series([100, 250], index=['cost', 'revenue'])
result = df.ge(other, axis='columns')
print(result)
```

**Output:**

```
    cost  revenue
A   True    False
B   True     True
C   True     True
```

---

#### 3. **Comparison with a DataFrame**

```python
# Compare with another DataFrame
other_df = pd.DataFrame({'cost': [200, 150, 50], 'revenue': [50, 300, 350]}, index=['A', 'B', 'C'])
result = df.ge(other_df)
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B   True    False
C   True    False
```

---

#### 4. **Comparison with a MultiIndex DataFrame**

```python
# Create a MultiIndex DataFrame
df_multiindex = pd.DataFrame(
    {'cost': [250, 150, 100, 150, 300, 220], 'revenue': [100, 250, 300, 200, 175, 225]},
    index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ['A', 'B', 'C', 'A', 'B', 'C']]
)

# Compare with another DataFrame at a specific MultiIndex level
result = df.ge(df_multiindex, level=1)
print(result)
```

**Output:**

```
        cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
```

---

### **Related Methods**

- `DataFrame.eq`: Equality comparison.
- `DataFrame.ne`: Inequality comparison.
- `DataFrame.le`: Less than or equal to comparison.
- `DataFrame.lt`: Less than comparison.
- `DataFrame.gt`: Greater than comparison.

---

### **Key Points**

- Use `axis` to control the alignment of the comparison (rows or columns).
- Use `level` for MultiIndex DataFrames to specify the level of alignment.
- The result is a boolean DataFrame indicating the comparison result.

This method is particularly useful for filtering or conditional operations in data analysis.


In [None]:
""" pandas.DataFrame.ne
DataFrame.ne(other, axis='columns', level=None)[source]
Get Not equal to of dataframe and other, element-wise (binary operator ne).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
:
other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns
:
DataFrame of bool
Result of the comparison.

See also

DataFrame.eq
Compare DataFrames for equality elementwise.

DataFrame.ne
Compare DataFrames for inequality elementwise.

DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])
df
   cost  revenue
A   250      100
B   150      250
C   100      300
Comparison with a scalar, using either the operator or method:

df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False
When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True
Use the method to control the broadcast axis:

df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False
Use the method to control the axis:

df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False
Compare to a DataFrame of different shape.

other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
                     index=['A', 'B', 'C', 'D'])
other
   revenue
A      300
B      250
C      100
D      150
df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
Compare to a MultiIndex by level.

df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])
df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False """

The `pandas.DataFrame.ne` method is used to perform element-wise **inequality comparison** between a DataFrame and another object (scalar, sequence, Series, or DataFrame). It checks if the elements in the DataFrame are **not equal** to the corresponding elements in the other object. The method returns a DataFrame of boolean values (`True` or `False`) indicating the result of the comparison.

---

### **Syntax**

```python
DataFrame.ne(other, axis='columns', level=None)
```

---

### **Parameters**

1. **`other`**:

   - The object to compare with the DataFrame.
   - Can be a scalar, sequence, Series, or DataFrame.
   - If `other` is a DataFrame, it must have the same shape as the original DataFrame unless alignment is performed using `axis` or `level`.

2. **`axis`**:

   - Determines the axis along which the comparison is performed.
   - Options:
     - `0` or `'index'`: Compare along the index (rows).
     - `1` or `'columns'`: Compare along the columns (default).
   - If `other` is a Series, the alignment is performed based on the `axis`.

3. **`level`**:
   - Used when comparing with a MultiIndex DataFrame.
   - Specifies the level of the MultiIndex to align the comparison.
   - If `None`, the comparison is performed element-wise without considering the MultiIndex levels.

---

### **Returns**

- A DataFrame of boolean values (`True` or `False`) indicating whether each element in the original DataFrame is **not equal** to the corresponding element in `other`.

---

### **Notes**

- Mismatched indices between the DataFrame and `other` will be unioned, and missing values will be filled with `NaN`.
- `NaN` values are considered unequal (i.e., `NaN != NaN`).

---

### **Examples**

#### 1. **Comparison with a Scalar**

```python
import pandas as pd

df = pd.DataFrame({'cost': [250, 150, 100], 'revenue': [100, 250, 300]}, index=['A', 'B', 'C'])

# Compare if elements are not equal to 150
result = df.ne(150)
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B  False     True
C   True     True
```

---

#### 2. **Comparison with a Series**

```python
# Compare with a Series (aligned by index)
other = pd.Series([100, 250], index=['cost', 'revenue'])
result = df.ne(other, axis='columns')
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B   True    False
C  False     True
```

---

#### 3. **Comparison with a DataFrame**

```python
# Compare with another DataFrame
other_df = pd.DataFrame({'cost': [200, 150, 50], 'revenue': [50, 300, 350]}, index=['A', 'B', 'C'])
result = df.ne(other_df)
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B  False     True
C   True     True
```

---

#### 4. **Comparison with a MultiIndex DataFrame**

```python
# Create a MultiIndex DataFrame
df_multiindex = pd.DataFrame(
    {'cost': [250, 150, 100, 150, 300, 220], 'revenue': [100, 250, 300, 200, 175, 225]},
    index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ['A', 'B', 'C', 'A', 'B', 'C']]
)

# Compare with another DataFrame at a specific MultiIndex level
result = df.ne(df_multiindex, level=1)
print(result)
```

**Output:**

```
        cost  revenue
Q1 A  False    False
   B  False    False
   C  False    False
Q2 A   True     True
   B   True     True
   C   True     True
```

---

### **Related Methods**

- `DataFrame.eq`: Equality comparison.
- `DataFrame.le`: Less than or equal to comparison.
- `DataFrame.lt`: Less than comparison.
- `DataFrame.ge`: Greater than or equal to comparison.
- `DataFrame.gt`: Greater than comparison.

---

### **Key Points**

- Use `axis` to control the alignment of the comparison (rows or columns).
- Use `level` for MultiIndex DataFrames to specify the level of alignment.
- The result is a boolean DataFrame indicating the comparison result.

This method is particularly useful for filtering or conditional operations in data analysis. For example, you can use it to identify mismatched values between two datasets or filter out rows/columns that do not meet certain conditions.


In [None]:
""" pandas.DataFrame.eq
DataFrame.eq(other, axis='columns', level=None)[source]
Get Equal to of dataframe and other, element-wise (binary operator eq).

Among flexible wrappers (eq, ne, le, lt, ge, gt) to comparison operators.

Equivalent to ==, !=, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison.

Parameters
:
other
scalar, sequence, Series, or DataFrame
Any single or multiple element data structure, or list-like object.

axis
{0 or ‘index’, 1 or ‘columns’}, default ‘columns’
Whether to compare by the index (0 or ‘index’) or columns (1 or ‘columns’).

level
int or label
Broadcast across a level, matching Index values on the passed MultiIndex level.

Returns
:
DataFrame of bool
Result of the comparison.

See also

DataFrame.eq
Compare DataFrames for equality elementwise.

DataFrame.ne
Compare DataFrames for inequality elementwise.

DataFrame.le
Compare DataFrames for less than inequality or equality elementwise.

DataFrame.lt
Compare DataFrames for strictly less than inequality elementwise.

DataFrame.ge
Compare DataFrames for greater than inequality or equality elementwise.

DataFrame.gt
Compare DataFrames for strictly greater than inequality elementwise.

Notes

Mismatched indices will be unioned together. NaN values are considered different (i.e. NaN != NaN).

Examples

df = pd.DataFrame({'cost': [250, 150, 100],
                   'revenue': [100, 250, 300]},
                  index=['A', 'B', 'C'])
df
   cost  revenue
A   250      100
B   150      250
C   100      300
Comparison with a scalar, using either the operator or method:

df == 100
    cost  revenue
A  False     True
B  False    False
C   True    False
df.eq(100)
    cost  revenue
A  False     True
B  False    False
C   True    False
When other is a Series, the columns of a DataFrame are aligned with the index of other and broadcast:

df != pd.Series([100, 250], index=["cost", "revenue"])
    cost  revenue
A   True     True
B   True    False
C  False     True
Use the method to control the broadcast axis:

df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
When comparing to an arbitrary sequence, the number of columns must match the number elements in other:

df == [250, 100]
    cost  revenue
A   True     True
B  False    False
C  False    False
Use the method to control the axis:

df.eq([250, 250, 100], axis='index')
    cost  revenue
A   True    False
B  False     True
C   True    False
Compare to a DataFrame of different shape.

other = pd.DataFrame({'revenue': [300, 250, 100, 150]},
                     index=['A', 'B', 'C', 'D'])
other
   revenue
A      300
B      250
C      100
D      150
df.gt(other)
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
Compare to a MultiIndex by level.

df_multindex = pd.DataFrame({'cost': [250, 150, 100, 150, 300, 220],
                             'revenue': [100, 250, 300, 200, 175, 225]},
                            index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'],
                                   ['A', 'B', 'C', 'A', 'B', 'C']])
df_multindex
      cost  revenue
Q1 A   250      100
   B   150      250
   C   100      300
Q2 A   150      200
   B   300      175
   C   220      225
df.le(df_multindex, level=1)
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False """

The `pandas.DataFrame.eq` method is used to perform element-wise **equality comparison** between a DataFrame and another object (scalar, sequence, Series, or DataFrame). It checks if the elements in the DataFrame are **equal** to the corresponding elements in the other object. The method returns a DataFrame of boolean values (`True` or `False`) indicating the result of the comparison.

---

### **Syntax**

```python
DataFrame.eq(other, axis='columns', level=None)
```

---

### **Parameters**

1. **`other`**:

   - The object to compare with the DataFrame.
   - Can be a scalar, sequence, Series, or DataFrame.
   - If `other` is a DataFrame, it must have the same shape as the original DataFrame unless alignment is performed using `axis` or `level`.

2. **`axis`**:

   - Determines the axis along which the comparison is performed.
   - Options:
     - `0` or `'index'`: Compare along the index (rows).
     - `1` or `'columns'`: Compare along the columns (default).
   - If `other` is a Series, the alignment is performed based on the `axis`.

3. **`level`**:
   - Used when comparing with a MultiIndex DataFrame.
   - Specifies the level of the MultiIndex to align the comparison.
   - If `None`, the comparison is performed element-wise without considering the MultiIndex levels.

---

### **Returns**

- A DataFrame of boolean values (`True` or `False`) indicating whether each element in the original DataFrame is **equal** to the corresponding element in `other`.

---

### **Notes**

- Mismatched indices between the DataFrame and `other` will be unioned, and missing values will be filled with `NaN`.
- `NaN` values are considered unequal (i.e., `NaN != NaN`).

---

### **Examples**

#### 1. **Comparison with a Scalar**

```python
import pandas as pd

df = pd.DataFrame({'cost': [250, 150, 100], 'revenue': [100, 250, 300]}, index=['A', 'B', 'C'])

# Compare if elements are equal to 100
result = df.eq(100)
print(result)
```

**Output:**

```
    cost  revenue
A  False     True
B  False    False
C   True    False
```

---

#### 2. **Comparison with a Series**

```python
# Compare with a Series (aligned by index)
other = pd.Series([100, 250], index=['cost', 'revenue'])
result = df.eq(other, axis='columns')
print(result)
```

**Output:**

```
    cost  revenue
A  False    False
B  False     True
C   True    False
```

---

#### 3. **Comparison with a DataFrame**

```python
# Compare with another DataFrame
other_df = pd.DataFrame({'cost': [250, 150, 50], 'revenue': [100, 250, 350]}, index=['A', 'B', 'C'])
result = df.eq(other_df)
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B   True     True
C  False    False
```

---

#### 4. **Comparison with a MultiIndex DataFrame**

```python
# Create a MultiIndex DataFrame
df_multiindex = pd.DataFrame(
    {'cost': [250, 150, 100, 150, 300, 220], 'revenue': [100, 250, 300, 200, 175, 225]},
    index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ['A', 'B', 'C', 'A', 'B', 'C']]
)

# Compare with another DataFrame at a specific MultiIndex level
result = df.eq(df_multiindex, level=1)
print(result)
```

**Output:**

```
        cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False    False
   B  False    False
   C  False    False
```

---

### **Related Methods**

- `DataFrame.ne`: Inequality comparison.
- `DataFrame.le`: Less than or equal to comparison.
- `DataFrame.lt`: Less than comparison.
- `DataFrame.ge`: Greater than or equal to comparison.
- `DataFrame.gt`: Greater than comparison.

---

### **Key Points**

- Use `axis` to control the alignment of the comparison (rows or columns).
- Use `level` for MultiIndex DataFrames to specify the level of alignment.
- The result is a boolean DataFrame indicating the comparison result.

This method is particularly useful for filtering or conditional operations in data analysis. For example, you can use it to identify matching values between two datasets or filter rows/columns that meet specific conditions.


In [None]:
""" pandas.DataFrame.combine
DataFrame.combine(other, func, fill_value=None, overwrite=True)[source]
Perform column-wise combine with another DataFrame.

Combines a DataFrame with other DataFrame using func to element-wise combine columns. The row and column indexes of the resulting DataFrame will be the union of the two.

Parameters
:
other
DataFrame
The DataFrame to merge column-wise.

func
function
Function that takes two series as inputs and return a Series or a scalar. Used to merge the two dataframes column by columns.

fill_value
scalar value, default None
The value to fill NaNs with prior to passing any column to the merge func.

overwrite
bool, default True
If True, columns in self that do not exist in other will be overwritten with NaNs.

Returns
:
DataFrame
Combination of the provided DataFrames.

See also

DataFrame.combine_first
Combine two DataFrame objects and default to non-null values in frame calling the method.

Examples

Combine using a simple function that chooses the smaller column.

df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
take_smaller = lambda s1, s2: s1 if s1.sum() < s2.sum() else s2
df1.combine(df2, take_smaller)
   A  B
0  0  3
1  0  3
Example using a true element-wise combine function.

df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
df1.combine(df2, np.minimum)
   A  B
0  1  2
1  0  3
Using fill_value fills Nones prior to passing the column to the merge function.

df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
df1.combine(df2, take_smaller, fill_value=-5)
   A    B
0  0 -5.0
1  0  4.0
However, if the same element in both dataframes is None, that None is preserved

df1 = pd.DataFrame({'A': [0, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [None, 3]})
df1.combine(df2, take_smaller, fill_value=-5)
    A    B
0  0 -5.0
1  0  3.0
Example that demonstrates the use of overwrite and behavior when the axis differ between the dataframes.

df1 = pd.DataFrame({'A': [0, 0], 'B': [4, 4]})
df2 = pd.DataFrame({'B': [3, 3], 'C': [-10, 1], }, index=[1, 2])
df1.combine(df2, take_smaller)
     A    B     C
0  NaN  NaN   NaN
1  NaN  3.0 -10.0
2  NaN  3.0   1.0
df1.combine(df2, take_smaller, overwrite=False)
     A    B     C
0  0.0  NaN   NaN
1  0.0  3.0 -10.0
2  NaN  3.0   1.0
Demonstrating the preference of the passed in dataframe.

df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1], }, index=[1, 2])
df2.combine(df1, take_smaller)
   A    B   C
0  0.0  NaN NaN
1  0.0  3.0 NaN
2  NaN  3.0 NaN
df2.combine(df1, take_smaller, overwrite=False)
     A    B   C
0  0.0  NaN NaN
1  0.0  3.0 1.0
2  NaN  3.0 1.0 """

**pandas.DataFrame.eq()**  
The `eq()` method performs **element-wise equality comparison** between a DataFrame and another object (scalar, sequence, Series, or DataFrame). It is equivalent to the `==` operator but provides additional flexibility, such as specifying the axis or level for comparison.

---

### **Syntax**

`DataFrame.eq(other, axis='columns', level=None)`

**Parameters**:

- `other`: Scalar, sequence, Series, or DataFrame to compare with.
- `axis`: `{0 or 'index', 1 or 'columns'}`, default `'columns'`.
  - `0` or `'index'`: Compare rows.
  - `1` or `'columns'`: Compare columns.
- `level`: `int` or `label`.
  - Used with MultiIndex to specify the level for comparison.

**Returns**:

- A DataFrame of **boolean values** (`True`/`False`) indicating the result of the comparison.

---

### **Key Behaviors**

1. **Element-wise Comparison**:

   - Compares each element of the DataFrame with the corresponding element in `other`.
   - If `other` is a scalar, it compares all elements in the DataFrame to that scalar.

2. **Alignment**:

   - If `other` is a Series or DataFrame, the indices/columns are aligned before comparison.
   - Mismatched indices/columns are **unioned**, and missing values result in `False`.

3. **NaN Handling**:

   - `NaN` values are considered **not equal** to any value, including other `NaN` values.

4. **Flexibility**:
   - Can compare along rows (`axis=0`) or columns (`axis=1`).
   - Supports MultiIndex comparisons using the `level` parameter.

---

### **Examples**

#### **Example 1**: Comparison with a Scalar

```python
df = pd.DataFrame({'cost': [250, 150, 100], 'revenue': [100, 250, 300]}, index=['A', 'B', 'C'])
df.eq(100)
```

**Output**:

```
    cost  revenue
A  False     True
B  False    False
C   True    False
```

- Compares each element in `df` to the scalar `100`.

---

#### **Example 2**: Comparison with a Series

```python
df != pd.Series([100, 250], index=["cost", "revenue"])
```

**Output**:

```
    cost  revenue
A   True     True
B   True    False
C  False     True
```

- The Series is aligned with the columns of `df`.
- Each column is compared to the corresponding value in the Series.

---

#### **Example 3**: Comparison with a Sequence

```python
df.eq([250, 250, 100], axis='index')
```

**Output**:

```
    cost  revenue
A   True    False
B  False     True
C   True    False
```

- The sequence `[250, 250, 100]` is compared row-wise (`axis='index'`).
- Each row is compared to the corresponding value in the sequence.

---

#### **Example 4**: Comparison with a DataFrame

```python
other = pd.DataFrame({'revenue': [300, 250, 100, 150]}, index=['A', 'B', 'C', 'D'])
df.gt(other)
```

**Output**:

```
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
```

- Compares `df` with `other` element-wise.
- Missing indices/columns result in `False`.

---

#### **Example 5**: MultiIndex Comparison

```python
df_multindex = pd.DataFrame(
    {'cost': [250, 150, 100, 150, 300, 220], 'revenue': [100, 250, 300, 200, 175, 225]},
    index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ['A', 'B', 'C', 'A', 'B', 'C']]
)
df.le(df_multindex, level=1)
```

**Output**:

```
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True    False
```

- Compares `df` with `df_multindex` at `level=1` of the MultiIndex.

---

### **Comparison with Other Methods**

| Method | Description                      |
| ------ | -------------------------------- |
| `eq()` | Equal to (`==`).                 |
| `ne()` | Not equal to (`!=`).             |
| `le()` | Less than or equal to (`<=`).    |
| `lt()` | Less than (`<`).                 |
| `ge()` | Greater than or equal to (`>=`). |
| `gt()` | Greater than (`>`).              |

---

### **When to Use**

- Perform **element-wise comparisons** with flexibility in axis/level.
- Compare DataFrames with mismatched indices/columns.
- Handle MultiIndex comparisons.
- Replace the `==` operator when additional control is needed.


In [None]:
""" pandas.DataFrame.combine_first
DataFrame.combine_first(other)[source]
Update null elements with value in the same location in other.

Combine two DataFrame objects by filling null values in one DataFrame with non-null values from other DataFrame. The row and column indexes of the resulting DataFrame will be the union of the two. The resulting dataframe contains the ‘first’ dataframe values and overrides the second one values where both first.loc[index, col] and second.loc[index, col] are not missing values, upon calling first.combine_first(second).

Parameters
:
other
DataFrame
Provided DataFrame to use to fill null values.

Returns
:
DataFrame
The result of combining the provided DataFrame with the other object.

See also

DataFrame.combine
Perform series-wise operation on two DataFrames using a given function.

Examples

df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
df1.combine_first(df2)
     A    B
0  1.0  3.0
1  0.0  4.0
Null values still persist if the location of that null value does not exist in other

df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
df1.combine_first(df2)
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0 """

The `combine_first()` method in `pandas` is a powerful function used to handle null values in a DataFrame by filling them with corresponding values from another DataFrame. Here’s a detailed breakdown of how `DataFrame.combine_first()` works, along with its syntax, parameters, return values, examples, and some potential use cases.

### Syntax

```python
DataFrame.combine_first(other)
```

### Parameters

- **other**: `DataFrame`
  - This is the provided DataFrame used to fill null values in the calling DataFrame. The operation compares the indices and columns of both DataFrames to combine them.

### Returns

- **DataFrame**
  - The result is a new DataFrame containing values from the calling DataFrame (`self`) and filling in its null values with corresponding non-null values from the `other` DataFrame.

### Working Principle

- The method aligns the two DataFrames by their indices and columns. Wherever a null value exists in the calling DataFrame, it will be filled with the value from the same location in `other` if it’s non-null.
- If both DataFrames have a non-null value at the same index and column, the value from the calling DataFrame takes precedence.
- The resulting DataFrame will have a union of all indices and columns from both DataFrames.

### Examples

#### Example 1: Basic Usage

```python
import pandas as pd

df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})

result = df1.combine_first(df2)
print(result)
```

**Output:**

```
     A    B
0  1.0  3.0
1  0.0  4.0
```

#### Example 2: Handling Non-Overlapping Indices

```python
df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])

result = df1.combine_first(df2)
print(result)
```

**Output:**

```
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0
```

### Key Observations

- **Nulls Persist**: In the second example, row 0 in `df1` has null values that are not filled because the index of `df2` does not correspond.
- **Union of Indexes/Columns**: The output DataFrame includes rows and columns from both `df1` and `df2`.

### Use Cases

1. **Data Cleaning**: Quickly fill missing values in a DataFrame when you have another DataFrame with potentially relevant data.

2. **Merging Datasets**: Combine datasets from different sources where one might hold partial information, allowing you to consolidate data effectively.

3. **Time Series Alignment**: When working with time series data, you can fill missing timestamps in one DataFrame with values from another aligned DataFrame.

### Related Functions

- **DataFrame.combine()**: This function applies a function element-wise between two DataFrames instead of just filling in missing values. It allows for custom operations rather than simple filling.

### Conclusion

The `DataFrame.combine_first()` method is a convenient way to manage missing data by leveraging another DataFrame to fill gaps. Its ability to handle different shapes and indices makes it versatile for various data manipulation tasks in `pandas`. If you have any specific scenarios you'd like to address or further questions about this function, feel free to ask!


The `pandas.DataFrame.eq()` method is a powerful and versatile function used for element-wise comparison of values within a DataFrame against another entity (which can be a scalar, Series, another DataFrame, or a list-like object). This comparison returns a DataFrame of boolean values indicating whether each element meets the equality condition.

Here's a comprehensive breakdown of the `DataFrame.eq()` functionality:

### Syntax

```python
DataFrame.eq(other, axis='columns', level=None)
```

### Parameters

- **other**: `scalar`, `sequence`, `Series`, or `DataFrame`

  - The element(s) to compare against the DataFrame. This can be any value or structure such as a scalar, Series, or another DataFrame.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default is `'columns'`

  - Determines the axis to compare along. Use:
    - `0` or `'index'` to compare by rows.
    - `1` or `'columns'` to compare by columns.

- **level**: `int` or `label`, optional
  - For DataFrames with a MultiIndex, this indicates which level of the index to use for the comparison.

### Returns

- **DataFrame of bool**
  - The result is a new DataFrame containing boolean values. Each value is `True` if the corresponding element in the original DataFrame is equal to the value in `other`, and `False` otherwise.

### Notes

- When comparing different DataFrames, if their indices or columns do not match, they are unioned together.
- NaN values are treated as different; thus, comparisons with NaN will yield `False`.

### Related Functions

- **DataFrame.ne()**: Checks for inequality (element-wise `!=`).
- **DataFrame.le()**: Checks for less than or equal to (element-wise `<=`).
- **DataFrame.lt()**: Checks for strictly less than (element-wise `<`).
- **DataFrame.ge()**: Checks for greater than or equal to (element-wise `>=`).
- **DataFrame.gt()**: Checks for strictly greater than (element-wise `>`).

### Examples

#### Example 1: Basic Scalar Comparison

```python
import pandas as pd

df = pd.DataFrame({
    'cost': [250, 150, 100],
    'revenue': [100, 250, 300]
}, index=['A', 'B', 'C'])

# Using eq method
result = df.eq(100)
print(result)
```

**Output:**

```
    cost  revenue
A  False     True
B  False    False
C   True    False
```

#### Example 2: Comparison with a Series

```python
result = df.ne(pd.Series([100, 250], index=["cost", "revenue"]))
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B   True    False
C  False     True
```

#### Example 3: Specifying the Axis

```python
result = df.ne(pd.Series([100, 300], index=["A", "D"]), axis='index')
print(result)
```

**Output:**

```
   cost  revenue
A  True    False
B  True     True
C  True     True
D  True     True
```

#### Example 4: Comparison with a List

```python
result = df.eq([250, 100], axis='index')
print(result)
```

**Output:**

```
    cost  revenue
A   True     True
B  False    False
C  False    False
```

#### Example 5: Comparing with another DataFrame

```python
other = pd.DataFrame({
    'revenue': [300, 250, 100, 150]
}, index=['A', 'B', 'C', 'D'])

result = df.gt(other)
print(result)
```

**Output:**

```
    cost  revenue
A  False    False
B  False    False
C  False     True
D  False    False
```

#### Example 6: MultiIndex DataFrame Comparison

```python
df_multindex = pd.DataFrame({
    'cost': [250, 150, 100, 150, 300, 220],
    'revenue': [100, 250, 300, 200, 175, 225]
}, index=[['Q1', 'Q1', 'Q1', 'Q2', 'Q2', 'Q2'], ['A', 'B', 'C', 'A', 'B', 'C']])

result = df.le(df_multindex, level=1)
print(result)
```

**Output:**

```
       cost  revenue
Q1 A   True     True
   B   True     True
   C   True     True
Q2 A  False     True
   B   True    False
   C   True     True
```

### Conclusion

The `DataFrame.eq()` method is an essential tool in the `pandas` library for performing element-wise equality checks. Its flexibility in handling different types of inputs and alignment makes it particularly useful for data analysis and manipulation.

If you have any specific scenarios you want to implement or further questions about this method, feel free to ask!


**pandas.DataFrame.combine_first()**  
This method merges two DataFrames by filling **null (NaN)** values in the first DataFrame with corresponding non-null values from the second. The result combines the indices and columns of both DataFrames, prioritizing non-null values from the first DataFrame where overlaps occur.

---

### **Syntax**

`DataFrame.combine_first(other)`

**Parameters**:

- `other`: The second DataFrame used to fill null values in the first.

**Returns**:

- A new DataFrame with merged data.

---

### **Key Behaviors**

1. **Union of Indices/Columns**:

   - The resulting DataFrame includes **all rows** (union of indices) and **all columns** (union of columns) from both DataFrames.
   - Missing indices/columns in one DataFrame are added from the other.

2. **Priority Rules**:

   - If a cell in the **first DataFrame** is **not null**, it is retained.
   - If a cell in the first DataFrame is **null**, it is filled with the corresponding value from `other` (if non-null).
   - If both DataFrames have **null** in the same cell, the result remains **null**.

3. **Non-Overlapping Data**:
   - Rows/columns present in `other` but not in the first DataFrame are added to the result.

---

### **Examples**

#### **Example 1**: Basic Usage

```python
df1 = pd.DataFrame({'A': [None, 0], 'B': [None, 4]})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3]})
result = df1.combine_first(df2)
```

**Output**:

```
     A    B
0  1.0  3.0
1  0.0  4.0
```

- `df1`’s nulls (row 0) are filled with `df2`’s values.
- Non-null values in `df1` (row 1) override `df2`.

---

#### **Example 2**: Non-Overlapping Indices/Columns

```python
df1 = pd.DataFrame({'A': [None, 0], 'B': [4, None]})
df2 = pd.DataFrame({'B': [3, 3], 'C': [1, 1]}, index=[1, 2])
result = df1.combine_first(df2)
```

**Output**:

```
     A    B    C
0  NaN  4.0  NaN
1  0.0  3.0  1.0
2  NaN  3.0  1.0
```

- **Row 0**: Only exists in `df1` → `A` remains `NaN`, `B=4.0` (from `df1`).
- **Row 1**: `df1.A=0.0` is kept; `df1.B` is `NaN` → filled with `df2.B=3.0`. Column `C` is added from `df2`.
- **Row 2**: Exists only in `df2` → `B=3.0` and `C=1.0` are retained; `A` is `NaN` (not present in `df2`).

---

### **Comparison with Similar Methods**

- **`fillna()`**: Requires matching indices/columns. Does not merge new indices/columns.
- **`update()`**: Modifies the first DataFrame in-place and does not add new indices/columns.
- **`combine()`**: Uses a custom function to resolve overlaps (e.g., `max`, `min`).

---

### **When to Use**

- Merge two DataFrames with **partial overlaps** (indices/columns).
- Prioritize non-null values from the first DataFrame while filling gaps with the second.
- Example use cases: Merging sensor data from overlapping time periods, combining partial datasets.


In [None]:
""" pandas.DataFrame.apply
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, **kwargs)[source]
Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

Parameters:
funcfunction
Function to apply to each column or row.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Axis along which the function is applied:

0 or ‘index’: apply function to each column.

1 or ‘columns’: apply function to each row.

rawbool, default False
Determines if row or column is passed as a Series or ndarray object:

False : passes each row or column as a Series to the function.

True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

result_type{‘expand’, ‘reduce’, ‘broadcast’, None}, default None
These only act when axis=1 (columns):

‘expand’ : list-like results will be turned into columns.

‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.

‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

argstuple
Positional arguments to pass to func in addition to the array/series.

by_rowFalse or “compat”, default “compat”
Only has an effect when func is a listlike or dictlike of funcs and the func isn’t a string. If “compat”, will if possible first translate the func into pandas methods (e.g. Series().apply(np.sum) will be translated to Series().sum()). If that doesn’t work, will try call to apply again with by_row=True and if that fails, will call apply again with by_row=False (backward compatible). If False, the funcs will be passed the whole Series at once.

Added in version 2.1.0.

engine{‘python’, ‘numba’}, default ‘python’
Choose between the python (default) engine or the numba engine in apply.

The numba engine will attempt to JIT compile the passed function, which may result in speedups for large DataFrames. It also supports the following engine_kwargs :

nopython (compile the function in nopython mode)

nogil (release the GIL inside the JIT compiled function)

parallel (try to apply the function in parallel over the DataFrame)

Note: Due to limitations within numba/how pandas interfaces with numba, you should only use this if raw=True

Note: The numba compiler only supports a subset of valid Python/numpy operations.

Please read more about the supported python features and supported numpy features in numba to learn what you can or cannot use in the passed function.

Added in version 2.2.0.

engine_kwargsdict
Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.

**kwargs
Additional keyword arguments to pass as keywords arguments to func.

Returns:
Series or DataFrame
Result of applying func along the given axis of the DataFrame.

See also

DataFrame.map
For elementwise operations.

DataFrame.aggregate
Only perform aggregating type operations.

DataFrame.transform
Only perform transforming type operations.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

Examples

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df
   A  B
0  4  9
1  4  9
2  4  9
Using a numpy universal function (in this case the same as np.sqrt(df)):

df.apply(np.sqrt)
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0
Using a reducing function on either axis

df.apply(np.sum, axis=0)
A    12
B    27
dtype: int64
df.apply(np.sum, axis=1)
0    13
1    13
2    13
dtype: int64
Returning a list-like will result in a Series

df.apply(lambda x: [1, 2], axis=1)
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object
Passing result_type='expand' will expand list-like results to columns of a Dataframe

df.apply(lambda x: [1, 2], axis=1, result_type='expand')
   0  1
0  1  2
1  1  2
2  1  2
Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.

df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
   foo  bar
0    1    2
1    1    2
2    1    2
Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
   A  B
0  1  2
1  1  2
2  1  2 """

The `pandas.DataFrame.apply()` method is a versatile function that allows you to apply a specified function along a particular axis of a DataFrame. This can be useful for performing operations on rows or columns, transforming data, or aggregating results. Below is a detailed overview of the method, including its syntax, parameters, return values, and examples.

### Syntax

```python
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, **kwargs)
```

### Parameters

- **func**: `function`

  - This is the function to apply to each column or row of the DataFrame.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default is `0`

  - Determines the axis along which the function is applied:
    - `0` or `'index'`: apply function to each column.
    - `1` or `'columns'`: apply function to each row.

- **raw**: `bool`, default is `False`

  - If `False`, each row or column is passed as a Series to the function. If `True`, the function receives ndarray objects instead, which can improve performance for NumPy operations.

- **result_type**: `{‘expand’, ‘reduce’, ‘broadcast’, None}`, default is `None`

  - This parameter only affects the output when `axis=1` (columns):
    - `'expand'`: list-like results will be turned into columns.
    - `'reduce'`: attempts to return a Series if possible rather than expanding list-like results.
    - `'broadcast'`: results will be broadcast to the original shape of the DataFrame.

- **args**: `tuple`

  - Positional arguments to pass to `func` in addition to the array/Series.

- **by_row**: `False` or `"compat"`, default is `"compat"`

  - Affects behavior when `func` is a list-like or dict-like of functions. If `"compat"`, it will attempt to translate the function into pandas methods first.

- **engine**: `{‘python’, ‘numba’}`, default is `'python'`

  - Specifies the engine to use for applying the function. The `numba` engine can provide speedups for large DataFrames when `raw=True`.

- **engine_kwargs**: `dict`

  - Keyword arguments for the engine, primarily used with the `numba` engine.

- **kwargs**: additional keyword arguments
  - Additional keyword arguments to pass to `func`.

### Returns

- **Series or DataFrame**
  - The result of applying `func` along the specified axis of the DataFrame.

### Examples

#### Example 1: Basic Usage

```python
import pandas as pd
import numpy as np

df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
print(df)

# Applying a NumPy function
result = df.apply(np.sqrt)
print(result)
```

**Output:**

```
   A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0
```

#### Example 2: Reducing Function on Columns

```python
# Sum of each column
column_sum = df.apply(np.sum, axis=0)
print(column_sum)
```

**Output:**

```
A    12
B    27
dtype: int64
```

#### Example 3: Reducing Function on Rows

```python
# Sum of each row
row_sum = df.apply(np.sum, axis=1)
print(row_sum)
```

**Output:**

```
0    13
1    13
2    13
dtype: int64
```

#### Example 4: Returning List-like Results

```python
# Returning a list-like result
result = df.apply(lambda x: [1, 2], axis=1)
print(result)
```

**Output:**

```
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object
```

#### Example 5: Expanding List-like Results

```python
# Expanding list-like results into columns
result_expand = df.apply(lambda x: [1, 2], axis=1, result_type='expand')
print(result_expand)
```

**Output:**

```
   0  1
0  1  2
1  1  2
2  1  2
```

#### Example 6: Returning a Series

```python
# Returning a Series inside the function
result_series = df.apply(lambda x: pd.Series([1, 2], index=['foo', 'bar']), axis=1)
print(result_series)
```

**Output:**

```
   foo  bar
0    1    2
1    1    2
2    1    2
```

#### Example 7: Broadcasting Results

```python
# Broadcasting results to the original shape
result_broadcast = df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
print(result_broadcast)
```

**Output:**

```
   A  B
0  1  2
1  1  2
2  1  2
```

### Conclusion

The `DataFrame.apply()` method is a powerful tool in `pandas` for applying custom functions to rows or columns of a DataFrame. Its flexibility allows for a wide range of operations, from simple transformations to complex aggregations. If you have specific use cases or further questions about this method, feel free to ask!


**pandas.DataFrame.apply()**  
The `apply()` method applies a function along an axis (rows or columns) of a DataFrame. It is highly flexible and can be used for transformations, aggregations, or custom operations.

---

### **Syntax**

`DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, **kwargs)`

**Parameters**:

- `func`: The function to apply. Can be a Python function, lambda, or string (for built-in methods).
- `axis`: `{0 or 'index', 1 or 'columns'}`, default `0`.
  - `0` or `'index'`: Apply the function to each **column**.
  - `1` or `'columns'`: Apply the function to each **row**.
- `raw`: `bool`, default `False`.
  - `False`: Pass each row/column as a **Series**.
  - `True`: Pass each row/column as a **NumPy ndarray** (for better performance with NumPy functions).
- `result_type`: `{'expand', 'reduce', 'broadcast', None}`, default `None`.
  - `'expand'`: List-like results are turned into **columns**.
  - `'reduce'`: Returns a **Series** if possible (opposite of `'expand'`).
  - `'broadcast'`: Results are broadcast to the **original shape** of the DataFrame.
  - `None`: Infer the return type based on the function's output.
- `args`: `tuple`, default `()`.
  - Additional positional arguments to pass to `func`.
- `by_row`: `False` or `"compat"`, default `"compat"`.
  - Only applies when `func` is a list-like or dict-like of functions.
  - `"compat"`: Tries to translate `func` into pandas methods first.
  - `False`: Passes the whole Series at once.
- `engine`: `{'python', 'numba'}`, default `'python'`.
  - `'python'`: Uses the standard Python engine.
  - `'numba'`: Uses the Numba engine for JIT compilation (requires `raw=True`).
- `engine_kwargs`: `dict`, default `None`.
  - Keyword arguments for the engine (e.g., `nopython`, `nogil`, `parallel` for Numba).
- `**kwargs`: Additional keyword arguments to pass to `func`.

**Returns**:

- A **Series** or **DataFrame** depending on the function's output and `result_type`.

---

### **Key Behaviors**

1. **Flexible Function Application**:

   - Can apply any function to rows or columns.
   - Supports lambda functions, custom functions, and built-in methods.

2. **Alignment**:

   - If `func` returns a Series or list-like object, the result can be expanded into columns or broadcast to match the original shape.

3. **Performance**:

   - Using `raw=True` with NumPy functions can improve performance.
   - The Numba engine (`engine='numba'`) can further optimize performance for large DataFrames.

4. **Result Type Control**:
   - Use `result_type` to control how list-like results are handled (`'expand'`, `'reduce'`, `'broadcast'`).

---

### **Examples**

#### **Example 1**: Apply a NumPy Function

```python
df = pd.DataFrame([[4, 9]] * 3, columns=['A', 'B'])
df.apply(np.sqrt)
```

**Output**:

```
     A    B
0  2.0  3.0
1  2.0  3.0
2  2.0  3.0
```

- Applies `np.sqrt` element-wise to the DataFrame.

---

#### **Example 2**: Apply a Reducing Function

```python
df.apply(np.sum, axis=0)  # Sum columns
```

**Output**:

```
A    12
B    27
dtype: int64
```

```python
df.apply(np.sum, axis=1)  # Sum rows
```

**Output**:

```
0    13
1    13
2    13
dtype: int64
```

- Applies `np.sum` along the specified axis.

---

#### **Example 3**: Return List-like Results

```python
df.apply(lambda x: [1, 2], axis=1)
```

**Output**:

```
0    [1, 2]
1    [1, 2]
2    [1, 2]
dtype: object
```

- Returns a Series of lists.

---

#### **Example 4**: Expand List-like Results into Columns

```python
df.apply(lambda x: [1, 2], axis=1, result_type='expand')
```

**Output**:

```
   0  1
0  1  2
1  1  2
2  1  2
```

- Expands the list-like results into separate columns.

---

#### **Example 5**: Broadcast Results to Original Shape

```python
df.apply(lambda x: [1, 2], axis=1, result_type='broadcast')
```

**Output**:

```
   A  B
0  1  2
1  1  2
2  1  2
```

- Broadcasts the results to match the original DataFrame shape.

---

#### **Example 6**: Use Numba Engine for Performance

```python
df.apply(np.sum, axis=0, engine='numba', raw=True)
```

**Output**:

```
A    12
B    27
dtype: int64
```

- Uses the Numba engine for JIT compilation (requires `raw=True`).

---

### **Comparison with Similar Methods**

| Method        | Description                                 |
| ------------- | ------------------------------------------- |
| `apply()`     | General-purpose function application.       |
| `map()`       | Element-wise operations (for Series only).  |
| `applymap()`  | Element-wise operations (for DataFrames).   |
| `agg()`       | Aggregation operations.                     |
| `transform()` | Transformations that return the same shape. |

---

### **When to Use**

- Perform **custom transformations** or **aggregations** on rows/columns.
- Handle **list-like results** and control their output format.
- Optimize performance with `raw=True` or the Numba engine.
- Replace loops with vectorized operations for better efficiency.


In [None]:
""" pandas.DataFrame.map
DataFrame.map(func, na_action=None, **kwargs)[source]
Apply a function to a Dataframe elementwise.

Added in version 2.1.0: DataFrame.applymap was deprecated and renamed to DataFrame.map.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters
:
func
callable
Python function, returns a single value from a single value.

na_action
{None, ‘ignore’}, default None
If ‘ignore’, propagate NaN values, without passing them to func.

**kwargs
Additional keyword arguments to pass as keywords arguments to func.

Returns
:
DataFrame
Transformed DataFrame.

See also

DataFrame.apply
Apply a function along input axis of DataFrame.

DataFrame.replace
Replace values given in to_replace with value.

Series.map
Apply a function elementwise on a Series.

Examples

df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
df
       0      1
0  1.000  2.120
1  3.356  4.567
df.map(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5
Like Series.map, NA values can be ignored:

df_copy = df.copy()
df_copy.iloc[0, 0] = pd.NA
df_copy.map(lambda x: len(str(x)), na_action='ignore')
     0  1
0  NaN  4
1  5.0  5
It is also possible to use map with functions that are not lambda functions:

df.map(round, ndigits=1)
     0    1
0  1.0  2.1
1  3.4  4.6
Note that a vectorized version of func often exists, which will be much faster. You could square each number elementwise.

df.map(lambda x: x**2)
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
But it’s better to avoid map in that case.

df ** 2
           0          1
0   1.000000   4.494400
1  11.262736  20.857489 """


The `pandas.DataFrame.map()` method is used to apply a specified function element-wise across a DataFrame. It was introduced as a replacement for the deprecated `DataFrame.applymap()` method, allowing for more intuitive and flexible operations on DataFrame cells. Here’s a complete overview of the method, including its syntax, parameters, return values, and examples.

### Syntax

```python
DataFrame.map(func, na_action=None, **kwargs)
```

### Parameters

- **func**: `callable`

  - This is the function that will be applied to each element of the DataFrame. The function must accept a single value and return a single value.

- **na_action**: `{None, 'ignore'}`, default is `None`

  - If set to `'ignore'`, NaN values will be propagated, meaning they will not be passed to the `func`. Instead, they will be left as NaN in the resulting DataFrame.

- **kwargs**: additional keyword arguments
  - Additional arguments to pass to `func`.

### Returns

- **DataFrame**
  - A new DataFrame containing the transformed values after applying the function to each element.

### Examples

#### Example 1: Basic Usage

```python
import pandas as pd

df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
print("Original DataFrame:")
print(df)

# Apply a function to count the length of elements when converted to strings
result = df.map(lambda x: len(str(x)))
print("\nMapped DataFrame:")
print(result)
```

**Output:**

```
Original DataFrame:
       0      1
0  1.000  2.120
1  3.356  4.567

Mapped DataFrame:
   0  1
0  3  4
1  5  5
```

#### Example 2: Handling NA Values

```python
# Create a new DataFrame with an NA value
df_copy = df.copy()
df_copy.iloc[0, 0] = pd.NA
print("\nDataFrame with NA value:")
print(df_copy)

# Apply function while ignoring NA values
result_na_ignore = df_copy.map(lambda x: len(str(x)), na_action='ignore')
print("\nMapped DataFrame with NA ignored:")
print(result_na_ignore)
```

**Output:**

```
DataFrame with NA value:
       0      1
0  <NA>   2.120
1  3.356  4.567

Mapped DataFrame with NA ignored:
     0  1
0  NaN  4
1  5.0  5
```

#### Example 3: Using Built-in Functions

```python
# Using a built-in function to round the values
result_round = df.map(round, ndigits=1)
print("\nRounded DataFrame:")
print(result_round)
```

**Output:**

```
Rounded DataFrame:
     0    1
0  1.0  2.1
1  3.4  4.6
```

#### Example 4: Squaring Each Element

```python
# Squaring each number in the DataFrame
result_squared = df.map(lambda x: x ** 2)
print("\nSquared DataFrame:")
print(result_squared)
```

**Output:**

```
Squared DataFrame:
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
```

#### Example 5: Using Vectorized Operations

```python
# Alternative way using vectorized operation
result_squared_vectorized = df ** 2
print("\nSquared DataFrame using vectorized operation:")
print(result_squared_vectorized)
```

**Output:**

```
Squared DataFrame using vectorized operation:
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
```

### Conclusion

The `DataFrame.map()` method is a convenient way to transform DataFrame elements using scalar functions. It allows for greater flexibility, especially with respect to handling NaNs. It’s worth noting that for operations that can be vectorized (i.e., applied to an entire DataFrame without looping), it's often more efficient to use vectorized methods or operations directly rather than `map`.

If you have any specific questions about using `DataFrame.map()` or need further examples, feel free to ask!


In [None]:
""" pandas.DataFrame.applymap
DataFrame.applymap(func, na_action=None, **kwargs)[source]
Apply a function to a Dataframe elementwise.

Deprecated since version 2.1.0: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

This method applies a function that accepts and returns a scalar to every element of a DataFrame.

Parameters
:
func
callable
Python function, returns a single value from a single value.

na_action
{None, ‘ignore’}, default None
If ‘ignore’, propagate NaN values, without passing them to func.

**kwargs
Additional keyword arguments to pass as keywords arguments to func.

Returns
:
DataFrame
Transformed DataFrame.

See also

DataFrame.apply
Apply a function along input axis of DataFrame.

DataFrame.map
Apply a function along input axis of DataFrame.

DataFrame.replace
Replace values given in to_replace with value.

Examples

df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
df
       0      1
0  1.000  2.120
1  3.356  4.567
df.map(lambda x: len(str(x)))
   0  1
0  3  4
1  5  5 """

The `pandas.DataFrame.applymap()` method was traditionally used to apply a function element-wise to every cell in a DataFrame. However, this method has been deprecated since version 2.1.0, and you should now use `DataFrame.map()` for similar functionality. Here's a comprehensive overview of what `applymap()` was, its parameters, usage, and examples for your understanding.

### Overview of `DataFrame.applymap()`

**Deprecated Notice:**

- Since version 2.1.0, `DataFrame.applymap()` has been deprecated. Instead, you should use `DataFrame.map()`.

### Syntax

```python
DataFrame.applymap(func, na_action=None, **kwargs)
```

### Parameters

- **func**: `callable`

  - A Python function that takes a single value as input and returns a single value as output.

- **na_action**: `{None, 'ignore'}`, default is `None`

  - If set to `'ignore'`, the method will propagate NaN values, meaning they will not be passed to the function. Instead, they will be preserved in the resulting DataFrame.

- **kwargs**: additional keyword arguments
  - Additional arguments that can be passed to `func`.

### Returns

- **DataFrame**
  - A new DataFrame with transformed values after applying the specified function element-wise.

### Example of `DataFrame.applymap()`

Here’s how you would have used `applymap()` before its deprecation:

```python
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
print("Original DataFrame:")
print(df)

# Applying a function to count the length of elements when converted to strings
result = df.applymap(lambda x: len(str(x)))
print("\nMapped DataFrame using applymap:")
print(result)
```

**Output:**

```
Original DataFrame:
       0      1
0  1.000  2.120
1  3.356  4.567

Mapped DataFrame using applymap:
   0  1
0  3  4
1  5  5
```

### Transition to `DataFrame.map()`

Since `applymap()` has been deprecated, you should use `map()` for similar functionality:

#### Example Using `DataFrame.map()`

```python
# Using DataFrame.map instead of deprecated applymap
result_map = df.map(lambda x: len(str(x)))
print("\nMapped DataFrame using map:")
print(result_map)
```

**Output:**

```
Mapped DataFrame using map:
   0  1
0  3  4
1  5  5
```

### Conclusion

While `DataFrame.applymap()` was once the go-to method for applying functions element-wise across a DataFrame, you should now utilize `DataFrame.map()` to achieve the same effect. Both methods allowed for the transformation of DataFrame elements, but `map()` is now the recommended approach moving forward.

If you have any further questions about `DataFrame.map()` or how to handle specific transformations, feel free to ask!


The `pandas.DataFrame.map` method is used to apply a function **element-wise** to every element in a DataFrame. It is a versatile method that allows you to transform each element of the DataFrame using a custom function. This method was introduced in **pandas 2.1.0** as a replacement for the deprecated `DataFrame.applymap`.

---

### **Syntax**

```python
DataFrame.map(func, na_action=None, **kwargs)
```

---

### **Parameters**

1. **`func`**:

   - A Python function (or callable) that takes a single value as input and returns a single value.
   - This function is applied to each element of the DataFrame.

2. **`na_action`**:

   - Controls how `NaN` values are handled.
   - Options:
     - `None` (default): `NaN` values are passed to the function.
     - `'ignore'`: `NaN` values are propagated without being passed to the function.

3. **`**kwargs`\*\*:
   - Additional keyword arguments to pass to the function `func`.

---

### **Returns**

- A **DataFrame** with the same shape as the original, where each element is the result of applying the function `func`.

---

### **Key Points**

- The `map` method is **element-wise**, meaning the function is applied to each individual element of the DataFrame.
- It is similar to `Series.map` but works on entire DataFrames.
- For operations that can be vectorized (e.g., mathematical operations), avoid using `map` and use direct DataFrame operations instead, as they are faster.

---

### **Examples**

#### 1. **Basic Usage with a Lambda Function**

```python
import pandas as pd

df = pd.DataFrame([[1, 2.12], [3.356, 4.567]])
print(df)
```

**Input DataFrame:**

```
       0      1
0  1.000  2.120
1  3.356  4.567
```

```python
# Apply a lambda function to calculate the length of the string representation of each element
result = df.map(lambda x: len(str(x)))
print(result)
```

**Output:**

```
   0  1
0  3  4
1  5  5
```

---

#### 2. **Handling NaN Values with `na_action='ignore'`**

```python
# Create a copy of the DataFrame and introduce a NaN value
df_copy = df.copy()
df_copy.iloc[0, 0] = pd.NA

# Apply the same lambda function but ignore NaN values
result = df_copy.map(lambda x: len(str(x)), na_action='ignore')
print(result)
```

**Output:**

```
     0  1
0  NaN  4
1  5.0  5
```

---

#### 3. **Using a Built-in Function with Additional Arguments**

```python
# Use the `round` function to round each element to 1 decimal place
result = df.map(round, ndigits=1)
print(result)
```

**Output:**

```
     0    1
0  1.0  2.1
1  3.4  4.6
```

---

#### 4. **Avoiding `map` for Vectorized Operations**

```python
# Square each element using a lambda function
result = df.map(lambda x: x**2)
print(result)
```

**Output:**

```
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
```

However, for such operations, it is better to use direct DataFrame operations for better performance:

```python
# Vectorized operation (faster and more efficient)
result = df ** 2
print(result)
```

**Output:**

```
           0          1
0   1.000000   4.494400
1  11.262736  20.857489
```

---

### **When to Use `DataFrame.map`**

- Use `map` when you need to apply a **custom function** to each element of the DataFrame.
- Use it for **non-vectorizable operations** (e.g., string manipulations, custom logic).
- Avoid using `map` for **vectorizable operations** (e.g., mathematical operations) and instead use direct DataFrame operations.

---

### **Related Methods**

- `DataFrame.apply`: Apply a function along the rows or columns of a DataFrame.
- `DataFrame.replace`: Replace values in a DataFrame.
- `Series.map`: Apply a function element-wise on a Series.

---

### **Summary**

- `DataFrame.map` is a powerful tool for **element-wise transformations** in a DataFrame.
- It is flexible and supports custom functions, handling of `NaN` values, and additional arguments.
- For performance-critical operations, prefer **vectorized operations** over `map`.

This method is particularly useful for data cleaning, transformation, and custom calculations in data analysis workflows.


In [None]:
""" 

pandas.DataFrame.pipe
DataFrame.pipe(func, *args, **kwargs)[source]
Apply chainable functions that expect Series or DataFrames.

Parameters
:
func
function
Function to apply to the Series/DataFrame. args, and kwargs are passed into func. Alternatively a (callable, data_keyword) tuple where data_keyword is a string indicating the keyword of callable that expects the Series/DataFrame.

*args
iterable, optional
Positional arguments passed into func.

**kwargs
mapping, optional
A dictionary of keyword arguments passed into func.

Returns
:
the return type of
func
.
See also

DataFrame.apply
Apply a function along input axis of DataFrame.

DataFrame.map
Apply a function elementwise on a whole DataFrame.

Series.map
Apply a mapping correspondence on a Series.

Notes

Use .pipe when chaining together functions that expect Series, DataFrames or GroupBy objects.

Examples

Constructing a income DataFrame from a dictionary.

data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
df = pd.DataFrame(data, columns=['Salary', 'Others'])
df
   Salary  Others
0    8000  1000.0
1    9500     NaN
2    5000  2000.0
Functions that perform tax reductions on an income DataFrame.

def subtract_federal_tax(df):
    return df * 0.9
def subtract_state_tax(df, rate):
    return df * (1 - rate)
def subtract_national_insurance(df, rate, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)
Instead of writing

subtract_national_insurance(
    subtract_state_tax(subtract_federal_tax(df), rate=0.12),
    rate=0.05,
    rate_increase=0.02)  
You can write

(
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
)
    Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12
If you have a function that takes the data as (say) the second argument, pass a tuple indicating which keyword expects the data. For example, suppose national_insurance takes its data as df in the second argument:

def subtract_national_insurance(rate, df, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)
(
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(
        (subtract_national_insurance, 'df'),
        rate=0.05,
        rate_increase=0.02
    )
)
    Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12 """

The `pandas.DataFrame.pipe()` method is a powerful utility that allows you to apply a series of chainable functions to a DataFrame or Series. This method enhances code readability and makes it easier to apply multiple transformations in a clear and organized manner.

### Overview of `DataFrame.pipe()`

The `pipe()` method provides a way to apply functions that expect a DataFrame or Series as their first argument, while also allowing for additional positional and keyword arguments.

### Syntax

```python
DataFrame.pipe(func, *args, **kwargs)
```

### Parameters

- **func**: `function`

  - The function to apply to the DataFrame. Alternatively, a tuple of the form `(callable, data_keyword)` can be provided where `data_keyword` is a string indicating the keyword of a callable that accepts the Series/DataFrame.

- **args**: `iterable, optional`

  - Positional arguments passed to `func`.

- **kwargs**: `mapping, optional`
  - A dictionary of keyword arguments passed to `func`.

### Returns

- The return type of `func`.

### When to Use

- Use `pipe()` when chaining together multiple functions that require Series, DataFrames, or GroupBy objects. It creates a more readable flow as compared to nested function calls.

### Examples

#### Example 1: Basic Usage

```python
import pandas as pd
import numpy as np

# Constructing an income DataFrame from a dictionary
data = [[8000, 1000], [9500, np.nan], [5000, 2000]]
df = pd.DataFrame(data, columns=['Salary', 'Others'])

print("Original DataFrame:")
print(df)
```

**Output:**

```
   Salary  Others
0    8000  1000.0
1    9500     NaN
2    5000  2000.0
```

#### Example 2: Chaining Functions

Here we’ll define several functions to perform tax deductions:

```python
def subtract_federal_tax(df):
    return df * 0.9

def subtract_state_tax(df, rate):
    return df * (1 - rate)

def subtract_national_insurance(df, rate, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)

# Applying multiple deductions using pipe
result = (
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(subtract_national_insurance, rate=0.05, rate_increase=0.02)
)

print("\nDataFrame after applying tax deductions:")
print(result)
```

**Output:**

```
     Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12
```

#### Example 3: Using a Function with Data as Second Argument

If you have a function that needs the DataFrame as a second argument, you can pass it in a tuple. For example:

```python
def subtract_national_insurance(rate, df, rate_increase):
    new_rate = rate + rate_increase
    return df * (1 - new_rate)

# Applying tax deductions again, but with the national insurance function taking df as second argument
result_with_second_arg = (
    df.pipe(subtract_federal_tax)
    .pipe(subtract_state_tax, rate=0.12)
    .pipe(
        (subtract_national_insurance, 'df'),  # Tuple indicating that 'df' is the keyword
        rate=0.05,
        rate_increase=0.02
    )
)

print("\nDataFrame after applying deductions with second argument:")
print(result_with_second_arg)
```

**Output:**

```
     Salary   Others
0  5892.48   736.56
1  6997.32      NaN
2  3682.80  1473.12
```

### Conclusion

The `DataFrame.pipe()` method is a convenient way to improve the clarity of your code by enabling a functional programming style. It's particularly useful for applying multiple transformations in a readable manner, avoiding the complexity associated with deeply nested function calls.

If you have any further questions about using `pipe()` or need more examples, feel free to ask!


In [None]:
""" pandas.DataFrame.agg
DataFrame.agg(func=None, axis=0, *args, **kwargs)[source]
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions

See also

DataFrame.apply
Perform any type of operations.

DataFrame.transform
Perform transformation type operations.

pandas.DataFrame.groupby
Perform operations over groups.

pandas.DataFrame.resample
Perform operations over resampled bins.

pandas.DataFrame.rolling
Perform operations over rolling window.

pandas.DataFrame.expanding
Perform operations over expanding window.

pandas.core.window.ewm.ExponentialMovingWindow
Perform operation over exponential weighted window.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [np.nan, np.nan, np.nan]],
                  columns=['A', 'B', 'C'])
Aggregate these functions over the rows.

df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0
Different aggregations per column.

df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0
Aggregate different functions over the columns and rename the index of the resulting DataFrame.

df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0
Aggregate over the columns.

df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64 """

The `pandas.DataFrame.agg()` method is used for aggregating data in a DataFrame using one or more operations along a specified axis (either rows or columns). It allows you to apply various aggregation functions conveniently and flexibly.

### Overview of `DataFrame.agg()`

### Syntax

```python
DataFrame.agg(func=None, axis=0, *args, **kwargs)
```

### Parameters

- **func**: `function`, `str`, `list`, or `dict`

  - Function to use for aggregating the data. Can be:
    - A single function (e.g., `np.sum`).
    - A string function name (e.g., `'mean'`).
    - A list of functions or function names (e.g., `[np.sum, 'mean']`).
    - A dictionary mapping DataFrame columns to functions.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default is `0`

  - If `0` or `'index'`: apply the function to each column.
  - If `1` or `'columns'`: apply the function to each row.

- **args**: `iterable, optional`

  - Positional arguments to pass to the function.

- **kwargs**: `mapping, optional`
  - Keyword arguments to pass to the function.

### Returns

- **scalar**, **Series**, or **DataFrame**
  - The return type varies depending on how the method is called:
    - A scalar when called on a Series with a single function.
    - A Series when called on a DataFrame with a single function.
    - A DataFrame when called with several functions.

### Notes

- Aggregations are performed over a specified axis, which differs from NumPy's functions that flatten the array by default.
- User-defined functions that mutate the passed object may produce unexpected behavior or errors; it's advisable to avoid them.

### Examples

#### Example 1: Basic Aggregation

```python
import pandas as pd
import numpy as np

# Creating a DataFrame
df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [np.nan, np.nan, np.nan]],
                  columns=['A', 'B', 'C'])

# Aggregate functions over the rows
result = df.agg(['sum', 'min'])
print("Aggregate sum and min over rows:")
print(result)
```

**Output:**

```
           A     B     C
sum     12.0  15.0  18.0
min      1.0   2.0   3.0
```

#### Example 2: Different Aggregations per Column

```python
# Aggregate different functions for specific columns
result_col_specific = df.agg({'A': ['sum', 'min'], 'B': ['min', 'max']})
print("\nDifferent aggregations per column:")
print(result_col_specific)
```

**Output:**

```
           A    B
sum     12.0  NaN
min      1.0  2.0
max      NaN  8.0
```

#### Example 3: Renaming Output with Column Aggregations

```python
# Aggregate different functions and rename the index
result_rename = df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
print("\nAggregate with renaming indexes:")
print(result_rename)
```

**Output:**

```
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0
```

#### Example 4: Aggregating Over Columns

```python
# Aggregate using mean over the columns
result_mean = df.agg("mean", axis="columns")
print("\nMean aggregated over columns:")
print(result_mean)
```

**Output:**

```
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
```

### Conclusion

The `DataFrame.agg()` method is a versatile function that allows you to apply various aggregation functions along different axes of a DataFrame. By utilizing it effectively, you can achieve complex summaries of your data with simple and readable code.


Here's a concise and organized explanation of `pandas.DataFrame.agg()`:

---

### **`pandas.DataFrame.agg()`**

Aggregates data using one or more operations over rows or columns.

#### **Parameters**

- **`func`**: Function(s) to apply. Can be:
  - Single function (e.g., `np.sum`, `'mean'`).
  - List of functions (e.g., `['sum', 'min']`).
  - Dict mapping columns to specific functions (e.g., `{'A': 'max', 'B': ['min', 'mean']}`).
- **`axis`**: `0` or `'index'` (column-wise, default) / `1` or `'columns'` (row-wise).
- **`*args`**/**`**kwargs`**: Additional arguments for `func`.

#### **Returns**

- **Scalar**: If a single function is applied to a `Series`.
- **Series**: If a single function is applied to a `DataFrame`.
- **DataFrame**: If multiple functions are applied.

---

### **Key Notes**

1. **Axis Behavior**:

   - `axis=0` (default): Apply to each **column**.
   - `axis=1`: Apply to each **row**.
   - Differs from NumPy’s default (aggregates over flattened array).

2. **Flexibility**:

   - Use **strings** (e.g., `'sum'`), **built-in functions**, or **custom functions**.
   - Rename results using keyword syntax (e.g., `x=('A', 'max')`).

3. **Handling NaNs**:
   - Functions like `sum` ignore NaN, while `mean` returns NaN if all values are NaN.

---

### **Examples**

#### 1. **Basic Aggregation**

```python
df.agg(['sum', 'min'])
```

Output:

```
     A     B     C
sum 12.0  15.0  18.0
min  1.0   2.0   3.0
```

Applies `sum` and `min` to every column.

#### 2. **Column-Specific Aggregation**

```python
df.agg({'A': ['sum', 'min'], 'B': ['min', 'max']})
```

Output:

```
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0
```

Only specified columns (`A`, `B`) are included.

#### 3. **Row-Wise Aggregation**

```python
df.agg("mean", axis="columns")
```

Output:

```
0    2.0  # (1+2+3)/3
1    5.0  # (4+5+6)/3
2    8.0  # (7+8+9)/3
3    NaN  # All NaNs
```

#### 4. **Renaming Results**

```python
df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
```

Output:

```
     A    B    C
x  7.0  NaN  NaN  # Max of A
y  NaN  2.0  NaN  # Min of B
z  NaN  NaN  6.0  # Mean of C
```

---

### **Related Functions**

- `apply()`: General-purpose operations.
- `transform()`: Broadcast results to original shape.
- `groupby()`: Group-based aggregation.

### **Alias**

`agg()` is interchangeable with `aggregate()`.

---

This method is ideal for summarizing data, computing statistics, and preparing reports. Use it to streamline data analysis workflows!


In [None]:
""" pandas.DataFrame.aggregate
DataFrame.aggregate(func=None, axis=0, *args, **kwargs)[source]
Aggregate using one or more operations over the specified axis.

Parameters:
funcfunction, str, list or dict
Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

function

string function name

list of functions and/or function names, e.g. [np.sum, 'mean']

dict of axis labels -> functions, function names or list of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
scalar, Series or DataFrame
The return can be:

scalar : when Series.agg is called with single function

Series : when DataFrame.agg is called with a single function

DataFrame : when DataFrame.agg is called with several functions

See also

DataFrame.apply
Perform any type of operations.

DataFrame.transform
Perform transformation type operations.

pandas.DataFrame.groupby
Perform operations over groups.

pandas.DataFrame.resample
Perform operations over resampled bins.

pandas.DataFrame.rolling
Perform operations over rolling window.

pandas.DataFrame.expanding
Perform operations over expanding window.

pandas.core.window.ewm.ExponentialMovingWindow
Perform operation over exponential weighted window.

Notes

The aggregation operations are always performed over an axis, either the index (default) or the column axis. This behavior is different from numpy aggregation functions (mean, median, prod, sum, std, var), where the default is to compute the aggregation of the flattened array, e.g., numpy.mean(arr_2d) as opposed to numpy.mean(arr_2d, axis=0).

agg is an alias for aggregate. Use the alias.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

A passed user-defined-function will be passed a Series for evaluation.

Examples

df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [np.nan, np.nan, np.nan]],
                  columns=['A', 'B', 'C'])
Aggregate these functions over the rows.

df.agg(['sum', 'min'])
        A     B     C
sum  12.0  15.0  18.0
min   1.0   2.0   3.0
Different aggregations per column.

df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']})
        A    B
sum  12.0  NaN
min   1.0  2.0
max   NaN  8.0
Aggregate different functions over the columns and rename the index of the resulting DataFrame.

df.agg(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0
Aggregate over the columns.

df.agg("mean", axis="columns")
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64 """

The `pandas.DataFrame.aggregate()` method (often abbreviated as `agg()`) is a versatile function used to apply one or more aggregation operations across specified axes of a DataFrame. This functionality is particularly useful for summarizing data in a concise manner.

### Overview of `DataFrame.aggregate()`

### Syntax

```python
DataFrame.aggregate(func=None, axis=0, *args, **kwargs)
```

### Parameters

- **func**: `function`, `str`, `list`, or `dict`

  - Specifies the function(s) to use for aggregation:
    - **function**: A single callable that is applicable to the DataFrame.
    - **string function name**: A string representing a built-in function (e.g., `'mean'`).
    - **list of functions**: A list containing one or more functions or function names (e.g., `[np.sum, 'mean']`).
    - **dict**: A mapping of column labels to functions, function names, or lists of such (e.g., `{'A': ['sum', 'min'], 'B': 'max'}`).

- **axis**: `{0 or 'index', 1 or 'columns'}`, default is `0`

  - Determines whether to apply the function to rows or columns:
    - `0` or `'index'`: Apply function to each column.
    - `1` or `'columns'`: Apply function to each row.

- **args**: `iterable, optional`

  - Positional arguments to pass to the aggregation function.

- **kwargs**: `mapping, optional`
  - Keyword arguments to pass to the aggregation function.

### Returns

- **scalar**, **Series**, or **DataFrame**
  - The return type varies depending on how the method is called:
    - A scalar when `Series.aggregate` is called with a single function.
    - A Series when `DataFrame.aggregate` is called with a single function.
    - A DataFrame when multiple functions are specified.

### Notes

- The operations are performed over a specified axis but differ from NumPy's aggregation functions, which perform aggregation over the flattened array by default.
- `agg()` is an alias for `aggregate()`, and both can be used interchangeably.
- Be cautious with user-defined functions; those that mutate objects can lead to unexpected behaviors and errors.

### Examples

#### Example 1: Basic Aggregation

```python
import pandas as pd
import numpy as np

# Creating a DataFrame
df = pd.DataFrame([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9],
                   [np.nan, np.nan, np.nan]],
                  columns=['A', 'B', 'C'])

# Aggregate functions over the rows
result = df.aggregate(['sum', 'min'])
print("Aggregate sum and min over rows:")
print(result)
```

**Output:**

```
           A     B     C
sum     12.0  15.0  18.0
min      1.0   2.0   3.0
```

#### Example 2: Different Aggregations Per Column

```python
# Aggregate different functions for specific columns
result_col_specific = df.aggregate({'A': ['sum', 'min'], 'B': ['min', 'max']})
print("\nDifferent aggregations per column:")
print(result_col_specific)
```

**Output:**

```
           A    B
sum     12.0  NaN
min      1.0  2.0
max      NaN  8.0
```

#### Example 3: Renaming Output with Column Aggregations

```python
# Aggregate different functions and rename the index
result_rename = df.aggregate(x=('A', 'max'), y=('B', 'min'), z=('C', 'mean'))
print("\nAggregate with renaming indexes:")
print(result_rename)
```

**Output:**

```
     A    B    C
x  7.0  NaN  NaN
y  NaN  2.0  NaN
z  NaN  NaN  6.0
```

#### Example 4: Aggregating Over Columns

```python
# Aggregate using mean over the columns
result_mean = df.aggregate("mean", axis="columns")
print("\nMean aggregated over columns:")
print(result_mean)
```

**Output:**

```
0    2.0
1    5.0
2    8.0
3    NaN
dtype: float64
```

### Conclusion

The `DataFrame.aggregate()` method is an essential tool for performing aggregation operations over DataFrames in a flexible way. By utilizing various aggregation functions and specifying whether to aggregate along rows or columns, you can easily summarize and analyze your data.

If you have more questions or need further examples on how to use `aggregate()`, feel free to ask!


In [None]:
""" pandas.DataFrame.transform
DataFrame.transform(func, axis=0, *args, **kwargs)[source]
Call func on self producing a DataFrame with the same axis shape as self.

Parameters:
funcfunction, str, list-like or dict-like
Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:

function

string function name

list-like of functions and/or function names, e.g. [np.exp, 'sqrt']

dict-like of axis labels -> functions, function names or list-like of such.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args
Positional arguments to pass to func.

**kwargs
Keyword arguments to pass to func.

Returns:
DataFrame
A DataFrame that must have the same length as self.

Raises:
ValueError
If the returned DataFrame has a different length than self.
See also

DataFrame.agg
Only perform aggregating type operations.

DataFrame.apply
Invoke function on a DataFrame.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methods for more details.

Examples

df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
df
   A  B
0  0  1
1  1  2
2  2  3
df.transform(lambda x: x + 1)
   A  B
0  1  2
1  2  3
2  3  4
Even though the resulting DataFrame must have the same length as the input DataFrame, it is possible to provide several input functions:

s = pd.Series(range(3))
s
0    0
1    1
2    2
dtype: int64
s.transform([np.sqrt, np.exp])
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056
You can call transform on a GroupBy object:

df = pd.DataFrame({
    "Date": [
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
    "Data": [5, 8, 6, 1, 50, 100, 60, 120],
})
df
         Date  Data
0  2015-05-08     5
1  2015-05-07     8
2  2015-05-06     6
3  2015-05-05     1
4  2015-05-08    50
5  2015-05-07   100
6  2015-05-06    60
7  2015-05-05   120
df.groupby('Date')['Data'].transform('sum')
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64
df = pd.DataFrame({
    "c": [1, 1, 1, 2, 2, 2, 2],
    "type": ["m", "n", "o", "m", "m", "n", "n"]
})
df
   c type
0  1    m
1  1    n
2  1    o
3  2    m
4  2    m
5  2    n
6  2    n
df['size'] = df.groupby('c')['type'].transform(len)
df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4 """

The `pandas.DataFrame.transform()` method allows you to apply a function to each element in the DataFrame (or to each group) and return a DataFrame that retains the same shape as the input. This makes it particularly useful for element-wise transformations without changing the structure of the original data.

### Overview of `DataFrame.transform()`

### Syntax

```python
DataFrame.transform(func, axis=0, *args, **kwargs)
```

### Parameters

- **func**: `function`, `str`, `list-like`, or `dict-like`

  - Indicates the function(s) to use for transformation:
    - **function**: A single callable that can be applied to the DataFrame.
    - **string function name**: A built-in function name as a string (e.g., `'sqrt'`).
    - **list-like of functions**: A list containing one or more functions (e.g., `[np.exp, 'log']`).
    - **dict-like**: A mapping of column labels to functions, function names, or lists of functions.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default is `0`

  - Specifies whether to apply the function to columns or rows:
    - `0` or `'index'`: Apply function to each column.
    - `1` or `'columns'`: Apply function to each row.

- **args**: `iterable, optional`

  - Positional arguments to pass to the function.

- **kwargs**: `mapping, optional`
  - Keyword arguments to pass to the function.

### Returns

- **DataFrame**
  - Returns a DataFrame with the same shape as the input DataFrame.

### Raises

- **ValueError**
  - Raised if the returned DataFrame has a different length or shape than the original.

### Notes

- Functions that mutate the passed object can produce unexpected behavior. It's advisable to avoid using mutable functions in this context.
- You can call `transform` on GroupBy objects, allowing for more complex operations based on grouping.

### Examples

#### Example 1: Basic Transformation

```python
import pandas as pd
import numpy as np

# Creating a DataFrame
df = pd.DataFrame({'A': range(3), 'B': range(1, 4)})
print("Original DataFrame:")
print(df)

# Apply a transformation
transformed_df = df.transform(lambda x: x + 1)
print("\nTransformed DataFrame:")
print(transformed_df)
```

**Output:**

```
Original DataFrame:
   A  B
0  0  1
1  1  2
2  2  3

Transformed DataFrame:
   A  B
0  1  2
1  2  3
2  3  4
```

#### Example 2: Using Multiple Functions

```python
# Transforming a Series with multiple functions
s = pd.Series(range(3))
print("\nOriginal Series:")
print(s)

transformed_series = s.transform([np.sqrt, np.exp])
print("\nTransformed Series with multiple functions:")
print(transformed_series)
```

**Output:**

```
Original Series:
0    0
1    1
2    2
dtype: int64

Transformed Series with multiple functions:
       sqrt        exp
0  0.000000   1.000000
1  1.000000   2.718282
2  1.414214   7.389056
```

#### Example 3: Transformation with GroupBy

```python
# Creating a DataFrame for group operations
df = pd.DataFrame({
    "Date": [
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05",
        "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"],
    "Data": [5, 8, 6, 1, 50, 100, 60, 120],
})
print("\nDataFrame with Date and Data:")
print(df)

# Total sum for each date
grouped_sum = df.groupby('Date')['Data'].transform('sum')
print("\nGrouped sum of Data by Date:")
print(grouped_sum)
```

**Output:**

```
DataFrame with Date and Data:
         Date  Data
0  2015-05-08     5
1  2015-05-07     8
2  2015-05-06     6
3  2015-05-05     1
4  2015-05-08    50
5  2015-05-07   100
6  2015-05-06    60
7  2015-05-05   120

Grouped sum of Data by Date:
0     55
1    108
2     66
3    121
4     55
5    108
6     66
7    121
Name: Data, dtype: int64
```

#### Example 4: Adding Column to DataFrame

```python
df = pd.DataFrame({
    "c": [1, 1, 1, 2, 2, 2, 2],
    "type": ["m", "n", "o", "m", "m", "n", "n"]
})
print("\nOriginal DataFrame:")
print(df)

# Adding a column for the size of groups
df['size'] = df.groupby('c')['type'].transform(len)
print("\nDataFrame with size of each group:")
print(df)
```

**Output:**

```
Original DataFrame:
   c type
0  1    m
1  1    n
2  1    o
3  2    m
4  2    m
5  2    n
6  2    n

DataFrame with size of each group:
   c type  size
0  1    m     3
1  1    n     3
2  1    o     3
3  2    m     4
4  2    m     4
5  2    n     4
6  2    n     4
```

### Conclusion

The `DataFrame.transform()` method is ideal for when you want to apply a function to each element while maintaining the original DataFrame structure. It can be applied to both DataFrames and Series, and also works seamlessly with GroupBy objects for grouped transformations.


In [None]:
# A DataFrame is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns. It is similar to a spreadsheet, a SQL table or the data.frame in R.
import pandas as pd

df = pd.DataFrame(
     {
        "Name": [
           "Braund, Mr. Owen Harris",
             "Allen, Mr. William Henry",
            "Bonnell, Miss. Elizabeth",
        ],
         "Age": [22, 35, 58],
         "Sex": ["male", "male", "female"],
     }
)
 

df
df["Age"]

In [None]:
#A pandas Series has no column labels, as it is just a single column of a DataFrame. A Series does have row labels.

ages = pd.Series([22, 35, 58], name="Age")
ages
df["Age"].max() #Make sure to always have a check on the data after reading in the data. When displaying a DataFrame, the first and last 5 rows will be shown by default:
df.describe()

In [None]:
""" pandas.DataFrame.groupby
DataFrame.groupby(by=None, axis=<no_default>, level=None, as_index=True, sort=True, group_keys=True, observed=<no_default>, dropna=True)[source]
Group DataFrame using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups.

Parameters:
bymapping, function, label, pd.Grouper or list of such
Used to determine the groups for the groupby. If by is a function, it’s called on each value of the object’s index. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups (the Series’ values are first aligned; see .align() method). If a list or ndarray of length equal to the selected axis is passed (see the groupby user guide), the values are used as-is to determine the groups. A label or list of labels may be passed to group by the columns in self. Notice that a tuple is interpreted as a (single) key.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
Split along rows (0) or columns (1). For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: Will be removed and behave like axis=0 in a future version. For axis=1, do frame.T.groupby(...) instead.

levelint, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels. Do not specify both by and level.

as_indexbool, default True
Return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

sortbool, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group. If False, the groups will appear in the same order as they did in the original DataFrame. This argument has no effect on filtrations (see the filtrations in the user guide), such as head(), tail(), nth() and in transformations (see the transformations in the user guide).

Changed in version 2.0.0: Specifying sort=False with an ordered categorical grouper will no longer sort the values.

group_keysbool, default True
When calling apply and the by argument produces a like-indexed (i.e. a transform) result, add group keys to index to identify pieces. By default group keys are not included when the result’s index (and column) labels match the inputs, and are included otherwise.

Changed in version 1.5.0: Warns that group_keys will no longer be ignored when the result from apply is a like-indexed Series or DataFrame. Specify group_keys explicitly to include the group keys or not.

Changed in version 2.0.0: group_keys now defaults to True.

observedbool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

Deprecated since version 2.1.0: The default value will change to True in a future version of pandas.

dropnabool, default True
If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups.

Returns:
pandas.api.typing.DataFrameGroupBy
Returns a groupby object that contains information about the groups.

See also

resample
Convenience method for frequency conversion and resampling of time series.

Notes

See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.

Examples

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})
df
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0
df.groupby(['Animal']).mean()
        Max Speed
Animal
Falcon      375.0
Parrot       25.0
Hierarchical Indexes

We can groupby different levels of a hierarchical index using the level parameter:

arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
          ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]},
                  index=index)
df
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0
df.groupby(level=0).mean()
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
df.groupby(level="Type").mean()
         Max Speed
Type
Captive      210.0
Wild         185.0
We can also choose to include NA in group keys or not by setting dropna parameter, the default setting is True.

l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
df = pd.DataFrame(l, columns=["a", "b", "c"])
df.groupby(by=["b"]).sum()
    a   c
b
1.0 2   3
2.0 2   5
df.groupby(by=["b"], dropna=False).sum()
    a   c
b
1.0 2   3
2.0 2   5
NaN 1   4
l = [["a", 12, 12], [None, 12.3, 33.], ["b", 12.3, 123], ["a", 1, 1]]
df = pd.DataFrame(l, columns=["a", "b", "c"])
df.groupby(by="a").sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
df.groupby(by="a", dropna=False).sum()
    b     c
a
a   13.0   13.0
b   12.3  123.0
NaN 12.3   33.0
When using .apply(), use group_keys to include or exclude the group keys. The group_keys argument defaults to True (include).

df = pd.DataFrame({'Animal': ['Falcon', 'Falcon',
                              'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})
df.groupby("Animal", group_keys=True)[['Max Speed']].apply(lambda x: x)
          Max Speed
Animal
Falcon 0      380.0
       1      370.0
Parrot 2       24.0
       3       26.0
df.groupby("Animal", group_keys=False)[['Max Speed']].apply(lambda x: x)
   Max Speed
0      380.0
1      370.0
2       24.0
3       26.0
 """

The `pandas.DataFrame.groupby()` method is a powerful tool for grouping data in a DataFrame. It allows you to split your data into groups based on certain criteria, apply functions to these groups, and combine the results back into a DataFrame. This method is essential for data analysis, enabling operations like aggregation, transformation, and filtering.

### Overview of `DataFrame.groupby()`

### Syntax

```python
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)
```

### Parameters

- **by**: `mapping`, `function`, `label`, `pd.Grouper`, or list of such

  - Determines the groups for the groupby operation. It can be a function called on each value of the object's index, a dict or Series to determine groups based on values, or a list of labels for grouping by columns.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default is `0`

  - Specifies whether to group along rows (0) or columns (1). Note that for Series, this parameter is ignored.

- **level**: `int`, `level name`, or sequence of such, default is `None`

  - For MultiIndex DataFrames, specifies the level(s) to group by.

- **as_index**: `bool`, default is `True`

  - If `True`, group labels are returned as the index. If `False`, it behaves like SQL-style grouped output.

- **sort**: `bool`, default is `True`

  - If `True`, group keys are sorted. Setting this to `False` can improve performance but does not affect the order of observations within each group.

- **group_keys**: `bool`, default is `True`

  - If `True`, adds group keys to the index when calling `apply`.

- **observed**: `bool`, default is `False`

  - Only relevant for categorical groupers. If `True`, only shows observed values.

- **dropna**: `bool`, default is `True`
  - If `True`, NA values are dropped from the group keys. If `False`, NA values are treated as a key in groups.

### Returns

- **DataFrameGroupBy**
  - Returns a groupby object containing information about the groups.

### Notes

- The `groupby` operation involves splitting the DataFrame, applying a function, and combining the results.
- It can be used in conjunction with aggregation functions like `sum()`, `mean()`, etc.

### Examples

#### Example 1: Basic Grouping

```python
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
                   'Max Speed': [380., 370., 24., 26.]})

print("Original DataFrame:")
print(df)

# Group by 'Animal' and calculate the mean Max Speed
mean_speed = df.groupby(['Animal']).mean()
print("\nMean Max Speed by Animal:")
print(mean_speed)
```

**Output:**

```
Original DataFrame:
   Animal  Max Speed
0  Falcon      380.0
1  Falcon      370.0
2  Parrot       24.0
3  Parrot       26.0

Mean Max Speed by Animal:
        Max Speed
Animal
Falcon      375.0
Parrot       25.0
```

#### Example 2: Grouping with Hierarchical Indexes

```python
# Creating a MultiIndex DataFrame
arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
          ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]}, index=index)

print("\nMultiIndex DataFrame:")
print(df)

# Group by the first level of the index
mean_by_animal = df.groupby(level=0).mean()
print("\nMean Max Speed by Animal:")
print(mean_by_animal)

# Group by the second level of the index
mean_by_type = df.groupby(level="Type").mean()
print("\nMean Max Speed by Type:")
print(mean_by_type)
```

**Output:**

```
MultiIndex DataFrame:
                Max Speed
Animal Type
Falcon Captive      390.0
       Wild         350.0
Parrot Captive       30.0
       Wild          20.0

Mean Max Speed by Animal:
        Max Speed
Animal
Falcon      370.0
Parrot       25.0

Mean Max Speed by Type:
         Max Speed
Type
Captive      210.0
Wild         185.0
```

#### Example 3: Handling NA Values in Grouping

```python
# Creating a DataFrame with NA values
l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
df = pd.DataFrame(l, columns=["a", "b", "c"])

print("\nDataFrame with NA values:")
print(df)

# Group by column 'b' and sum the values
grouped_sum = df.groupby(by=["b"]).sum()
print("\nGrouped sum by column 'b' (dropping NA):")
print(grouped_sum)

# Group by column 'b' without dropping NA
grouped_sum_with_na = df.groupby(by=["b"], dropna=False).sum()
print("\nGrouped sum by column 'b' (including NA):")
print(grouped_sum_with_na)
```

**Output:**

```
DataFrame with NA values:
   a    b  c
0  1  2.0  3
1  1  NaN  4
2  2  1.0  3
3  1  2.0  2

Grouped sum by column 'b' (dropping NA):
    a   c
b
1.0 2   3
```


Here’s a comprehensive explanation of **`pandas.DataFrame.groupby()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.groupby()`**

Groups a DataFrame using a mapper (e.g., column names, functions, or arrays) and allows you to perform operations (e.g., aggregation, transformation, filtration) on these groups.

---

### **Syntax**

```python
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)
```

---

### **Parameters**

| Parameter | Description                              |
| --------- | ---------------------------------------- |
| **`by`**  | Specifies how to group the data. Can be: |

- A **column name** or **list of column names**.
- A **function** applied to the index.
- A **dict** or **Series** mapping values to groups.
- A **list of arrays** of the same length as the axis.  
  | **`axis`** | Axis to group along:
- `0` or `'index'`: Group by rows (default).
- `1` or `'columns'`: Group by columns.  
  | **`level`** | For MultiIndex DataFrames, specifies the level(s) to group by. |  
  | **`as_index`** | If `True`, group labels become the index of the result. If `False`, group labels are returned as columns (SQL-style). |  
  | **`sort`** | If `True`, sort group keys. Disable for better performance. |  
  | **`group_keys`** | If `True`, include group keys in the index when using `apply`. |  
  | **`observed`** | If `True`, only show observed values for categorical groupers. If `False`, show all categories. |  
  | **`dropna`** | If `True`, exclude NA values in group keys. If `False`, include NA as a group. |

---

### **Returns**

- **`DataFrameGroupBy`**: A groupby object containing information about the groups.

---

### **Key Notes**

1. **Grouping Process**:

   - **Split**: Splits the DataFrame into groups based on the `by` parameter.
   - **Apply**: Applies a function (e.g., aggregation, transformation) to each group.
   - **Combine**: Combines the results into a new DataFrame or Series.

2. **Common Use Cases**:

   - **Aggregation**: Compute summary statistics (e.g., `mean`, `sum`, `count`).
   - **Transformation**: Perform group-wise operations while preserving the original shape.
   - **Filtration**: Filter groups based on a condition.

3. **Flexibility**:
   - Group by **single or multiple columns**.
   - Use **custom functions** or **built-in methods**.

---

### **Examples**

#### 1. **Basic Grouping and Aggregation**

Group by a single column and compute the mean:

```python
df = pd.DataFrame({
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Max Speed': [380., 370., 24., 26.]
})

df.groupby('Animal').mean()
```

Output:

```
        Max Speed
Animal
Falcon      375.0
Parrot       25.0
```

---

#### 2. **Grouping by Multiple Columns**

Group by multiple columns and compute the sum:

```python
df = pd.DataFrame({
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Type': ['Wild', 'Captive', 'Wild', 'Captive'],
    'Max Speed': [350., 370., 20., 26.]
})

df.groupby(['Animal', 'Type']).sum()
```

Output:

```
                Max Speed
Animal Type
Falcon Captive      370.0
       Wild         350.0
Parrot Captive       26.0
       Wild          20.0
```

---

#### 3. **Grouping with Hierarchical Index**

Group by a level in a MultiIndex DataFrame:

```python
arrays = [['Falcon', 'Falcon', 'Parrot', 'Parrot'],
          ['Captive', 'Wild', 'Captive', 'Wild']]
index = pd.MultiIndex.from_arrays(arrays, names=('Animal', 'Type'))
df = pd.DataFrame({'Max Speed': [390., 350., 30., 20.]}, index=index)

df.groupby(level=0).mean()  # Group by the first level of the index
```

Output:

```
        Max Speed
Animal
Falcon      370.0
Parrot       25.0
```

---

#### 4. **Handling NA Values**

Include or exclude NA values in group keys:

```python
df = pd.DataFrame({
    'a': [1, 1, 2, 1],
    'b': [2, None, 1, 2],
    'c': [3, 4, 3, 2]
})

# Exclude NA (default)
df.groupby('b').sum()
```

Output:

```
     a  c
b
1.0  2  3
2.0  2  5
```

```python
# Include NA
df.groupby('b', dropna=False).sum()
```

Output:

```
     a  c
b
1.0  2  3
2.0  2  5
NaN  1  4
```

---

#### 5. **Using `apply` with Group Keys**

Include or exclude group keys in the result:

```python
df = pd.DataFrame({
    'Animal': ['Falcon', 'Falcon', 'Parrot', 'Parrot'],
    'Max Speed': [380., 370., 24., 26.]
})

# Include group keys (default)
df.groupby('Animal', group_keys=True).apply(lambda x: x)
```

Output:

```
          Max Speed
Animal
Falcon 0      380.0
       1      370.0
Parrot 2       24.0
       3       26.0
```

```python
# Exclude group keys
df.groupby('Animal', group_keys=False).apply(lambda x: x)
```

Output:

```
   Max Speed
0      380.0
1      370.0
2       24.0
3       26.0
```

---

### **Related Functions**

- **`agg()`**: Aggregate groups using one or more operations.
- **`transform()`**: Perform group-wise transformations while preserving the shape.
- **`filter()`**: Filter groups based on a condition.
- **`resample()`**: Group time-series data by frequency.

---

### **When to Use `groupby()`**

- To **split data into groups** based on a condition.
- To **compute group-wise statistics** (e.g., mean, sum, count).
- To **transform data** within groups (e.g., normalize, rank).
- To **filter groups** based on aggregate conditions.

This method is essential for data analysis and manipulation in pandas!


In [None]:
""" pandas.DataFrame.rolling
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=<no_default>, closed=None, step=None, method='single')[source]
Provide rolling window calculations.

Parameters:
windowint, timedelta, str, offset, or BaseIndexer subclass
Size of the moving window.

If an integer, the fixed number of observations used for each window.

If a timedelta, str, or offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link.

If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, closed and step will be passed to get_window_bounds.

min_periodsint, default None
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

For a window that is specified by an offset, min_periods will default to 1.

For a window that is specified by an integer, min_periods will default to the size of the window.

centerbool, default False
If False, set the window labels as the right edge of the window index.

If True, set the window labels as the center of the window index.

win_typestr, default None
If None, all points are evenly weighted.

If a string, it must be a valid scipy.signal window function.

Certain Scipy window types require additional parameters to be passed in the aggregation function. The additional parameters must match the keywords specified in the Scipy window type method signature.

onstr, optional
For a DataFrame, a column label or Index level on which to calculate the rolling window, rather than the DataFrame’s index.

Provided integer column is ignored and excluded from result since an integer index is not used to calculate the rolling window.

axisint or str, default 0
If 0 or 'index', roll across the rows.

If 1 or 'columns', roll across the columns.

For Series this parameter is unused and defaults to 0.

Deprecated since version 2.1.0: The axis keyword is deprecated. For axis=1, transpose the DataFrame first instead.

closedstr, default None
If 'right', the first point in the window is excluded from calculations.

If 'left', the last point in the window is excluded from calculations.

If 'both', the no points in the window are excluded from calculations.

If 'neither', the first and last points in the window are excluded from calculations.

Default None ('right').

stepint, default None
Added in version 1.5.0.

Evaluate the window at every step result, equivalent to slicing as [::step]. window must be an integer. Using a step argument other than None or 1 will produce a result with a different shape than the input.

methodstr {‘single’, ‘table’}, default ‘single’
Added in version 1.3.0.

Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

Returns:
pandas.api.typing.Window or pandas.api.typing.Rolling
An instance of Window is returned if win_type is passed. Otherwise, an instance of Rolling is returned.

See also

expanding
Provides expanding transformations.

ewm
Provides exponential weighted functions.

Notes

See Windowing Operations for further usage details and examples.

Examples

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
window

Rolling sum with a window length of 2 observations.

df.rolling(2).sum()
     B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN
Rolling sum with a window span of 2 seconds.

df_time = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
                       index=[pd.Timestamp('20130101 09:00:00'),
                              pd.Timestamp('20130101 09:00:02'),
                              pd.Timestamp('20130101 09:00:03'),
                              pd.Timestamp('20130101 09:00:05'),
                              pd.Timestamp('20130101 09:00:06')])
df_time
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
df_time.rolling('2s').sum()
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
Rolling sum with forward looking windows with 2 observations.

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
df.rolling(window=indexer, min_periods=1).sum()
     B
0  1.0
1  3.0
2  2.0
3  4.0
4  4.0
min_periods

Rolling sum with a window length of 2 observations, but only needs a minimum of 1 observation to calculate a value.

df.rolling(2, min_periods=1).sum()
     B
0  0.0
1  1.0
2  3.0
3  2.0
4  4.0
center

Rolling sum with the result assigned to the center of the window index.

df.rolling(3, min_periods=1, center=True).sum()
     B
0  1.0
1  3.0
2  3.0
3  6.0
4  4.0
df.rolling(3, min_periods=1, center=False).sum()
     B
0  0.0
1  1.0
2  3.0
3  3.0
4  6.0
step

Rolling sum with a window length of 2 observations, minimum of 1 observation to calculate a value, and a step of 2.

df.rolling(2, min_periods=1, step=2).sum()
     B
0  0.0
2  3.0
4  4.0
win_type

Rolling sum with a window length of 2, using the Scipy 'gaussian' window type. std is required in the aggregation function.

df.rolling(2, win_type='gaussian').sum(std=3)
          B
0       NaN
1  0.986207
2  2.958621
3       NaN
4       NaN
on

Rolling sum with a window length of 2 days.

df = pd.DataFrame({
    'A': [pd.to_datetime('2020-01-01'),
          pd.to_datetime('2020-01-01'),
          pd.to_datetime('2020-01-02'),],
    'B': [1, 2, 3], },
    index=pd.date_range('2020', periods=3))
df
                    A  B
2020-01-01 2020-01-01  1
2020-01-02 2020-01-01  2
2020-01-03 2020-01-02  3
df.rolling('2D', on='A').sum()
                    A    B
2020-01-01 2020-01-01  1.0
2020-01-02 2020-01-01  3.0
2020-01-03 2020-01-02  6.0 all syntaxes all about  """

Here’s a comprehensive explanation of **`pandas.DataFrame.rolling()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.rolling()`**

Provides rolling window calculations, which are useful for time-series or sequential data analysis. It allows you to compute metrics (e.g., mean, sum, standard deviation) over a sliding window of observations.

---

### **Syntax**

```python
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method='single')
```

---

### **Parameters**

| Parameter    | Description                        |
| ------------ | ---------------------------------- |
| **`window`** | Size of the moving window. Can be: |

- **Integer**: Fixed number of observations.
- **Timedelta/str/offset**: Variable-sized window for datetime-like indexes.
- **BaseIndexer subclass**: Custom window boundaries. |  
  | **`min_periods`** | Minimum number of observations in a window to compute a result. Defaults to `window` size for integer windows, or `1` for offset-based windows. |  
  | **`center`** | If `True`, label the window by its center. If `False`, label by the right edge. |  
  | **`win_type`** | Type of window weighting (e.g., `'gaussian'`, `'triang'`). Requires additional parameters for some types. |  
  | **`on`** | Column or index level to use for rolling calculations (instead of the DataFrame index). |  
  | **`axis`** | Axis to roll along: `0` (rows) or `1` (columns). Default is `0`. |  
  | **`closed`** | Specifies which end of the window is inclusive:
- `'right'`: Exclude the first point (default).
- `'left'`: Exclude the last point.
- `'both'`: Include both ends.
- `'neither'`: Exclude both ends. |  
  | **`step`** | Evaluate the window at every `step` result (e.g., `step=2` skips every other window). |  
  | **`method`** | Execution method: `'single'` (per column/row) or `'table'` (entire object). Only works with `engine='numba'`. |

---

### **Returns**

- **`Rolling` object**: If `win_type` is not specified.
- **`Window` object**: If `win_type` is specified.

---

### **Key Notes**

1. **Window Types**:

   - **Fixed windows**: Use an integer for a fixed number of observations.
   - **Variable windows**: Use a timedelta or offset for time-based windows.
   - **Custom windows**: Use a `BaseIndexer` subclass for custom logic.

2. **Common Use Cases**:

   - **Moving averages**: Smooth time-series data.
   - **Cumulative metrics**: Compute rolling sums, standard deviations, etc.
   - **Weighted calculations**: Apply window functions (e.g., Gaussian, exponential).

3. **Handling Missing Data**:
   - Use `min_periods` to control how many observations are required for a valid result.

---

### **Examples**

#### 1. **Basic Rolling Sum**

Compute a rolling sum with a window of 2 observations:

```python
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df.rolling(2).sum()
```

Output:

```
     B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN
```

---

#### 2. **Time-Based Rolling Window**

Compute a rolling sum with a 2-second window:

```python
df_time = pd.DataFrame(
    {'B': [0, 1, 2, np.nan, 4]},
    index=pd.to_datetime([
        '2013-01-01 09:00:00',
        '2013-01-01 09:00:02',
        '2013-01-01 09:00:03',
        '2013-01-01 09:00:05',
        '2013-01-01 09:00:06'
    ])
)

df_time.rolling('2s').sum()
```

Output:

```
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
```

---

#### 3. **Custom Window with `min_periods`**

Compute a rolling sum with a window of 2 observations, requiring at least 1 observation:

```python
df.rolling(2, min_periods=1).sum()
```

Output:

```
     B
0  0.0
1  1.0
2  3.0
3  2.0
4  4.0
```

---

#### 4. **Centered Rolling Window**

Compute a rolling sum with the result centered in the window:

```python
df.rolling(3, min_periods=1, center=True).sum()
```

Output:

```
     B
0  1.0
1  3.0
2  3.0
3  6.0
4  4.0
```

---

#### 5. **Step-Based Rolling Window**

Compute a rolling sum with a step of 2:

```python
df.rolling(2, min_periods=1, step=2).sum()
```

Output:

```
     B
0  0.0
2  3.0
4  4.0
```

---

#### 6. **Weighted Rolling Window**

Compute a rolling sum using a Gaussian window:

```python
df.rolling(2, win_type='gaussian').sum(std=3)
```

Output:

```
          B
0       NaN
1  0.986207
2  2.958621
3       NaN
4       NaN
```

---

#### 7. **Rolling on a Specific Column**

Compute a rolling sum on a specific column:

```python
df = pd.DataFrame({
    'A': pd.to_datetime(['2020-01-01', '2020-01-01', '2020-01-02']),
    'B': [1, 2, 3]
}, index=pd.date_range('2020', periods=3))

df.rolling('2D', on='A').sum()
```

Output:

```
                    A    B
2020-01-01 2020-01-01  1.0
2020-01-02 2020-01-01  3.0
2020-01-03 2020-01-02  6.0
```

---

### **Related Functions**

- **`expanding()`**: Cumulative window calculations.
- **`ewm()`**: Exponential weighted moving calculations.
- **`groupby()`**: Group-wise operations.

---

### **When to Use `rolling()`**

- For **time-series analysis** (e.g., moving averages, trends).
- For **smoothing noisy data**.
- For **sequential data calculations** (e.g., rolling sums, standard deviations).

This method is essential for time-series and sequential data analysis in pandas!


The `pandas.DataFrame.rolling()` method provides rolling window calculations, which are useful for time series analysis and smoothing data. This method allows you to perform calculations over a specified window of data points, making it easier to analyze trends and patterns.

### Overview of `DataFrame.rolling()`

### Syntax

```python
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method='single')
```

### Parameters

- **window**: `int`, `timedelta`, `str`, `offset`, or `BaseIndexer subclass`

  - Defines the size of the moving window. It can be:
    - An integer for a fixed number of observations.
    - A timedelta, string, or offset for time-based windows (only valid for datetime-like indexes).

- **min_periods**: `int`, default is `None`

  - Minimum number of observations in the window required to have a value; otherwise, the result is `np.nan`. Defaults to the window size if an integer is specified.

- **center**: `bool`, default is `False`

  - If `True`, the window labels are set to the center of the window. If `False`, the labels are set to the right edge.

- **win_type**: `str`, default is `None`

  - If specified, it must be a valid SciPy window function (e.g., 'hamming', 'blackman'). If `None`, all points are evenly weighted.

- **on**: `str`, optional

  - For a DataFrame, specifies a column label or index level on which to calculate the rolling window.

- **axis**: `int` or `str`, default is `0`

  - Specifies whether to roll across rows (0) or columns (1). This parameter is ignored for Series.

- **closed**: `str`, default is `None`

  - Defines which endpoints are included in the window:
    - 'right': include the right endpoint (default).
    - 'left': include the left endpoint.
    - 'both': include both endpoints.
    - 'neither': exclude both endpoints.

- **step**: `int`, default is `None`

  - Added in version 1.5.0. Evaluates the window at every specified step. Must be an integer.

- **method**: `str`, default is `'single'`
  - Specifies how to execute the rolling operation. 'single' rolls per column/row, while 'table' rolls over the entire object (only available with `engine='numba'`).

### Returns

- **Window or Rolling**
  - Returns a `Window` instance if `win_type` is specified; otherwise, it returns a `Rolling` instance.

### Examples

#### Example 1: Basic Rolling Calculation

```python
import pandas as pd
import numpy as np

# Creating a DataFrame
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})

print("Original DataFrame:")
print(df)

# Rolling sum with a window length of 2 observations
rolling_sum = df.rolling(2).sum()
print("\nRolling sum with a window length of 2:")
print(rolling_sum)
```

**Output:**

```
Original DataFrame:
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Rolling sum with a window length of 2:
     B
0  NaN
1  1.0
2  3.0
3  NaN
4  NaN
```

#### Example 2: Rolling with Time-Based Index

```python
# Creating a DataFrame with a datetime index
df_time = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},
                       index=[pd.Timestamp('20130101 09:00:00'),
                              pd.Timestamp('20130101 09:00:02'),
                              pd.Timestamp('20130101 09:00:03'),
                              pd.Timestamp('20130101 09:00:05'),
                              pd.Timestamp('20130101 09:00:06')])

print("\nTime-based DataFrame:")
print(df_time)

# Rolling sum with a window span of 2 seconds
rolling_time_sum = df_time.rolling('2s').sum()
print("\nRolling sum with a window span of 2 seconds:")
print(rolling_time_sum)
```

**Output:**

```
Time-based DataFrame:
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  2.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0

Rolling sum with a window span of 2 seconds:
                       B
2013-01-01 09:00:00  0.0
2013-01-01 09:00:02  1.0
2013-01-01 09:00:03  3.0
2013-01-01 09:00:05  NaN
2013-01-01 09:00:06  4.0
```


In [None]:
""" pandas.DataFrame.expanding
DataFrame.expanding(min_periods=1, axis=<no_default>, method='single')[source]
Provide expanding window calculations.

Parameters:
min_periodsint, default 1
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

axisint or str, default 0
If 0 or 'index', roll across the rows.

If 1 or 'columns', roll across the columns.

For Series this parameter is unused and defaults to 0.

methodstr {‘single’, ‘table’}, default ‘single’
Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

Added in version 1.3.0.

Returns:
pandas.api.typing.Expanding
See also

rolling
Provides rolling window calculations.

ewm
Provides exponential weighted functions.

Notes

See Windowing Operations for further usage details and examples.

Examples

df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
min_periods

Expanding sum with 1 vs 3 observations needed to calculate a value.

df.expanding(1).sum()
     B
0  0.0
1  1.0
2  3.0
3  3.0
4  7.0
df.expanding(3).sum()
     B
0  NaN
1  NaN
2  3.0
3  3.0
4  7.0 """

The `pandas.DataFrame.expanding()` method provides expanding window calculations, which allow you to compute statistics over an expanding window of observations in your DataFrame. This method is particularly useful for cumulative calculations, where each result includes all previous data points up to the current observation.

### Overview of `DataFrame.expanding()`

### Syntax

```python
DataFrame.expanding(min_periods=1, axis=0, method='single')
```

### Parameters

- **min_periods**: `int`, default is `1`

  - The minimum number of observations in the window required to have a value; otherwise, the result is `np.nan`.

- **axis**: `int` or `str`, default is `0`

  - If `0` or `'index'`, the operation rolls across the rows. If `1` or `'columns'`, it rolls across the columns.

- **method**: `str`, {‘single’, ‘table’}, default is `'single'`
  - Specifies whether to execute the expanding operation per column or row ('single') or over the entire object ('table'). This argument is only relevant when using `engine='numba'`.

### Returns

- **Expanding**
  - An instance of the `Expanding` object is returned, which provides methods for calculating various statistics over the expanding window.

### Use Cases

The expanding window is particularly useful for cumulative sums, means, or any other statistics that aggregate data from the beginning of a dataset up to the current observation.

### Examples

#### Example 1: Basic Expanding Calculation

```python
import pandas as pd
import numpy as np

# Creating a DataFrame
df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})

print("Original DataFrame:")
print(df)

# Expanding sum with a minimum of 1 observation needed to calculate a value
expanding_sum_1 = df.expanding(1).sum()
print("\nExpanding sum with min_periods=1:")
print(expanding_sum_1)

# Expanding sum with a minimum of 3 observations needed to calculate a value
expanding_sum_3 = df.expanding(3).sum()
print("\nExpanding sum with min_periods=3:")
print(expanding_sum_3)
```

**Output:**

```
Original DataFrame:
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Expanding sum with min_periods=1:
     B
0  0.0
1  1.0
2  3.0
3  3.0
4  7.0

Expanding sum with min_periods=3:
     B
0  NaN
1  NaN
2  3.0
3  3.0
4  7.0
```

In this example:

- The first expanding sum with `min_periods=1` gives cumulative sums starting from the first row, so each row's value reflects the sum of all previous rows including itself.
- The second expanding sum with `min_periods=3` requires at least 3 observations, so the first two rows return `NaN`.

#### Example 2: Expanding Mean Calculation

```python
# Expanding mean with a minimum of 1 observation needed
expanding_mean = df.expanding().mean()
print("\nExpanding mean with min_periods=1:")
print(expanding_mean)
```

**Output:**

```
Expanding mean with min_periods=1:
     B
0  0.0
1  0.5
2  1.0
3  1.0
4  1.75
```

This calculation shows the average of all previous values (including the current value) for each row.

### Conclusion

The `pandas.DataFrame.expanding()` method is a powerful tool for conducting cumulative analyses on DataFrames. It allows for the calculation of various statistical measures while accommodating different minimum observation requirements.


Here’s a comprehensive explanation of **`pandas.DataFrame.expanding()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.expanding()`**

Provides expanding window calculations, which are cumulative and grow over time. It is useful for computing cumulative metrics (e.g., cumulative sum, mean, or standard deviation) over a dataset.

---

### **Syntax**

```python
DataFrame.expanding(min_periods=1, axis=0, method='single')
```

---

### **Parameters**

| Parameter         | Description                                                                                |
| ----------------- | ------------------------------------------------------------------------------------------ |
| **`min_periods`** | Minimum number of observations in the window required to compute a result. Default is `1`. |
| **`axis`**        | Axis to apply the expanding window:                                                        |

- `0` or `'index'`: Apply to rows (default).
- `1` or `'columns'`: Apply to columns. |  
  | **`method`** | Execution method:
- `'single'`: Perform calculations per column/row (default).
- `'table'`: Perform calculations over the entire object (requires `engine='numba'`). |

---

### **Returns**

- **`Expanding` object**: Used to compute cumulative metrics.

---

### **Key Notes**

1. **Expanding Window**:

   - The window starts at the first observation and grows to include all previous observations.
   - Useful for cumulative calculations (e.g., cumulative sum, cumulative average).

2. **Handling Missing Data**:

   - Use `min_periods` to control how many observations are required for a valid result.

3. **Common Use Cases**:
   - Cumulative sums, averages, or other metrics.
   - Analyzing trends over time.

---

### **Examples**

#### 1. **Basic Expanding Sum**

Compute the cumulative sum with a minimum of 1 observation:

```python
df = pd.DataFrame({"B": [0, 1, 2, np.nan, 4]})
df.expanding(1).sum()
```

Output:

```
     B
0  0.0
1  1.0
2  3.0
3  3.0
4  7.0
```

---

#### 2. **Expanding Sum with `min_periods`**

Compute the cumulative sum with a minimum of 3 observations:

```python
df.expanding(3).sum()
```

Output:

```
     B
0  NaN
1  NaN
2  3.0
3  3.0
4  7.0
```

---

#### 3. **Expanding Mean**

Compute the cumulative mean:

```python
df.expanding().mean()
```

Output:

```
     B
0  0.0
1  0.5
2  1.0
3  1.0
4  1.75
```

---

#### 4. **Expanding Standard Deviation**

Compute the cumulative standard deviation:

```python
df.expanding().std()
```

Output:

```
     B
0  NaN
1  0.707107
2  1.000000
3  1.000000
4  1.707825
```

---

#### 5. **Expanding on Columns**

Compute the cumulative sum across columns:

```python
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
df.expanding(axis=1).sum()
```

Output:

```
   A   B
0  1   5
1  2   7
2  3   9
```

---

### **Related Functions**

- **`rolling()`**: Rolling window calculations.
- **`ewm()`**: Exponential weighted moving calculations.
- **`cumsum()`**: Cumulative sum (simpler alternative for specific cases).

---

### **When to Use `expanding()`**

- For **cumulative calculations** (e.g., cumulative sums, averages).
- For **trend analysis** over time.
- When you need **growing window metrics** instead of fixed-size windows.

This method is essential for cumulative data analysis in pandas!


In [None]:
""" pandas.DataFrame.ewm
DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=<no_default>, times=None, method='single')[source]
Provide exponentially weighted (EW) calculations.

Exactly one of com, span, halflife, or alpha must be provided if times is not provided. If times is provided, halflife and one of com, span or alpha may be provided.

Parameters:
comfloat, optional
Specify decay in terms of center of mass

, for 
.

spanfloat, optional
Specify decay in terms of span

, for 
.

halflifefloat, str, timedelta, optional
Specify decay in terms of half-life

, for 
.

If times is specified, a timedelta convertible unit over which an observation decays to half its value. Only applicable to mean(), and halflife value will not apply to the other functions.

alphafloat, optional
Specify smoothing factor 
 directly

.

min_periodsint, default 0
Minimum number of observations in window required to have a value; otherwise, result is np.nan.

adjustbool, default True
Divide by decaying adjustment factor in beginning periods to account for imbalance in relative weightings (viewing EWMA as a moving average).

When adjust=True (default), the EW function is calculated using weights 
. For example, the EW moving average of the series [
] would be:

 
When adjust=False, the exponentially weighted function is calculated recursively:

 
 
 
ignore_nabool, default False
Ignore missing values when calculating weights.

When ignore_na=False (default), weights are based on absolute positions. For example, the weights of 
 and 
 used in calculating the final weighted average of [
, None, 
] are 
 and 
 if adjust=True, and 
 and 
 if adjust=False.

When ignore_na=True, weights are based on relative positions. For example, the weights of 
 and 
 used in calculating the final weighted average of [
, None, 
] are 
 and 
 if adjust=True, and 
 and 
 if adjust=False.

axis{0, 1}, default 0
If 0 or 'index', calculate across the rows.

If 1 or 'columns', calculate across the columns.

For Series this parameter is unused and defaults to 0.

timesnp.ndarray, Series, default None
Only applicable to mean().

Times corresponding to the observations. Must be monotonically increasing and datetime64[ns] dtype.

If 1-D array like, a sequence with the same shape as the observations.

methodstr {‘single’, ‘table’}, default ‘single’
Added in version 1.4.0.

Execute the rolling operation per single column or row ('single') or over the entire object ('table').

This argument is only implemented when specifying engine='numba' in the method call.

Only applicable to mean()

Returns:
pandas.api.typing.ExponentialMovingWindow
See also

rolling
Provides rolling window calculations.

expanding
Provides expanding transformations.

Notes

See Windowing Operations for further usage details and examples.

Examples

df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0
df.ewm(com=0.5).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
df.ewm(alpha=2 / 3).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
adjust

df.ewm(com=0.5, adjust=True).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
df.ewm(com=0.5, adjust=False).mean()
          B
0  0.000000
1  0.666667
2  1.555556
3  1.555556
4  3.650794
ignore_na

df.ewm(com=0.5, ignore_na=True).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.225000
df.ewm(com=0.5, ignore_na=False).mean()
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
times

Exponentially weighted mean with weights calculated with a timedelta halflife relative to times.

times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
          B
0  0.000000
1  0.585786
2  1.523889
3  1.523889
4  3.233686 """

The `pandas.DataFrame.ewm()` method provides exponentially weighted calculations, which are useful for smoothing time series data by giving more weight to recent observations. This method can be used to compute various statistics such as the exponentially weighted mean, variance, and standard deviation.

### Overview of `DataFrame.ewm()`

### Syntax

```python
DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0, times=None, method='single')
```

### Parameters

- **com**: `float`, optional

  - Specifies decay in terms of center of mass. The relationship is given by:
    $$ \alpha = \frac{1}{1 + com} $$

- **span**: `float`, optional

  - Specifies decay in terms of span. The relationship is:
    $$ \alpha = \frac{2}{span + 1} $$

- **halflife**: `float`, `str`, or `timedelta`, optional

  - Specifies decay in terms of half-life. The relationship is:
    $$ \alpha = 1 - \exp\left(-\frac{\ln(2)}{halflife}\right) $$

- **alpha**: `float`, optional

  - Specifies the smoothing factor directly, where $$ 0 < \alpha \leq 1 $$.

- **min_periods**: `int`, default is `0`

  - Minimum number of observations in the window required to have a value; otherwise, the result is `np.nan`.

- **adjust**: `bool`, default is `True`

  - If `True`, the function divides by a decaying adjustment factor in beginning periods to account for imbalance in relative weightings.

- **ignore_na**: `bool`, default is `False`

  - If `True`, missing values are ignored when calculating weights.

- **axis**: `int` or `str`, default is `0`

  - If `0` or `'index'`, calculates across the rows; if `1` or `'columns'`, calculates across the columns.

- **times**: `np.ndarray` or `Series`, default is `None`

  - Only applicable to the `mean()` function. Specifies times corresponding to observations, which must be monotonically increasing and of `datetime64[ns]` dtype.

- **method**: `str`, {‘single’, ‘table’}, default is `'single'`
  - Specifies whether to execute the operation per single column or row ('single') or over the entire object ('table'). This is implemented when using `engine='numba'`.

### Returns

- **ExponentialMovingWindow**
  - Returns an instance of the `ExponentialMovingWindow` object, which provides methods for calculating exponentially weighted statistics.

### Examples

#### Example 1: Basic Exponentially Weighted Mean

```python
import pandas as pd
import numpy as np

# Creating a DataFrame
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})

print("Original DataFrame:")
print(df)

# Exponentially weighted mean with center of mass
ewm_mean_com = df.ewm(com=0.5).mean()
print("\nExponentially weighted mean (com=0.5):")
print(ewm_mean_com)

# Exponentially weighted mean with alpha
ewm_mean_alpha = df.ewm(alpha=2/3).mean()
print("\nExponentially weighted mean (alpha=2/3):")
print(ewm_mean_alpha)
```

**Output:**

```
Original DataFrame:
     B
0  0.0
1  1.0
2  2.0
3  NaN
4  4.0

Exponentially weighted mean (com=0.5):
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213

Exponentially weighted mean (alpha=2/3):
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

#### Example 2: Adjusting for Missing Values

```python
# Exponentially weighted mean with adjust=True
ewm_adjust_true = df.ewm(com=0.5, adjust=True).mean()
print("\nExponentially weighted mean with adjust=True:")
print(ewm_adjust_true)

# Exponentially weighted mean with adjust=False
ewm_adjust_false = df.ewm
```


Here’s a comprehensive explanation of **`pandas.DataFrame.ewm()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.ewm()`**

Provides exponentially weighted (EW) calculations, which are useful for smoothing time-series data or computing weighted metrics. It assigns exponentially decreasing weights to older observations, giving more importance to recent data.

---

### **Syntax**

```python
DataFrame.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0, times=None, method='single')
```

---

### **Parameters**

| Parameter         | Description                                                                                            |
| ----------------- | ------------------------------------------------------------------------------------------------------ |
| **`com`**         | Specify decay in terms of **center of mass** (e.g., `com=0.5`).                                        |
| **`span`**        | Specify decay in terms of **span** (e.g., `span=10`).                                                  |
| **`halflife`**    | Specify decay in terms of **half-life** (e.g., `halflife=5`). Can be a float, string, or timedelta.    |
| **`alpha`**       | Specify **smoothing factor** directly (e.g., `alpha=0.5`).                                             |
| **`min_periods`** | Minimum number of observations required to compute a result. Default is `0`.                           |
| **`adjust`**      | If `True`, adjust weights to account for imbalance in early periods. Default is `True`.                |
| **`ignore_na`**   | If `True`, ignore missing values when calculating weights. Default is `False`.                         |
| **`axis`**        | Axis to apply the calculation: `0` (rows) or `1` (columns). Default is `0`.                            |
| **`times`**       | Times corresponding to observations (for time-based decay). Must be monotonically increasing.          |
| **`method`**      | Execution method: `'single'` (per column/row) or `'table'` (entire object). Requires `engine='numba'`. |

---

### **Returns**

- **`ExponentialMovingWindow` object**: Used to compute exponentially weighted metrics.

---

### **Key Notes**

1. **Decay Parameters**:

   - Exactly one of `com`, `span`, `halflife`, or `alpha` must be provided (unless `times` is specified).
   - These parameters control how quickly weights decay:
     - `com`: Center of mass.
     - `span`: Span of the window.
     - `halflife`: Half-life of the decay.
     - `alpha`: Smoothing factor directly.

2. **Adjustment**:

   - If `adjust=True`, weights are adjusted to account for imbalance in early periods.
   - If `adjust=False`, weights are calculated recursively.

3. **Handling Missing Data**:

   - If `ignore_na=True`, missing values are ignored when calculating weights.
   - If `ignore_na=False`, missing values are treated as part of the sequence.

4. **Common Use Cases**:
   - Smoothing time-series data.
   - Computing exponentially weighted moving averages (EWMA).
   - Analyzing trends with more weight on recent observations.

---

### **Examples**

#### 1. **Basic Exponentially Weighted Mean**

Compute the exponentially weighted mean using `com`:

```python
df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
df.ewm(com=0.5).mean()
```

Output:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

---

#### 2. **Exponentially Weighted Mean with `alpha`**

Compute the exponentially weighted mean using `alpha`:

```python
df.ewm(alpha=2/3).mean()
```

Output:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

---

#### 3. **Adjustment (`adjust` Parameter)**

Compare `adjust=True` (default) and `adjust=False`:

```python
# With adjust=True (default)
df.ewm(com=0.5, adjust=True).mean()
```

Output:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

```python
# With adjust=False
df.ewm(com=0.5, adjust=False).mean()
```

Output:

```
          B
0  0.000000
1  0.666667
2  1.555556
3  1.555556
4  3.650794
```

---

#### 4. **Handling Missing Data (`ignore_na` Parameter)**

Compare `ignore_na=True` and `ignore_na=False`:

```python
# With ignore_na=True
df.ewm(com=0.5, ignore_na=True).mean()
```

Output:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.225000
```

```python
# With ignore_na=False (default)
df.ewm(com=0.5, ignore_na=False).mean()
```

Output:

```
          B
0  0.000000
1  0.750000
2  1.615385
3  1.615385
4  3.670213
```

---

#### 5. **Time-Based Decay (`times` Parameter)**

Compute the exponentially weighted mean with time-based decay:

```python
times = ['2020-01-01', '2020-01-03', '2020-01-10', '2020-01-15', '2020-01-17']
df.ewm(halflife='4 days', times=pd.DatetimeIndex(times)).mean()
```

Output:

```
          B
0  0.000000
1  0.585786
2  1.523889
3  1.523889
4  3.233686
```

---

### **Related Functions**

- **`rolling()`**: Rolling window calculations.
- **`expanding()`**: Cumulative window calculations.
- **`cumsum()`**: Cumulative sum (simpler alternative for specific cases).

---

### **When to Use `ewm()`**

- For **smoothing time-series data** with more weight on recent observations.
- For **trend analysis** with exponential decay.
- When you need **weighted metrics** that prioritize recent data.

This method is essential for time-series analysis and smoothing in pandas!


In [None]:
""" pandas.DataFrame.abs
DataFrame.abs()[source]
Return a Series/DataFrame with absolute numeric value of each element.

This function only applies to elements that are all numeric.

Returns
:
abs
Series/DataFrame containing the absolute value of each element.

See also

numpy.absolute
Calculate the absolute value element-wise.

Notes

For complex inputs, 1.2 + 1j, the absolute value is 
.

Examples

Absolute numeric values in a Series.

s = pd.Series([-1.10, 2, -3.33, 4])
s.abs()
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64
Absolute numeric values in a Series with complex numbers.

s = pd.Series([1.2 + 1j])
s.abs()
0    1.56205
dtype: float64
Absolute numeric values in a Series with a Timedelta element.

s = pd.Series([pd.Timedelta('1 days')])
s.abs()
0   1 days
dtype: timedelta64[ns]
Select rows with data closest to certain value using argsort (from StackOverflow).

df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})
df
     a    b    c
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
df.loc[(df.c - 43).abs().argsort()]
     a    b    c
1    5   20   50
0    4   10  100
2    6   30  -30
3    7   40  -50 """

The `pandas.DataFrame.abs()` method is used to compute the absolute numeric values of each element in a DataFrame or Series. This function is particularly useful when you want to ignore the sign of the numeric values, returning only their magnitudes.

### Overview of `DataFrame.abs()`

### Syntax

```python
DataFrame.abs()
```

### Returns

- **abs**: `Series` or `DataFrame`
  - A Series or DataFrame containing the absolute value of each element. This function only applies to numeric elements.

### Notes

- The `abs()` method is similar to `numpy.absolute()`, which also calculates the absolute value element-wise.
- For complex numbers, the absolute value is calculated as the magnitude of the complex number, given by the formula:
  $$ |z| = \sqrt{a^2 + b^2} $$
  where \( z = a + bi \).

### Examples

#### Example 1: Absolute Numeric Values in a Series

```python
import pandas as pd

# Creating a Series with negative and positive values
s = pd.Series([-1.10, 2, -3.33, 4])
print("Absolute values in Series:")
print(s.abs())
```

**Output:**

```
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64
```

#### Example 2: Absolute Numeric Values in a Series with Complex Numbers

```python
# Creating a Series with a complex number
s_complex = pd.Series([1.2 + 1j])
print("\nAbsolute values in Series with complex numbers:")
print(s_complex.abs())
```

**Output:**

```
0    1.56205
dtype: float64
```

#### Example 3: Absolute Numeric Values in a Series with Timedelta

```python
# Creating a Series with a Timedelta element
s_timedelta = pd.Series([pd.Timedelta('1 days')])
print("\nAbsolute values in Series with Timedelta:")
print(s_timedelta.abs())
```

**Output:**

```
0   1 days
dtype: timedelta64[ns]
```

#### Example 4: Selecting Rows Based on Closest Values Using Absolute Difference

You can also use the `abs()` method to select rows with data closest to a certain value. Here’s how to do it using `argsort()`:

```python
# Creating a DataFrame
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})

print("\nOriginal DataFrame:")
print(df)

# Selecting rows where column 'c' is closest to 43
closest_rows = df.loc[(df.c - 43).abs().argsort()]
print("\nRows with 'c' closest to 43:")
print(closest_rows)
```

**Output:**

```
Original DataFrame:
   a   b    c
0  4  10  100
1  5  20   50
2  6  30  -30
3  7  40  -50

Rows with 'c' closest to 43:
   a   b   c
1  5  20  50
0  4  10 100
2  6  30 -30
3  7  40 -50
```

In this example, we calculate the absolute difference between the values in column `c` and the target value (43), then use `argsort()` to sort the indices based on the smallest differences, allowing us to retrieve the rows in order of proximity to the target value.


Here’s a comprehensive explanation of **`pandas.DataFrame.abs()`**, covering its syntax, behavior, and examples:

---

### **`pandas.DataFrame.abs()`**

Returns a DataFrame or Series where each element is replaced by its **absolute value**. This function is applicable only to numeric data (integers, floats, and complex numbers).

---

### **Syntax**

```python
DataFrame.abs()
```

---

### **Returns**

- **DataFrame/Series**: A new DataFrame or Series with the absolute value of each element.

---

### **Key Notes**

1. **Applicability**:

   - Works only on numeric data (integers, floats, complex numbers).
   - For complex numbers, the absolute value is calculated as the magnitude:  
     \[
     \text{abs}(a + bj) = \sqrt{a^2 + b^2}
     \]
   - For timedelta objects, the absolute value is the same as the original value (since timedeltas are always non-negative).

2. **Non-Numeric Data**:

   - If the DataFrame or Series contains non-numeric data, it will be ignored, and the function will only apply to numeric columns.

3. **Common Use Cases**:
   - Converting negative values to positive.
   - Calculating distances or deviations from a reference value.

---

### **Examples**

#### 1. **Absolute Values in a Series**

Compute the absolute values of a Series:

```python
s = pd.Series([-1.10, 2, -3.33, 4])
s.abs()
```

Output:

```
0    1.10
1    2.00
2    3.33
3    4.00
dtype: float64
```

---

#### 2. **Absolute Values in a Series with Complex Numbers**

Compute the absolute values of a Series containing complex numbers:

```python
s = pd.Series([1.2 + 1j])
s.abs()
```

Output:

```
0    1.56205
dtype: float64
```

Explanation:  
\[
\text{abs}(1.2 + 1j) = \sqrt{1.2^2 + 1^2} = 1.56205
\]

---

#### 3. **Absolute Values in a DataFrame**

Compute the absolute values of all numeric columns in a DataFrame:

```python
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})

df.abs()
```

Output:

```
   a   b    c
0  4  10  100
1  5  20   50
2  6  30   30
3  7  40   50
```

---

#### 4. **Absolute Values with Timedelta**

Compute the absolute values of a Series containing timedelta objects:

```python
s = pd.Series([pd.Timedelta('1 days')])
s.abs()
```

Output:

```
0   1 days
dtype: timedelta64[ns]
```

---

#### 5. **Select Rows Closest to a Value Using `abs()`**

Use `abs()` to find rows with values closest to a specific number:

```python
df = pd.DataFrame({
    'a': [4, 5, 6, 7],
    'b': [10, 20, 30, 40],
    'c': [100, 50, -30, -50]
})

# Find rows where column 'c' is closest to 43
df.loc[(df['c'] - 43).abs().argsort()]
```

Output:

```
   a   b    c
1  5  20   50
0  4  10  100
2  6  30  -30
3  7  40  -50
```

Explanation:

- `(df['c'] - 43).abs()` computes the absolute deviation from 43.
- `argsort()` sorts the deviations in ascending order.
- `df.loc[]` selects rows based on the sorted order.

---

### **Related Functions**

- **`numpy.absolute`**: Similar functionality in NumPy.
- **`pandas.Series.abs()`**: Absolute values for a Series.

---

### **When to Use `abs()`**

- To convert negative values to positive.
- To calculate distances, deviations, or magnitudes.
- For preprocessing data (e.g., removing negative values).

This method is simple yet powerful for numeric data manipulation in pandas!


In [None]:
""" pandas.DataFrame.all
DataFrame.all(axis=0, bool_only=False, skipna=True, **kwargs)[source]
Return whether all elements are True, potentially over an axis.

Returns True unless there at least one element within a series or along a Dataframe axis that is False or equivalent (e.g. zero or empty).

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

None : reduce all axes, return a scalar.

bool_only
bool, default False
Include only boolean columns. Not implemented for Series.

skipna
bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be True, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

**kwargs
any, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.

See also

Series.all
Return True if all elements are True.

DataFrame.any
Return True if one (or more) elements are True.

Examples

Series

pd.Series([True, True]).all()
True
pd.Series([True, False]).all()
False
pd.Series([], dtype="float64").all()
True
pd.Series([np.nan]).all()
True
pd.Series([np.nan]).all(skipna=False)
True
DataFrames

Create a dataframe from a dictionary.

df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
df
   col1   col2
0  True   True
1  True  False
Default behaviour checks if values in each column all return True.

df.all()
col1     True
col2    False
dtype: bool
Specify axis='columns' to check if values in each row all return True.

df.all(axis='columns')
0     True
1    False
dtype: bool
Or axis=None for whether every value is True.

df.all(axis=None)
False """

The `pandas.DataFrame.all()` method is used to determine whether all elements in a DataFrame (or Series) evaluate to `True` along a specified axis. It returns a boolean result indicating if every element is `True`, with options to customize its behavior regarding how it handles null values and which axes to check.

### Overview of `DataFrame.all()`

### Syntax

```python
DataFrame.all(axis=0, bool_only=False, skipna=True, **kwargs)
```

### Parameters

- **axis**: `{0 or 'index', 1 or 'columns', None}`, default `0`
  - Determines which axis to reduce:
    - `0` or `'index'` - Reduce the index (check each column).
    - `1` or `'columns'` - Reduce the columns (check each row).
    - `None` - Reduce all axes (check the entire DataFrame).
- **bool_only**: `bool`, default `False`

  - If `True`, includes only boolean columns. Note: This parameter is not implemented for Series.

- **skipna**: `bool`, default `True`

  - If `True`, excludes NA/null values. If an entire row or column is NA, the result will be `True` (for that row/column).
  - If `False`, NAs are treated as `True` (i.e., they contribute to a True outcome).

- **kwargs**: Additional keyword arguments (not used).

### Returns

- **Series or DataFrame**
  - Returns a Series if checking along one axis, or a scalar if reducing all axes.

### Related Functions

- **pandas.Series.all()**: Checks if all elements in a Series are `True`.
- **DataFrame.any()**: Returns `True` if any element is `True`.

### Examples

#### Example 1: Using `all()` with a Series

```python
import pandas as pd
import numpy as np

# Series examples
print(pd.Series([True, True]).all())          # True
print(pd.Series([True, False]).all())         # False
print(pd.Series([], dtype="float64").all())  # True (empty Series)
print(pd.Series([np.nan]).all())              # True (NA ignored)
print(pd.Series([np.nan]).all(skipna=False)) # True (NA treated as True)
```

#### Example 2: Using `all()` with a DataFrame

```python
# Creating a DataFrame from a dictionary
df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
print("\nOriginal DataFrame:")
print(df)

# Default behavior: Check if all values in each column are True
print("\nCheck all values in each column:")
print(df.all())  # Check along index (columns)

# Specify axis='columns' to check if all values in each row are True
print("\nCheck all values in each row:")
print(df.all(axis='columns'))  # Check along columns (rows)

# Check if all values in the entire DataFrame are True
print("\nCheck all values in the entire DataFrame:")
print(df.all(axis=None))  # Check all elements
```

**Output:**

```
Original DataFrame:
   col1   col2
0  True   True
1  True  False

Check all values in each column:
col1     True
col2    False
dtype: bool

Check all values in each row:
0     True
1    False
dtype: bool

Check all values in the entire DataFrame:
False
```

### Explanation of the Output

1. **Default Behavior**: When calling `df.all()` without parameters, it checks each column to see if all values in that column evaluate to `True`. Here, `col1` returns `True` because all values are `True`, while `col2` returns `False` because one value is `False`.

2. **Axis Specified**: When using `df.all(axis='columns')`, it checks each row independently. The first row returns `True` (both values are `True`), while the second row returns `False` (because of the `False` in `col2`).

3. **Checking Entire DataFrame**: Calling `df.all(axis=None)` checks all elements in the DataFrame. Since not all values are `True`, it returns `False`.

### Summary

The `DataFrame.all()` method is a powerful way to evaluate conditions across columns or rows in a DataFrame, making it useful for data validation, conditional checks, and any scenario where the truthiness of entire datasets is important.


Here’s a comprehensive explanation of **`pandas.DataFrame.all()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.all()`**

Returns `True` if **all elements** in a DataFrame or Series are `True` (or equivalent, like non-zero numbers or non-empty strings). Otherwise, it returns `False`. This function can operate row-wise, column-wise, or across the entire DataFrame.

---

### **Syntax**

```python
DataFrame.all(axis=0, bool_only=False, skipna=True, **kwargs)
```

---

### **Parameters**

| Parameter  | Description     |
| ---------- | --------------- |
| **`axis`** | Axis to reduce: |

- `0` or `'index'`: Check if all elements in each **column** are `True` (default).
- `1` or `'columns'`: Check if all elements in each **row** are `True`.
- `None`: Check if **all elements** in the entire DataFrame are `True`. |  
  | **`bool_only`** | If `True`, include only boolean columns. Default is `False`. |  
  | **`skipna`** | If `True`, exclude `NaN`/`None` values. If `False`, treat `NaN`/`None` as `True`. Default is `True`. |  
  | **`**kwargs`\*\* | Additional arguments for compatibility (no effect). |

---

### **Returns**

- **Series**: If `axis=0` or `axis=1`, returns a Series with boolean values.
- **Scalar**: If `axis=None`, returns a single boolean value.

---

### **Key Notes**

1. **Behavior with `NaN`/`None`**:

   - If `skipna=True` (default), `NaN`/`None` values are ignored.
   - If `skipna=False`, `NaN`/`None` values are treated as `True`.

2. **Behavior with Empty Data**:

   - For an empty Series or DataFrame, `all()` returns `True`.

3. **Common Use Cases**:
   - Check if all values in a column/row meet a condition.
   - Validate data integrity (e.g., ensure no `False` or zero values).

---

### **Examples**

#### 1. **Series Examples**

Check if all elements in a Series are `True`:

```python
s = pd.Series([True, True])
s.all()
```

Output:

```
True
```

Check with a `False` value:

```python
s = pd.Series([True, False])
s.all()
```

Output:

```
False
```

Check with an empty Series:

```python
s = pd.Series([], dtype="float64")
s.all()
```

Output:

```
True
```

Check with `NaN` values:

```python
s = pd.Series([np.nan])
s.all()  # skipna=True by default
```

Output:

```
True
```

```python
s.all(skipna=False)
```

Output:

```
True
```

---

#### 2. **DataFrame Examples**

Check if all elements in each column are `True`:

```python
df = pd.DataFrame({'col1': [True, True], 'col2': [True, False]})
df.all()
```

Output:

```
col1     True
col2    False
dtype: bool
```

Check if all elements in each row are `True`:

```python
df.all(axis='columns')
```

Output:

```
0     True
1    False
dtype: bool
```

Check if all elements in the entire DataFrame are `True`:

```python
df.all(axis=None)
```

Output:

```
False
```

---

#### 3. **Handling `NaN` Values**

Check with `NaN` values in a DataFrame:

```python
df = pd.DataFrame({'col1': [True, np.nan], 'col2': [True, False]})

# Default behavior (skipna=True)
df.all()
```

Output:

```
col1     True
col2    False
dtype: bool
```

```python
# Treat NaN as True (skipna=False)
df.all(skipna=False)
```

Output:

```
col1     True
col2    False
dtype: bool
```

---

#### 4. **Using `bool_only`**

Check only boolean columns:

```python
df = pd.DataFrame({'col1': [True, True], 'col2': [1, 0]})

# Include only boolean columns
df.all(bool_only=True)
```

Output:

```
col1    True
dtype: bool
```

---

### **Related Functions**

- **`DataFrame.any()`**: Returns `True` if **any** element is `True`.
- **`Series.all()`**: Similar functionality for Series.

---

### **When to Use `all()`**

- To validate if **all values** in a dataset meet a condition.
- To check for the absence of `False`, `0`, or empty values.
- For data integrity checks in preprocessing pipelines.

This method is simple yet powerful for boolean evaluations in pandas!


In [None]:
""" pandas.DataFrame.any
DataFrame.any(*, axis=0, bool_only=False, skipna=True, **kwargs)[source]
Return whether any element is True, potentially over an axis.

Returns False unless there is at least one element within a series or along a Dataframe axis that is True or equivalent (e.g. non-zero or non-empty).

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’, None}, default 0
Indicate which axis or axes should be reduced. For Series this parameter is unused and defaults to 0.

0 / ‘index’ : reduce the index, return a Series whose index is the original column labels.

1 / ‘columns’ : reduce the columns, return a Series whose index is the original index.

None : reduce all axes, return a scalar.

bool_only
bool, default False
Include only boolean columns. Not implemented for Series.

skipna
bool, default True
Exclude NA/null values. If the entire row/column is NA and skipna is True, then the result will be False, as for an empty row/column. If skipna is False, then NA are treated as True, because these are not equal to zero.

**kwargs
any, default None
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
Series or DataFrame
If level is specified, then, DataFrame is returned; otherwise, Series is returned.

See also

numpy.any
Numpy version of this method.

Series.any
Return whether any element is True.

Series.all
Return whether all elements are True.

DataFrame.any
Return whether any element is True over requested axis.

DataFrame.all
Return whether all elements are True over requested axis.

Examples

Series

For Series input, the output is a scalar indicating whether any element is True.

pd.Series([False, False]).any()
False
pd.Series([True, False]).any()
True
pd.Series([], dtype="float64").any()
False
pd.Series([np.nan]).any()
False
pd.Series([np.nan]).any(skipna=False)
True
DataFrame

Whether each column contains at least one True element (the default).

df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
df
   A  B  C
0  1  0  0
1  2  2  0
df.any()
A     True
B     True
C    False
dtype: bool
Aggregating over the columns.

df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
df
       A  B
0   True  1
1  False  2
df.any(axis='columns')
0    True
1    True
dtype: bool
df = pd.DataFrame({"A": [True, False], "B": [1, 0]})
df
       A  B
0   True  1
1  False  0
df.any(axis='columns')
0    True
1    False
dtype: bool
Aggregating over the entire DataFrame with axis=None.

df.any(axis=None)
True
any for an empty DataFrame is an empty Series.

pd.DataFrame([]).any()
Series([], dtype: bool) """

The `pandas.DataFrame.any()` method is a powerful tool used to determine whether any elements within a DataFrame (or Series) evaluate to `True`. This function is particularly useful for checking if at least one value meets a certain condition along specified axes.

### Overview of `DataFrame.any()`

### Syntax

```python
DataFrame.any(axis=0, bool_only=False, skipna=True, **kwargs)
```

### Parameters

- **axis**: `{0 or 'index', 1 or 'columns', None}`, default `0`

  - Defines the axis to evaluate:
    - `0` or `'index'`: Reduce across the index (check each column).
    - `1` or `'columns'`: Reduce across the columns (check each row).
    - `None`: Reduce over all axes, returning a scalar result.

- **bool_only**: `bool`, default `False`

  - When set to `True`, it only includes columns that are of boolean type. Not applicable for Series.

- **skipna**: `bool`, default `True`
  - If `True`, NA/null values are excluded from the evaluation. If the entire row or column is NA, the result will be `False`.
  - If `False`, NA values are treated as `True`, influencing the outcome.

### Returns

- **Series or scalar**
  - A Series is returned when checking one axis, or a scalar Boolean value when reducing across all axes.

### Related Functions

- **numpy.any**: Similar function in NumPy for evaluating boolean conditions.
- **Series.any**: For evaluating whether any element in a Series is `True`.
- **DataFrame.all**: Checks if all elements are `True` across specified axes.

### Examples

#### Example 1: Using `any()` with a Series

```python
import pandas as pd
import numpy as np

# Series examples
print(pd.Series([False, False]).any())           # False
print(pd.Series([True, False]).any())            # True
print(pd.Series([], dtype="float64").any())     # False (empty Series)
print(pd.Series([np.nan]).any())                 # False (NA ignored)
print(pd.Series([np.nan]).any(skipna=False))    # True (NA treated as True when skipna=False)
```

#### Example 2: Using `any()` with a DataFrame

```python
# Creating a DataFrame
df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
print("\nOriginal DataFrame:")
print(df)

# Check if any values in each column are True (non-zero)
print("\nCheck if any values in each column are True:")
print(df.any())  # Default behavior

# Aggregating over rows (checking if any value in each row is True)
print("\nCheck if any values in each row are True:")
print(df.any(axis='columns'))

# DataFrame with mixed boolean and numerical values
df_mixed = pd.DataFrame({"A": [True, False], "B": [1, 2]})
print("\nDataFrame with mixed types:")
print(df_mixed)

# Check if any value in each row is True
print("\nCheck if any values in each row are True:")
print(df_mixed.any(axis='columns'))

# Aggregating over the entire DataFrame with axis=None
print("\nCheck whether any value in the entire DataFrame is True:")
print(df.any(axis=None))
```

**Output:**

```
Original DataFrame:
   A  B  C
0  1  0  0
1  2  2  0

Check if any values in each column are True:
A     True
B     True
C    False
dtype: bool

Check if any values in each row are True:
0    True
1    True
dtype: bool

DataFrame with mixed types:
       A  B
0   True  1
1  False  2

Check if any values in each row are True:
0    True
1    True
dtype: bool

Check whether any value in the entire DataFrame is True:
True
```

### Explanation of the Output

1. **Column Checks**: The `df.any()` call evaluates each column for any non-zero values. In the first DataFrame, columns `A` and `B` return `True`, indicating they contain at least one non-zero value, while column `C` returns `False`.

2. **Row Checks**: When using `any(axis='columns')`, it evaluates whether each row contains at least one non-zero value, returning `True` for all rows since both rows have at least one non-zero entry.


The `pandas.DataFrame.any()` method is used to evaluate whether any elements within a DataFrame (or Series) are `True` along a specified axis. This method returns a boolean value indicating if at least one element meets the condition, and it provides options for handling missing values and choosing the axis over which to aggregate.

### Overview of `DataFrame.any()`

### Syntax

```python
DataFrame.any(*, axis=0, bool_only=False, skipna=True, **kwargs)
```

### Parameters

- **axis**: `{0 or 'index', 1 or 'columns', None}`, default `0`

  - Specifies which axis to reduce:
    - `0` or `'index'` - Aggregates across the index (checks each column).
    - `1` or `'columns'` - Aggregates across the columns (checks each row).
    - `None` - Reduces across all axes (checks the entire DataFrame).

- **bool_only**: `bool`, default `False`

  - If `True`, includes only boolean columns. This parameter is not applicable to Series.

- **skipna**: `bool`, default `True`

  - If `True`, NA/null values are excluded. If an entire row or column is NA and `skipna` is `True`, the result will be `False`.
  - If `False`, NAs are treated as `True`, meaning they will not affect the aggregate result as they are not zero or empty.

- **kwargs**: Additional keyword arguments that are not used but can be included for compatibility with NumPy.

### Returns

- **Series or DataFrame**:
  - A Series is returned if checking along one axis; if all axes are checked, a scalar is returned.

### Related Functions

- **numpy.any**: The NumPy version of this method.
- **Series.any()**: Returns `True` if any element in a Series is `True`.
- **Series.all()**: Returns `True` if all elements in a Series are `True`.

### Examples

#### Example 1: Using `any()` with a Series

```python
import pandas as pd
import numpy as np

# Series examples
print(pd.Series([False, False]).any())          # False
print(pd.Series([True, False]).any())           # True
print(pd.Series([], dtype="float64").any())    # False (empty Series)
print(pd.Series([np.nan]).any())                 # False (NaN ignored)
print(pd.Series([np.nan]).any(skipna=False))    # True (NaN is treated as True)
```

#### Example 2: Using `any()` with a DataFrame

```python
# Creating a DataFrame with values
df = pd.DataFrame({"A": [1, 2], "B": [0, 2], "C": [0, 0]})
print("\nOriginal DataFrame:")
print(df)

# Default behavior: Check if any value in each column is True (non-zero)
print("\nCheck any value in each column:")
print(df.any())  # Checks along index (columns)

# Specifying axis='columns' to check if any value in each row is True
df2 = pd.DataFrame({"A": [True, False], "B": [1, 2]})
print("\nDataFrame for row-wise check:")
print(df2)
print("\nCheck any value in each row:")
print(df2.any(axis='columns'))  # Checks along columns (rows)

# Another example with different values
df3 = pd.DataFrame({"A": [True, False], "B": [1, 0]})
print("\nDataFrame with mixed values:")
print(df3)
print("\nCheck any value in each row:")
print(df3.any(axis='columns'))  # Checks along columns (rows)

# Check if any value in the entire DataFrame is True
print("\nCheck any value in the entire DataFrame:")
print(df.any(axis=None))  # Checks all elements

# Check for an empty DataFrame
print("\nAny for an empty DataFrame:")
print(pd.DataFrame([]).any())  # Should return an empty Series
```

### Explanation of the Output

1. **Series Checks**: The output from the Series checks validates if any elements are `True`. For example, an empty Series returns `False`.

2. **Column Checks**: The DataFrame `df` checks if any values in each column are non-zero (which evaluates to `True`). Columns A and B return `True`, while column C returns `False` since all its values are zero.

3. **Row Checks**: For DataFrame `df2`, the method checks each row for any `True` values. The output indicates every row contains at least


Here’s a comprehensive explanation of **`pandas.DataFrame.any()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.any()`**

Returns `True` if **any element** in a DataFrame or Series is `True` (or equivalent, like non-zero numbers or non-empty strings). Otherwise, it returns `False`. This function can operate row-wise, column-wise, or across the entire DataFrame.

---

### **Syntax**

```python
DataFrame.any(*, axis=0, bool_only=False, skipna=True, **kwargs)
```

---

### **Parameters**

| Parameter  | Description     |
| ---------- | --------------- |
| **`axis`** | Axis to reduce: |

- `0` or `'index'`: Check if any element in each **column** is `True` (default).
- `1` or `'columns'`: Check if any element in each **row** is `True`.
- `None`: Check if **any element** in the entire DataFrame is `True`. |  
  | **`bool_only`** | If `True`, include only boolean columns. Default is `False`. |  
  | **`skipna`** | If `True`, exclude `NaN`/`None` values. If `False`, treat `NaN`/`None` as `True`. Default is `True`. |  
  | **`**kwargs`\*\* | Additional arguments for compatibility (no effect). |

---

### **Returns**

- **Series**: If `axis=0` or `axis=1`, returns a Series with boolean values.
- **Scalar**: If `axis=None`, returns a single boolean value.

---

### **Key Notes**

1. **Behavior with `NaN`/`None`**:

   - If `skipna=True` (default), `NaN`/`None` values are ignored.
   - If `skipna=False`, `NaN`/`None` values are treated as `True`.

2. **Behavior with Empty Data**:

   - For an empty Series or DataFrame, `any()` returns `False`.

3. **Common Use Cases**:
   - Check if **any value** in a column/row meets a condition.
   - Validate data integrity (e.g., ensure at least one `True` or non-zero value).

---

### **Examples**

#### 1. **Series Examples**

Check if any element in a Series is `True`:

```python
s = pd.Series([False, False])
s.any()
```

Output:

```
False
```

Check with a `True` value:

```python
s = pd.Series([True, False])
s.any()
```

Output:

```
True
```

Check with an empty Series:

```python
s = pd.Series([], dtype="float64")
s.any()
```

Output:

```
False
```

Check with `NaN` values:

```python
s = pd.Series([np.nan])
s.any()  # skipna=True by default
```

Output:

```
False
```

```python
s.any(skipna=False)
```

Output:

```
True
```

---

#### 2. **DataFrame Examples**

Check if any element in each column is `True`:

```python
df = pd.DataFrame({'A': [1, 0], 'B': [0, 0], 'C': [0, 2]})
df.any()
```

Output:

```
A     True
B    False
C     True
dtype: bool
```

Check if any element in each row is `True`:

```python
df.any(axis='columns')
```

Output:

```
0     True
1     True
dtype: bool
```

Check if any element in the entire DataFrame is `True`:

```python
df.any(axis=None)
```

Output:

```
True
```

---

#### 3. **Handling `NaN` Values**

Check with `NaN` values in a DataFrame:

```python
df = pd.DataFrame({'A': [True, np.nan], 'B': [False, False]})

# Default behavior (skipna=True)
df.any()
```

Output:

```
A     True
B    False
dtype: bool
```

```python
# Treat NaN as True (skipna=False)
df.any(skipna=False)
```

Output:

```
A     True
B    False
dtype: bool
```

---

#### 4. **Using `bool_only`**

Check only boolean columns:

```python
df = pd.DataFrame({'A': [True, False], 'B': [1, 0]})

# Include only boolean columns
df.any(bool_only=True)
```

Output:

```
A     True
dtype: bool
```

---

### **Related Functions**

- **`DataFrame.all()`**: Returns `True` if **all elements** are `True`.
- **`Series.any()`**: Similar functionality for Series.
- **`numpy.any()`**: NumPy version of this method.

---

### **When to Use `any()`**

- To validate if **any value** in a dataset meets a condition.
- To check for the presence of `True`, non-zero, or non-empty values.
- For data integrity checks in preprocessing pipelines.

This method is simple yet powerful for boolean evaluations in pandas!


In [None]:
""" pandas.DataFrame.clip
DataFrame.clip(lower=None, upper=None, *, axis=None, inplace=False, **kwargs)[source]
Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds can be singular values or array like, and in the latter case the clipping is performed element-wise in the specified axis.

Parameters
:
lower
float or array-like, default None
Minimum threshold value. All values below this threshold will be set to it. A missing threshold (e.g NA) will not clip the value.

upper
float or array-like, default None
Maximum threshold value. All values above this threshold will be set to it. A missing threshold (e.g NA) will not clip the value.

axis
{{0 or ‘index’, 1 or ‘columns’, None}}, default None
Align object with lower and upper along the given axis. For Series this parameter is unused and defaults to None.

inplace
bool, default False
Whether to perform the operation in place on the data.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with numpy.

Returns
:
Series or DataFrame or None
Same type as calling object with the values outside the clip boundaries replaced or None if inplace=True.

See also

Series.clip
Trim values at input threshold in series.

DataFrame.clip
Trim values at input threshold in dataframe.

numpy.clip
Clip (limit) the values in an array.

Examples

data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
df
   col_0  col_1
0      9     -2
1     -3     -7
2      0      6
3     -1      8
4      5     -5
Clips per column using lower and upper thresholds:

df.clip(-4, 6)
   col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4
Clips using specific lower and upper thresholds per column:

df.clip([-2, -1], [4, 5])
    col_0  col_1
0      4     -1
1     -2     -1
2      0      5
3     -1      5
4      4     -1
Clips using specific lower and upper thresholds per column element:

t = pd.Series([2, -4, -1, 6, 3])
t
0    2
1   -4
2   -1
3    6
4    3
dtype: int64
df.clip(t, t + 4, axis=0)
   col_0  col_1
0      6      2
1     -3     -4
2      0      3
3      6      8
4      5      3
Clips using specific lower threshold per column element, with missing values:

t = pd.Series([2, -4, np.nan, 6, 3])
t
0    2.0
1   -4.0
2    NaN
3    6.0
4    3.0
dtype: float64
df.clip(t, axis=0)
col_0  col_1
0      9      2
1     -3     -4
2      0      6
3      6      8
4      5      3 """

The `pandas.DataFrame.clip()` method is used to trim values in a DataFrame to specified thresholds. Any values below the `lower` threshold are set to that threshold, and any values above the `upper` threshold are set to that upper limit. This method can be particularly useful for data normalization and outlier removal.

### Overview of `DataFrame.clip()`

### Syntax

```python
DataFrame.clip(lower=None, upper=None, *, axis=None, inplace=False, **kwargs)
```

### Parameters

- **lower**: `float` or `array-like`, default `None`

  - The minimum threshold value. Values below this threshold will be set to the threshold. If not provided, there is no lower limit.

- **upper**: `float` or `array-like`, default `None`

  - The maximum threshold value. Values above this threshold will be set to the threshold. If not provided, there is no upper limit.

- **axis**: `{0 or 'index', 1 or 'columns', None}`, default `None`

  - Aligns the DataFrame with `lower` and `upper` along the specified axis for element-wise clipping.

- **inplace**: `bool`, default `False`

  - If `True`, modifies the DataFrame in place; if `False`, returns a new DataFrame with clipped values.

- **kwargs**: Additional keyword arguments that are ignored but may be accepted for compatibility with NumPy.

### Returns

- **Series or DataFrame or None**:
  - Returns a DataFrame with values clipped to the thresholds. If `inplace=True`, it returns `None`.

### Related Functions

- **Series.clip()**: Similar method for Series objects.
- **numpy.clip()**: For clipping values in NumPy arrays.

### Examples

#### Example 1: Basic Clipping

```python
import pandas as pd

# Create a DataFrame with sample data
data = {'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Clip values using a lower and upper threshold
clipped_df = df.clip(lower=-4, upper=6)
print("\nClipped DataFrame with thresholds (-4, 6):")
print(clipped_df)
```

**Output:**

```
Original DataFrame:
   col_0  col_1
0      9     -2
1     -3     -7
2      0      6
3     -1      8
4      5     -5

Clipped DataFrame with thresholds (-4, 6):
   col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4
```

#### Example 2: Specifying Thresholds Per Column

```python
# Clip using specific lower and upper thresholds for each column
clipped_df_columns = df.clip(lower=[-2, -1], upper=[4, 5])
print("\nClipped DataFrame with specific thresholds per column:")
print(clipped_df_columns)
```

**Output:**

```
Clipped DataFrame with specific thresholds per column:
   col_0  col_1
0      4     -1
1     -2     -1
2      0      5
3     -1      5
4      4     -1
```

#### Example 3: Clipping Using a Series

```python
# Clipping using a Series for thresholds
t = pd.Series([2, -4, -1, 6, 3])
print("\nSeries used for clipping:")
print(t)

clipped_with_series = df.clip(lower=t, upper=t + 4, axis=0)
print("\nClipped DataFrame using Series for thresholds:")
print(clipped_with_series)
```

**Output:**

```
Series used for clipping:
0    2
1   -4
2   -1
3    6
4    3
dtype: int64

Clipped DataFrame using Series for thresholds:
   col_0  col_1
0      6      2
1     -3     -4
2      0      3
3      6      8
4      5      3
```

#### Example 4: Clipping with Missing Values

```python
# Clipping with a Series containing NaN values
t_nan = pd.Series([2, -4, np.nan, 6, 3])
print("\nSeries with missing values:")
print(t_nan)

clipped_with_nan = df.clip(lower=t_nan, axis=0)
print("\nClipped DataFrame with Series containing NaN:")
print(clipped_with_nan)
```

**Output:**

```
Series with missing values:
0    2.0
1   -4.0
2    NaN
3    6.0
4    3.0
dtype: float64

Clipped DataFrame with Series containing NaN:
   col_0  col_1
0      9      2
1     -3     -4
2      0      6
3      6      8
4      5      3
```

### Summary

The `DataFrame.clip()` method provides an effective way to manage extreme values in your DataFrame by trimming them to specific ranges. This is useful for data preprocessing and ensuring that your dataset adheres to certain limits for analysis or modeling purposes.


Here’s a comprehensive explanation of **`pandas.DataFrame.clip()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.clip()`**

Trims values at specified thresholds. Values below the `lower` threshold are set to the `lower` value, and values above the `upper` threshold are set to the `upper` value. This function can be applied element-wise, column-wise, or row-wise.

---

### **Syntax**

```python
DataFrame.clip(lower=None, upper=None, axis=None, inplace=False, **kwargs)
```

---

### **Parameters**

| Parameter   | Description                                                                                       |
| ----------- | ------------------------------------------------------------------------------------------------- |
| **`lower`** | Minimum threshold value. All values below this are set to `lower`. Can be a scalar or array-like. |
| **`upper`** | Maximum threshold value. All values above this are set to `upper`. Can be a scalar or array-like. |
| **`axis`**  | Axis to align `lower` and `upper` thresholds:                                                     |

- `0` or `'index'`: Align with rows.
- `1` or `'columns'`: Align with columns.
- `None`: Align with both rows and columns (default). |  
  | **`inplace`** | If `True`, performs the operation in place and returns `None`. Default is `False`. |  
  | **`**kwargs`\*\* | Additional arguments for compatibility (no effect). |

---

### **Returns**

- **DataFrame/Series**: A new DataFrame or Series with clipped values.
- **None**: If `inplace=True`.

---

### **Key Notes**

1. **Thresholds**:

   - If `lower` or `upper` is `None`, no clipping is performed for that bound.
   - If `lower` or `upper` is array-like, it must match the shape of the DataFrame or Series along the specified axis.

2. **Behavior with `NaN`**:

   - If `lower` or `upper` contains `NaN`, those values are ignored (no clipping occurs for those elements).

3. **Common Use Cases**:
   - Limit values to a specific range (e.g., for normalization or outlier handling).
   - Apply element-wise or column-wise clipping.

---

### **Examples**

#### 1. **Basic Clipping**

Clip all values in a DataFrame to a range:

```python
df = pd.DataFrame({'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]})
df.clip(-4, 6)
```

Output:

```
   col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4
```

---

#### 2. **Column-Wise Clipping**

Clip each column with different thresholds:

```python
df.clip(lower=[-2, -1], upper=[4, 5])
```

Output:

```
   col_0  col_1
0      4     -1
1     -2     -1
2      0      5
3     -1      5
4      4     -1
```

---

#### 3. **Row-Wise Clipping**

Clip each row with different thresholds:

```python
t = pd.Series([2, -4, -1, 6, 3])
df.clip(lower=t, upper=t + 4, axis=0)
```

Output:

```
   col_0  col_1
0      6      2
1     -3     -4
2      0      3
3      6      8
4      5      3
```

---

#### 4. **Handling `NaN` in Thresholds**

Clip with `NaN` in thresholds (ignores `NaN` values):

```python
t = pd.Series([2, -4, np.nan, 6, 3])
df.clip(lower=t, axis=0)
```

Output:

```
   col_0  col_1
0      9      2
1     -3     -4
2      0      6
3      6      8
4      5      3
```

---

#### 5. **In-Place Clipping**

Clip values in place (modifies the original DataFrame):

```python
df.clip(-4, 6, inplace=True)
df
```

Output:

```
   col_0  col_1
0      6     -2
1     -3     -4
2      0      6
3     -1      6
4      5     -4
```

---

### **Related Functions**

- **`Series.clip()`**: Clip values in a Series.
- **`numpy.clip()`**: Clip values in a NumPy array.

---

### **When to Use `clip()`**

- To limit values to a specific range (e.g., for normalization).
- To handle outliers by capping extreme values.
- For element-wise or column-wise data manipulation.

This method is powerful for data preprocessing and cleaning in pandas!


In [None]:
""" 

pandas.DataFrame.corr

DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)

Compute pairwise correlation of columns, excluding NA/null values.

Parameters:
method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method of correlation:

pearson : standard correlation coefficient

kendall : Kendall Tau correlation coefficient

spearman : Spearman rank correlation

callable: callable with input two 1d ndarrays
and returning a float. Note that the returned matrix from corr will have 1 along the diagonals and will be symmetric regardless of the callable’s behavior.

min_periodsint, optional
Minimum number of observations required per pair of columns to have a valid result. Currently only available for Pearson and Spearman correlation.

numeric_onlybool, default False
Include only float, int or boolean data.

Added in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
DataFrame
Correlation matrix.

See also

DataFrame.corrwith
Compute pairwise correlation with another DataFrame or Series.

Series.corr
Compute the correlation between two Series.

Notes

Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

Pearson correlation coefficient

Kendall rank correlation coefficient

Spearman’s rank correlation coefficient

Examples

def histogram_intersection(a, b):
    v = np.minimum(a, b).sum().round(decimals=1)
    return v
df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
                  columns=['dogs', 'cats'])
df.corr(method=histogram_intersection)
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0
df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
                  columns=['dogs', 'cats'])
df.corr(min_periods=3)
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0 """

The `pandas.DataFrame.corr` method is used to compute pairwise correlation of columns in a DataFrame, excluding NA/null values. Below is a comprehensive overview of the syntax, parameters, and usage examples:

### Syntax

```python
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
```

### Parameters

1. **method** : `{'pearson', 'kendall', 'spearman'}` or callable, default 'pearson'

   - Method of correlation:
     - **pearson** : Standard correlation coefficient.
     - **kendall** : Kendall Tau correlation coefficient.
     - **spearman** : Spearman rank correlation.
     - **callable** : A callable function that takes two 1d ndarrays as input and returns a float. The returned matrix will have 1 along the diagonals and will be symmetric.

2. **min_periods** : int, optional

   - Minimum number of observations required per pair of columns to have a valid result. Currently, this is only available for Pearson and Spearman correlation.

3. **numeric_only** : bool, default False
   - If True, includes only float, int, or boolean data in the correlation computation. This parameter was added in version 1.5.0, and its default value changed to False in version 2.0.0.

### Returns

- **DataFrame** : Correlation matrix.

### See Also

- **DataFrame.corrwith** : Compute pairwise correlation with another DataFrame or Series.
- **Series.corr** : Compute the correlation between two Series.

### Notes

- Pearson, Kendall, and Spearman correlations are computed using pairwise complete observations.
- The Pearson correlation coefficient measures linear correlation.
- The Kendall rank correlation coefficient measures ordinal association.
- Spearman’s rank correlation coefficient assesses how well the relationship between two variables can be described using a monotonic function.

### Examples

#### Example 1: Using Pearson Correlation

```python
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [4, 3, 2, 1],
    'C': [1, 3, 2, 4]
})

# Compute Pearson correlation
corr_matrix = df.corr(method='pearson')
print(corr_matrix)
```

#### Example 2: Using Spearman Correlation

```python
# Compute Spearman correlation
corr_matrix = df.corr(method='spearman')
print(corr_matrix)
```

#### Example 3: Using a Custom Callable Function

```python
def histogram_intersection(a, b):
    v = np.minimum(a, b).sum().round(decimals=1)
    return v

df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
                  columns=['dogs', 'cats'])

# Compute correlation using custom function
corr_matrix = df.corr(method=histogram_intersection)
print(corr_matrix)
```

#### Example 4: Handling Missing Values with `min_periods`

```python
df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
                  columns=['dogs', 'cats'])

# Compute correlation with minimum periods
corr_matrix = df.corr(min_periods=3)
print(corr_matrix)
```

### Output Examples

#### Example 1 Output:

```python
     A    B    C
A  1.0 -1.0  0.8
B -1.0  1.0 -0.8
C  0.8 -0.8  1.0
```

#### Example 2 Output:

```python
     A    B    C
A  1.0 -1.0  0.8
B -1.0  1.0 -0.8
C  0.8 -0.8  1.0
```

#### Example 3 Output:

```python
      dogs  cats
dogs   1.0   0.3
cats   0.3   1.0
```

#### Example 4 Output:

```python
      dogs  cats
dogs   1.0   NaN
cats   NaN   1.0
```

### Additional Notes

- The correlation matrix will always have 1s on the diagonal, representing the correlation of each column with itself.
- If there are insufficient valid pairs of observations (less than `min_periods`), the result will be `NaN` for those pairs.
- The `numeric_only` parameter ensures that only numeric columns are included in the correlation computation, which can be useful when dealing with mixed data types.

This method is highly versatile and can be adapted to various correlation computation needs, whether using built-in methods or custom functions.


The `pandas.DataFrame.corr()` method is used to compute pairwise correlation between the columns of a DataFrame, excluding any NA/null values. This can be particularly useful for understanding relationships between different variables in your dataset.

### Overview of `DataFrame.corr()`

### Syntax

```python
DataFrame.corr(method='pearson', min_periods=1, numeric_only=False)
```

### Parameters

- **method**: `{‘pearson’, ‘kendall’, ‘spearman’}` or callable, default `'pearson'`

  - Specifies the method to use for correlation:
    - `'pearson'`: Standard correlation coefficient.
    - `'kendall'`: Kendall Tau correlation coefficient.
    - `'spearman'`: Spearman rank correlation.
    - `callable`: A user-defined function that takes two 1D arrays as input and returns a float.

- **min_periods**: `int`, optional, default `1`

  - Minimum number of observations required per pair of columns to have a valid result. This parameter is currently only applicable for Pearson and Spearman correlation.

- **numeric_only**: `bool`, default `False`
  - If `True`, only includes float, int, or boolean data in the correlation calculation.

### Returns

- **DataFrame**:
  - A correlation matrix showing the correlation coefficients between the columns.

### Related Functions

- **DataFrame.corrwith()**: Computes pairwise correlation with another DataFrame or Series.
- **Series.corr()**: Computes the correlation between two Series.

### Notes

- The Pearson, Kendall, and Spearman correlations are computed using pairwise complete observations, meaning that only the rows with non-null values in both columns are considered.

### Examples

#### Example 1: Basic Correlation Calculation

```python
import pandas as pd
import numpy as np

# Create a DataFrame with sample data
df = pd.DataFrame({
    'dogs': [0.2, 0.0, 0.6, 0.2],
    'cats': [0.3, 0.6, 0.0, 0.1]
})

# Calculate the Pearson correlation (default)
correlation_matrix = df.corr()
print("Correlation matrix (Pearson):")
print(correlation_matrix)
```

**Output:**

```
Correlation matrix (Pearson):
       dogs      cats
dogs    1.0  0.309999
cats    0.31  1.0
```

#### Example 2: Using Different Correlation Methods

```python
# Calculate the Kendall correlation
kendall_corr = df.corr(method='kendall')
print("\nCorrelation matrix (Kendall):")
print(kendall_corr)

# Calculate the Spearman correlation
spearman_corr = df.corr(method='spearman')
print("\nCorrelation matrix (Spearman):")
print(spearman_corr)
```

**Output:**

```
Correlation matrix (Kendall):
       dogs      cats
dogs    1.0  0.333333
cats    0.33  1.0

Correlation matrix (Spearman):
       dogs      cats
dogs    1.0  0.333333
cats    0.33  1.0
```

#### Example 3: Using a Callable for Custom Correlation

```python
def histogram_intersection(a, b):
    v = np.minimum(a, b).sum().round(decimals=1)
    return v

# Calculate correlation using a custom function
custom_corr = df.corr(method=histogram_intersection)
print("\nCorrelation matrix using custom method:")
print(custom_corr)
```

**Output:**

```
Correlation matrix using custom method:
       dogs  cats
dogs   1.0  0.3
cats   0.3  1.0
```

#### Example 4: Handling Missing Values with `min_periods`

```python
# Create a DataFrame with some missing values
df_nan = pd.DataFrame({
    'dogs': [1, 2, np.nan, 4],
    'cats': [1, np.nan, 3, 4]
})

# Calculate correlation with a minimum number of periods
nan_corr = df_nan.corr(min_periods=3)
print("\nCorrelation matrix with min_periods=3:")
print(nan_corr)
```

**Output:**

```
Correlation matrix with min_periods=3:
       dogs  cats
dogs   1.0   NaN
cats   NaN   1.0
```

### Summary

The `DataFrame.corr()` method is a powerful tool for assessing the relationships between different columns in a DataFrame. By offering multiple correlation methods and the ability to handle missing data, it provides flexibility for various analytical needs. Whether you're exploring data relationships or preparing for predictive modeling, understanding these correlations can be crucial.


In [None]:
""" pandas.DataFrame.corrwith
DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)[source]
Compute pairwise correlation.

Pairwise correlation is computed between rows or columns of DataFrame with rows or columns of Series or DataFrame. DataFrames are first aligned along both axes before computing the correlations.

Parameters:
otherDataFrame, Series
Object with which to compute correlations.

axis{0 or ‘index’, 1 or ‘columns’}, default 0
The axis to use. 0 or ‘index’ to compute row-wise, 1 or ‘columns’ for column-wise.

dropbool, default False
Drop missing indices from result.

method{‘pearson’, ‘kendall’, ‘spearman’} or callable
Method of correlation:

pearson : standard correlation coefficient

kendall : Kendall Tau correlation coefficient

spearman : Spearman rank correlation

callable: callable with input two 1d ndarrays
and returning a float.

numeric_onlybool, default False
Include only float, int or boolean data.

Added in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
Series
Pairwise correlations.

See also

DataFrame.corr
Compute pairwise correlation of columns.

Examples

index = ["a", "b", "c", "d", "e"]
columns = ["one", "two", "three", "four"]
df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)
df1.corrwith(df2)
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
df2.corrwith(df1, axis=1)
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64 """

The `pandas.DataFrame.corrwith()` method computes pairwise correlation between the rows or columns of a DataFrame with those of another DataFrame or Series. This method is especially useful for exploring relationships between datasets or verifying similar patterns among different groups of observations.

### Overview of `DataFrame.corrwith()`

### Syntax

```python
DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)
```

### Parameters

- **other**: `DataFrame`, `Series`

  - The object with which to compute correlations.

- **axis**: `{0 or 'index', 1 or 'columns'}`, default `0`

  - The axis to compute over. Use `0` or `'index'` for row-wise correlation, or `1` or `'columns'` for column-wise correlation.

- **drop**: `bool`, default `False`

  - Whether to drop indices that are missing in the result.

- **method**: `{‘pearson’, ‘kendall’, ‘spearman’}` or callable

  - Method of correlation to use:
    - `'pearson'`: Standard correlation coefficient.
    - `'kendall'`: Kendall Tau correlation coefficient.
    - `'spearman'`: Spearman rank correlation.
    - `callable`: A user-defined function that accepts two 1D arrays and returns a float.

- **numeric_only**: `bool`, default `False`
  - If `True`, only consider float, int, or boolean data when computing the correlations.

### Returns

- **Series**:
  - A Series containing the pairwise correlations.

### Related Functions

- **DataFrame.corr()**: Compute pairwise correlation of the entire DataFrame.

### Examples

#### Example 1: Basic Correlation Calculation

```python
import pandas as pd
import numpy as np

# Create two DataFrames with sample data
index = ["a", "b", "c", "d", "e"]
columns = ["one", "two", "three", "four"]
df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)

# Compute pairwise correlation (df1 against df2)
correlation_result = df1.corrwith(df2)
print("Pairwise correlation (df1 with df2):")
print(correlation_result)
```

**Output:**

```
Pairwise correlation (df1 with df2):
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
```

#### Example 2: Correlation with Row-wise Calculation

```python
# Compute pairwise correlation (df2 against df1, row-wise)
row_correlation = df2.corrwith(df1, axis=1)
print("\nRow-wise correlation (df2 with df1):")
print(row_correlation)
```

**Output:**

```
Row-wise correlation (df2 with df1):
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64
```

#### Example 3: Using Different Correlation Methods

```python
# Create another DataFrame with different data
df3 = pd.DataFrame({
    'one': [1, 2, 3],
    'two': [4, 5, 6],
    'three': [7, 8, 9]
}, index=["a", "b", "c"])

# Calculate correlation using the Kendall method
kendall_correlation = df3.corrwith(df1)
print("\nKendall correlation (df3 with df1):")
print(kendall_correlation)

# Calculate correlation using the Spearman method
spearman_correlation = df3.corrwith(df1, method='spearman')
print("\nSpearman correlation (df3 with df1):")
print(spearman_correlation)
```

**Output:**

```
Kendall correlation (df3 with df1):
one      1.0
two      1.0
three    1.0
dtype: float64

Spearman correlation (df3 with df1):
one      1.0
two      1.0
three    1.0
dtype: float64
```

### Summary

The `DataFrame.corrwith()` method is a straightforward way to compute the correlation between two datasets, allowing for either row-wise or column-wise analysis. It supports multiple correlation methods, giving flexibility to analyze relationships based on different statistical approaches. This can be especially useful in exploratory data analysis or when comparing different datasets for similarities.


The `pandas.DataFrame.corrwith` method is used to compute pairwise correlations between rows or columns of a DataFrame and another DataFrame or Series. Below is a detailed explanation of the syntax, parameters, and usage examples:

---

### Syntax

```python
DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)
```

---

### Parameters

1. **other** : DataFrame or Series

   - The object (DataFrame or Series) with which to compute correlations.

2. **axis** : `{0 or 'index', 1 or 'columns'}`, default `0`

   - The axis to use for computation:
     - `0` or `'index'` : Compute correlation row-wise (between rows of the DataFrame and rows of `other`).
     - `1` or `'columns'` : Compute correlation column-wise (between columns of the DataFrame and columns of `other`).

3. **drop** : bool, default `False`

   - If `True`, drop missing indices from the result. If `False`, include missing indices with `NaN` values.

4. **method** : `{'pearson', 'kendall', 'spearman'}` or callable, default `'pearson'`

   - Method of correlation:
     - **pearson** : Standard correlation coefficient.
     - **kendall** : Kendall Tau correlation coefficient.
     - **spearman** : Spearman rank correlation.
     - **callable** : A custom function that takes two 1d ndarrays as input and returns a float.

5. **numeric_only** : bool, default `False`
   - If `True`, include only float, int, or boolean data in the computation. This parameter was added in version 1.5.0, and its default value changed to `False` in version 2.0.0.

---

### Returns

- **Series** : Pairwise correlations between the DataFrame and `other`.

---

### See Also

- **DataFrame.corr** : Compute pairwise correlation of columns within a DataFrame.
- **Series.corr** : Compute the correlation between two Series.

---

### Notes

- The DataFrames are aligned along both axes before computing correlations.
- Missing values (`NaN`) are excluded from the computation.
- If `drop=False`, missing indices in the result will be filled with `NaN`.

---

### Examples

#### Example 1: Column-wise Correlation Between Two DataFrames

```python
import pandas as pd
import numpy as np

# Create two DataFrames
index = ["a", "b", "c", "d", "e"]
columns = ["one", "two", "three", "four"]
df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)

# Compute column-wise correlation
corr_series = df1.corrwith(df2)
print(corr_series)
```

**Output:**

```
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
```

---

#### Example 2: Row-wise Correlation Between Two DataFrames

```python
# Compute row-wise correlation
corr_series = df2.corrwith(df1, axis=1)
print(corr_series)
```

**Output:**

```
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64
```

---

#### Example 3: Correlation with a Series

```python
# Create a Series
series = pd.Series([1, 2, 3, 4], index=columns)

# Compute correlation with the Series
corr_series = df1.corrwith(series)
print(corr_series)
```

**Output:**

```
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
```

---

#### Example 4: Using a Custom Correlation Method

```python
# Define a custom correlation function
def custom_corr(a, b):
    return np.sum(a * b) / (np.sqrt(np.sum(a**2)) * np.sqrt(np.sum(b**2)))

# Compute correlation using the custom function
corr_series = df1.corrwith(df2, method=custom_corr)
print(corr_series)
```

---

#### Example 5: Handling Missing Values with `drop`

```python
# Introduce missing values
df2.loc['a', 'one'] = np.nan

# Compute correlation with missing values
corr_series = df1.corrwith(df2, drop=False)
print(corr_series)
```

**Output:**

```
one      NaN
two      1.0
three    1.0
four     1.0
dtype: float64
```

---

### Key Points

- `corrwith` is useful for computing correlations between two DataFrames or a DataFrame and a Series.
- The `axis` parameter determines whether correlations are computed row-wise or column-wise.
- Missing values are automatically excluded from the computation, but you can control whether to drop or keep missing indices in the result using the `drop` parameter.
- Custom correlation methods can be used by passing a callable function to the `method` parameter.

This method is particularly useful when comparing the relationships between two datasets or aligning correlations along specific axes.


In [None]:
""" pandas.DataFrame.count
DataFrame.count(axis=0, numeric_only=False)[source]
Count non-NA cells for each column or row.

The values None, NaN, NaT, pandas.NA are considered NA.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
If 0 or ‘index’ counts are generated for each column. If 1 or ‘columns’ counts are generated for each row.

numeric_only
bool, default False
Include only float, int or boolean data.

Returns
:
Series
For each column/row the number of non-NA/null entries.

See also

Series.count
Number of non-NA elements in a Series.

DataFrame.value_counts
Count unique combinations of columns.

DataFrame.shape
Number of DataFrame rows and columns (including NA elements).

DataFrame.isna
Boolean same-sized DataFrame showing places of NA elements.

Examples

Constructing DataFrame from a dictionary:

df = pd.DataFrame({"Person":
                   ["John", "Myla", "Lewis", "John", "Myla"],
                   "Age": [24., np.nan, 21., 33, 26],
                   "Single": [False, True, True, True, False]})
df
   Person   Age  Single
0    John  24.0   False
1    Myla   NaN    True
2   Lewis  21.0    True
3    John  33.0    True
4    Myla  26.0   False
Notice the uncounted NA values:

df.count()
Person    5
Age       4
Single    5
dtype: int64
Counts for each row:

df.count(axis='columns')
0    3
1    2
2    3
3    3
4    3
dtype: int64 """

The `pandas.DataFrame.corrwith` method is used to compute pairwise correlations between rows or columns of a DataFrame and another DataFrame or Series. Below is a detailed explanation of the syntax, parameters, and usage examples:

---

### Syntax

```python
DataFrame.corrwith(other, axis=0, drop=False, method='pearson', numeric_only=False)
```

---

### Parameters

1. **other** : DataFrame or Series

   - The object (DataFrame or Series) with which to compute correlations.

2. **axis** : `{0 or 'index', 1 or 'columns'}`, default `0`

   - The axis to use for computation:
     - `0` or `'index'` : Compute correlation row-wise (between rows of the DataFrame and rows of `other`).
     - `1` or `'columns'` : Compute correlation column-wise (between columns of the DataFrame and columns of `other`).

3. **drop** : bool, default `False`

   - If `True`, drop missing indices from the result. If `False`, include missing indices with `NaN` values.

4. **method** : `{'pearson', 'kendall', 'spearman'}` or callable, default `'pearson'`

   - Method of correlation:
     - **pearson** : Standard correlation coefficient.
     - **kendall** : Kendall Tau correlation coefficient.
     - **spearman** : Spearman rank correlation.
     - **callable** : A custom function that takes two 1d ndarrays as input and returns a float.

5. **numeric_only** : bool, default `False`
   - If `True`, include only float, int, or boolean data in the computation. This parameter was added in version 1.5.0, and its default value changed to `False` in version 2.0.0.

---

### Returns

- **Series** : Pairwise correlations between the DataFrame and `other`.

---

### See Also

- **DataFrame.corr** : Compute pairwise correlation of columns within a DataFrame.
- **Series.corr** : Compute the correlation between two Series.

---

### Notes

- The DataFrames are aligned along both axes before computing correlations.
- Missing values (`NaN`) are excluded from the computation.
- If `drop=False`, missing indices in the result will be filled with `NaN`.

---

### Examples

#### Example 1: Column-wise Correlation Between Two DataFrames

```python
import pandas as pd
import numpy as np

# Create two DataFrames
index = ["a", "b", "c", "d", "e"]
columns = ["one", "two", "three", "four"]
df1 = pd.DataFrame(np.arange(20).reshape(5, 4), index=index, columns=columns)
df2 = pd.DataFrame(np.arange(16).reshape(4, 4), index=index[:4], columns=columns)

# Compute column-wise correlation
corr_series = df1.corrwith(df2)
print(corr_series)
```

**Output:**

```
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
```

---

#### Example 2: Row-wise Correlation Between Two DataFrames

```python
# Compute row-wise correlation
corr_series = df2.corrwith(df1, axis=1)
print(corr_series)
```

**Output:**

```
a    1.0
b    1.0
c    1.0
d    1.0
e    NaN
dtype: float64
```

---

#### Example 3: Correlation with a Series

```python
# Create a Series
series = pd.Series([1, 2, 3, 4], index=columns)

# Compute correlation with the Series
corr_series = df1.corrwith(series)
print(corr_series)
```

**Output:**

```
one      1.0
two      1.0
three    1.0
four     1.0
dtype: float64
```

---

#### Example 4: Using a Custom Correlation Method

```python
# Define a custom correlation function
def custom_corr(a, b):
    return np.sum(a * b) / (np.sqrt(np.sum(a**2)) * np.sqrt(np.sum(b**2)))

# Compute correlation using the custom function
corr_series = df1.corrwith(df2, method=custom_corr)
print(corr_series)
```

---

#### Example 5: Handling Missing Values with `drop`

```python
# Introduce missing values
df2.loc['a', 'one'] = np.nan

# Compute correlation with missing values
corr_series = df1.corrwith(df2, drop=False)
print(corr_series)
```

**Output:**

```
one      NaN
two      1.0
three    1.0
four     1.0
dtype: float64
```

---

### Key Points

- `corrwith` is useful for computing correlations between two DataFrames or a DataFrame and a Series.
- The `axis` parameter determines whether correlations are computed row-wise or column-wise.
- Missing values are automatically excluded from the computation, but you can control whether to drop or keep missing indices in the result using the `drop` parameter.
- Custom correlation methods can be used by passing a callable function to the `method` parameter.

This method is particularly useful when comparing the relationships between two datasets or aligning correlations along specific axes.


Here’s a comprehensive explanation of **`pandas.DataFrame.count()`**, covering its syntax, parameters, behavior, and examples:

---

### **`pandas.DataFrame.count()`**

Counts the number of **non-NA/null** values for each column or row in a DataFrame. This function is useful for understanding the completeness of your data.

---

### **Syntax**

```python
DataFrame.count(axis=0, numeric_only=False)
```

---

### **Parameters**

| Parameter  | Description          |
| ---------- | -------------------- |
| **`axis`** | Axis to count along: |

- `0` or `'index'`: Count non-NA values for each **column** (default).
- `1` or `'columns'`: Count non-NA values for each **row**. |  
  | **`numeric_only`** | If `True`, include only numeric columns (float, int, or boolean). Default is `False`. |

---

### **Returns**

- **Series**: A Series containing the count of non-NA values for each column or row.

---

### **Key Notes**

1. **NA Values**:

   - The following are considered NA: `None`, `NaN`, `NaT`, and `pandas.NA`.
   - These values are excluded from the count.

2. **Behavior with Empty Data**:

   - If a column or row contains only NA values, the count will be `0`.

3. **Common Use Cases**:
   - Check for missing data in columns or rows.
   - Assess data completeness before analysis.

---

### **Examples**

#### 1. **Count Non-NA Values in Columns**

Count non-NA values for each column:

```python
df = pd.DataFrame({
    "Person": ["John", "Myla", "Lewis", "John", "Myla"],
    "Age": [24., np.nan, 21., 33, 26],
    "Single": [False, True, True, True, False]
})

df.count()
```

Output:

```
Person    5
Age       4
Single    5
dtype: int64
```

---

#### 2. **Count Non-NA Values in Rows**

Count non-NA values for each row:

```python
df.count(axis='columns')
```

Output:

```
0    3
1    2
2    3
3    3
4    3
dtype: int64
```

---

#### 3. **Count Only Numeric Columns**

Count non-NA values for numeric columns only:

```python
df.count(numeric_only=True)
```

Output:

```
Age       4
Single    5
dtype: int64
```

---

#### 4. **Handling All-NA Columns**

Count non-NA values in a DataFrame with an all-NA column:

```python
df['New_Column'] = np.nan
df.count()
```

Output:

```
Person        5
Age           4
Single        5
New_Column    0
dtype: int64
```

---

### **Related Functions**

- **`Series.count()`**: Count non-NA values in a Series.
- **`DataFrame.value_counts()`**: Count unique combinations of columns.
- **`DataFrame.shape`**: Get the number of rows and columns (including NA values).
- **`DataFrame.isna()`**: Identify NA values in a DataFrame.

---

### **When to Use `count()`**

- To assess the completeness of your data.
- To identify columns or rows with missing values.
- For data cleaning and preprocessing.

This method is essential for understanding the structure and quality of your data in pandas!


In [None]:
""" pandas.DataFrame.cov
DataFrame.cov(min_periods=None, ddof=1, numeric_only=False)[source]
Compute pairwise covariance of columns, excluding NA/null values.

Compute the pairwise covariance among the series of a DataFrame. The returned data frame is the covariance matrix of the columns of the DataFrame.

Both NA and null values are automatically excluded from the calculation. (See the note below about bias from missing values.) A threshold can be set for the minimum number of observations for each value created. Comparisons with observations below this threshold will be returned as NaN.

This method is generally used for the analysis of time series data to understand the relationship between different measures across time.

Parameters:
min_periodsint, optional
Minimum number of observations required per pair of columns to have a valid result.

ddofint, default 1
Delta degrees of freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. This argument is applicable only when no nan is in the dataframe.

numeric_onlybool, default False
Include only float, int or boolean data.

Added in version 1.5.0.

Changed in version 2.0.0: The default value of numeric_only is now False.

Returns:
DataFrame
The covariance matrix of the series of the DataFrame.

See also

Series.cov
Compute covariance with another Series.

core.window.ewm.ExponentialMovingWindow.cov
Exponential weighted sample covariance.

core.window.expanding.Expanding.cov
Expanding sample covariance.

core.window.rolling.Rolling.cov
Rolling sample covariance.

Notes

Returns the covariance matrix of the DataFrame’s time series. The covariance is normalized by N-ddof.

For DataFrames that have Series that are missing data (assuming that data is missing at random) the returned covariance matrix will be an unbiased estimate of the variance and covariance between the member Series.

However, for many applications this estimate may not be acceptable because the estimate covariance matrix is not guaranteed to be positive semi-definite. This could lead to estimate correlations having absolute values which are greater than one, and/or a non-invertible covariance matrix. See Estimation of covariance matrices for more details.

Examples

df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
                  columns=['dogs', 'cats'])
df.cov()
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
np.random.seed(42)
df = pd.DataFrame(np.random.randn(1000, 5),
                  columns=['a', 'b', 'c', 'd', 'e'])
df.cov()
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795
Minimum number of periods

This method also supports an optional min_periods keyword that specifies the required minimum number of non-NA observations for each column pair in order to have a valid result:

np.random.seed(42)
df = pd.DataFrame(np.random.randn(20, 3),
                  columns=['a', 'b', 'c'])
df.loc[df.index[:5], 'a'] = np.nan
df.loc[df.index[5:10], 'b'] = np.nan
df.cov(min_periods=12)
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202 """

The `pandas.DataFrame.cov()` method computes the pairwise covariance of the columns in a DataFrame, excluding any NA (Not Available) or null values. This method is particularly useful in statistical analysis to understand the relationships between different variables.

### Overview of `DataFrame.cov()`

### Syntax

```python
DataFrame.cov(min_periods=None, ddof=1, numeric_only=False)
```

### Parameters

- **min_periods**: `int`, optional

  - Minimum number of observations required per pair of columns to have a valid result. If the number of non-NA observations is less than this, the result will be `NaN`.

- **ddof**: `int`, default `1`

  - Delta degrees of freedom. The divisor used in calculations is \(N - \text{ddof}\), where \(N\) is the number of non-NA elements. This parameter is applicable only when there are no NA values in the DataFrame.

- **numeric_only**: `bool`, default `False`
  - If `True`, only considers float, int, or boolean data for the calculation.

### Returns

- **DataFrame**:
  - A covariance matrix representing the covariance between the columns of the DataFrame.

### Related Functions

- **Series.cov()**: Computes covariance with another Series.
- **core.window.ewm.ExponentialMovingWindow.cov**: Computes exponential weighted sample covariance.
- **core.window.expanding.Expanding.cov**: Computes expanding sample covariance.
- **core.window.rolling.Rolling.cov**: Computes rolling sample covariance.

### Notes

- The covariance matrix provides insights into how two variables change together. A positive covariance indicates that as one variable increases, the other tends to increase, while a negative covariance indicates the opposite.
- The covariance matrix may not be guaranteed to be positive semi-definite, which can lead to issues in certain applications.

### Examples

#### Example 1: Basic Covariance Calculation

```python
import pandas as pd

# Create a DataFrame
df = pd.DataFrame([(1, 2), (0, 3), (2, 0), (1, 1)],
                  columns=['dogs', 'cats'])

# Compute the covariance matrix
cov_matrix = df.cov()
print("Covariance matrix:")
print(cov_matrix)
```

**Output:**

```
Covariance matrix:
          dogs      cats
dogs  0.666667 -1.000000
cats -1.000000  1.666667
```

#### Example 2: Covariance with Random Data

```python
import numpy as np

# Set random seed for reproducibility
np.random.seed(42)

# Create a DataFrame with random numbers
df_random = pd.DataFrame(np.random.randn(1000, 5),
                         columns=['a', 'b', 'c', 'd', 'e'])

# Compute the covariance matrix
cov_random = df_random.cov()
print("\nCovariance matrix of random data:")
print(cov_random)
```

**Output:**

```
Covariance matrix of random data:
          a         b         c         d         e
a  0.998438 -0.020161  0.059277 -0.008943  0.014144
b -0.020161  1.059352 -0.008543 -0.024738  0.009826
c  0.059277 -0.008543  1.010670 -0.001486 -0.000271
d -0.008943 -0.024738 -0.001486  0.921297 -0.013692
e  0.014144  0.009826 -0.000271 -0.013692  0.977795
```

#### Example 3: Covariance with Minimum Periods

```python
# Create a DataFrame with missing values
np.random.seed(42)
df_nan = pd.DataFrame(np.random.randn(20, 3),
                      columns=['a', 'b', 'c'])

# Introduce NaN values
df_nan.loc[df_nan.index[:5], 'a'] = np.nan
df_nan.loc[df_nan.index[5:10], 'b'] = np.nan

# Compute the covariance matrix with a minimum number of periods
cov_min_periods = df_nan.cov(min_periods=12)
print("\nCovariance matrix with min_periods=12:")
print(cov_min_periods)
```

**Output:**

```
Covariance matrix with min_periods=12:
          a         b         c
a  0.316741       NaN -0.150812
b       NaN  1.248003  0.191417
c -0.150812  0.191417  0.895202
```

### Summary

The `DataFrame.cov()` method is a powerful tool for calculating the covariance matrix of a DataFrame, allowing analysts to understand the relationships between different variables. By excluding NA values and providing options for minimum observation thresholds, it offers flexibility for various statistical analyses, particularly in time series data.


The `pandas.DataFrame.count` method is used to count the number of non-NA (non-missing) cells for each column or row in a DataFrame. Below is a detailed explanation of the syntax, parameters, and usage examples:

---

### Syntax

```python
DataFrame.count(axis=0, numeric_only=False)
```

---

### Parameters

1. **axis** : `{0 or 'index', 1 or 'columns'}`, default `0`

   - The axis along which to count non-NA cells:
     - `0` or `'index'` : Count non-NA cells for each **column**.
     - `1` or `'columns'` : Count non-NA cells for each **row**.

2. **numeric_only** : bool, default `False`
   - If `True`, include only float, int, or boolean data in the count. If `False`, include all data types.

---

### Returns

- **Series** : A Series containing the count of non-NA/null entries for each column or row.

---

### See Also

- **Series.count** : Number of non-NA elements in a Series.
- **DataFrame.value_counts** : Count unique combinations of columns.
- **DataFrame.shape** : Number of DataFrame rows and columns (including NA elements).
- **DataFrame.isna** : Boolean DataFrame showing the locations of NA elements.

---

### Notes

- The following values are considered NA (missing):
  - `None`
  - `NaN` (Not a Number)
  - `NaT` (Not a Time)
  - `pandas.NA` (pandas' missing value indicator).
- If `numeric_only=True`, only numeric columns (float, int, boolean) are included in the count.

---

### Examples

#### Example 1: Counting Non-NA Cells in Columns

```python
import pandas as pd
import numpy as np

# Create a DataFrame
df = pd.DataFrame({
    "Person": ["John", "Myla", "Lewis", "John", "Myla"],
    "Age": [24., np.nan, 21., 33, 26],
    "Single": [False, True, True, True, False]
})

# Count non-NA cells for each column
count_series = df.count()
print(count_series)
```

**Output:**

```
Person    5
Age       4
Single    5
dtype: int64
```

---

#### Example 2: Counting Non-NA Cells in Rows

```python
# Count non-NA cells for each row
count_series = df.count(axis='columns')
print(count_series)
```

**Output:**

```
0    3
1    2
2    3
3    3
4    3
dtype: int64
```

---

#### Example 3: Counting Only Numeric Columns

```python
# Count non-NA cells for numeric columns only
count_series = df.count(numeric_only=True)
print(count_series)
```

**Output:**

```
Age       4
Single    5
dtype: int64
```

---

#### Example 4: Handling Missing Values

```python
# Introduce more missing values
df.loc[2, 'Person'] = np.nan
df.loc[3, 'Single'] = np.nan

# Count non-NA cells for each column
count_series = df.count()
print(count_series)
```

**Output:**

```
Person    4
Age       4
Single    4
dtype: int64
```

---

#### Example 5: Counting Non-NA Cells in Rows with Missing Values

```python
# Count non-NA cells for each row
count_series = df.count(axis=1)
print(count_series)
```

**Output:**

```
0    3
1    2
2    2
3    2
4    3
dtype: int64
```

---

### Key Points

- The `count` method is useful for identifying the number of valid (non-missing) entries in a DataFrame.
- By default, it counts non-NA cells for each column (`axis=0`). Use `axis=1` to count non-NA cells for each row.
- The `numeric_only` parameter allows you to restrict the count to numeric columns (float, int, boolean).
- Missing values (`None`, `NaN`, `NaT`, `pandas.NA`) are excluded from the count.

This method is particularly helpful for data cleaning and understanding the completeness of your dataset.


In [None]:
""" pandas.DataFrame.cummax
DataFrame.cummax(axis=None, skipna=True, *args, **kwargs)[source]
Return cumulative maximum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative maximum.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

skipna
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
Series or DataFrame
Return cumulative maximum of Series or DataFrame.

See also

core.window.expanding.Expanding.max
Similar functionality but ignores NaN values.

DataFrame.max
Return the maximum over DataFrame axis.

DataFrame.cummax
Return cumulative maximum over DataFrame axis.

DataFrame.cummin
Return cumulative minimum over DataFrame axis.

DataFrame.cumsum
Return cumulative sum over DataFrame axis.

DataFrame.cumprod
Return cumulative product over DataFrame axis.

Examples

Series

s = pd.Series([2, np.nan, 5, -1, 0])
s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64
By default, NA values are ignored.

s.cummax()
0    2.0
1    NaN
2    5.0
3    5.0
4    5.0
dtype: float64
To include NA values in the operation, use skipna=False

s.cummax(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
DataFrame

df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
By default, iterates over rows and finds the maximum in each column. This is equivalent to axis=None or axis='index'.

df.cummax()
     A    B
0  2.0  1.0
1  3.0  NaN
2  3.0  1.0
To iterate over columns and find the maximum in each row, use axis=1

df.cummax(axis=1)
     A    B
0  2.0  2.0
1  3.0  NaN
2  1.0  1.0  """

The `pandas.DataFrame.cummax()` method calculates the cumulative maximum value along a specified axis of a DataFrame or Series. This function is useful for analyzing time series data or any sequential data where you want to track the highest value seen so far.

### Overview of `DataFrame.cummax()`

### Syntax

```python
DataFrame.cummax(axis=None, skipna=True, *args, **kwargs)
```

### Parameters

- **axis**: `{0 or 'index', 1 or 'columns'}`, default `0`

  - Determines the axis for the operation:
    - `0` or `'index'`: Cumulative maximum is computed column-wise.
    - `1` or `'columns'`: Cumulative maximum is computed row-wise.

- **skipna**: `bool`, default `True`
  - Excludes NA/null values from the calculation. If an entire row or column is NA, the result will be NA.
- **args, kwargs**:
  - Additional arguments are accepted for compatibility with NumPy but have no effect on this method.

### Returns

- **Series or DataFrame**:
  - Returns a Series or DataFrame containing the cumulative maximum values along the specified axis.

### Related Functions

- **DataFrame.max()**: Returns the maximum value over the specified DataFrame axis.
- **DataFrame.cummin()**: Returns cumulative minimum over DataFrame axis.
- **DataFrame.cumsum()**: Returns cumulative sum over DataFrame.
- **DataFrame.cumprod()**: Returns cumulative product over DataFrame.

### Examples

#### Example 1: Cumulative Maximum on a Series

```python
import pandas as pd
import numpy as np

# Create a Series
s = pd.Series([2, np.nan, 5, -1, 0])
print("Series:")
print(s)

# Compute cumulative maximum
cummax_series = s.cummax()
print("\nCumulative maximum:")
print(cummax_series)
```

**Output:**

```
Series:
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

Cumulative maximum:
0    2.0
1    2.0
2    5.0
3    5.0
4    5.0
dtype: float64
```

#### Example 2: Including NA Values in Cumulative Maximum

```python
# Cumulative maximum with skipna=False
cummax_series_na = s.cummax(skipna=False)
print("\nCumulative maximum including NA values:")
print(cummax_series_na)
```

**Output:**

```
Cumulative maximum including NA values:
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

#### Example 3: Cumulative Maximum on a DataFrame

```python
# Create a DataFrame
df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
print("\nDataFrame:")
print(df)

# Cumulative maximum column-wise (default behavior)
cummax_df = df.cummax()
print("\nCumulative maximum (column-wise):")
print(cummax_df)
```

**Output:**

```
Cumulative maximum (column-wise):
     A    B
0  2.0  1.0
1  3.0  1.0
2  3.0  1.0
```

#### Example 4: Cumulative Maximum Row-wise

```python
# Cumulative maximum row-wise
cummax_df_row = df.cummax(axis=1)
print("\nCumulative maximum (row-wise):")
print(cummax_df_row)
```

**Output:**

```
Cumulative maximum (row-wise):
     A    B
0  2.0  2.0
1  3.0  3.0
2  1.0  1.0
```

### Summary

The `DataFrame.cummax()` method is a powerful tool in pandas for calculating the cumulative maximum values across a DataFrame or Series. It's particularly useful when tracking the highest values seen up to each point in time or observations. By allowing the option to skip NA values, it provides flexibility for various data types and scenarios.


In [None]:
""" pandas.DataFrame.cummin
DataFrame.cummin(axis=None, skipna=True, *args, **kwargs)[source]
Return cumulative minimum over a DataFrame or Series axis.

Returns a DataFrame or Series of the same size containing the cumulative minimum.

Parameters
:
axis
{0 or ‘index’, 1 or ‘columns’}, default 0
The index or the name of the axis. 0 is equivalent to None or ‘index’. For Series this parameter is unused and defaults to 0.

skipna
bool, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA.

*args, **kwargs
Additional keywords have no effect but might be accepted for compatibility with NumPy.

Returns
:
Series or DataFrame
Return cumulative minimum of Series or DataFrame.

See also

core.window.expanding.Expanding.min
Similar functionality but ignores NaN values.

DataFrame.min
Return the minimum over DataFrame axis.

DataFrame.cummax
Return cumulative maximum over DataFrame axis.

DataFrame.cummin
Return cumulative minimum over DataFrame axis.

DataFrame.cumsum
Return cumulative sum over DataFrame axis.

DataFrame.cumprod
Return cumulative product over DataFrame axis.

Examples

Series

s = pd.Series([2, np.nan, 5, -1, 0])
s
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64
By default, NA values are ignored.

s.cummin()
0    2.0
1    NaN
2    2.0
3   -1.0
4   -1.0
dtype: float64
To include NA values in the operation, use skipna=False

s.cummin(skipna=False)
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
DataFrame

df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
df
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0
By default, iterates over rows and finds the minimum in each column. This is equivalent to axis=None or axis='index'.

df.cummin()
     A    B
0  2.0  1.0
1  2.0  NaN
2  1.0  0.0
To iterate over columns and find the minimum in each row, use axis=1

df.cummin(axis=1)
     A    B
0  2.0  1.0
1  3.0  NaN
2  1.0  0.0 """

The `pandas.DataFrame.cummin()` method calculates the cumulative minimum value along a specified axis of a DataFrame or Series. This function is useful in scenarios where you need to track the lowest value encountered up to each point in your data.

### Overview of `DataFrame.cummin()`

### Syntax

```python
DataFrame.cummin(axis=None, skipna=True, *args, **kwargs)
```

### Parameters

- **axis**: `{0 or 'index', 1 or 'columns'}`, default `0`

  - Defines the axis for the operation:
    - `0` or `'index'`: Cumulative minimum is computed column-wise (along each column).
    - `1` or `'columns'`: Cumulative minimum is computed row-wise (along each row).

- **skipna**: `bool`, default `True`

  - Excludes NA/null values from the calculation. If an entire row or column is NA, the result will be NA.

- **args, kwargs**:
  - Additional arguments are accepted for compatibility with NumPy but have no effect on this method.

### Returns

- **Series or DataFrame**:
  - Returns a Series or DataFrame with the cumulative minimum values along the specified axis.

### Related Functions

- **DataFrame.min()**: Returns the minimum value over the specified DataFrame axis.
- **DataFrame.cummax()**: Returns cumulative maximum over DataFrame axis.
- **DataFrame.cumsum()**: Returns cumulative sum over DataFrame.
- **DataFrame.cumprod()**: Returns cumulative product over DataFrame.

### Examples

#### Example 1: Cumulative Minimum on a Series

```python
import pandas as pd
import numpy as np

# Create a Series
s = pd.Series([2, np.nan, 5, -1, 0])
print("Series:")
print(s)

# Compute cumulative minimum
cummin_series = s.cummin()
print("\nCumulative minimum:")
print(cummin_series)
```

**Output:**

```
Series:
0    2.0
1    NaN
2    5.0
3   -1.0
4    0.0
dtype: float64

Cumulative minimum:
0    2.0
1    2.0
2    2.0
3   -1.0
4   -1.0
dtype: float64
```

#### Example 2: Including NA Values in Cumulative Minimum

```python
# Cumulative minimum with skipna=False
cummin_series_na = s.cummin(skipna=False)
print("\nCumulative minimum including NA values:")
print(cummin_series_na)
```

**Output:**

```
Cumulative minimum including NA values:
0    2.0
1    NaN
2    NaN
3    NaN
4    NaN
dtype: float64
```

#### Example 3: Cumulative Minimum on a DataFrame

```python
# Create a DataFrame
df = pd.DataFrame([[2.0, 1.0],
                   [3.0, np.nan],
                   [1.0, 0.0]],
                  columns=list('AB'))
print("\nDataFrame:")
print(df)

# Cumulative minimum column-wise (default behavior)
cummin_df = df.cummin()
print("\nCumulative minimum (column-wise):")
print(cummin_df)
```

**Output:**

```
Cumulative minimum (column-wise):
     A    B
0  2.0  1.0
1  2.0  1.0
2  1.0  0.0
```

#### Example 4: Cumulative Minimum Row-wise

```python
# Cumulative minimum row-wise
cummin_df_row = df.cummin(axis=1)
print("\nCumulative minimum (row-wise):")
print(cummin_df_row)
```

**Output:**

```
Cumulative minimum (row-wise):
     A    B
0  2.0  1.0
1  3.0  3.0
2  1.0  0.0
```

### Summary

The `DataFrame.cummin()` method is a valuable feature in pandas that allows for the calculation of cumulative minimum values across a DataFrame or Series. It efficiently tracks the lowest values seen so far while providing options to handle NA values based on user preference. This can be particularly useful in data analysis for monitoring trends or changes over time.


In [None]:
""" pandas.DataFrame.to_numpy
DataFrame.to_numpy(dtype=None, copy=False, na_value=<no_default>)[source]
Convert the DataFrame to a NumPy array.

By default, the dtype of the returned array will be the common NumPy dtype of all types in the DataFrame. For example, if the dtypes are float16 and float32, the results dtype will be float32. This may require copying data and coercing values, which may be expensive.

Parameters
:
dtype
str or numpy.dtype, optional
The dtype to pass to numpy.asarray().

copy
bool, default False
Whether to ensure that the returned value is not a view on another array. Note that copy=False does not ensure that to_numpy() is no-copy. Rather, copy=True ensure that a copy is made, even if not strictly necessary.

na_value
Any, optional
The value to use for missing values. The default value depends on dtype and the dtypes of the DataFrame columns.

Returns
:
numpy.ndarray
See also

Series.to_numpy
Similar method for Series.

Examples

pd.DataFrame({"A": [1, 2], "B": [3, 4]}).to_numpy()
array([[1, 3],
       [2, 4]])
With heterogeneous data, the lowest common type will have to be used.

df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
df.to_numpy()
array([[1. , 3. ],
       [2. , 4.5]])
For a mix of numeric and non-numeric types, the output array will have object dtype.

df['C'] = pd.date_range('2000', periods=2)
df.to_numpy()
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object) """

The `pandas.DataFrame.to_numpy` method is a versatile function that converts a DataFrame into a NumPy array. Below, I'll provide a comprehensive overview of its syntax, parameters, return values, and examples to illustrate its usage.

### Syntax

```python
DataFrame.to_numpy(dtype=None, copy=False, na_value=<no_default>)
```

### Parameters

1. **dtype**: `str` or `numpy.dtype`, optional

   - This parameter allows you to specify the desired data type for the output NumPy array. If not provided, the method will infer the common data type based on the DataFrame's columns.

2. **copy**: `bool`, default `False`

   - This parameter controls whether to ensure that the returned array is not a view on another array.
   - If `copy=True`, a copy of the data will be made, even if it is not strictly necessary.
   - If `copy=False`, the method may return a view of the data if possible, which can lead to performance benefits but may also mean changes to the original DataFrame affect the NumPy array.

3. **na_value**: `Any`, optional
   - This specifies the value to use for missing values (NaNs). The default behavior depends on the specified `dtype` and the dtypes of the DataFrame columns.

### Returns

- **numpy.ndarray**: The method returns a NumPy array representation of the DataFrame.

### Examples

#### Basic Example

```python
import pandas as pd

# Creating a simple DataFrame
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})

# Converting to a NumPy array
array_result = df.to_numpy()
print(array_result)
```

**Output:**

```
array([[1, 3],
       [2, 4]])
```

#### With Heterogeneous Data

When the DataFrame contains mixed types, the output will use the lowest common type.

```python
df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
array_result = df.to_numpy()
print(array_result)
```

**Output:**

```
array([[1. , 3. ],
       [2. , 4.5]])
```

#### Mixed Numeric and Non-Numeric Types

If the DataFrame contains a mix of numeric and non-numeric types, the output array will have `object` dtype.

```python
df['C'] = pd.date_range('2000', periods=2)
array_result = df.to_numpy()
print(array_result)
```

**Output:**

```
array([[1, 3.0, Timestamp('2000-01-01 00:00:00')],
       [2, 4.5, Timestamp('2000-01-02 00:00:00')]], dtype=object)
```

#### Specifying a Data Type

You can specify a particular dtype for the output array.

```python
array_result = df.to_numpy(dtype='float32')
print(array_result)
```

**Output:**

```
array([[1. , 3. , 0.],
       [2. , 4.5, 0.]], dtype=float32)
```

#### Copy Parameter

If you want to ensure that a copy is made regardless of whether a view could be returned, set `copy=True`.

```python
array_result = df.to_numpy(copy=True)
print(array_result)
```

### Summary

- The `to_numpy()` method is a powerful tool for converting a DataFrame to a NumPy array, allowing for flexible data type handling and control over whether a copy is made.
- It is especially useful when integrating pandas with libraries that require NumPy arrays, such as machine learning libraries.

This overview should provide you with a clear understanding of how to use `pandas.DataFrame.to_numpy` effectively. If you have any further questions or need clarification on specific aspects, feel free to ask!


The `pandas.DataFrame.to_numpy()` method is used to convert a **Pandas DataFrame** into a **NumPy array**. This is particularly useful when you need to perform numerical operations or use libraries that require NumPy arrays as input. Below is a comprehensive explanation of the method, including its parameters, return value, and examples.

---

## **Syntax**

```python
DataFrame.to_numpy(dtype=None, copy=False, na_value=<no_default>)
```

---

## **Parameters**

| Parameter      | Description                                                                                                                                                           |
| -------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **`dtype`**    | `str` or `numpy.dtype`, optional<br>The data type of the resulting NumPy array. If not specified, the common dtype of all columns will be used.                       |
| **`copy`**     | `bool`, default `False`<br>If `True`, ensures that the returned array is a copy of the data. If `False`, the array may be a view on the original data.                |
| **`na_value`** | Any, optional<br>The value to use for missing (`NaN`) values in the DataFrame. If not specified, the default depends on the `dtype` and the DataFrame's column types. |

---

## **Returns**

- **`numpy.ndarray`**: A NumPy array representation of the DataFrame.

---

## **Key Points**

1. **Dtype Inference**:

   - If the DataFrame contains columns with different dtypes, the resulting array will have a **common dtype** that can accommodate all the data.
   - For example:
     - If the DataFrame has `float16` and `float32` columns, the resulting array will have `float32` dtype.
     - If the DataFrame has a mix of numeric and non-numeric types, the resulting array will have `object` dtype.

2. **Copy Behavior**:

   - By default (`copy=False`), the method may return a **view** of the data, meaning changes to the array could affect the original DataFrame.
   - If `copy=True`, a **copy** of the data is always returned, ensuring that changes to the array do not affect the DataFrame.

3. **Handling Missing Values**:
   - The `na_value` parameter allows you to specify a value to replace missing (`NaN`) values in the DataFrame.
   - If not specified, the default behavior depends on the `dtype` of the resulting array.

---

## **Examples**

### **Example 1: Basic Usage**

Convert a DataFrame with homogeneous data (all integers) to a NumPy array.

```python
import pandas as pd

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
array = df.to_numpy()
print(array)
```

**Output**:

```
[[1 3]
 [2 4]]
```

---

### **Example 2: Heterogeneous Data**

Convert a DataFrame with mixed data types (integers and floats) to a NumPy array.

```python
df = pd.DataFrame({"A": [1, 2], "B": [3.0, 4.5]})
array = df.to_numpy()
print(array)
```

**Output**:

```
[[1.  3. ]
 [2.  4.5]]
```

---

### **Example 3: Mixed Numeric and Non-Numeric Data**

Convert a DataFrame with numeric and non-numeric (e.g., datetime) columns to a NumPy array.

```python
df['C'] = pd.date_range('2000', periods=2)
array = df.to_numpy()
print(array)
```

**Output**:

```
[[1 3.0 Timestamp('2000-01-01 00:00:00')]
 [2 4.5 Timestamp('2000-01-02 00:00:00')]]
```

- The resulting array has `dtype=object` because it contains mixed types.

---

### **Example 4: Specifying `dtype`**

Convert a DataFrame to a NumPy array with a specific data type.

```python
array = df.to_numpy(dtype='float32')
print(array)
```

**Output**:

```
[[1.  3.  0. ]
 [2.  4.5 0. ]]
```

- Non-numeric columns (e.g., datetime) are coerced to `0.0` because they cannot be converted to `float32`.

---

### **Example 5: Handling Missing Values**

Convert a DataFrame with missing values to a NumPy array, replacing `NaN` with a specified value.

```python
df = pd.DataFrame({"A": [1, None], "B": [3.0, 4.5]})
array = df.to_numpy(na_value=-1)
print(array)
```

**Output**:

```
[[ 1.   3. ]
 [-1.   4.5]]
```

- The `NaN` value in column `A` is replaced with `-1`.

---

### **Example 6: Ensuring a Copy**

Convert a DataFrame to a NumPy array and ensure the result is a copy.

```python
array = df.to_numpy(copy=True)
```

- This ensures that the returned array is a copy of the data, not a view.

---

## **When to Use `to_numpy()`**

1. **Interoperability with NumPy**: Use `to_numpy()` when you need to pass DataFrame data to a NumPy-based library (e.g., SciPy, Scikit-learn).
2. **Performance**: NumPy arrays are generally faster for numerical computations than Pandas DataFrames.
3. **Memory Efficiency**: If you don't need the DataFrame's indexing or labeling, converting to a NumPy array can save memory.

---

## **See Also**

- **`Series.to_numpy()`**: Similar method for converting a Pandas Series to a NumPy array.
- **`numpy.asarray()`**: The underlying function used by `to_numpy()`.

---
