# Pandas Objects
Pandas is one of the most widely used libraries in Python for data analysis. It provides powerful data structures like Series and DataFrames that make it easier to manipulate and analyze data. 

To understand how these objects work together, we’ll think of a scenario where we’re collecting information about several people — their height, age, and location.

In this tutorial, we’ll use three key objects from Pandas:

- Pandas Series: A one-dimensional array-like object with labeled indices.
- Pandas DataFrame: A two-dimensional table (like a spreadsheet or SQL table) where data is aligned into rows and columns, and both rows and columns have labels (indices).
- Pandas Index: A specialized object used for managing row and column labels efficiently, with support for operations like unions and intersections.

**Why know about these pandas objects**: Knowing their properties and behaviors can help you work more effectively. Smilar to knowing how to operate a drill saw.

## Step 1: Creating a Pandas Series
We begin with Series, which is a simple list-like object. For example, let’s say we are collecting the heights of four people. Each person's height will be stored along with an index (label) for easy access. 

In [None]:
# Import Pandas
import pandas as pd

# Create a Series to store heights of people
heights = pd.Series([5.5, 6.0, 5.8, 6.2], index=['Alice', 'Bob', 'Charlie', 'David'])
          # pd.Series(data, index): data first, then index.

# Display the Series
heights

Alice      5.5
Bob        6.0
Charlie    5.8
David      6.2
dtype: float64

In [2]:
# Accessing specific data by index
heights['Alice']

np.float64(5.5)

## Step 2. Create and Work with a DataFrame
Now, let’s take multiple Series (like heights and maybe ages) and combine them into a DataFrame.

In [3]:
# Create another Series (ages)
ages = pd.Series([25, 30, 28, 35], index=['Alice', 'Bob', 'Charlie', 'David'])

# Combine Series into a DataFrame
data = pd.DataFrame({'Height': heights, 'Age': ages})

# Display the DataFrame
data

Unnamed: 0,Height,Age
Alice,5.5,25
Bob,6.0,30
Charlie,5.8,28
David,6.2,35


## Step 3. Accessing Data Using Index
Use the Index to refer to rows in your DataFrame and Series.
- Row Identifier = Index
- Column Name = Label

Tips for Efficient Work:
- Use .loc[] for label-based indexing: It's great for when you want to work with specific row/column labels.
- Use .iloc[] for integer-based indexing: When you're accessing data by its position.
- Set operations: Experiment with Index operations like union, intersection, and difference for comparing data labels.

In [15]:
data.loc["Bob"]

Height     6.0
Age       30.0
Name: Bob, dtype: float64

In [16]:
data.iloc[1]

Height     6.0
Age       30.0
Name: Bob, dtype: float64

In [8]:
# Accessing a row by label (index)
print(f"About Alice: {data.loc['Alice']}")  # Access by row label

# Accessing a column by name aka lable
print(f"All heights: {data['Height']}")


About Alice: Height     5.5
Age       25.0
Name: Alice, dtype: float64
All heights: Alice      5.5
Bob        6.0
Charlie    5.8
David      6.2
Name: Height, dtype: float64


In [14]:
import pandas as pd

# Sales data index (customer IDs)
sales_index = pd.Index([1, 2, 3, 4, 5])

# Feedback data index (customer IDs)
feedback_index = pd.Index([4, 5, 6, 7, 8])

# Find customers who both bought and gave feedback (intersection)
customers_with_feedback = sales_index.intersection(feedback_index)  # Common customers

# Find all customers who either bought something or left feedback (union)
all_customers = sales_index.union(feedback_index)  # All unique customers

# Find all customers who bought something and did not leave feedback or vice versa (difference)
customers_difference = sales_index.symmetric_difference(feedback_index)  # Unique customers

# Print results
print("Customers with both sales and feedback:", customers_with_feedback)
print("All unique customers:", all_customers)
print(f"All unique customers (difference): {customers_difference}")


Customers with both sales and feedback: Index([4, 5], dtype='int64')
All unique customers: Index([1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
All unique customers (difference): Index([1, 2, 3, 6, 7, 8], dtype='int64')


In [17]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("sakshigoyal7/credit-card-customers")

print("Path to dataset files:", path)

# Load the dataset into a pandas DataFrame
df = pd.read_csv(path + "/BankChurners.csv")  

# Display the first few rows of the dataset
print(df.head(3))

  from .autonotebook import tqdm as notebook_tqdm


Path to dataset files: /Users/vijaypatha/.cache/kagglehub/datasets/sakshigoyal7/credit-card-customers/versions/1
   CLIENTNUM     Attrition_Flag  Customer_Age Gender  Dependent_count  \
0  768805383  Existing Customer            45      M                3   
1  818770008  Existing Customer            49      F                5   
2  713982108  Existing Customer            51      M                3   

  Education_Level Marital_Status Income_Category Card_Category  \
0     High School        Married     $60K - $80K          Blue   
1        Graduate         Single  Less than $40K          Blue   
2        Graduate        Married    $80K - $120K          Blue   

   Months_on_book  ...  Credit_Limit  Total_Revolving_Bal  Avg_Open_To_Buy  \
0              39  ...       12691.0                  777          11914.0   
1              44  ...        8256.0                  864           7392.0   
2              36  ...        3418.0                    0           3418.0   

   Total_Amt_Chn

In [20]:
df[df['Customer_Age']>40]

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,...,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,...,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,0.000093,0.999910
1,818770008,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,44,...,8256.0,864,7392.0,1.541,1291,33,3.714,0.105,0.000057,0.999940
2,713982108,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,36,...,3418.0,0,3418.0,2.594,1887,20,2.333,0.000,0.000021,0.999980
5,713061558,Existing Customer,44,M,2,Graduate,Married,$40K - $60K,Blue,36,...,4010.0,1247,2763.0,1.376,1088,24,0.846,0.311,0.000055,0.999940
6,810347208,Existing Customer,51,M,4,Unknown,Married,$120K +,Gold,46,...,34516.0,2264,32252.0,1.975,1330,31,0.722,0.066,0.000123,0.999880
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10121,713899383,Existing Customer,56,F,1,Graduate,Single,Less than $40K,Blue,50,...,3688.0,606,3082.0,0.570,14596,120,0.791,0.164,0.000148,0.999850
10122,772366833,Existing Customer,50,M,2,Graduate,Single,$40K - $60K,Blue,40,...,4003.0,1851,2152.0,0.703,15476,117,0.857,0.462,0.000191,0.999810
10123,710638233,Attrited Customer,41,M,2,Unknown,Divorced,$40K - $60K,Blue,25,...,4277.0,2186,2091.0,0.804,8764,69,0.683,0.511,0.995270,0.004729
10124,716506083,Attrited Customer,44,F,1,High School,Married,Less than $40K,Blue,36,...,5409.0,0,5409.0,0.819,10291,60,0.818,0.000,0.997880,0.002118


In [28]:
data = df[(df["Income_Category"] == "$60K - $80K") & (df["Marital_Status"]== "Married") ]
data

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,...,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,...,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,0.000093,0.999910
4,709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,...,4716.0,0,4716.0,2.175,816,28,2.500,0.000,0.000022,0.999980
27,804424383,Existing Customer,63,M,1,Unknown,Married,$60K - $80K,Blue,56,...,10215.0,1010,9205.0,0.843,1904,40,1.000,0.099,0.000186,0.999810
31,712991808,Existing Customer,53,M,2,Uneducated,Married,$60K - $80K,Blue,48,...,2451.0,1690,761.0,1.323,1596,26,1.600,0.690,0.000125,0.999880
32,709029408,Existing Customer,41,M,4,Graduate,Married,$60K - $80K,Blue,36,...,8923.0,2517,6406.0,1.726,1589,24,1.667,0.282,0.000058,0.999940
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10003,719701383,Existing Customer,44,M,4,Unknown,Married,$60K - $80K,Blue,32,...,5175.0,0,5175.0,0.720,14304,99,0.768,0.000,0.000349,0.999650
10022,716832858,Attrited Customer,46,M,3,Graduate,Married,$60K - $80K,Blue,34,...,4930.0,159,4771.0,0.592,7412,60,0.579,0.032,0.997060,0.002941
10041,767348733,Existing Customer,56,M,2,Unknown,Married,$60K - $80K,Blue,49,...,4058.0,793,3265.0,0.758,15865,105,0.667,0.195,0.000333,0.999670
10045,755305683,Existing Customer,45,M,5,Graduate,Married,$60K - $80K,Blue,38,...,8983.0,0,8983.0,0.713,15163,124,0.746,0.000,0.000163,0.999840


In [33]:
data.iloc[0:3]

Unnamed: 0,CLIENTNUM,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Months_on_book,...,Credit_Limit,Total_Revolving_Bal,Avg_Open_To_Buy,Total_Amt_Chng_Q4_Q1,Total_Trans_Amt,Total_Trans_Ct,Total_Ct_Chng_Q4_Q1,Avg_Utilization_Ratio,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1,Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2
0,768805383,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,39,...,12691.0,777,11914.0,1.335,1144,42,1.625,0.061,9.3e-05,0.99991
4,709106358,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,21,...,4716.0,0,4716.0,2.175,816,28,2.5,0.0,2.2e-05,0.99998
27,804424383,Existing Customer,63,M,1,Unknown,Married,$60K - $80K,Blue,56,...,10215.0,1010,9205.0,0.843,1904,40,1.0,0.099,0.000186,0.99981


In [34]:
df.select_dtypes(include=["number"]).columns

Index(['CLIENTNUM', 'Customer_Age', 'Dependent_count', 'Months_on_book',
       'Total_Relationship_Count', 'Months_Inactive_12_mon',
       'Contacts_Count_12_mon', 'Credit_Limit', 'Total_Revolving_Bal',
       'Avg_Open_To_Buy', 'Total_Amt_Chng_Q4_Q1', 'Total_Trans_Amt',
       'Total_Trans_Ct', 'Total_Ct_Chng_Q4_Q1', 'Avg_Utilization_Ratio',
       'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_1',
       'Naive_Bayes_Classifier_Attrition_Flag_Card_Category_Contacts_Count_12_mon_Dependent_count_Education_Level_Months_Inactive_12_mon_2'],
      dtype='object')

In [35]:
data.filter(items=["Age","Gender"])

Unnamed: 0,Gender
0,M
4,M
27,M
31,M
32,M
...,...
10003,M
10022,M
10041,M
10045,M
