This guide will walk you through the essentials of using Pandas, the most popular data analysis library in Python — and take it a step further with more powerful filtering and CRUD-style operations.
Pandas is an open-source Python library designed for data manipulation and analysis. It makes working with structured data fast, flexible, and intuitive — especially if you're dealing with CSV files, Excel sheets, SQL tables, JSON, or APIs.
The two core data structures in Pandas are:
Series
: A 1D labeled array (like a column)DataFrame
: A 2D labeled data structure (like a full spreadsheet or SQL table)
A DataFrame is a two-dimensional, tabular data structure with labeled rows and columns. Think of it as a Python-native spreadsheet — but way more powerful and programmable.
- Label-based indexing
- Column-wise and row-wise operations
- Support for mixed data types
- Fast, vectorized operations (built on NumPy)
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Country': ['USA', 'Canada', 'UK']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Country
0 Alice 25 USA
1 Bob 30 Canada
2 Charlie 35 UK
df = pd.read_csv('data.csv') # Load from CSV
df = pd.read_excel('data.xlsx') # Load from Excel
df = pd.DataFrame(data_dict) # Load from dictionary
df.head() # First 5 rows
df.tail() # Last 5 rows
df.info() # Column types and non-null values
df.describe() # Summary stats
df.columns # Column names
df.index # Row indices
df['Age'] # Single column
df[['Name', 'Age']] # Multiple columns
df.iloc[0] # First row by position
df.loc[0] # First row by label
df[df['Age'] > 30] # Age over 30
df[df['Country'] == 'USA'] # Country match
df[(df['Age'] > 25) & (df['Country'] == 'UK')] # Multiple conditions
df[df['Name'].str.startswith('A')] # Names starting with A
df[df['Country'].isin(['USA', 'UK'])] # Match multiple values
df[~df['Country'].isin(['Canada'])] # NOT condition
new_row = pd.DataFrame([{'Name': 'Diana', 'Age': 28, 'Country': 'Germany'}])
df = pd.concat([df, new_row], ignore_index=True)
df.loc[df['Name'] == 'Alice']
df.iloc[2]
df.loc[df['Name'] == 'Bob', 'Age'] = 31 # Update Bob's age
df['Country'] = df['Country'].str.upper() # Convert all country names to uppercase
df = df.drop(2, axis=0) # Drop row with index 2
df = df[df['Name'] != 'Alice'] # Drop rows where name == Alice
df = df.drop('Age', axis=1) # Drop 'Age' column
df.isnull().sum() # Count missing values
df.dropna(inplace=True) # Drop rows with missing values
df.fillna({'Age': 0}, inplace=True) # Fill missing Age values with 0
df.rename(columns={'Age': 'Years'}, inplace=True) # Rename column
df['Years'] = df['Years'].astype(float) # Convert column to float
df['Country'].value_counts() # Count occurrences
df.groupby('Country')['Age'].mean() # Average age per country
df.sort_values(by='Age', ascending=False) # Sort by Age descending
df.to_csv('cleaned_data.csv', index=False)
df.to_excel('output.xlsx', index=False)