# Basics of Pandas with CRUD operations


### 1. Importing pandas 
Pandas is conventionnaly imported as `pd` for easier usage 

In [None]:
import pandas as pd


### 2. Creating DataFrames
You can create DataFrames from dictionaries or NumPy arrays with specified columns



creating a pandas series from  a list 

In [None]:
a = [1, 7, 5]
var = pd.Series(a)
print(var)

creating own lables


In [None]:
myvar = pd.Series(a, index= ["x", "y", "z"])
myvar["y"]

creating pandas series from a dictionary(key:value) Keys becomes lables of tables

In [None]:
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)


``Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
Series is like a column, a DataFrame is the whole table.``

we can also do this by using two series

In [32]:
data = {
        'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
}
myvar = pd.DataFrame(data)
myvar


Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


Some other example where we create a new dataframe by importing data set 

In [None]:

df = pd.DataFrame(iris.data,columns=iris.feature_names)
# creates a data frame named df using panda , then 
# pd.DataFrame(iris.data,columns=iris.feature_names ) creates new columns with every feature name and fill it with iris.data 

df['target'] = digits.target # adds a new column at last with values from digits.target


In [None]:
import matplotlib.pyplot as plt
plt.scatter(df['Name'], df['Age'])

### 3. Viewing the data 


In [None]:
df.to_string() #display all data
df.head() # Display the first 5 rows of the DataFrame
df.head(15) # Display the first 15 rows of the DataFrame
df.tail() # Display the last 5 rows of the DataFrame
df.tail(15) # Display the last 15 rows of the DataFrame

# ---------------other ways related to viewing---------------------
df.info() # summary of the DataFrame, including data types, non-null values, and memory usage.
df.describe() # display count, mean , standard deviation, min, max and quartiles
df.columns()  # Column labels

# Get dimensions (number of rows and columns) of the DataFrame
print(df.shape)  # Dimensions (rows, columns)

# Get data types of columns
print(df.dtypes)  # Data types of columns

# Get the number of unique values in each column
print(df.nunique())  # Number of unique values in each column

# Check for missing values in the DataFrame
print(df.isnull())  # Check for missing values

### 4. Reading Data From files 
Pandas allow you to read data from Csv and Excel files easily

In [None]:
# Reading data from a CSV file
df = pd.read_csv('data.csv')

# Reading data from a JSON file
df = pd.read_json('data.json')

# Reading data from an Excel file
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')


### Update data 
Update Data using following commands 

In [None]:
# Example: Updating a specific cell in the DataFrame
df.loc[0, 'Age'] = 26

# Example: Updating an entire column in the DataFrame
df['Salary'] = [50000, 60000, 55000, 62000]

# Example: Replacing values in the DataFrame
df.replace('Alice', 'Alicia', inplace=True)

# Example: Renaming columns in the DataFrame
df.rename(columns={'Name': 'Full Name', 'Age': 'Years'}, inplace=True)


### Delete operation in pandas


In [None]:
# Example: Removing rows by index df.drop(index)
df.drop(2, inplace=True)

# Example: Removing columns by name df.drop(columns)
df.drop('City', axis=1, inplace=True)

# Example: Removing duplicate rows df.drop_duplicates()
df.drop_duplicates(subset='Name', keep='first', inplace=True)

# Example: Removing rows with missing values  df.dropna()
df.dropna(inplace=True)

# Remove columns with missing values df.dropna(axis=1)
df.dropna(axis=1, inplace=True)


we use `inplace=True` to specify, modify the original dataframe directly insted or returning a new DataFrame with the changes applied. 

### 5. Data Selection and indexing
You can select and index data using column names, integer-based indexing (iloc), and label-based indexing (loc).

In [None]:
# Selecting a single column
col = df['ColumnName']

# Using iloc for integer-based indexing
row = df.iloc[0]


### 6. Filtering and Querying Data with conditions

In [None]:
# Filtering rows based on a condition
filtered_df = df[df['Age'] > 25]

# Querying data using the query() method
query_result = df.query('Age > 25')


### 7. Handling Missing Data
You can handle missind data either by removing the missing row or filling with specific values.

In [None]:
# Removing rows with missing values
df = df.dropna()

# Filling missing values with a specific value
df['Column'].fillna(value, inplace=True)


### 8. Grouping and Aggregation 
Group data based on a column and perform aggregation operations. 

In [None]:
# Grouping data by a column
grouped = df.groupby('Category')

# Applying aggregation functions
agg_result = grouped['Value'].agg(['mean', 'sum', 'count'])


### 9. Data Manipulation
Merge DataFrames, concatenate them, and pivot data for reshaping.

In [None]:
# Merging DataFrames
merged_df = pd.merge(df1, df2, on='KeyColumn')

# Concatenating DataFrames vertically
concatenated_df = pd.concat([df1, df2])

# Pivoting data
pivot_table = df.pivot_table(index='IndexColumn', columns='Column1', values='Column2', aggfunc='mean')


### 10. Data visualization
WE can use libraries like Matplotlib and seaborn to plot or visualize the data
