## DataFrame:
 The DataFrame is one of the central data structures in Pandas. It is a
two-dimensional table with rows and columns, similar to a spreadsheet or a SQL table.
Each column in a DataFrame can have a different data type, making it suitable for
heterogeneous and structured data.
## Series: 
A Series is a one-dimensional array-like object in Pandas. It can be thought of as
a single column of data within a DataFrame, with an associated index. Series are used
for representing and working with one-dimensional data.

In [1]:
import pandas as pd
print(pd.__version__)

2.2.2


![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [2]:
import pandas as pd 
data =[10,20,30,40,50]
ser = pd.Series(data)
print(ser)
type(ser)

0    10
1    20
2    30
3    40
4    50
dtype: int64


pandas.core.series.Series

## Example: Monthly Sales Data
index and name parameters


In [3]:
import pandas as pd
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September',
'October', 'November', 'December']

sales_data = [12000, 13500, 14200, 12800, 14000, 15500, 16200, 15800, 16500,
17800, 18500, 17200]

sales_series = pd.Series(sales_data, index=months, name='Monthly Sales (USD)')
# Display the Series
print(sales_series)

January      12000
February     13500
March        14200
April        12800
May          14000
June         15500
July         16200
August       15800
September    16500
October      17800
November     18500
December     17200
Name: Monthly Sales (USD), dtype: int64


dtype use for print differnett data type

In [4]:
import pandas as pd
months = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September',
'October', 'November', 'December']

sales_data = [12000, 13500, 14200, 12800, 14000, 15500, 16200, 15800, 16500,
17800, 18500, 17200]

sales_series = pd.Series(sales_data, index=months, name='Monthly Sales (USD)',dtype=int )
# Display the Series
print(sales_series)

January      12000
February     13500
March        14200
April        12800
May          14000
June         15500
July         16200
August       15800
September    16500
October      17800
November     18500
December     17200
Name: Monthly Sales (USD), dtype: int32


## DataFrame
A Pandas DataFrame is a two-dimensional, tabular data structure that can hold data of
various types, including numerical, string, boolean, or other data types. DataFrames are
one of the core data structures in the Pandas library for Python and are widely used for
data manipulation and analysis tasks.

### What is the difference between series and dataframe?
Series is designed for one-dimensional data, while DataFrame is designed for
two-dimensional tabular data.
DataFrames are more versatile and are commonly used for most data analysis tasks.
Series, on the other hand, are useful when you're primarily working with
single-variable data or when you need to extract a specific column from a DataFrame.

Create, Insert, Update, Delete operations in Dataframe

In [5]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}


df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston


Adding a new column:

In [6]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28]}
df = pd.DataFrame(data)
print("Before Adding Column")
print(df)
print()
# Add a new column "City" with values
df['City'] = ['New York', 'Los Angeles', 'Chicago', 'Houston']
print("After Adding Column")
print(df)

Before Adding Column
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
3    David   28

After Adding Column
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston


Adding multiple columns:

In [7]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)

print("Initial DataFrame:")
print(df)
# Create data for the new columns
new_columns = {
'Gender': ['Female', 'Male', 'Male', 'Male'],
'Grade': ['A', 'B', 'C', 'B']
}
# Add the new columns to the DataFrame
df['Gender'] = new_columns['Gender']
df['Grade'] = new_columns['Grade']
# Display the DataFrame with the new columns
print("\nDataFrame after adding new columns:")
print(df)

Initial DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston

DataFrame after adding new columns:
      Name  Age         City  Gender Grade
0    Alice   25     New York  Female     A
1      Bob   30  Los Angeles    Male     B
2  Charlie   35      Chicago    Male     C
3    David   28      Houston    Male     B


Adding a column at a specific index:

In [8]:
import pandas as pd
# Create an initial DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
# Display the initial DataFrame
print("Initial DataFrame:")
print(df)
# Create data for the new column
new_column = ['Female', 'Male', 'Male', 'Male']
# Define the name and index where you want to insert the new column
new_column_name = 'Gender'
insert_index = 1 # Insert as the second column (index 1)
# Insert the new column at the specified location
df.insert(insert_index, new_column_name, new_column)
# Display the DataFrame with the new column
print("\nDataFrame after adding the new column:")
print(df)

Initial DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston

DataFrame after adding the new column:
      Name  Gender  Age         City
0    Alice  Female   25     New York
1      Bob    Male   30  Los Angeles
2  Charlie    Male   35      Chicago
3    David    Male   28      Houston


insert new column

In [9]:
import pandas as pd

data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)

print("Initial DataFrame:")
print(df)

new_columns = {
'Gender': ['Female', 'Male', 'Male', 'Male'],
'Grade': ['A', 'B', 'C', 'B']
}

df['Gender'] = new_columns['Gender']
df['Grade'] = new_columns['Grade']
 
df.insert(1,"marks",[88,99,67,88])

print("\nDataFrame after adding new columns:")
print(df)

Initial DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston

DataFrame after adding new columns:
      Name  marks  Age         City  Gender Grade
0    Alice     88   25     New York  Female     A
1      Bob     99   30  Los Angeles    Male     B
2  Charlie     67   35      Chicago    Male     C
3    David     88   28      Houston    Male     B


Adding a New Row:

In [10]:
import pandas as pd

info = {'name': ['a', 'b', 'C', 'D'],
'age': [20, 30, 40, 50],
'city': ['noida', 'ghaziabad', 'delhi', 'cp']}
df = pd.DataFrame(info)
print("initial Dataframe:")
print(df)

# new row 
new_row_data = pd.DataFrame([{'Name': 'E', 'Age': 24, 'City': 'ujjain'}])
print()
print(new_row_data)

#adding a row at the end of the DataFrame
df = pd.concat([df, new_row_data], ignore_index=True)

print("\nDataframe after adding a new row:")
print(df)

initial Dataframe:
  name  age       city
0    a   20      noida
1    b   30  ghaziabad
2    C   40      delhi
3    D   50         cp

  Name  Age    City
0    E   24  ujjain

Dataframe after adding a new row:
  name   age       city Name   Age    City
0    a  20.0      noida  NaN   NaN     NaN
1    b  30.0  ghaziabad  NaN   NaN     NaN
2    C  40.0      delhi  NaN   NaN     NaN
3    D  50.0         cp  NaN   NaN     NaN
4  NaN   NaN        NaN    E  24.0  ujjain


Adding multiple new rows:

In [11]:
import pandas as pd
# Create an initial DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 28],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
# Display the initial DataFrame
print("Initial DataFrame:")
print(df)
# Create data for the new rows
new_rows = pd.DataFrame([
{'Name': 'Eve', 'Age': 24, 'City': 'San Francisco'},
{'Name': 'Frank', 'Age': 32, 'City': 'Miami'},
{'Name': 'Grace', 'Age': 29, 'City': 'Seattle'}
])
df = pd.concat([df, new_rows], ignore_index=True)
# Display the DataFrame with the new rows
print("\nDataFrame after adding new rows:")
print(df)

Initial DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston

DataFrame after adding new rows:
      Name  Age           City
0    Alice   25       New York
1      Bob   30    Los Angeles
2  Charlie   35        Chicago
3    David   28        Houston
4      Eve   24  San Francisco
5    Frank   32          Miami
6    Grace   29        Seattle


Adding a new row at a specific index:

In [12]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 28],
    "City": ["New York", "Los Angeles", "Chicago", "Houston"],
}
df = pd.DataFrame(data)

print("Initial DataFrame:")
print(df)

new_row = {"Name": "Eve", "Age": 24, "City": "San Francisco"}
# Define the index where you want to insert the new row

#insert_index = 1.5  # This will actually insert at index 2
insert_index = 2  # This will actually insert at index 2

# Insert the new row at the specified location

df.loc[insert_index] = new_row

df = df.sort_index().reset_index(drop=True)

print("\nDataFrame after adding a new row:")
print(df)

Initial DataFrame:
      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3    David   28      Houston

DataFrame after adding a new row:
    Name  Age           City
0  Alice   25       New York
1    Bob   30    Los Angeles
2    Eve   24  San Francisco
3  David   28        Houston


Updating a single column:

In [13]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print("Before Updating Column")
print(data)
#update the column name
data=data.rename(columns = {'Fruit':'Fruit Name'})
print("After Updating Column")
print(data)

Before Updating Column
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49
After Updating Column
   Fruit Name   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49


Updating multiple columns:

In [14]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print("Before Updating Column")
print(data)
#update the Multiple column name
data=data.rename(columns = {'Fruit':'Fruit Name','Colour':'Color','Price':'Cost'})
print("After Updating Column")
print(data)

Before Updating Column
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49
After Updating Column
   Fruit Name   Color  Cost
0       Apple     Red    45
1     Avacado   Green    90
2      Banana  Yellow    60
3  Strawberry    Pink    37
4       Grape   Green    49


Updating the case of the column names

In [15]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print("Before Updating Column")
print(data)
print()
#lower case
data.columns=data.columns.str.lower()
print("After Updating Column")
print(data)

Before Updating Column
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49

After Updating Column
        fruit   color  price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49


## Updating Row Values

You can use the pandas loc function to locate the rows

In [16]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print("Original Dataset")
print(data)
print("Display 4th row Value")
print(data.loc[3]) # fetch the data of row at index 3

Original Dataset
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49
Display 4th row Value
Fruit    Strawberry
Color          Pink
Price            37
Name: 3, dtype: object


We have located row number 4, which has the details of the fruit, Strawberry. Now, we
have to update this row with a new fruit named Pineapple and its details.

In [17]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print("Original Dataset")
print(data)
#update
data.loc[3] = ['PineApple','Yellow',48]
print("After Updating Values")
print(data)

Original Dataset
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49
After Updating Values
       Fruit   Color  Price
0      Apple     Red     45
1    Avacado   Green     90
2     Banana  Yellow     60
3  PineApple  Yellow     48
4      Grape   Green     49


You can also update only a few details in the row and not the entire one. Assume the
below dataframe with details of fruits.

In [18]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print(data.loc[3, ['Price']])

Price    37
Name: 3, dtype: object


In [19]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}
#Dataframe
data = pd.DataFrame(fruit_data)
print("Original Dataset")
print(data)
#updating
data.loc[3, ['Price']] = [65]
print("After Updating Values")
print(data)

Original Dataset
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49
After Updating Values
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     65
4       Grape   Green     49


code in one cell 


In [20]:
import pandas as pd
fruit_data = {"Fruit": ['Apple','Avacado','Banana','Strawberry','Grape'],"Color":
['Red','Green','Yellow','Pink','Green'],
"Price": [45, 90, 60, 37, 49]
}

#Dataframe
data = pd.DataFrame(fruit_data)
print("Original Dataset")
print(data)
print()

data.columns=data.columns.str.lower()
print("After Updating Column")
print(data)
print()

print("Display 4th row Value")
print(data.loc[3])
data.loc[3] = ['PineApple','Yellow',48]
print("After Updating Values")
print(data)
print()

print(data.loc[3, ['price']])

Original Dataset
        Fruit   Color  Price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49

After Updating Column
        fruit   color  price
0       Apple     Red     45
1     Avacado   Green     90
2      Banana  Yellow     60
3  Strawberry    Pink     37
4       Grape   Green     49

Display 4th row Value
fruit    Strawberry
color          Pink
price            37
Name: 3, dtype: object
After Updating Values
       fruit   color  price
0      Apple     Red     45
1    Avacado   Green     90
2     Banana  Yellow     60
3  PineApple  Yellow     48
4      Grape   Green     49

price    48
Name: 3, dtype: object


Update rows and columns based on condition

In [21]:
import pandas as pd
student_data = {"name": ['A','B','C','D','E'],"section":
['a','b','a','a','b'],
"marks": [45, 90, 65, 37, 49]
}
#Dataframe
data = pd.DataFrame(student_data)
print("Original Dataset")
print(data)
#Updating
data.loc[data['marks'] >60, 'Remarks'] = 'good'
data.loc[data['marks'] <60, 'Remarks'] = 'Not good'
print("After Updating Values")
print(data)

Original Dataset
  name section  marks
0    A       a     45
1    B       b     90
2    C       a     65
3    D       a     37
4    E       b     49
After Updating Values
  name section  marks   Remarks
0    A       a     45  Not good
1    B       b     90      good
2    C       a     65      good
3    D       a     37  Not good
4    E       b     49  Not good


Delete a single row:

In [22]:
import pandas as pd

data = {
'EmployeeID': [101, 102, 103, 104, 105],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [28, 32, 29, 35, 26]
}
df = pd.DataFrame(data)

print("Initial DataFrame:")
print(df)

employee_to_delete = 'Charlie'

df = df.loc[df['Name'] != employee_to_delete]

print("\nDataFrame after deleting the row for 'Charlie':")
print(df)

Initial DataFrame:
   EmployeeID     Name  Age
0         101    Alice   28
1         102      Bob   32
2         103  Charlie   29
3         104    David   35
4         105      Eve   26

DataFrame after deleting the row for 'Charlie':
   EmployeeID   Name  Age
0         101  Alice   28
1         102    Bob   32
3         104  David   35
4         105    Eve   26


Delete a Column

In [23]:
import pandas as pd
# Create a sample employee DataFrame
data = {
'EmployeeID': [101, 102, 103, 104, 105],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [28, 32, 29, 35, 26],
'Photo': ['photo1.jpg', 'photo2.jpg', 'photo3.jpg', 'photo4.jpg', 'photo5.jpg']
}
df = pd.DataFrame(data)
# Display the initial DataFrame
print("Initial DataFrame:")
print(df)

df = df.drop('Photo', axis=1)
# Display the DataFrame after deleting the column
print("\nDataFrame after deleting the 'Photo' column:")
print(df)

Initial DataFrame:
   EmployeeID     Name  Age       Photo
0         101    Alice   28  photo1.jpg
1         102      Bob   32  photo2.jpg
2         103  Charlie   29  photo3.jpg
3         104    David   35  photo4.jpg
4         105      Eve   26  photo5.jpg

DataFrame after deleting the 'Photo' column:
   EmployeeID     Name  Age
0         101    Alice   28
1         102      Bob   32
2         103  Charlie   29
3         104    David   35
4         105      Eve   26


Delete a row based on condition

In [24]:
import pandas as pd
# Create a sample sales DataFrame
data = {
'OrderID': [101, 102, 103, 104, 105],
'Product': ['Widget', 'Gadget', 'Widget', 'Doodad', 'Widget'],
'Quantity': [10, 5, 0, 7, 15],
'Status': ['Shipped', 'Canceled', 'Canceled', 'Shipped', 'Shipped']
}
df = pd.DataFrame(data)
# Display the initial DataFrame
print("Initial DataFrame:")
print(df)

# df = df[df['Status'] != 'Canceled']
df = df.loc[df['Status'] != 'Canceled']
# Display the DataFrame after deleting rows
print("\nDataFrame after deleting 'Canceled' orders:")
print(df)

Initial DataFrame:
   OrderID Product  Quantity    Status
0      101  Widget        10   Shipped
1      102  Gadget         5  Canceled
2      103  Widget         0  Canceled
3      104  Doodad         7   Shipped
4      105  Widget        15   Shipped

DataFrame after deleting 'Canceled' orders:
   OrderID Product  Quantity   Status
0      101  Widget        10  Shipped
3      104  Doodad         7  Shipped
4      105  Widget        15  Shipped
