# ðŸ“Š Pandas Series 

## ðŸ”¹ What is Pandas?
Pandas is a powerful Python library used for **data analysis and data manipulation**.  
It helps us work easily with structured data like tables, columns, and rows.

In [None]:
# import major libraries
import pandas as pd
import numpy as np


## ðŸ”¹ Main Data Structures in Pandas
Pandas mainly provides **two core data types**:

- **Series** â†’ One-dimensional data (single column)
- **DataFrame** â†’ Two-dimensional data (rows + columns, like a table)

## Two data types:
- Series --> column
- DataFrames --> Table

## ðŸ”¹ Creating a Series from a List
When a list is converted into a Series:
- Pandas automatically assigns **numeric indexes starting from 0**
- All values are stored in a single column-like structure  

This is useful when you want to convert simple data into an analyzable format.


In [None]:
#series -- > column
#list
l1=[10,20,30,40]
pd.Series(l1)

## ðŸ”¹ Creating a Series from Strings
A Series can also store **text data** like country names.
- Data type becomes `object`
- Indexes are automatically generated
- Useful for categorical data

In [None]:
# countries
countries = ['India','China','USA','Japan','Russia']
countries = pd.Series(countries)
countries

## ðŸ”¹ Creating a Series from a Dictionary
When a dictionary is used:
- **Keys become indexes**
- **Values become data**
- This is called **labeled indexing**

This is very useful when data already has meaningful labels.

## ðŸ”¹ Custom Index in Series
You can assign your **own index names** instead of default numbers.
- Helpful in marksheets, subject-wise data, or named records
- Makes data more readable and meaningful

In [None]:
dict1={
    'Dunki':'SRK',
    'Sultan':'SK',
    'Sanju':'Ranbir kapoor',
    'PK':'AK',
    'Holiday':'Akshay Kumar'
}
movies = pd.Series(dict1)
movies
#labelled indexes

## ðŸ”¹ Handling Missing Values (NaN)
- `NaN` represents **missing or undefined data**
- Pandas automatically converts numeric data with NaN to `float`
- Very common in real-world datasets

In [None]:
sub = ['Hindi','Englis','SST','Science']
marks=[np.nan,78,56,np.nan]
std=pd.Series(marks,index=sub,name='Vipul_Marks')
std

## ðŸ”¹ Saving Series to CSV
A Series can be saved as a **CSV file**.
- CSV means *Comma Separated Values*
- Used for data sharing and storage
- Can be opened in Excel or Google Sheets

In [None]:
#(saving Table in home of jupyter CSV=Comma Separated Values)
std.to_csv('std.csv')

## ðŸ”¹ Creating Series from NumPy Arrays
NumPy arrays can be converted into Series easily.
- Useful for numerical and statistical analysis
- Supports large datasets efficiently
- Indexes can be customized

In [None]:
#numpy arrays --> series
marks = pd.Series(np.random.randint(0,101,100),index=range(1,101,1)) #index --> labelled index
marks

## ðŸ”¹ Important Attributes of Series

### â–¶ Index
- Shows labels of each value
- Can be numeric or text-based

In [None]:
# attributes
# basic attributes
# index
print(marks.index)
print(countries.index)
print(movies.index)

### â–¶ Values
- Returns only the data stored in the Series
- Does not include indexes

In [None]:
# values
marks.values
movies.values

### â–¶ Data Type (`dtype`)
- Tells the type of data stored
- Important for calculations and memory usage

In [None]:
#dtype
marks.dtype
std.dtype
print(countries.dtype)

### â–¶ Name
- A Series can have a name
- Useful for identification

In [None]:
#name
marks.name
std.name

### â–¶ Shape
- Shows number of elements
- Always one-dimensional

In [None]:
#shape
marks.shape

### â–¶ Size
- Total number of elements including missing values

In [None]:
#size
marks.size
std.size

### â–¶ Count
- Counts only **non-missing values**
- Ignores NaN values

In [None]:
#count function
marks.count()
std.count()

### â–¶ Dimensions (`ndim`)
- Always `1` for Series

In [None]:
#ndim
marks.ndim

### â–¶ Is Unique
- Checks whether all values are unique

In [None]:
# isunique
countries.is_unique
marks.is_unique

### â–¶ Empty
- Checks if the Series has no data

In [None]:
#empty
marks.empty

## ðŸ”¹ String Operations on Series
Series containing text data support **string methods**.
- Example: converting text to uppercase
- Very useful for text cleaning

In [None]:
#str
countries.str.upper()

## ðŸ”¹ Viewing Data

### â–¶ Head
- Shows first few values
- Helps quickly inspect data

### â–¶ Tail
- Shows last few values

### â–¶ Sample
- Returns random values
- Useful for large datasets

In [None]:
# function
# head
# tail
# sample
marks
# formula for all value shown 1 to 100(consider not you use it)  --> pd.setoption('display.max_rows','None')
marks.head(10) #top 5
marks.tail() #last 5
marks.sample(5) # random 5

In [None]:
marks.head(10) #top 10

In [None]:
marks.tail() #last 5

## ðŸ”¹ Info Method
Provides:
- Total entries
- Data types
- Memory usage
- Non-null count  

Gives a **summary of Series structure**.

In [None]:
#info
marks.info()

## ðŸ”¹ Describe Method
Gives **statistical summary**:
- Count
- Mean
- Standard deviation
- Minimum and maximum
- Quartiles (25%, 50%, 75%)

Used mainly for numerical analysis.

In [None]:
#describe()
marks.describe()

#SELECTION AND FILTERATION --> IMPORTANT
## ðŸ”¹ Selection and Filtering

### â–¶ Indexing
Accessing data using index positions.

### â–¶ Slicing
Extracting a range of values.

### â–¶ Label-based Selection
Accessing values using index labels.

### â–¶ Condition-based Filtering
Selecting values based on conditions.

In [None]:
marks[3] #indexing

In [None]:
marks[3:5] # slicing

In [None]:
#loc --> labelled indexing
movies

In [None]:
movies.loc['Dunki':'PK']

In [None]:
#iloc --> index
countries[1:4:2] # 2 stpe size or gap

In [None]:
#condition based
marks[marks<10]

In [None]:
# sorting methods
#sort_values
#sort_index

## ðŸ”¹ Sorting Series

### â–¶ Sort by Values
- Arranges data in ascending or descending order

### â–¶ Sort by Index
- Orders data based on index labels

In [None]:
marks.sort_values()

In [None]:
marks=marks.sort_values(ascending=False)
marks

In [None]:
marks=marks.sort_index()
marks

## ðŸ”¹ Aggregate Functions
- **Sum** â€“ Adds all values  
- **Mean** â€“ Average value  
- **Median** â€“ Middle value  
- **Mode** â€“ Most frequent value  
- **Variance** â€“ Data spread  
- **Standard Deviation** â€“ Data variation  
- **Min / Max** â€“ Lowest and highest values  
- **Quantiles** â€“ Divides data into equal parts

In [None]:
#aggregate functions
#sum
marks.sum()
std.sum()

In [None]:
#mean
marks.mean()

In [None]:
#median
marks.median()

In [None]:
#mode
marks.mode()

In [None]:
#value_counts()
marks.value_counts().head()

In [None]:
#Variance
marks.var()

In [None]:
#std
marks.std()

In [None]:
#min/max
print(marks.min())
print(marks.max())

In [None]:
#count
marks.count()

In [None]:
#quantile
print(marks.quantile(0.25))
print(marks.quantile(0.50))
print(marks.quantile(0.75))

## ðŸ”¹ Value Frequency Analysis
- Counts how many times each value appears
- Useful for categorical data analysis

---

## ðŸ”¹ Data Cleaning Operations

### â–¶ Replace
- Replaces specific values

### â–¶ Type Conversion
- Changes data type

### â–¶ Round
- Rounds numeric values

### â–¶ Clip
- Limits values within a range

In [None]:
#replace and clean
#replace
countries.replace('USA','SOUTH KORIA')

In [None]:
#astype
marks.astype(float)

In [None]:
#round
marks.round(2)

In [None]:
#clip
marks.clip(10,60).head(20) #--> 10 se niche 10 ho jaye ge values

## ðŸ”¹ Unique and Duplicate Values

### â–¶ Unique
- Returns unique values

### â–¶ Duplicated
- Finds repeated values

### â–¶ Drop Duplicates
- Removes duplicate values

In [None]:
#unique
#duplicated
#value_counts
#to_dict

In [None]:
marks.unique()

In [None]:
marks[marks.duplicated()].head(15)

In [None]:
marks.drop_duplicates().head(15)

In [None]:
movies.value_counts()

In [None]:
movies.value_counts().to_dict()

## ðŸ”¹ Handling Missing Data

### â–¶ Is Null
- Detects missing values

### â–¶ Drop NA
- Removes missing values

### â–¶ Fill NA
- Replaces missing values

In [None]:
#filling
#dropna
#isnull
std.isnull().sum()

In [None]:
#dropna --> NA VALUE HATA DIYA
std.dropna()

In [None]:
#filling
std.fillna(10)

In [None]:
std[std.isnull()]

## ðŸ”¹ Final Note
Pandas Series is the **foundation of data analysis**.  
Mastering Series makes DataFrames and advanced analysis much easier.