# <center>Pandas Basics🐼<center><hr style="border:4.5px solid #108999"> </hr>
- Pandas is a powerful, open-source Python library specifically designed for data analysis and manipulation. It offers:

- Data Structures:
    * Series: One-dimensional array with labeled data
    * DataFrame: Two-dimensional data structure with rows and columns
    * Panel: Three-dimensional data structure with labeled axes

- Capabilities:
    * Data Import/Export: Reads and writes data from various formats like CSV, Excel, SQL databases, etc.
    * Data Cleaning and Wrangling: Filters, sorts, merges, replaces missing values, etc.
    * Data Analysis: Descriptive statistics, correlations, time series analysis, etc.
    * Visualization: Creates various charts and graphs to explore and communicate data insights.

- Benefits:
    * Efficient: Optimizes data handling and analysis tasks.
    * Flexible: Supports various data types and operations.
    * Easy to Use: Provides intuitive API and extensive documentation.
    * Powerful: Enables complex data analysis and manipulation.

![Capture.PNG](attachment:Capture.PNG)

## <center>Installing and Running pandas<center>

In [1]:
import pandas as pd

## <center>Series in Python</center>
- A Series in Pandas is a one-dimensional array with labeled data. It is a fundamental data structure within the Pandas ecosystem and serves as the building block for more complex data structures like DataFrames and Panels.

- Key characteristics:
    * One-dimensional array
    * Each element has a corresponding label (index)
    * Can hold data of any type (integers, floats, strings, booleans, etc.)
    * Supports efficient data manipulation and analysis

- Common operations:
    * Accessing elements using labels
    * Data selection and filtering
    * Arithmetic operations and comparisons
    * Data aggregation (sum, mean, max, etc.)
    * Missing value handling
    * Data sorting and ranking
    * Visualization

- Benefits of using Series:
    * Efficiently store and manipulate one-dimensional data
    * Easy to create and work with
    * Powerful data analysis capabilities
    * Integrates seamlessly with other Pandas data structures

- Applications of Series:
    * Representing time series data
    * Categorical data analysis
    * Feature engineering for machine learning
    * Analyzing individual variables in a dataset
    * Creating visualizations like bar charts and histograms

In [2]:
a = [10.5,20,30,40,50,'hi',8j]
A = pd.Series(a)
A

0    10.5
1      20
2      30
3      40
4      50
5      hi
6      8j
dtype: object

#### Adding element in the Series

In [3]:
A[7] = 'Hello'
A

0     10.5
1       20
2       30
3       40
4       50
5       hi
6       8j
7    Hello
dtype: object

In [4]:
data={'a':0,'b':1,'c':2,'d':3,'e':4}
B = pd.Series(data)
B

a    0
b    1
c    2
d    3
e    4
dtype: int64

#### Getting Index value in series 

In [5]:
B.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [6]:
print(type(B))
print(B.dtype)
print(B.size)

<class 'pandas.core.series.Series'>
int64
5


#### Dropping columns in series

In [7]:
C = B.drop(labels=['a','b'])
C

c    2
d    3
e    4
dtype: int64

#### Getting particular range in series

In [8]:
A = pd.Series(a,index=range(2,9))
A

2    10.5
3      20
4      30
5      40
6      50
7      hi
8      8j
dtype: object

## <center>DataFrames in Pandas</center>
- DataFrames are two-dimensional, size-mutable, potentially heterogeneous tabular data structures in Pandas. They act like the backbone of data analysis in Python, offering efficient and flexible data manipulation and analysis capabilities.


- Key characteristics:
    * Two-dimensional table with rows and columns
    * Each row represents a data point
    * Each column represents a variable or feature
    * Can hold data of different types (mixed data types)
    * Supports labeled axes (index for rows and columns)
    * Offers efficient data manipulation and analysis

- Common operations:
    * Accessing elements using row/column labels
    * Data selection and filtering (loc, iloc)
    * Data sorting and grouping
    * Applying functions to rows/columns
    * Merging and joining DataFrames
    * Data aggregation and statistics
    * Data visualization

- Benefits of using DataFrames:
    * Efficiently store and manipulate tabular data
    * Flexible data handling with mixed data types
    * Powerful data analysis capabilities
    * Easy integration with other Pandas libraries and tools
    * Widely used in data science and machine learning

- Applications of DataFrames:
    * Representing and analyzing datasets
    * Feature engineering for machine learning
    * Exploratory data analysis (EDA)
    * Data cleaning and preparation
    * Data visualization and storytelling 

#### Creating DataFrames

In [9]:
data = [['Zahid',23],['Salim',23],['Shaikh',23]]
A = pd.DataFrame(data,columns=['name','age'])
A

Unnamed: 0,name,age
0,Zahid,23
1,Salim,23
2,Shaikh,23


#### Adding or updating column names

In [10]:
A.columns=['firstname','age']
A

Unnamed: 0,firstname,age
0,Zahid,23
1,Salim,23
2,Shaikh,23


In [11]:
data={'Name':['Zahid','Salim','Shaikh'],'Marks':[100,75,50]}
df=pd.DataFrame(data,index=['rank1','rank2','rank3'])
print(df)

         Name  Marks
rank1   Zahid    100
rank2   Salim     75
rank3  Shaikh     50


## <center>Access using iloc and loc</center>
* loc---> location/rowlabels/column labels
* iloc-->index location--->indexpostion of rows/indexposition of columns

In [12]:
df.iloc[:2,]

Unnamed: 0,Name,Marks
rank1,Zahid,100
rank2,Salim,75


#### Adding new column in dataframe

In [13]:
df['address']= ['Mumbai','Bangalore','Delhi']
df

Unnamed: 0,Name,Marks,address
rank1,Zahid,100,Mumbai
rank2,Salim,75,Bangalore
rank3,Shaikh,50,Delhi


In [14]:
df['result'] = [20,30,50]
df

Unnamed: 0,Name,Marks,address,result
rank1,Zahid,100,Mumbai,20
rank2,Salim,75,Bangalore,30
rank3,Shaikh,50,Delhi,50


In [15]:
df['updated_result'] = df['result'] + 20
df

Unnamed: 0,Name,Marks,address,result,updated_result
rank1,Zahid,100,Mumbai,20,40
rank2,Salim,75,Bangalore,30,50
rank3,Shaikh,50,Delhi,50,70


#### Deleting column in DataFrame

In [16]:
del df['result']
df

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank2,Salim,75,Bangalore,50
rank3,Shaikh,50,Delhi,70


#### Dropping particular row from the dataframe

In [17]:
df.drop(df[(df['Name'] == 'Shaikh') & (df['Marks'] == 50)].index, axis = 0)

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank2,Salim,75,Bangalore,50


In [18]:
df.drop(['rank2'],axis=0)

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank3,Shaikh,50,Delhi,70


In [19]:
df

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank2,Salim,75,Bangalore,50
rank3,Shaikh,50,Delhi,70


In [20]:
df.iloc[1:3,:3]

Unnamed: 0,Name,Marks,address
rank2,Salim,75,Bangalore
rank3,Shaikh,50,Delhi


#### Various method to store DataFrame 
    1. to_excel()
    2. to_csv()
    3. to_pickle()
    4. to_json()

In [21]:
df.to_excel(r'df.xlsx')

In [22]:
df.to_csv('df1.csv')

In [23]:
df.to_pickle('data.pkl')

In [24]:
import json
data_json = df.to_json(orient='records')
with open('data.json', 'w') as f:
    json.dump(data_json, f)
with open('data.json', 'r') as f:
    data_json = json.load(f)

#### Various method to store data and convert it into dataframe
    1. read_excel()
    2. read_csv()
    3. read_pickle()
    4. read_html()

In [25]:
A = pd.read_excel(r'df.xlsx',index_col=0,header=0)
A

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank2,Salim,75,Bangalore,50
rank3,Shaikh,50,Delhi,70


In [26]:
B = pd.read_csv(r'df1.csv',index_col=0,header=0)
B

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank2,Salim,75,Bangalore,50
rank3,Shaikh,50,Delhi,70


In [27]:
C = pd.read_pickle('data.pkl')
C

Unnamed: 0,Name,Marks,address,updated_result
rank1,Zahid,100,Mumbai,40
rank2,Salim,75,Bangalore,50
rank3,Shaikh,50,Delhi,70


In [28]:
D = pd.read_html('https://en.wikipedia.org/wiki/Minnesota')
D

[                                            Minnesota  \
 0                                               State   
 1   .mw-parser-output .ib-settlement-cols{text-ali...   
 2   Nicknames: Land of 10,000 Lakes; North Star St...   
 3   Motto: L'Étoile du Nord (French: The Star of t...   
 4                           Anthem: "Hail! Minnesota"   
 5   Map of the United States with Minnesota highli...   
 6                                             Country   
 7                                    Before statehood   
 8                               Admitted to the Union   
 9                                             Capital   
 10                                       Largest city   
 11                       Largest county or equivalent   
 12                      Largest metro and urban areas   
 13                                         Government   
 14                                         • Governor   
 15                              • Lieutenant Governor   
 16           