## **Pandas**

Pandas is a powerful and flexible open-source data analysis and manipulation library for Python. It provides data structures and functions needed to work on structured data seamlessly. Pandas is particularly well-suited for handling tabular data, similar to SQL tables or Excel spreadsheets.

### Benefits of Using Pandas
- **Ease of Use**: Pandas provides a high-level interface for data manipulation, making it easy to perform complex operations with simple commands.
- **Data Cleaning**: It offers robust tools for cleaning and preparing data for analysis.
- **Data Analysis**: Pandas supports a wide range of data analysis tasks, including filtering, grouping, and aggregating data.
- **Integration**: It integrates well with other data science libraries like NumPy, Matplotlib, and SciPy.
- **Performance**: Pandas is optimized for performance, allowing for efficient handling of large datasets.

### Installing Pandas
To install Pandas, you can use pip, the Python package installer. Run the following command in your terminal:
```sh
pip install pandas
```

### Importing Pandas
To use Pandas in your Python code, you first need to import it. This is typically done with the following command:
```python
import pandas as pd
```

### DataFrame and Series
- **DataFrame**: A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. It is similar to a table in a database or an Excel spreadsheet.
- **Series**: A Series is a 1-dimensional labeled array capable of holding any data type. It is similar to a column in a table.

### Example
Here is a simple example of creating a DataFrame and a Series in a Jupyter Notebook:

```python
import pandas as pd

# Creating a Series
data_series = pd.Series([1, 2, 3, 4, 5])
print("Series:")
print(data_series)

# Creating a DataFrame
data_frame = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
})
print("\nDataFrame:")
print(data_frame)
```


In [1]:
import pandas as pd

In [10]:
# Creating a Series
data_series = pd.Series([1, 2, 3, 4, 5])
print("Series:")
print(data_series)

# Creating series with dictionary
data = pd.Series({'a':1,'b':2,'c':5})
print(data)

#Creating with index and data as seperate value
data = [3,4,5,6]
index = ['a','b','d','c']
data = pd.Series(data,index=index)
print(data)

print(type(data))


Series:
0    1
1    2
2    3
3    4
4    5
dtype: int64
a    1
b    2
c    5
dtype: int64
a    3
b    4
d    5
c    6
dtype: int64
<class 'pandas.core.series.Series'>


In [None]:
# Creating a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
})
print("\nDataFrame:")
print(df)
print(type(df))

# Creating dataframe with a csv file
df = pd.read_csv("your_csv_file.csv")
df.head() # this will give top 5 record from the csv file


In [None]:
## Accessing column and row from a dataframe

df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
}, index = ['A','B','C','D'])

# how to access a column use direct name of the column
print(df['A'])

# use loc when we need to access a group of rows
print(df.loc['A']) # this will return the A th index of the row which are 1 5 9

# use iloc it is used when we have index location of a integer of that data
print(df.iloc[1])

A    1
B    2
C    3
D    4
Name: A, dtype: int64
A    1
B    5
C    9
Name: A, dtype: int64
A     2
B     6
C    10
Name: B, dtype: int64


In [None]:
## Accesing a Specified element

# .at will work more or less like the row,column value which need to be not the index number but the actual index naming
print(df.at['A','B'])

#.iat is similar to .at but it will perform with the integer value of the both indexation of row and column
print(df.iat[0,1])

5
5


In [None]:
## Data manipulation with dataframe

# Add a column
df['D'] = [1,2,3,4]
print(df)

# Remove a column
# here by default it will check for the row as the default axis is 0 which is row but if we need to remove a whole column then we need to explicit tell about the axis=1 for the column to be removed . Here we also have a different parameter as inplace which is a boolean value which will keep the data as it is in the main datafram when kept False but when that is True it will make changes to the main dataframe itself
df.drop('D',inplace=True) # this has remove the row with index 'D' but we need to remove the column 'D'
print(df)
df.drop('D',axis=1,inplace=True)
print(df)


   A  B   C  D
A  1  5   9  1
B  2  6  10  2
C  3  7  11  3
D  4  8  12  4
   A  B   C  D
A  1  5   9  1
B  2  6  10  2
C  3  7  11  3
   A  B   C
A  1  5   9
B  2  6  10
C  3  7  11
