In [1]:
pip install pandas 

Note: you may need to restart the kernel to use updated packages.


Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series

In [1]:
import pandas as pd

# Create the Pandas Series
data = [4, 8, 15, 16, 23, 42]
series = pd.Series(data)

# Print the series
print(series)


0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64


Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the
variable print it.


In [2]:
import pandas as pd

# Create a list with 10 elements
data_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Convert the list to a Pandas Series
series = pd.Series(data_list)

# Print the Series
print(series)


0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64


Q3. Create a Pandas DataFrame that contains the following data:
    ![Screenshot 2024-07-19 153331.png](attachment:079c9dcc-0656-45ae-98cc-c2f5cadc7762.png)
  Then, print the DataFrame.
  

In [3]:
import pandas as pd

# Creating a DataFrame with the given data
data = {
    'Name': ['Alice', 'Bob', 'Claire'],
    'Age': [25, 30, 27],
    'Gender': ['Female', 'Male', 'Female']
}

df = pd.DataFrame(data)
df


Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


Q4.What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In pandas, a DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). It can be thought of as a table or a spreadsheet in which data is aligned in rows and columns. Each column in a DataFrame is a Series, which is a one-dimensional array-like object containing a sequence of values.

Key Differences between DataFrame and Series:

1.Dimensionality:
Series: One-dimensional. It has a single axis (either row or column).
DataFrame: Two-dimensional. It has both rows and columns.

2.Structure:
Series: Can be seen as a single column of data, which includes an index (like a label) and the data values.
DataFrame: Consists of multiple columns of data, where each column is a Series. It can be seen as a collection of Series objects that share the same index.

3.Usage:
Series: Used for handling and analyzing data that is in a single column.
DataFrame: Used for handling and analyzing tabular data where data is in multiple columns.

Example:
Let's illustrate the difference with an example

In [4]:
import pandas as pd

# Creating a pandas Series
data_series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print("Pandas Series:")
print(data_series)

# Creating a pandas DataFrame
data_frame = pd.DataFrame({
    'Column1': [10, 20, 30, 40],
    'Column2': [50, 60, 70, 80]
}, index=['a', 'b', 'c', 'd'])
print("\nPandas DataFrame:")
print(data_frame)


Pandas Series:
a    10
b    20
c    30
d    40
dtype: int64

Pandas DataFrame:
   Column1  Column2
a       10       50
b       20       60
c       30       70
d       40       80


This shows that while a Series represents a single column of data, a DataFrame can represent multiple columns, making it a more complex and versatile data structure for data manipulation and analysis in pandas

Q5.What are some common functions you can use to manipulate data in a Pandas DataFrame? Can
you give an example of when you might use one of these functions?

Pandas provides a wide range of functions to manipulate and analyze data in a DataFrame. Here are some common functions and their use cases:

### 1. `head()` and `tail()`
- **Purpose**: View the first few or last few rows of the DataFrame.
- **Example**: Quickly inspect the beginning or end of a dataset.
  ```python
  df.head(5)  # Display the first 5 rows
  df.tail(5)  # Display the last 5 rows
  ```

### 2. `info()` and `describe()`
- **Purpose**: Get a summary of the DataFrame and statistical overview of numeric columns.
- **Example**: Check data types and basic statistics.
  ```python
  df.info()      # Get a concise summary of the DataFrame
  df.describe()  # Get descriptive statistics for numeric columns
  ```

### 3. `drop()`
- **Purpose**: Remove specified labels (rows or columns) from the DataFrame.
- **Example**: Remove columns that are not needed for analysis.
  ```python
  df.drop(columns=['Column1', 'Column2'], inplace=True)  # Remove specified columns
  ```

### 4. `loc[]` and `iloc[]`
- **Purpose**: Access a group of rows and columns by labels or integer positions.
- **Example**: Select specific rows and columns for analysis.
  ```python
  df.loc[0:5, ['ColumnA', 'ColumnB']]  # Select rows 0 to 5 and specific columns
  df.iloc[0:5, 0:2]                    # Select rows 0 to 5 and first 2 columns by index
  ```

### 5. `groupby()`
- **Purpose**: Group data by one or more columns and perform aggregate operations.
- **Example**: Calculate the average value of each group.
  ```python
  grouped = df.groupby('Category').mean()  # Group by 'Category' and calculate the mean
  ```

### 6. `merge()`
- **Purpose**: Merge two DataFrames based on a key column.
- **Example**: Combine data from two sources based on a common identifier.
  ```python
  merged_df = pd.merge(df1, df2, on='KeyColumn')  # Merge df1 and df2 on 'KeyColumn'
  ```

### 7. `apply()`
- **Purpose**: Apply a function along an axis of the DataFrame.
- **Example**: Apply a custom function to each column or row.
  ```python
  df['NewColumn'] = df['ColumnA'].apply(lambda x: x * 2)  # Double the values in 'ColumnA'
  ```

### 8. `pivot_table()`
- **Purpose**: Create a pivot table based on data in the DataFrame.
- **Example**: Summarize data by creating a pivot table.
  ```python
  pivot = df.pivot_table(values='Sales', index='Region', columns='Product', aggfunc='sum')
  ```

### 9. `fillna()`
- **Purpose**: Fill missing values in the DataFrame.
- **Example**: Replace NaN values with a specified value.
  ```python
  df.fillna(0, inplace=True)  # Replace all NaN values with 0
  ```

### 10. `sort_values()`
- **Purpose**: Sort the DataFrame by the values of one or more columns.
- **Example**: Sort data based on a specific column.
  ```python
  df.sort_values(by='ColumnA', ascending=False, inplace=True)  # Sort by 'ColumnA' in descending order
  ```

### Example Scenario:

Suppose you have a DataFrame containing sales data for different products across various regions, and you want to analyze the total sales by region. You can use the `groupby()` function for this purpose.

```python
import pandas as pd

# Sample DataFrame
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'South', 'East', 'West'],
    'Product': ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B'],
    'Sales': [100, 150, 200, 130, 120, 170, 180, 160]
}

df = pd.DataFrame(data)

# Group by 'Region' and calculate total sales
total_sales_by_region = df.groupby('Region')['Sales'].sum()

print(total_sales_by_region)
```

**Output:**

```
Region
East     380
North    220
South    320
West     290
Name: Sales, dtype: int64
```

In this example, the `groupby()` function groups the data by the 'Region' column and then sums the 'Sales' values for each region, providing a clear summary of total sales by region.

Q6.Which of the following is mutable in nature Series, DataFrame, Panel?


In pandas, both Series and DataFrame are mutable, meaning you can modify their content after they are created. However, Panel has been deprecated since pandas version 0.25.0 and is no longer used.

In [6]:
##Series:

##Mutable: You can change its values, add new elements, and modify existing elements.

import pandas as pd

s = pd.Series([1, 2, 3])
s[0] = 10  # Modify an existing value
s['new'] = 20  # Add a new element
print(s)


0      10
1       2
2       3
new    20
dtype: int64


In [7]:
##DataFrame:

##Mutable: You can modify its values, add or remove columns and rows, and change its structure.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df.at[0, 'A'] = 10  # Modify an existing value
df['C'] = [7, 8, 9]  # Add a new column
print(df)


    A  B  C
0  10  4  7
1   2  5  8
2   3  6  9


Panel:

Deprecated: Panel was a 3D data structure and has been deprecated and removed from pandas. The recommended alternatives are using DataFrame with MultiIndex for 3D data or using the xarray library.

Q7.Create a DataFrame using multiple Series. Explain with an example

Creating a DataFrame using multiple Series in pandas is straightforward. Each Series can represent a column in the DataFrame, and the indices of the Series will align to form the rows.

Example:
Suppose we have data about the names, ages, and cities of a few individuals. We can create separate Series for each of these attributes and then combine them into a DataFrame.

In [8]:
import pandas as pd

# Creating Series for names, ages, and cities
names = pd.Series(['Alice', 'Bob', 'Charlie'])
ages = pd.Series([25, 30, 35])
cities = pd.Series(['New York', 'Los Angeles', 'Chicago'])

# Combining the Series into a DataFrame
df = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'City': cities
})

# Combining the Series into a DataFrame
df = pd.DataFrame({
    'Name': names,
    'Age': ages,
    'City': cities
})

print(df)


      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


By using multiple Series, we can easily organize and manipulate tabular data in pandas, leveraging the powerful functionality of DataFrames for further data analysis and manipulation.