## Q1. Create a Pandas Series that contains the following data: 4, 8, 15, 16, 23, and 42. Then, print the series.

In [4]:
import pandas as pd
series = pd.Series([4,8,15,16,23,42])
series

0     4
1     8
2    15
3    16
4    23
5    42
dtype: int64

## Q2. Create a variable of list type containing 10 elements in it, and apply pandas.Series function on the variable print it.

In [5]:
import pandas as pd
series = pd.Series(list(range(1,11)))
series

0     1
1     2
2     3
3     4
4     5
5     6
6     7
7     8
8     9
9    10
dtype: int64

## Q3. Create a Pandas DataFrame that contains the following data:

In [9]:
import pandas as pd
data = {'Name':['Alice','Bob','Claire'],
        'Age':[25,30,27],
        'Gender':['Female','Male','Female']}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender
0,Alice,25,Female
1,Bob,30,Male
2,Claire,27,Female


## Q4. What is ‘DataFrame’ in pandas and how is it different from pandas.series? Explain with an example.

In pandas, a DataFrame is a two-dimensional labeled data structure, similar to a table in a relational database or an Excel spreadsheet. It is a fundamental data structure for data manipulation and analysis in Python. A DataFrame consists of rows and columns, where each column can be thought of as a pandas Series.

A pandas Series, on the other hand, is a one-dimensional labeled array-like data structure. It's similar to a single column of a DataFrame, and it can hold data of various types (numeric, string, etc.) with associated labels (index).

## DataFrame Example:
Suppose you have data about students, including their names, ages, and scores in two subjects, Math and English. You can create a DataFrame to store this data:

In [11]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 22, 23, 21],
    'Math_Score': [90, 85, 78, 92],
    'English_Score': [80, 75, 88, 95]
}

df = pd.DataFrame(data)
print(df)

      Name  Age  Math_Score  English_Score
0    Alice   25          90             80
1      Bob   22          85             75
2  Charlie   23          78             88
3    David   21          92             95


## Series Example:
Now, let's consider a specific column from the DataFrame, for example, the "Age" column:

In [12]:
age_series = df['Age']
print(age_series)

0    25
1    22
2    23
3    21
Name: Age, dtype: int64


## Q5. What are some common functions you can use to manipulate data in a Pandas DataFrame? Can you give an example of when you might use one of these functions?

In [None]:
# Pandas provides a wide range of functions to manipulate data within a DataFrame. Here are some common functions you can use:

#head() and tail():
#head(n): Returns the first n rows of the DataFrame.
#tail(n): Returns the last n rows of the DataFrame.
 
df.head(5)  # Returns the first 5 rows
df.tail(3)  # Returns the last 3 rows

#describe():
#Provides summary statistics of the DataFrame's numeric columns.

df.describe()  # Generates statistics like mean, min, max, etc.
#info():

#Displays information about the DataFrame, including data types and non-null counts.

df.info()# Provides information about DataFrame columns

#shape:
#Returns the number of rows and columns as a tuple.
rows, cols = df.shape

#loc[] and iloc[]:
#loc[]: Accesses data by label (row and column names).
#iloc[]: Accesses data by integer-based indexing.
df.loc[0]  # Accesses the first row of the DataFrame
df.iloc[:, 2]  # Accesses the third column of the DataFrame


#drop():
#Removes specified rows or columns from the DataFrame.

df.drop([0, 2])  # Drops rows with index 0 and 2
df.drop('column_name', axis=1)  # Drops the specified column

#groupby():
#Groups data based on specified columns and allows aggregation operations.
df.groupby('category')['value'].mean()  # Computes the mean of 'value' for each 'category'

#sort_values():
#Sorts the DataFrame by specified columns.
df.sort_values('column_name', ascending=False)  # Sorts the DataFrame by 'column_name' in descending order

#pivot_table():
#Creates a pivot table to summarize and aggregate data.

pivot_table = df.pivot_table(index='category', values='value', aggfunc='mean')

#fillna():
#Fills missing values with specified values or methods.

df.fillna(0)  # Fills missing values with 0
df.fillna(method='ffill')  # Fills missing values with the previous value (forward fill)


These are just a few of the many functions available in Pandas for data manipulation. You might use these functions in various scenarios, such as data preprocessing, exploratory data analysis, cleaning missing values, summarizing data, and creating derived features. For example, the groupby() function could be used to analyze sales data by different product categories, and the pivot_table() function could be used to create a summary table for further analysis.

## Q6. Which of the following is mutable in nature Series, DataFrame, Panel?
Among the options provided (Series, DataFrame, Panel), both Series and DataFrame are mutable in nature in the context of pandas.

Series: A pandas Series is mutable because you can change the values of elements within the Series after its creation. For example, you can assign new values to specific indices in a Series.

DataFrame: A pandas DataFrame is also mutable. You can modify the values of individual cells, add or remove columns, and perform various data manipulation operations on a DataFrame.

However, it's worth noting that the Panel data structure in pandas has been deprecated since version 0.25.0, and it's no longer recommended for use. Instead, you should use MultiIndex DataFrames or the pd.Panel.to_frame() function to convert Panels to DataFrames for modern pandas versions. Panels were not as widely used as Series and DataFrames and have been largely replaced by other data structures and techniques.

## Q7. Create a DataFrame using multiple Series. Explain with an example.
You can create a DataFrame using multiple Series by combining those Series into a dictionary and then passing the dictionary to the pd.DataFrame() constructor. Each Series will correspond to a column in the DataFrame. Here's an example:

In [14]:
Name_1=pd.Series(['Alice', 'Bob', 'Charlie', 'David'])
age = pd.Series([25, 22, 23, 21])
math_score = pd.Series([90, 85, 78, 92])
data = {
    'Name': Name_1,
    'Age': age,
    'Math_Score': math_score,
}

df = pd.DataFrame(data)
print(df)

      Name  Age  Math_Score
0    Alice   25          90
1      Bob   22          85
2  Charlie   23          78
3    David   21          92


In this example, individual Series (names, ages, math_scores) are created to hold data for different columns of the DataFrame. These Series are then combined into a dictionary named data, where the keys of the dictionary represent column names, and the values are the corresponding Series. Finally, the pd.DataFrame(data) constructor creates a DataFrame with columns "Name", "Age", "Math_Score".