## **Theory Section**

**Q1. What is NumPy, and why is it widely used in Python?**  
**Answer:**  
NumPy (Numerical Python) is a library for numerical and scientific computing in Python. It provides powerful tools for handling multidimensional arrays, mathematical functions, linear algebra, and random number generation.  
It is widely used because:  
- It is faster than Python lists (implemented in C).  
- It supports vectorized operations.  
- It integrates easily with other libraries like Pandas, Matplotlib, and Scikit-learn.  

---

**Q2. How does broadcasting work in NumPy?**  
**Answer:**  
Broadcasting allows NumPy to perform arithmetic operations on arrays of different shapes. Instead of creating multiple copies, NumPy automatically "expands" smaller arrays to match the larger one for element-wise operations.  
For example:  
```python
import numpy as np
a = np.array([1, 2, 3])
b = 5
print(a + b)   # Output: [6 7 8]
```

---

**Q3. What is a Pandas DataFrame?**  
**Answer:**  
A DataFrame is a 2D labeled data structure in Pandas, similar to a table in Excel or SQL. It consists of rows and columns where each column can hold different data types (int, float, string, etc.).  

---

**Q4. Explain the use of the groupby() method in Pandas.**  
**Answer:**  
The `groupby()` method is used to split data into groups based on some criteria (like categories) and then apply aggregation functions such as sum, mean, or count. It simplifies data analysis tasks.  

---

**Q5. Why is Seaborn preferred for statistical visualizations?**  
**Answer:**  
Seaborn is preferred because:  
- It is built on top of Matplotlib but easier to use.  
- It provides built-in themes and color palettes.  
- It has high-level functions like heatmap, pairplot, and violinplot, which make statistical visualization simpler.  

---

**Q6. What are the differences between NumPy arrays and Python lists?**  
**Answer:**  
- **Speed:** NumPy arrays are faster.  
- **Memory efficiency:** NumPy arrays use less memory.  
- **Functionality:** NumPy supports mathematical operations directly; Python lists do not.  
- **Data type:** NumPy arrays have a fixed type, while Python lists can store mixed data.  

---

**Q7. What is a heatmap, and when should it be used?**  
**Answer:**  
A heatmap is a graphical representation of data where values are represented as colors. It is used to visualize correlation matrices, frequency distributions, and patterns in large datasets.  

---

**Q8. What does the term “vectorized operation” mean in NumPy?**  
**Answer:**  
Vectorized operations mean applying an operation directly on the whole array without using loops. This makes the code faster and more concise. Example:  
```python
a = np.array([1,2,3])
b = np.array([4,5,6])
print(a + b)   # Output: [5 7 9]
```

---

**Q9. How does Matplotlib differ from Plotly?**  
**Answer:**  
- **Matplotlib:** Mostly used for static 2D plots.  
- **Plotly:** Used for interactive and dynamic plots (zoom, hover info, 3D visualization).  

---

**Q10. What is the significance of hierarchical indexing in Pandas?**  
**Answer:**  
Hierarchical indexing allows multiple levels of indexing in rows or columns. It helps in working with higher-dimensional data and simplifies group-based analysis.  

---

**Q11. What is the role of Seaborn’s pairplot() function?**  
**Answer:**  
`pairplot()` creates scatter plots for each pair of numerical columns and histograms for individual columns. It is useful for identifying relationships and distributions in a dataset.  

---

**Q12. What is the purpose of the describe() function in Pandas?**  
**Answer:**  
The `describe()` function provides summary statistics of a DataFrame such as count, mean, std, min, max, and quartiles.  

---

**Q13. Why is handling missing data important in Pandas?**  
**Answer:**  
Missing data can lead to inaccurate results, errors, or biased analysis. Pandas provides methods like `dropna()`, `fillna()` to handle missing values effectively.  

---

**Q14. What are the benefits of using Plotly for data visualization?**  
**Answer:**  
- Interactive charts with zoom/hover features.  
- Supports 3D and real-time visualizations.  
- Easy to integrate with Dash for dashboards.  

---

**Q15. How does NumPy handle multidimensional arrays?**  
**Answer:**  
NumPy uses the `ndarray` object which can store arrays of any dimension (1D, 2D, 3D, etc.). It allows efficient storage and operations like slicing, reshaping, and broadcasting.  

---

**Q16. What is the role of Bokeh in data visualization?**  
**Answer:**  
Bokeh is a Python library for creating interactive and web-friendly visualizations. It is useful for dashboards and handling large datasets in browsers.  

---

**Q17. Explain the difference between apply() and map() in Pandas.**  
**Answer:**  
- `map()`: Works only on Series (1D), used for element-wise operations.  
- `apply()`: Works on both DataFrame and Series, allows applying custom functions row-wise or column-wise.  

---

**Q18. What are some advanced features of NumPy?**  
**Answer:**  
- Linear algebra functions.  
- Fourier transforms.  
- Random number generation.  
- Masked arrays.  
- Broadcasting and vectorization.  

---

**Q19. How does Pandas simplify time series analysis?**  
**Answer:**  
- Date/time indexing.  
- Resampling and shifting.  
- Built-in functions for rolling averages and trends.  
- Easy handling of missing dates.  

---

**Q20. What is the role of a pivot table in Pandas?**  
**Answer:**  
Pivot tables summarize data by grouping and aggregating values. It helps in analyzing large datasets by transforming them into meaningful summaries.  

---

**Q21. Why is NumPy’s array slicing faster than Python’s list slicing?**  
**Answer:**  
NumPy arrays are stored in contiguous memory blocks and implemented in C, which makes slicing operations faster compared to Python lists.  

---

**Q22. What are some common use cases for Seaborn?**  
**Answer:**  
- Visualizing correlations with heatmaps.  
- Comparing distributions with histograms and violin plots.  
- Relationship analysis with scatter plots.  
- Exploratory data analysis with pairplots.  


**Q1. Create a 2D NumPy array and calculate the sum of each row**

In [None]:
import numpy as np

arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

row_sum = arr.sum(axis=1)
print("Row sums:", row_sum)

**Q2. Write a Pandas script to find the mean of a specific column in a DataFrame**

In [None]:
import pandas as pd

data = {'Name': ['A', 'B', 'C', 'D'],
        'Marks': [85, 90, 78, 92]}
df = pd.DataFrame(data)

mean_marks = df['Marks'].mean()
print("Mean of Marks column:", mean_marks)

**Q3. Create a scatter plot using Matplotlib**

In [None]:
import matplotlib.pyplot as plt

x = [5,7,8,7,6,9,5,6,7,8]
y = [99,86,87,88,100,86,103,87,94,78]

plt.scatter(x, y)
plt.title("Scatter Plot Example")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()

**Q4. Calculate the correlation matrix using Seaborn and visualize it with a heatmap**

In [None]:
import seaborn as sns
import pandas as pd

data = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [5, 4, 3, 2, 1],
    'C': [2, 3, 4, 5, 6]
})

corr = data.corr()
sns.heatmap(corr, annot=True, cmap="coolwarm")

**Q5. Generate a bar plot using Plotly**

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'Fruits': ['Apple', 'Banana', 'Mango', 'Grapes'],
    'Quantity': [10, 15, 7, 12]
})

fig = px.bar(df, x='Fruits', y='Quantity', title="Fruit Quantity")
fig.show()

**Q6. Create a DataFrame and add a new column based on an existing column**

In [None]:
import pandas as pd

df = pd.DataFrame({'Name': ['A', 'B', 'C'],
                   'Score': [50, 60, 70]})

df['Passed'] = df['Score'] > 55
print(df)

**Q7. Element-wise multiplication of two NumPy arrays**

In [None]:
import numpy as np

a = np.array([1,2,3])
b = np.array([4,5,6])

result = a * b
print("Element-wise multiplication:", result)

**Q8. Create a line plot with multiple lines using Matplotlib**

In [None]:
import matplotlib.pyplot as plt

x = [1,2,3,4,5]
y1 = [1,4,9,16,25]
y2 = [1,2,3,4,5]

plt.plot(x, y1, label="Square")
plt.plot(x, y2, label="Linear")
plt.legend()
plt.title("Multiple Line Plot")
plt.show()

**Q9. Generate a Pandas DataFrame and filter rows where a column value is greater than a threshold**

In [None]:
import pandas as pd

df = pd.DataFrame({'Name': ['A','B','C','D'],
                   'Age': [20,25,30,18]})

filtered = df[df['Age'] > 21]
print(filtered)

**Q10. Create a histogram using Seaborn to visualize a distribution**

In [None]:
import seaborn as sns

data = [1,2,2,3,3,3,4,4,4,4,5,5,6]
sns.histplot(data, bins=5, kde=True)

**Q11. Perform matrix multiplication using NumPy**

In [None]:
import numpy as np

A = np.array([[1,2],[3,4]])
B = np.array([[5,6],[7,8]])

result = np.dot(A, B)
print("Matrix multiplication:\n", result)

**Q12. Use Pandas to load a CSV file and display its first 5 rows**

In [None]:
import pandas as pd

df = pd.read_csv("sample.csv")   # make sure 'sample.csv' is uploaded
print(df.head())

**Q13. Create a 3D scatter plot using Plotly**

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame({
    'x': [1,2,3,4,5],
    'y': [10,11,12,13,14],
    'z': [5,6,7,8,9]
})

fig = px.scatter_3d(df, x='x', y='y', z='z', title="3D Scatter Plot")
fig.show()