<a href="https://colab.research.google.com/github/tanujkhatri24-max/Data_Toolkit_Assignment/blob/main/Data_Toolkit_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Toolkit Assignment**

#**Question 1 : What is the difference between multithreading and multiprocessing?**
  -  **Multithreading vs Multiprocessing**

 **1. Meaning**

* **Multithreading** means dividing a single process (or program) into **multiple threads** that can run **concurrently**.
  Each thread shares the same memory and resources of that process.

* **Multiprocessing** means running **multiple processes** at the same time.
  Each process has its **own memory space** and works **independently**.

 **2. Example**

* **Multithreading:**
  Think of a web browser — one thread loads the page, another downloads images, and another plays music.
  All these threads belong to the **same application** and share memory.

* **Multiprocessing:**
  Imagine running several apps at once — one for browsing, one for video editing, and one for playing music.
  Each app runs in its **own process** and uses a separate CPU core.

 **3. Resource Sharing**

* **Multithreading:** Threads share the **same memory**, which makes data sharing easy but also risky because one thread can affect another if not handled carefully.
* **Multiprocessing:** Each process has **its own memory**, so data sharing is difficult, but the system is safer from crashes and memory corruption.

**4. Speed and Performance**

* **Multithreading:** Faster for **I/O-bound tasks** like reading files, sending data over the internet, etc.
  Threads switch quickly and don’t require much memory.

* **Multiprocessing:** Better for **CPU-bound tasks** like heavy calculations, data processing, or video rendering.
  It can use **multiple CPU cores** to truly run tasks in parallel.

 **5. Communication**

* **Multithreading:** Threads can easily communicate through shared memory, but it needs synchronization (using locks or semaphores) to avoid conflicts.
* **Multiprocessing:** Processes need special methods (like pipes or queues) to communicate, which is slower but safer.

 **6. Failure Impact**

* **Multithreading:** If one thread crashes, it can crash the **entire process**.
* **Multiprocessing:** If one process crashes, **other processes keep running**.

 **7. Operating System Support**

* **Multithreading:** Managed inside a **single process**.
* **Multiprocessing:** Managed by the **operating system**, which assigns each process its own CPU.

 **8. Real-life Analogy**

* **Multithreading:** Like having several people (threads) working together on the same desk (shared memory).
* **Multiprocessing:** Like each person having their own desk and working separately (separate memory).
----

#**Question 2: What are the challenges associated with memory management in Python?**
   -  **Challenges Associated with Memory Management in Python**

Memory management means how a programming language **allocates, uses, and frees memory** while a program is running.
Python handles memory automatically using a system called **Garbage Collection**, but there are still some **challenges and limitations** developers face.

---
 **1. Automatic Garbage Collection**

* Python automatically removes unused objects from memory through **Garbage Collection (GC)**.
* However, sometimes GC may **not run at the right time**, causing **unnecessary memory usage**.
* This can make long-running programs **slow or memory-heavy**.

**2. Reference Counting Issues**

* Python mainly uses **reference counting** to keep track of how many variables refer to an object.
* When the reference count becomes zero, the object is deleted.

**3. Memory Leaks**

* A **memory leak** happens when memory is no longer used by the program but not released back to the system.
* It can occur due to **improper handling of objects**, circular references, or global variables.
* Memory leaks increase memory usage and can **slow down or crash** the program over time.

 **4. Fragmentation of Memory**

* When Python frequently allocates and frees memory blocks of different sizes, it can cause **fragmentation**.
* Fragmentation means small unused spaces are left scattered in memory.
* It leads to **inefficient memory use** and reduced performance, especially in large or long-running programs.

 **5. Large Object Handling**

* Python can face challenges while handling **large data structures or big objects** like large lists, dictionaries, or arrays.
* These consume a lot of memory and may cause **MemoryError** if system memory runs out.

**6. Global Interpreter Lock (GIL)**

* Python uses a **Global Interpreter Lock (GIL)** which allows only one thread to execute Python bytecode at a time.
* This limits **true parallel execution** and affects memory management in multi-threaded programs.
* It can also cause uneven memory usage among threads.

**7. Difficulty in Manual Control**

* Unlike C or C++, Python does not give direct control over memory allocation and deallocation.
* Developers **cannot manually free memory** for certain objects, which makes optimization difficult.
* This can be a challenge in programs that require **fine-tuned memory management**.

 **8. External Libraries and Extensions**

* When using **external C extensions** or libraries like NumPy or Pandas, memory management becomes more complex.
* These libraries may allocate memory outside Python’s garbage collector, making it **harder to track or release** properly.

 **9. Inefficient Use of Objects**

* Repeated creation and destruction of small objects can increase memory load.
* For example, using many temporary lists or strings can waste memory.
* Reusing or clearing variables properly can help reduce this problem.

 **10. Lack of Awareness by Developers**

* Many beginners rely completely on Python’s automatic management and ignore optimization.
* Poor coding practices like keeping unused references or using too many global variables can **waste memory unnecessarily**.

----

#**Question 3:Write a Python program that logs an error message to a log file when a division by zero exception occurs.?**
   - Logging division by zero error

```python
import logging

# Configure logging to write to a file
logging.basicConfig(filename='error.log', level=logging.ERROR)

def safe_division(a, b):
  """
  Performs division of a by b and logs an error if division by zero occurs.
  """
  try:
    result = a / b
    return result
  except ZeroDivisionError as e:
    logging.error(f"Error: Division by zero occurred. {e}")
    return None

# Example usage
numerator = 10
denominator = 0

result = safe_division(numerator, denominator)

if result is not None:
  print(f"Result of division: {result}")
else:
  print("Could not perform division due to an error. Check 'error.log' for details.")
```
----
#**Question 4:Write a Python program that reads from one file and writes its content to another file.?**

```python
# Program to copy content from one file to another

# Open the source file in read mode
source_file = open("source.txt", "r")

# Open the destination file in write mode
destination_file = open("destination.txt", "w")

# Read content from source file
content = source_file.read()

# Write content to destination file
destination_file.write(content)

# Close both files
source_file.close()
destination_file.close()

print("File content copied successfully!")
```

---

### **Explanation:**

1. **`open("source.txt", "r")`** – Opens the source file in **read mode**.
2. **`open("destination.txt", "w")`** – Opens the destination file in **write mode** (creates a new file if it doesn’t exist).
3. **`read()`** – Reads the entire content of the source file.
4. **`write()`** – Writes that content to the destination file.
5. Finally, both files are **closed** using `close()` to save changes and free memory.

---

### **Output Example:**

If `source.txt` contains:

```
Hello, this is my Python file.
```

Then after running the program, `destination.txt` will also contain:

```
Hello, this is my Python file.
```

---
#**Question 5: Write a program that handles both IndexError and KeyError using a try-except block.**

```python
# Program to handle both IndexError and KeyError

try:
    # List and dictionary examples
    my_list = [10, 20, 30]
    my_dict = {"name": "Tanuj", "age": 22}

    # Trying to access invalid index and key
    print(my_list[5])      # This will raise IndexError
    print(my_dict["city"]) # This will raise KeyError

except IndexError:
    print("Error: You tried to access an index that does not exist in the list.")

except KeyError:
    print("Error: You tried to access a key that does not exist in the dictionary.")

print("Program continues after handling exceptions.")
```

### 🧠 Explanation:

* **`try` block:** Contains code that might cause an exception.
* **`except IndexError:`** Handles cases where a list index is out of range.
* **`except KeyError:`** Handles cases where a dictionary key does not exist.
* The program continues smoothly after handling the errors instead of crashing.
---------

#**Question 6: What are the differences between NumPy arrays and Python lists?**
 **Differences between NumPy Arrays and Python Lists**

**1. Data Type**

* NumPy arrays can store only one type of data (e.g., all integers or all floats).
* Python lists can store mixed data types (int, float, string, etc.).

**2. Speed**

* NumPy arrays are much faster because they are implemented in C.
* Python lists are slower because they store elements as individual Python objects.

**3. Memory Usage**

* NumPy arrays use less memory.
* Python lists take more memory.

**4. Mathematical Operations**

* NumPy supports element-wise operations directly (like `a + b`, `a * 2`).
* Python lists do not support direct element-wise operations; you need loops.

**5. Functionality**

* NumPy provides many built-in mathematical and statistical functions.
* Python lists have limited functions for numerical tasks.

**6. Multi-Dimensional Support**

* NumPy can easily handle multi-dimensional data (2D, 3D arrays).
* Python lists need nested lists for multiple dimensions, which is harder to manage.

**7. Type Conversion**

* NumPy automatically converts all elements to a common data type.
* Python lists keep the original type of each element.

---

#**Question 7:Explain the difference between apply() and map() in Pandas.?**
  
 **1. `map()`**

* Used **only with Series** (one column).
* It applies a **function, dictionary, or mapping** to each element of that Series.
* Works **element-wise** (one value at a time).

**Example:**

```python
import pandas as pd

s = pd.Series([1, 2, 3, 4])

# Using map() to square each element
print(s.map(lambda x: x**2))
```

**Output:**

```
0     1
1     4
2     9
3    16
dtype: int64
```

---

#### **2. `apply()`**

* Can be used with **both Series and DataFrame**.
* It applies a **function along an axis** (rows or columns).
* More powerful — can work on multiple columns or rows together.

**Example:**

```python
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Using apply() to find sum of each row
print(df.apply(lambda x: x.sum(), axis=1))
```

**Output:**

```
0    5
1    7
2    9
dtype: int64
```
-----
#**Question 8: Create a histogram using Seaborn to visualize a distribution.?**

```python
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = [12, 15, 17, 20, 22, 25, 25, 26, 28, 30, 32, 35, 37, 40, 42, 45]

# Create a histogram
sns.histplot(data, bins=8, kde=True, color='skyblue')

# Add labels and title
plt.title("Distribution of Sample Data")
plt.xlabel("Value")
plt.ylabel("Frequency")

# Show the plot
plt.show()
```

**Explanation:**

* **`sns.histplot()`** → Creates a histogram.
* **`bins=8`** → Divides data into 8 intervals (bars).
* **`kde=True`** → Adds a smooth curve (Kernel Density Estimate) showing the distribution shape.
* **`color='skyblue'`** → Sets the color of the bars.
* **`plt.xlabel()`, `plt.ylabel()`, `plt.title()`** → Used for labeling the plot.

---
#**Question 9: Use Pandas to load a CSV file and display its first 5 rows.?**
  -  Here’s a simple example of how to **load a CSV file using Pandas** and display its **first 5 rows** 👇

---

### ✅ **Python Program:**

```python
import pandas as pd

# Load the CSV file
data = pd.read_csv("data.csv")   # Replace 'data.csv' with your file name

# Display the first 5 rows
print(data.head())
```

---

### 🧠 **Explanation:**

* **`import pandas as pd`** → Imports the Pandas library.
* **`pd.read_csv("data.csv")`** → Reads the CSV file and loads it into a DataFrame.
* **`data.head()`** → Displays the **first 5 rows** of the dataset by default.

---

📌 **Example Output:**

```
   ID   Name   Age   City
0   1   Tanuj  21    Dehradun
1   2   Aryan  22    Delhi
2   3   Riya   20    Mumbai
3   4   Karan  23    Pune
4   5   Meena  21    Lucknow
```

---
#**Question 10: Calculate the correlation matrix using Seaborn and visualize it with a heatmap.?**

**Python Program:**

```python
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Sample data
data = {
    'Age': [21, 22, 23, 24, 25],
    'Height': [160, 165, 170, 175, 180],
    'Weight': [55, 60, 65, 70, 75],
    'Score': [80, 82, 85, 87, 90]
}

# Create a DataFrame
df = pd.DataFrame(data)

# Calculate the correlation matrix
corr = df.corr()

# Display the correlation matrix
print("Correlation Matrix:")
print(corr)

# Visualize using a heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)

# Add title
plt.title("Correlation Heatmap")

# Show the plot
plt.show()
```

---

 **Explanation:**

* **`df.corr()`** → Calculates the **correlation matrix** between numerical columns.
* **`sns.heatmap()`** → Creates a colored grid showing correlation values.
* **`annot=True`** → Displays the correlation values inside the boxes.
* **`cmap='coolwarm'`** → Sets the color scheme for better visualization.
* **`linewidths=0.5`** → Adds thin lines between boxes for clarity.

---









