Q1) What is data cleaning, and why is it important in data analysis?
 What are the potential consequences of analyzing unclean or messy data?
 Explain the common steps involved in cleaning and organizing data.

 ans) Data cleaning (also called data cleansing or data scrubbing) is the process of detecting and correcting (or removing) inaccurate, incomplete, duplicated, or irrelevant data from a dataset. The goal is to ensure that the data is accurate, consistent, and usable for analysis.

a)  clean data is the foundation of reliable analysis. Without it, even the most advanced algorithms or statistical models can produce misleading results.

Accuracy: Ensures insights are based on correct information.

Efficiency: Reduces wasted time dealing with errors during analysis.

Consistency: Makes datasets comparable across sources or time periods.

Better decisions: Clean data leads to trustworthy conclusions and business strategies.



b) If data is not cleaned properly, several problems can arise:

Incorrect conclusions: Wrong insights can lead to poor business or policy decisions.

Bias in models: Machine learning algorithms may learn patterns from errors instead of reality.

Reduced credibility: Stakeholders may lose trust in the analysis.

Wasted resources: Time and money spent on flawed analysis.

Compliance risks: In regulated industries, inaccurate data can lead to legal or financial penalties.




c)Common Steps in Data Cleaning & Organizing:
**bold text**
 Data Inspection

Identify missing values, duplicates, outliers, or inconsistencies.

Example: Spotting “N/A” or “NULL” entries in a dataset.

Handling Missing Data

Options: remove rows, fill with averages/medians, or use predictive methods.

Removing Duplicates

Eliminate repeated records to avoid skewed results.

Correcting Inconsistencies

Standardize formats (e.g., dates as YYYY-MM-DD, consistent units like meters vs. feet).

Filtering Irrelevant Data

Remove unnecessary columns or records that don’t contribute to the analysis.

Outlier Detection & Treatment

Decide whether extreme values are errors or valid rare cases.

Data Transformation

Normalize or scale values, encode categorical variables, and ensure compatibility with analysis tools.

Q2) How would you sort the following dataset first by "Department" (A-Z) and then by "Salary" (Largest to
Smallest)? Write a step-by-step approach.
 Employee
 Sonu
 Pranav
 Rahul
 Department
 IT
 HR
 IT
 Salary
 4000
 5000
 2500







ans) Step-by-step approach (generic process)
Organize columns: Ensure the data is in three aligned columns: Employee, Department, Salary.

Set sort keys:

Primary key: Department (A–Z).

Secondary key: Salary (Largest to Smallest / Descending).

Apply the sort: Sort the entire range using the two keys in order: first by Department, then by Salary descending.

Verify alignment: Confirm each row’s Employee, Department, and Salary stayed together after sorting.


In Excel or Google Sheets
Select range: Highlight all three columns and rows of data.

Open sort dialog: Data → Sort range (ensure “Data has header row” if you have headers).

Add levels:

Sort by Department → A to Z.

Then by Salary → Z to A (largest to smallest).

Apply: Click OK/Sort.



In [None]:
import pandas as pd

df = pd.DataFrame({
    'Employee': ['Sonu', 'Pranav', 'Rahul'],
    'Department': ['IT', 'HR', 'IT'],
    'Salary': [4000, 5000, 2500]
})

df_sorted = df.sort_values(by=['Department', 'Salary'], ascending=[True, False])
print(df_sorted)

FINAL RESULT:


 employee               department             salary

pranav                    HR                     5000

sonu                      IT                     4000

rahul                     IT                     2500

**bold text**


Q3) Explain the use of text functions such as TRIM , LEFT, RIGHT, MID, and CONCAT in data cleaning.


ans) TRIM:

Purpose: Removes extra spaces from text, leaving only single spaces between words.

Use in Data Cleaning:

Eliminates leading, trailing, and multiple spaces that can cause mismatches.

Example: " Sonu Kumar " → TRIM(" Sonu Kumar ") → "Sonu Kumar"



LEFT:

Purpose: Extracts a specified number of characters from the left side of a text string.

Use in Data Cleaning:

Useful for splitting codes, IDs, or prefixes.

Example: "IT4000" → LEFT("IT4000",2) → "IT"

RIGHT:

Purpose: Extracts a specified number of characters from the right side of a text string.

Use in Data Cleaning:

Helps isolate suffixes, numeric codes, or last few digits.

Example: "IT4000" → RIGHT("IT4000",4) → "4000"


MID:

Purpose: Extracts characters from the middle of a text string, starting at a specific position.

Use in Data Cleaning:

Useful when data is embedded in the middle of a string.

Example: "EMP12345HR" → MID("EMP12345HR",4,5) → "12345"


CONCAT (or CONCATENATE):

Purpose: Joins two or more text strings into one.

Use in Data Cleaning:

Combines fields like first name + last name, or merges codes.

Example: CONCAT("Sonu"," Kumar") → "Sonu Kumar"


Why These Functions Matter in Data Cleaning
Standardization: Ensures consistent formatting of text fields.

Parsing & Extraction: Breaks down complex strings into usable parts.

Error Reduction: Prevents mismatches caused by hidden spaces or inconsistent text.

Integration: Helps merge datasets by aligning text values properly.



Q4)  What is the role of date functions like TODAY in managing datasets?

ans)  Role of Date Functions (like TODAY)

1. Dynamic Current Date
TODAY automatically returns the current system date.

This ensures your dataset always reflects “today’s” date without manual entry.

Example: In Excel, =TODAY() on 14 Nov 2025 will display 14-11-2025.

2. Tracking Deadlines & Expiry
Compare due dates or expiry dates with TODAY to identify overdue items.

Example: =IF(DueDate<TODAY(),"Overdue","On Time") → flags overdue tasks.

3. Calculating Durations
Measure time differences between dates and today.
Example: =DATEDIF(HireDate, TODAY(), "y") → calculates years of service for employees.
4. Automating Reports
Dashboards and reports can show “as of today” values automatically.

 Example: Sales reports update daily without needing manual date changes.

5. Forecasting & Scheduling
Combine TODAY with arithmetic to project future dates.

Example: =TODAY()+30 → gives the date 30 days from now (useful for reminders).

6. Consistency Across Records
Ensures all calculations reference the same current date, avoiding errors from manual input.



Q5)  Apply Data Validation to restrict Quantity values to only whole numbers between 1 and 10.

a)  Configure an input message that appears when a user selects a cell in the "Quantity" column explaining:
 "Please enter a whole number between 1 and 10."

b)  Set up an error alert message that triggers if the user enters a number less than 1 or greater than 10,
showing:
 "Invalid input! The quantity must be a whole number between 1 and 10.


 ans) 1. Select the "Quantity" Column

Highlight the cells in the Quantity column where you want to restrict input.

2. Open Data Validation

In Excel: Go to the Data tab → click Data Validation (in the Data Tools group).

In Google Sheets: Go to Data → Data validation.

3. Set Validation Criteria
Allow: Whole number

Data: Between

Minimum: 1

Maximum: 10

This ensures only whole numbers from 1 to 10 are accepted.

4. Configure Input Message

Go to the Input Message tab.

Check Show input message when cell is selected.

Enter:

Title: Quantity Entry

Message: Please enter a whole number between 1 and 10.

This message will appear when a user clicks on a cell in the Quantity column.

5. Configure Error Alert

Go to the Error Alert tab.

Check Show error alert after invalid data is entered.

Choose Style: Stop (to prevent invalid entry).

Enter:

Title: Invalid input!

Message: The quantity must be a whole number between 1 and 10.

This alert will pop up if someone enters a number less than 1 or greater than 10.


Result:

Users will see a helpful input message when selecting the cell.

If they enter something outside the allowed range, they’ll get a clear error alert and won’t be able to proceed until they correct it.





Q6) Write a step-by-step approach for this question
 Customer Name
 Product Name
 Category
 Quantity
 Jane Smith
 Isabella Moore
 Daniel Davis
 Shoes
 Laptop
 Electronics
 Electronics
 Unit Price ($)
 Sofa
 Alex Moore
 Michael Johnson
 Daniel Johnson
 Isabella Davis
 Jane Davis
 Shoes
 Table Lamp
 Clothing
 Electronics
 Home Decor
 Backpack
 Headphones
 Headphones
 Alex Wilson
 T-shirt
 Electronics
 Electronics
 Electronics
 Home Decor
 81
 121
 239
 500
 423
 160
 348
 152
 369

 Understand and apply fundamental text functions like LEFT, RIGHT, MID, and LEN.

 Extract the first 5 characters from the string "ExcelTipsAreGreat" using the LEFT function.

 Extract the last 4 characters from "DataAnalysis.xlsx" using the RIGHT function.

 Extract the substring "Tips" from "ExcelTipsAreGreat" using the MID function.

 Count the total number of characters in the string "Hello World!" using the LEN function.

 Create a formula to extract the middle 6 characters from "12345-67890-ABCDE"


 ans)
  Step-by-step approach to apply LEFT, RIGHT, MID, and LEN:

Organize data columns for clarity
Create headers: Customer Name, Product Name, Category, Quantity, Unit Price ($).

Align rows: Place each value under its correct header so each row represents a single order.

Confirm data types: Ensure Quantity and Unit Price are numbers; others are text.

Understanding the text functions:

LEFT(text, num_chars): Returns the first n characters from the start.

RIGHT(text, num_chars): Returns the last n characters from the end.

MID(text, start_num, num_chars): Returns n characters starting at a position.

LEN(text): Returns the total count of characters including spaces and punctuation.


* Apply each function with exact formulas and results

1. Extract the first 5 characters from "ExcelTipsAreGreat" using LEFT
Formula:

Code
=LEFT("ExcelTipsAreGreat", 5)

Result: Excel

2. Extract the last 4 characters from "DataAnalysis.xlsx" using RIGHT
Formula:

Code
=RIGHT("DataAnalysis.xlsx", 4)

Result: .xlsx


3. Extract the substring "Tips" from "ExcelTipsAreGreat" using MID
Find position: T in “ExcelTipsAreGreat” is the 6th character.

Formula:

Code
=MID("ExcelTipsAreGreat", 6, 4)

Result: Tips

4. Count the total number of characters in "Hello World!" using LEN
Formula:

Code
=LEN("Hello World!")

Result: 12

5. Extract the middle 6 characters from "12345-67890-ABCDE"
Identify the target: The middle block around the dash-separated sections is "67890-".

Start position: Count characters: "12345-" is 6 characters, so start at 7.

Formula:

Code
=MID("12345-67890-ABCDE", 7, 6)

Result: 67890



Q7) Understand how to combine text using CONCAT, TEXTJOIN, and the & operator.

 Use CONCAT to combine "Hello" and "World" with a space in between.

 Combine "Apple", "Banana", and "Cherry" into a single string separated by commas using TEXTJOIN.

 Use the & operator to create the string "2025: Excel Functions" by combining "2025", ": ", and "Excel
Functions".

 Create a comma-separated list from the range A1:A5 using TEXTJOIN
 Combine first names in column A with last names in column B to create full names in column C.



 ans)
 1. Use CONCAT to combine "Hello" and "World" with a space
Formula:


excel

=CONCAT("Hello"," ","World")
Result: Hello World

2. Combine "Apple", "Banana", and "Cherry" into a single string separated by commas using TEXTJOIN

Formula:

excel

=TEXTJOIN(",", TRUE, "Apple", "Banana", "Cherry")

Result:

 Apple,Banana,Cherry

  (TRUE ignores empty cells if any are included.)


  Use the & operator to create "2025: Excel Functions"
Formula:


excel

="2025" & ": " & "Excel Functions"
Result:

 2025: Excel Functions

 4. Create a comma-separated list from the range A1:A5 using TEXTJOIN
Formula:

excel


=TEXTJOIN(",", TRUE, A1:A5)
Result:

If A1:A5 contains Red, Blue, Green, Yellow, Orange

 Red,Blue,Green,Yellow,Orange


 5. 5. Combine first names in column A with last names in column B to create full names in column C

Formula in C2 (then copy down):

excel

=A2 & " " & B2
Result:

 If A2 = Jane and B2 = Smith

  Jane Smith

Q8) Understanding TODAY() and NOW()

 a. What is the difference between TODAY() and NOW() in Excel? Provide an example of when you would use each function.

 b. If cell A1 contains the date 2025-06-10, write a formula using TODAY() to determine how many days are left until that date'

 c. Write an Excel formula using NOW() to display the current date and time in the format MM/DD/YYYY  HH:MM AM/PM. ́

 d. If a cell contains =TODAY(), what will happen when the worksheet is reopened the next day? Explain
 e. You want to store a static date (today’s date) in a cell without it changing every day. What keyboard
 shortcut should you use?


 ans)

 a. Difference between TODAY() and NOW()

TODAY() → Returns the current date only (no time).

Example use: Tracking deadlines or calculating days until an event.

Formula: =TODAY() → 11/14/2025 (if today is Nov 14, 2025).

NOW() → Returns the current date and time.

Example use: Timestamping when a record was updated or logging activity.

Formula: =NOW() → 11/14/2025 7:26 PM.


b. Days left until 2025-06-10 using TODAY()

If cell A1 contains 2025-06-10:


excel

=A1 - TODAY()
This subtracts today’s date from the target date.

Result:

 Number of days remaining until June 10, 2025.

 c. Display current date and time in format MM/DD/YYYY HH:MM AM/PM

Formula:

excel

=TEXT(NOW(), "MM/DD/YYYY hh:mm AM/PM")

Example output: 11/14/2025 07:26 PM.

d. Behavior of =TODAY() when reopening worksheet next day

Explanation: The function recalculates automatically.

If you open the file tomorrow, =TODAY() will update to tomorrow’s date.

It always reflects the current system date.

e. Store a static date (today’s date) without it changing
Keyboard shortcut in Excel:


Code:

Ctrl + ;
This inserts the current date as a fixed value (not a formula).

It will not update when the worksheet is reopened.