## üß≠ Overview ‚Äî Python `print()` Function

The `print()` function is a core Python tool used to **display data or messages** to the screen.  
In data analysis, it helps confirm variable values, inspect data frames, and verify that transformations have occurred correctly.  

This lecture covers several use cases:
- Printing simple variables and strings  
- Printing multiple variables in one call  
- Printing lists and tuples  
- Printing arithmetic expressions  
- Printing **pandas** data subsets to verify filtering logic  


In [1]:
# import relevant libraries
import pandas as pd

In [2]:
# initialize greeting
greeting = 'Hi, nice to meet you!'

In [3]:
print(greeting)

Hi, nice to meet you!


In [4]:
# initialize greeting1
greeting1 = "Hey,"

# initialize name1
name1 = 'Christine'

# initialize greeting2
greeting2 = 'Hello,'

# initialize name2
name2 = 'Lav'

# initialize punctuation
punctuation = '!'

In [5]:
print(greeting1, name1, punctuation) # notice the extra space before the punctuation

Hey, Christine !


In [6]:
print(f"{greeting1} {name1}{punctuation}") # using an f"string is clean and readable

Hey, Christine!


In [7]:
print(greeting1 , name1, punctuation, sep="")  # notice no spaces between the words with sep=""


Hey,Christine!


In [8]:
greeting1 = "Hey, "          #  add a trailing space INSIDE the quotes
print(greeting1, name1, punctuation, sep="")  # By default, print() uses sep=" ". You can change it with sep=""


Hey, Christine!


In [9]:
# initialize odd_nums as a list containing some odd integers
odd_nums = [-5, -3, -1, 1, 3, 5]

In [10]:
print(odd_nums)

[-5, -3, -1, 1, 3, 5]


In [11]:
# initialize a point on a line
(x1, y1) = (5, 6)

# initialize another point on the same line
(x2, y2) = (-7, -8)

In [12]:
# compute slope of line using the two points & display with label
print('slope:', (y2 - y1) / (x2 - x1))

slope: 1.1666666666666667


In [30]:
slope = (y2 - y1) / (x2 - x1)
print(f"{slope:.2f}") # f-string to evaluate variable {slope:.2f} as a float with 2 decimal places
print(f"slope: {slope:.2f}")



1.17
slope: 1.17


## üß≠ Overview ‚Äî Using `print()` with a Pandas DataFrame

In this section of the *Python `print()` Function* lecture, the instructor demonstrates how `print()` is used with **pandas** DataFrames. This is common when previewing, filtering, and verifying data subsets during analysis.

**We will:**
- Load the `grades.csv` file into a pandas DataFrame.
- Filter students who scored **at least 70%** and **below 70%**.
- Display both subsets using `print()` with clear labels.
- Explain each step with detailed inline notes in the code cell that follows.


In [14]:
# ============================================
# üß© Handling Missing Values Before Filtering
# File: grades.csv
# ============================================

import pandas as pd

# Step 1 ‚Äì Load the dataset
grades = pd.read_csv("grades.csv")

In [15]:
# Step 2 ‚Äì Preview the first few rows of the dataset
# This helps confirm that the file loaded correctly and shows what columns exist.
print("Preview of the dataset:")
print(grades.head())   # Displays the top 5 rows in the DataFrame

Preview of the dataset:
   exam  student_id  grade
0     1           1   86.0
1     1           2   65.0
2     1           3   70.0
3     1           4   98.0
4     1           5   89.0


### üß© Concept: `.iloc[:10]` in pandas

- `.iloc` stands for **integer-location based indexer**.  
- It selects rows or columns **by position number** instead of label names.  
- `[:10]` means ‚Äútake rows starting at 0 and stop before 10‚Äù ‚Äî the first 10 rows.  
- Example: `grades.iloc[:10]` displays the first 10 rows of the DataFrame.
- grades.iloc[0]        # returns the first row
- grades.iloc[5:15]     # returns rows 5 through 14
- grades.iloc[:, :2]    # returns all rows, but only the first two columns



In [16]:
# Step 2 ‚Äì Display the first 10 rows to review structure and spot nulls
print("First 10 rows of grades data:")
print(grades.iloc[:10])  # integer-location based indexing, e.g., ‚ÄúStart at row 0 and stop before row 10.‚Äù 
# This returns the first 10 rows (0‚Äì9).

First 10 rows of grades data:
   exam  student_id  grade
0     1           1   86.0
1     1           2   65.0
2     1           3   70.0
3     1           4   98.0
4     1           5   89.0
5     1           6    NaN
6     1           7   75.0
7     1           8   56.0
8     1           9   90.0
9     1          10   81.0


In [17]:
# Step 3 ‚Äì Check for missing values in the 'grade' column
# .isnull() returns True for missing entries, False otherwise.
# .values.any() checks whether any value in that column is True (i.e., missing).

if grades['grade'].isnull().values.any():
    print("There are missing values in the 'grade' column.")
else:
    print("There are no missing values in the 'grade' column.")

There are missing values in the 'grade' column.


In [18]:
# Step 3 ‚Äì Check for missing values in the 'grade' column
# check for missing values in grade column

if grades['grade'].isnull().values.any():
    print('There are missing values')
else:
    print('There are no missing values')

There are missing values


In [19]:
# Step 4 ‚Äì Fill missing grades with zeros (or another value if specified)
# .fillna(0) replaces all NaN entries with 0.
grades['grade'] = grades['grade'].fillna(0)


In [20]:
# Step 5 ‚Äì Verify the update
print("After filling missing values:")
print(grades.iloc[:10])  # integer-location based indexing, e.g., ‚ÄúStart at row 0 and stop before row 10.‚Äù 
# This returns the first 10 rows (0‚Äì9).

After filling missing values:
   exam  student_id  grade
0     1           1   86.0
1     1           2   65.0
2     1           3   70.0
3     1           4   98.0
4     1           5   89.0
5     1           6    0.0
6     1           7   75.0
7     1           8   56.0
8     1           9   90.0
9     1          10   81.0


In [21]:
# Step 5 ‚Äì Verify the update
# display first ten rows of grades
grades.iloc[:10]

Unnamed: 0,exam,student_id,grade
0,1,1,86.0
1,1,2,65.0
2,1,3,70.0
3,1,4,98.0
4,1,5,89.0
5,1,6,0.0
6,1,7,75.0
7,1,8,56.0
8,1,9,90.0
9,1,10,81.0


### üß© Concept: Using `.loc[]` with Conditions and `.values`

- `.loc[]` is a pandas **label-based selector** ‚Äî it filters rows by condition.  
- Inside `.loc[]`, `grades['grade'] >= 70` creates a Boolean mask (True/False for each row).  
- Adding `['grade']` selects only that column from the filtered rows.  
- Adding `.values` converts the filtered Series into a **NumPy array**, which prints as a simple list of numbers.  
- Example:  
  ```python
  grades.loc[grades['grade'] >= 70]['grade'].values


In [22]:
grades_atleast_70 = grades.loc[grades['grade'] >= 70]['grade'].values
grades_below_70 = grades.loc[grades['grade'] < 70]['grade'].values

In [23]:
print('grades atleast 70%:', grades_atleast_70)

print('grades below 70%:', grades_below_70)

grades atleast 70%: [ 86.  70.  98.  89.  75.  90.  81.  79.  78.  75.  80.  87.  82.  95.
  96.  78.  80.  87.  89.  90. 100.  72.  73.  75.  80.  81.  82.  83.
  84.  85.  86.  87.  88.  90.  91.  92.  93.  94.  95.  96.  97.  98.]
grades below 70%: [65.  0. 56. 60.  0.  0.  0.  0.]


In [24]:
# Step 4 ‚Äì Create two filtered DataFrames
# Keep only rows with 'grade' >= 70 and another with 'grade' < 70
grades_atleast_70 = grades[grades["grade"] >= 70]
grades_below_70 = grades[grades["grade"] < 70]

# Step 5 ‚Äì Print both subsets with descriptive labels
print("\nüéì Students with grades ‚â• 70:")
print(grades_atleast_70)

print("\nüìâ Students with grades below 70:")
print(grades_below_70)

# Step 6 ‚Äì Optional: verify counts for context
print("\n‚úÖ Summary Check:")
print(f"Number of students scoring ‚â•70: {len(grades_atleast_70)}")
print(f"Number of students scoring <70: {len(grades_below_70)}")



üéì Students with grades ‚â• 70:
    exam  student_id  grade
0      1           1   86.0
2      1           3   70.0
3      1           4   98.0
4      1           5   89.0
6      1           7   75.0
8      1           9   90.0
9      1          10   81.0
10     2           1   79.0
12     2           3   78.0
13     2           4   75.0
15     2           6   80.0
16     2           7   87.0
17     2           8   82.0
18     2           9   95.0
19     2          10   96.0
20     3           1   78.0
21     3           2   80.0
22     3           3   87.0
24     3           5   89.0
25     3           6   90.0
26     3           7  100.0
27     3           8   72.0
28     3           9   73.0
29     3          10   75.0
31     4           2   80.0
32     4           3   81.0
33     4           4   82.0
34     4           5   83.0
35     4           6   84.0
36     4           7   85.0
37     4           8   86.0
38     4           9   87.0
39     4          10   88.0
40     5     

## üß† Key Takeaways

- **`df`** is short for *DataFrame*, the core pandas data structure similar to an Excel table.  
- **`df[df["column"] >= value]`** filters rows by a condition ‚Äî this method is called **Boolean indexing**.  
- **`.isnull()`** checks for missing values; **`.fillna()`** replaces them.  
- It‚Äôs best practice to address nulls before filtering or performing calculations.  
- Always preview your dataset (`head()` or `iloc`) before and after cleaning to verify results.  
