# CAP 379

# Python Fundamentals

- `Introduction to inbuilt datasets`  
- `Exploring datasets using Pandas`  
- `Basic operations on datasets` 


In [1]:
print("Hello, World! Welcome to Python for Data Analysis!")

Hello, World! Welcome to Python for Data Analysis!


In [2]:
# Basic arithmetic operations
print(10 + 5)  # Addition
print(10 - 5)  # Subtraction
print(10 * 5)  # Multiplication
print(10 / 5)  # Division


15
5
50
2.0


# Variables and Data Types

In Python, variables are used to store data values. Unlike some other languages, Python does not require you to declare the variable type explicitly – it is inferred automatically based on the assigned value.

In [23]:
name = "Alice"  # String
age = 25        # Integer
height = 5.6    # Float
is_student = True  # Boolean

print(f"Name: {name}, Age: {age}, Height: {height}, Student: {is_student}")

Name: Alice, Age: 25, Height: 5.6, Student: True


In [30]:
# Printing values
print("Name:", name)
print("Age:", age)
print("Height:", height)
print("Is a Student:", is_student)


Name: Alice
Age: 25
Height: 5.6
Is a Student: True


In [33]:
# Checking Data Type of a Variable
print(type(name))    # Output: <class 'str'>
print(type(age))     # Output: <class 'int'>
print(type(height))  # Output: <class 'float'>
print(type(is_student)) # Output: <class 'bool'>

<class 'str'>
<class 'int'>
<class 'float'>
<class 'bool'>


In [34]:
type(is_student)

bool

In [37]:
# Variable Naming Rules
my_name = "Alice"
_age = 25
total_score = 90
firstName = "Bob"


In [36]:
# Assigning Multiple Variables in One Line
x, y, z = 10, 20, 30
print(x, y, z)  # Output: 10 20 30


10 20 30


In [39]:
# Example: Swapping Variables Without Using a Temporary Variable
a, b = 5, 10
a, b = b, a
print(a, b)


10 5


In [40]:
# Type Conversion (Casting)
# Converting integer to string
age = 25
age_str = str(age)  # Now "25" is a string
print(type(age_str))  # Output: <class 'str'>

# Converting float to integer
pi = 3.14159
pi_int = int(pi)  # Converts to 3 (truncates decimal part)
print(pi_int)

# Converting string to integer
score = "90"
score_int = int(score)  # Converts string "90" to integer 90
print(score_int + 10)  # Output: 100


<class 'str'>
3
100


In [41]:
# Special Data Types – NoneType
x = None
print(type(x))  # Output: <class 'NoneType'>


<class 'NoneType'>


In [42]:
# String Manipulations (Advanced)
text = "Python is awesome!"
print(len(text))  # Length of string
print(text.upper())  # Convert to uppercase
print(text.lower())  # Convert to lowercase
print(text.replace("awesome", "powerful"))  # Replace words


18
PYTHON IS AWESOME!
python is awesome!
Python is powerful!


In [43]:
# String Indexing and Slicing
word = "Hello"
print(word[0])    # Output: H  (First character)
print(word[-1])   # Output: o  (Last character)
print(word[1:4])  # Output: ell (Substring from index 1 to 3)


H
o
ell


In [44]:
# Boolean Operations
is_raining = True
is_sunny = False

# Boolean expressions
print(is_raining and is_sunny)  # False
print(is_raining or is_sunny)   # True
print(not is_raining)           # False (negates True)


False
True
False


## **Lists vs Tuples vs Sets vs Dictionaries (Comparison)**

### **Comparison Table**
| Feature         | List (`[]`)       | Tuple (`()`)      | Set (`{}`)       | Dictionary (`{}`)  |
|---------------|-----------------|----------------|----------------|------------------|
| **Mutable?**  | ✅ Yes         | ❌ No         | ✅ Yes (Add/Remove) | ✅ Yes (Key-Value) |
| **Ordered?**  | ✅ Yes         | ✅ Yes        | ❌ No (Unordered) | ✅ Yes |
| **Duplicates?** | ✅ Yes        | ✅ Yes       | ❌ No (Unique values) | ❌ No (Unique keys) |
| **Example**   | `fruits = ["Apple", "Banana"]` | `coordinates = (10, 20)` | `numbers = {1, 2, 3}` | `student = {"Name": "Alice", "Age": 25}` |

---

### **Lists (Ordered, Mutable, Allows Duplicates)**

In [50]:
fruits = ["Apple", "Banana", "Cherry", "Apple"]  # Duplicates allowed

In [47]:
print(fruits[0])  # Accessing elements

Apple


In [51]:
fruits.append("Mango")  # Adding elements
fruits.remove("Banana")  # Removing elements
print(fruits)

['Apple', 'Cherry', 'Apple', 'Mango']


### **Tuples (Ordered, Immutable, Allows Duplicates)**

In [52]:
coordinates = (10, 20, 30)
print(coordinates[1])  # Accessing elements

20


In [54]:
# Attempting to modify a tuple (This will cause an error)
coordinates[0] = 100  #  TypeError: 'tuple' object does not support item assignment

TypeError: 'tuple' object does not support item assignment

### **Sets (Unordered, Mutable, No Duplicates)**

In [55]:
unique_numbers = {1, 2, 3, 3, 4}  # Duplicates automatically removed

In [56]:
print(unique_numbers)  # Output: {1, 2, 3, 4}

{1, 2, 3, 4}


In [57]:
unique_numbers.add(5)  # Adding elements
unique_numbers.remove(2)  # Removing elements

In [58]:
print(unique_numbers)

{1, 3, 4, 5}


### **Dictionaries (Key-Value Pairs, Ordered, Mutable, No Duplicate Keys)**

In [59]:
student = {"Name": "Alice", "Age": 25, "Grade": "A"}
print(student["Name"])  # Accessing values using keys

Alice


In [60]:
# Adding a new key-value pair
student["Score"] = 90
print(student)

{'Name': 'Alice', 'Age': 25, 'Grade': 'A', 'Score': 90}


In [61]:
# Removing a key
del student["Age"]
print(student)

{'Name': 'Alice', 'Grade': 'A', 'Score': 90}


# Working with Data using Pandas

## What is Pandas?

**Pandas** is a Python library used for handling, manipulating, and analyzing structured data.  
It provides two main data structures:

- **Series**: 1D data, like a single column.
- **DataFrame**: 2D tabular data, like an Excel sheet.

## Installing and Importing Pandas

Before using Pandas, install it (if not already installed).
## Install Pandas (only needed once)

In [None]:
!pip install pandas

In [None]:
# Now, import Pandas into your script:
import pandas as pd  

### **Method 1: Creating DataFrames**

In [None]:
# Creating a dataset using a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 89],
    'Score': [85, 90, 78, 88]
}

# Converting the dictionary into a DataFrame
df = pd.DataFrame(data)

# Displaying the DataFrame
print(df)

      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     78
3    David   89     88


In [22]:
df

Unnamed: 0,Name,Age,Score
0,Alice,25,85
1,Bob,30,90
2,Charlie,35,78
3,David,89,88


### **Method 2: Creating a DataFrame from a List of Lists**

In [63]:
# Data using a list of lists
data = [
    ["Alice", 25, 85],
    ["Bob", 30, 90],
    ["Charlie", 35, 78],
    ["David", 40, 88]
]

# Creating a DataFrame and specifying column names
df = pd.DataFrame(data, columns=["Name", "Age", "Score"])

# Displaying the DataFrame
print(df)


      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     78
3    David   40     88


### **Reading Data from Files**  

Pandas supports multiple file formats:  

- CSV files  
- Excel files  
- JSON files  

### **Reading from a CSV File**  
`df = pd.read_csv("data.csv")`  

### **Reading from an Excel File**  
`df = pd.read_excel("data.xlsx", sheet_name="Sheet1")`  

### **Reading from a JSON File**  
`df = pd.read_json("data.json")`  


# Exploring Data in Pandas
After loading data, we can explore its structure.

### **Checking the First Few Rows**


In [28]:
df.head()  # Show the first 5 rows

Unnamed: 0,Name,Age,Score,Result
0,Alice,25,85,Pass
1,Bob,30,90,Pass
2,Charlie,35,78,Fail
3,David,89,88,Pass


In [29]:
df.tail()  # Show the last 5 rows

Unnamed: 0,Name,Age,Score,Result
0,Alice,25,85,Pass
1,Bob,30,90,Pass
2,Charlie,35,78,Fail
3,David,89,88,Pass


### **Getting Information about the DataFrame**

In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    4 non-null      object
 1   Age     4 non-null      int64 
 2   Score   4 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 224.0+ bytes


In [26]:
df.describe()  # Summary of numerical columns

Unnamed: 0,Age,Score
count,4.0,4.0
mean,44.75,85.25
std,29.781146,5.251984
min,25.0,78.0
25%,28.75,83.25
50%,32.5,86.5
75%,48.5,88.5
max,89.0,90.0


In [27]:
df.isnull().sum()  # Checks for missing values in each column

Name      0
Age       0
Score     0
Result    0
dtype: int64

### **Selecting and Filtering Data**

In [None]:
# Selecting Specific Columns
print(df["Name"])  # Selecting one column
print(df[["Name", "Score"]])  # Selecting multiple columns


Selecting Rows using `loc[]` and `iloc[]`

`loc[]` → Selects rows by label (index or column name)

`iloc[]` → Selects rows by position (row number)

In [None]:
print(df.loc[0])  # First row (label-based)
print(df.iloc[1]) # Second row (position-based)


In [None]:
# Filtering Rows Based on Conditions
# Selecting rows where Score > 80
high_scorers = df[df["Score"] > 80]
print(high_scorers)


### **Modifying DataFrames**

In [None]:
df["Pass"] = df["Score"] > 80  # Creates a Boolean column
print(df)


In [None]:
df["Score"] = df["Score"] + 5  # Adds 5 to all scores
print(df)


In [None]:
df.drop(columns=["Pass"], inplace=True) # Removing a Column
print(df)


In [None]:
df.rename(columns={"Score": "Marks"}, inplace=True) # Renaming Columns
print(df)


### **Handling Missing Data**

In [None]:
print(df.isnull().sum())  # Checks missing values per column


In [None]:
df.fillna(value=0, inplace=True)  # Replaces NaN with 0


In [None]:
df.dropna(inplace=True)


### **Sorting and Grouping Data**

In [None]:
df.sort_values(by="Score", ascending=False, inplace=True)  # Sorting in descending order
print(df)


In [None]:
# Grouping by Age and getting the mean Score
grouped = df.groupby("Age")["Score"].mean()
print(grouped)


### **Applying Functions to Columns**

In [65]:
# Define a function
def grade(score):
    if score >= 85:
        return "A"
    elif score >= 75:
        return "B"
    else:
        return "C"

# Apply function to Score column
df["Grade"] = df["Score"].apply(grade)
print(df)


      Name  Age  Score Grade
0    Alice   25     85     A
1      Bob   30     90     A
2  Charlie   35     78     B
3    David   40     88     A


# Practice Exercise
## **Task**  

- Create a Pandas DataFrame with columns **"Student"**, **"Math Score"**, **"English Score"**, **"Science Score"**.  
- Add a **"Total Score"** column (sum of all scores).  
- Sort students by **"Total Score"** in descending order.  
- Filter students who scored more than **80** in **English**.  
- Create a **"Grade"** column using a function.  


- The output should be this

  ![Pandas DataFrame Example Output](images/practice_1.png)
