# Data Science and Visualization Course
## Module 2: Data Handling with Pandas (Part 1)

### Introducing Pandas Series and DataFrames
Pandas is a powerful library in Python used for data manipulation and analysis. A Series is a 1-dimensional labeled data structure. A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.

In [None]:
import pandas as pd

names = ['Ali', 'Alina', 'Aleem']

# How to create a series
series = pd.Series(names)
print(series, "\n")

data = {
    'Name': ['Ali', 'Alina', 'Aleem'],
    'Age': [25, 30, 35],
    'City': ['Karachi', 'Lahore', 'Pindi']
}

# How to create a DataFrame
df = pd.DataFrame(data)
print(df)

**Exercise:** Create a DataFrame with data about your favorite movies. Include columns for the movie title, director, and release year, then print the DataFrame.

In [None]:
# Your code here
# Create a dictionary with data about your favorite movies
# Use the dictionary to create a DataFrame
# Print the DataFrame

### Loading Data from CSV Files
CSV (Comma-Separated Values) files are a common format for storing data. Pandas makes it easy to load data from a CSV file into a DataFrame.

In [42]:
# Example of loading data from a CSV file
song_df = pd.read_csv('songs.csv')

# The head() function displays the first rows of a dataset
song_df.head()

Unnamed: 0,Artist Name,Song Name,Genre
0,Billie Eilish,Bad Guy,Pop
1,Taylor Swift,Cardigan,Indie
2,Drake,In My Feelings,
3,Post Malone,,Hip Hop
4,Bruno Mars,Uptown Funk,Funk


### Cleaning and Transforming Data
Data often needs to be cleaned and transformed before analysis. This includes dealing with missing values and formatting data.

#### Dealing with Missing Values
Missing values are common in real-world data. Pandas provides several ways to handle missing values, such as filling them with a specific value or dropping rows/columns with missing values.

In [None]:
# Fill missing values with a specific value
song_df_filled = song_df.fillna({'Song Name': 'Unknown Song'})  
print("\nDataFrame with missing values filled:")
print(song_df_filled.head())

# Drop rows with missing values
song_df_dropped = song_df.dropna()
print('\nDataFrame with rows with missing values dropped:')
print(song_df_dropped.head())

**Exercise:** Create a DataFrame with some missing values. Fill the missing values with appropriate values and print the DataFrame. Then, create another DataFrame by dropping rows with missing values and print it.

In [None]:
# This is a DataFrame with missing values.
data = {
    'Name': ['Ali', 'Alina', 'Aleem'],
    'Age': [25, 30, None],
    'City': [None, 'Lahore', 'Pindi']
}

df = pd.DataFrame(data)

# Drop rows with missing values and save it in fixed_df
# Print both DataFrames

#Your code here

#### Formatting Data
Data often needs to be formatted for consistency. This can include renaming columns, and more. 

In [None]:
# Example of formatting data

# Rename columns in the song DataFrame from 'Song Name' to 'Song' and from 'Artist Name' to 'Artist'
song_df = song_df.rename(columns={'Song Name': 'Song', 'Artist Name': 'Artist'})
print('\nDataFrame with renamed columns:')
song_df.head()

#### Selecting and Adding Data

Selecting and adding data is necessary to work with individual rows and columms from the DataFrame. 

In [47]:
# Example of selecting a column from the DataFrame
column = song_df['Artist']

# Print the column
print("Column \n", column.head(), "\n")

# Example of selecting a row from the DataFrame by index
row = song_df.iloc[0]
print("Row \n", row, "\n")

# Example of selecting multiple rows from the DataFrame by index
rows = song_df.iloc[0:3]
print("Rows \n", rows, "\n")

# Example of adding a row to the DataFrame
new_row = {"Artist": "Abdul Hannan", "Song": "Iraaday", "Genre": "Pop"}

# Convert the dictionary to a DataFrame
new_row_df = pd.DataFrame([new_row])

# Concat the new row to the DataFrame
song_df = pd.concat([song_df, new_row_df], ignore_index=True)

song_df.tail()

Column 
 0    Billie Eilish
1     Taylor Swift
2            Drake
3      Post Malone
4       Bruno Mars
Name: Artist, dtype: object 

Row 
 Artist    Billie Eilish
Song            Bad Guy
Genre               Pop
Name: 0, dtype: object 

Rows 
           Artist            Song  Genre
0  Billie Eilish         Bad Guy    Pop
1   Taylor Swift        Cardigan  Indie
2          Drake  In My Feelings    NaN 



Unnamed: 0,Artist,Song,Genre
56,Bruno Mars,That's What I Like,Pop
57,Kendrick Lamar,Money Trees,Hip Hop
58,Abdul Hannan,Iraaday,Pop
59,Abdul Hannan,Iraaday,Pop
60,Abdul Hannan,Iraaday,Pop


## Bonus Question

Write a Python script to select the "Song Name" column from the DataFrame and iterate over it in a for loop. Check how many song names contain the word "Money" and output the count of these songs.

### Steps:
1. Select the "Song" column from the DataFrame.
2. Remove missing values from the "Song" column by using the `dropna` function.
3. Use a for loop to iterate over each song name in the selected column.
4. Using an if statement check if the word "Money" is present in the song name.
5. Count the number of song names containing the word "Money".
6. Output the count.

In [None]:
# Hints: Ask ChatGPT on how to iterate over values in a DataFrame's column or how to use an if statement
# to check if a substring (Money) is present in a string (Name of the Song)

# Your code here