## Data Filtering and Selection

In [1]:
#pip install pandas

In [11]:
# Aight, let's get this party started with some heavy hitters!

# First up, we're summoning the 'np' library into the mix. This bad boy is like the Swiss Army knife of numerical computing!
# It's like having a superhero by our side, ready to crunch numbers and tackle any math challenge.

import numpy as np  # Get ready to crunch numbers like a boss!

# Next on the lineup, we're bringing in the 'pd' library. This powerhouse is the backbone of our data adventures!
# It's like having a master chef in the kitchen, ready to whip up delicious datasets and serve them with style.

import pandas as pd  # Get ready to dive deep into the world of data with pandas!

# And last but not least, we're summoning the 'DataFrame' class from the 'pandas' library.
# It's like having the VIP pass to the data party, giving us access to all the exclusive features and functionalities.

from pandas import DataFrame  # Get ready to create some epic DataFrames!


In [3]:
# Now, let's use the 'DataFrame' class to create our DataFrame.
# We're passing the sequence of numbers generated by 'np.arange' to the DataFrame constructor.
# We're also specifying the shape of the DataFrame as 10 rows and 3 columns using the 'reshape' function.
# It's like sculpting our data masterpiece, molding it into the perfect shape.

numbers_df = DataFrame(
    np.arange(0, 90, 3).reshape(10, 3),  # Generating a sequence of numbers from 0 to 90 (exclusive) with a step size of 3, reshaping it into a 10x3 array
    index=['row 1', 'row 2', 'row 3', 'row 4', 'row 5', 'row 6', 'row 7', 'row 8', 'row 9', 'row 10'],  # Assigning index labels for each row
    columns=['column 1', 'column 2', 'column 3']  # Assigning column labels for each column
)

numbers_df  # Returning the DataFrame

# Get ready to feast your eyes on the data masterpiece we've just created!


Unnamed: 0,column 1,column 2,column 3
row 1,0,3,6
row 2,9,12,15
row 3,18,21,24
row 4,27,30,33
row 5,36,39,42
row 6,45,48,51
row 7,54,57,60
row 8,63,66,69
row 9,72,75,78
row 10,81,84,87


In [4]:
# Now, let's break down this code snippet and understand what each part does:

# numbers_df: This is our DataFrame object containing the numerical data.

# .iloc: This is a method of the DataFrame object used for integer-location based indexing.
# It allows us to access specific rows and columns using integer indices.

# [0, 1]: Within the .iloc method, we're specifying the row and column indices we want to access.
# In this case, 0 refers to the first row (because indexing starts from 0) and 1 refers to the second column.

# So, numbers_df.iloc[0, 1] retrieves the value at the intersection of the first row and second column in the DataFrame numbers_df.

# In simpler terms, it's like pinpointing a specific cell in a table (DataFrame) by its row and column numbers,
# and fetching the value stored in that cell.

# Let's execute the code and see what value it retrieves.

numbers_df.iloc[0, 1]


3

In [5]:
# Alright, let's break it down step by step:

# We're assigning the value 20 to the cell located at the intersection of the first row and second column in the DataFrame 'numbers_df'.
# The .iloc method is used for integer-location based indexing, and [0, 1] specifies the row and column indices.
# So, we're modifying the value at row 1, column 2 (because indexing starts from 0).

numbers_df.iloc[0, 1] = 20

# Now, let's print the updated DataFrame 'numbers_df' to see the changes made.
# We're displaying the DataFrame after the value assignment to visualize the updated data.

numbers_df


Unnamed: 0,column 1,column 2,column 3
row 1,0,20,6
row 2,9,12,15
row 3,18,21,24
row 4,27,30,33
row 5,36,39,42
row 6,45,48,51
row 7,54,57,60
row 8,63,66,69
row 9,72,75,78
row 10,81,84,87


In [12]:
# Yo, check it out! We're about to flex some data skills:

# We're using fancy indexing with the .iloc method to select specific rows and columns from 'numbers_df'.
# This line of code is like cherry-pickin' the freshest data points from the block.
# We're rollin' with rows 1, 3, and 5, and columns 0 and 2.

numbers_df.iloc[[1, 3, 5], [0, 2]]


Unnamed: 0,column 1,column 3
row 2,9,15
row 4,27,0
row 6,0,0


#### Comparison operators (> < = <= => == !=) and Masking.

In [7]:
# Alright, let's break it down street style:

# We're cookin' up a boolean mask by layin' down a condition on the DataFrame 'numbers_df'.
# In this case, we're checkin' if each value in 'numbers_df' is straight-up greater than 30.
# The result is a DataFrame of boolean values, where 'True' means that the corresponding value in 'numbers_df' is higher than 30,
# and 'False' means it ain't.

mask = numbers_df > 30

# Now, let's peep the boolean mask to see what's poppin'.
# The mask shows us which values in 'numbers_df' keep it real by meetin' the condition (higher than 30) and which ones ain't cuttin' it.

mask


Unnamed: 0,column 1,column 2,column 3
row 1,False,False,False
row 2,False,False,False
row 3,False,False,False
row 4,False,False,True
row 5,True,True,True
row 6,True,True,True
row 7,True,True,True
row 8,True,True,True
row 9,True,True,True
row 10,True,True,True


In [8]:
# Alright, check it out fam, we're about to drop some mad skills:

# We're usin' the boolean mask we cooked up earlier to filter out the values in 'numbers_df' that match the condition.
# This line of code is like siftin' through the data hood and pickin' out only the values that meet the criteria.
# We're only keepin' it real with the values that passed the test, where the corresponding cell in 'mask' is 'True'.

numbers_df[mask]


Unnamed: 0,column 1,column 2,column 3
row 1,,,
row 2,,,
row 3,,,
row 4,,,33.0
row 5,36.0,39.0,42.0
row 6,45.0,48.0,51.0
row 7,54.0,57.0,60.0
row 8,63.0,66.0,69.0
row 9,72.0,75.0,78.0
row 10,81.0,84.0,87.0


In [13]:
# Yo, check it fam! We're about to flip the script on these numbers:

# We're using a conditional statement to create a boolean mask where values in 'numbers_df' greater than 30 are flagged.
# Then, we're setting all these high-rolling numbers to 0, like wiping the slate clean.
# It's like saying, "Nah, we ain't playin' that game no more!"

numbers_df[numbers_df > 30] = 0

# Now, let's peep the updated 'numbers_df' after droppin' those high flyers down to zero.
# It's a whole new vibe, like starting fresh on a brand new day!

numbers_df


Unnamed: 0,column 1,column 2,column 3
row 1,0,20,6
row 2,9,12,15
row 3,18,21,24
row 4,27,30,0
row 5,0,0,0
row 6,0,0,0
row 7,0,0,0
row 8,0,0,0
row 9,0,0,0
row 10,0,0,0


In [14]:
# Yo, peep this! We're about to slice and dice some data:

# We're using the .iloc method to select a range of rows from index 2 to 5 (exclusive) and columns from index 1 to 2 (exclusive).
# It's like carvin' out a fresh slice of data pie, grabbin' only the juiciest bits from rows 3 to 6 and columns 2 to 3.

numbers_df.iloc[2:6, 1:3]


Unnamed: 0,column 2,column 3
row 3,21,24
row 4,30,0
row 5,0,0
row 6,0,0
