# Python Refresher
**Objective:** To refresh your Pythonic brain before jumping into building AI/ML based applications.

**Note:** This notebook is based on the following [Python Cheat Sheet for Beginners](https://www.datacamp.com/cheat-sheet/getting-started-with-python-cheat-sheet) on [datacamp.com](https://app.datacamp.com/).

# Basic object types
We can use the built-in function `type` to access object data type.

In [5]:
# Integer
type(1)

int

In [2]:
# Float
type(1.99)

float

In [10]:
# String
type("Hello my friend")

str

In [9]:
# Boolean
type(True), type(1==1)

(bool, bool)

In [12]:
# Iterables
type([1, 2, 3, 4, 5]), type((1, 2, 3, 4, 5)), type({1, 2, 3, 4, 5}), type({"a":1, "b":2, "c":3})

(list, tuple, set, dict)

# Install and import package
In Jupyter Notebook, you can use the following command within a code cell to install a Python package using `pip`.
```
!pip install PYTHON-PACKAGE-OF-YOUR-CHOICE
```

In [13]:
!pip install pandas


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [14]:
import pandas # Import a package without an alias
import pandas as pd # Import a package with an alias
from pandas import DataFrame # Import an object from a package

In [15]:
# access package's class via import without an alias
pandas.DataFrame([1, 2, 3, 4, 5])

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [17]:
# access package's class via import with an alias
pd.DataFrame([1, 2, 3, 4, 5])

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [19]:
# access package's class directly
DataFrame([1, 2, 3, 4, 5])

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


# Operators

In [None]:
# Make Jupyter Notebook display all outputs
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Arithmetic operators

In [24]:
102 + 37 #Add two numbers with +
102 - 37 # Subtract a number with -
4 * 6 # Multiply two numbers with *
22 / 7 # Divide a number by another with /
22 // 7 # Integer divide a number with //
3 ** 4 # Raise to the power with **
22 % 7 # Returns 1 # Get the remainder  after division with %

139

65

24

3.142857142857143

3

81

1

## Assignment operators

In [28]:
a = 5 # Assign a value to a
a

x = [1, 2, 3, 4, 5]
x
x[0] = -1 # Change the value of an item in a list
x

5

[1, 2, 3, 4, 5]

[-1, 2, 3, 4, 5]

## Numeric comparison operators

In [31]:
3 == 3 # Test for equality with ==
3 != 3 # Test for inequality with !=
3 > 1 # Test greater than with >
3 >= 3 # Test greater than or equal to with >=
3 < 4 # Test less than with <
3 <= 4 # Test less than or equal to with <=
1 < 2 < 3 # Test for chained comparisons

True

False

True

True

True

True

True

## Logical operators

In [41]:
~(2 == 2) # Logical NOT with ~
not 2==2
(1 != 1) & (1 < 1) # Logical AND with &
(1 >= 1) | (1 < 1) # Logical OR with |
(1 != 1) ^ (1 < 1) # Logical XOR with ^

-2

False

False

True

False

# List
A list is an ordered and changeable sequence of elements. It can hold integers, characters, floats, strings, and even objects.

## Selecting list elements

In [45]:
# Define the list 
x = ['a', 'b', 'c', 'd', 'e']

# Select the 0th element in the list
x[0] # 'a'

# Select the last element in the list
x[-1] # 'e'

# Select 1st (inclusive) to 3rd (exclusive)
x[1:3] # ['b', 'c']

# Select the 2nd to the end
x[2:] # ['c', 'd', 'e']

# Select 0th to 3rd (exclusive)
x[:3] # ['a', 'b', 'c']

# Reverse the list
x[::-1]

'a'

'e'

['b', 'c']

['c', 'd', 'e']

['a', 'b', 'c']

['e', 'd', 'c', 'b', 'a']

## Concatenating lists

In [46]:
# Define the list x and y  
x = [1, 3, 6] 
y = [10, 15, 21]

# Concatenate lists with +
x + y # [1, 3, 6, 10, 15, 21]

# Repeat list n times with *
3 * x # [1, 3, 6, 1, 3, 6, 1, 3, 6]

[1, 3, 6, 10, 15, 21]

[1, 3, 6, 1, 3, 6, 1, 3, 6]

# Dictionary
A dictionary stores data values in key-value pairs. That is, unlike lists indexed by position, dictionaries are indexed by their keys, the names of which must be unique.

In [51]:
# Define the dictionary
a = {'a': 1, 'b': 2, 'c': 3}

# Get the keys
a.keys() # dict_keys(['a', 'b', 'c'])

# Get  the values
a.values() # dict_values([1, 2, 3])

# Get a value from a dictionary by specifying the key
a['a'] # 1

# Get a value using get method
a.get("a")

a.get("d")

a.get("d", "default value")

dict_keys(['a', 'b', 'c'])

dict_values([1, 2, 3])

1

1

'default value'

# String

## Combining and splitting strings

In [52]:
# Concatenate strings with +
"Data" + "Framed" # 'DataFramed'

# Repeat strings with *
3 * "data " # 'data data data '

# Split a string on a delimiter
"beekeepers".split("e") # ['b', '', 'k', '', 'p', 'rs']

'DataFramed'

'data data data '

['b', '', 'k', '', 'p', 'rs']

## Mutate strings

In [53]:
# Create a string named str
str = "Jack and Jill"

# Convert a string to uppercase
str.upper() # 'JACK AND JILL'

# Convert a string to lowercase
str.lower() # 'jack and jill'

# Convert a string to title case
str.title() # 'Jack And Jill' 

# Replaces matches of a substring with another
str.replace("J", "P") # 'Pack and Pill'

'JACK AND JILL'

'jack and jill'

'Jack And Jill'

'Pack and Pill'

# Pandas DataFrame
pandas is a fast and powerful package for data analysis and manipulation in python. To import the package, you can use import pandas as pd.  A pandas DataFrame is a structure that contains two-dimensional data stored as rows and columns. A pandas series is a structure that contains one-dimensional data.

## Creating DataFrames

In [55]:
# Create a dataframe from a dictionary
pd.DataFrame({
    'a': [1, 2, 3],
    'b': [4, 4, 6],
    'c': ['x', 'x', 'y']
})

# Create a dataframe from a list of dictionaries
pd.DataFrame([
    {'a': 1, 'b': 4, 'c': 'x'},
    {'a': 1, 'b': 4, 'c': 'x'},
    {'a': 3, 'b': 6, 'c': 'y'}
])

Unnamed: 0,a,b,c
0,1,4,x
1,2,4,x
2,3,6,y


Unnamed: 0,a,b,c
0,1,4,x
1,1,4,x
2,3,6,y


## Selecting DataFrame Elements
Here are the different ways to select a row, column or element from a dataframe.

In [58]:
df = pd.DataFrame({
    'col': [1, 2, 3, 4, 5],
    'col1': [6, 7, 8, 9, 10],
    'col2': ['x', 'x', 'y', 'y', 'z']
})

# Select the 4th row
df.iloc[3]

# Select one column by name
df['col']

# Select multiple columns by names
df[['col1', 'col2']]

# Select 3rd column
df.iloc[:, 2]

# Select the element in the 4th row, 3rd column
df.iloc[3, 2]

col     4
col1    9
col2    y
Name: 3, dtype: object

0    1
1    2
2    3
3    4
4    5
Name: col, dtype: int64

Unnamed: 0,col1,col2
0,6,x
1,7,x
2,8,y
3,9,y
4,10,z


0    x
1    x
2    y
3    y
4    z
Name: col2, dtype: object

'y'

## Manipulating DataFrames

In [59]:
# Concatenate DataFrames vertically
pd.concat([df, df])

# Concatenate DataFrames horizontally
pd.concat([df,df],axis="columns")

Unnamed: 0,col,col1,col2
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z


Unnamed: 0,col,col1,col2,col.1,col1.1,col2.1
0,1,6,x,1,6,x
1,2,7,x,2,7,x
2,3,8,y,3,8,y
3,4,9,y,4,9,y
4,5,10,z,5,10,z


In [60]:
# Get rows matching a condition
df.query('col2 == "x"')

Unnamed: 0,col,col1,col2
0,1,6,x
1,2,7,x


In [62]:
# Drop columns by name
df.drop(columns=['col'])

Unnamed: 0,col1,col2
0,6,x
1,7,x
2,8,y
3,9,y
4,10,z


In [63]:

# Rename columns
df.rename(columns={"col": "col0"})

Unnamed: 0,col0,col1,col2
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z


In [65]:
# Add a new column
df.assign(col3=df['col'] + df['col1'])

Unnamed: 0,col,col1,col2,col3
0,1,6,x,7
1,2,7,x,9
2,3,8,y,11
3,4,9,y,13
4,5,10,z,15


In [67]:
# Calculate the mean of each column
df[["col", "col1"]].mean()

col     3.0
col1    8.0
dtype: float64

In [68]:
# Get unique rows
_df = pd.concat([df, df])
_df
_df.drop_duplicates()

Unnamed: 0,col,col1,col2
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z


Unnamed: 0,col,col1,col2
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z


In [69]:
# Sort by values in a column in ascending order
df.sort_values(by='col2')

Unnamed: 0,col,col1,col2
0,1,6,x
1,2,7,x
2,3,8,y
3,4,9,y
4,5,10,z


In [70]:
# Get the rows with the n largest values of a column
df.nlargest(3, 'col')

Unnamed: 0,col,col1,col2
4,5,10,z
3,4,9,y
2,3,8,y
