# Assessments: Python

### Results:
Take 1: 67th percentile (intermediate)

Take 2: 87th percentile (advanced)

Take 3: 97th percentile (advanced)


### Take 1

#### Conditionals & Control Structures
- `enumerate(list,startValue)` iterates and indexes over elements in a collection
  - 1 Variable in for loop returns tuple `(index, element)`
  - 2 Variables in for loop returns index and element seperately
- `iter(list)` converts list/string into iterable
  - Each element can be printed using `next(list)`
  
#### Data Structures
Data Collections
- Casting any data collection to a **set** will only keep distinct values
  - The length of `set([1,1,2,3,3,3,4])` will be 4

#### Functions
Special Keywords
- `*args` allows you to pass in any number of arguments as an iterable collection
- `**kwargs` allows for user defined keyword arguments
  - Use Cases: Creating your own dictionary
- `*`: unpacks data collection
- `range()` returns a sequence of numbers starting from 0 until 1 less than input. `range(4)` = 0-3.

#### OOP
Classes
- A class is like an object factory which has methods and constructors
- `__init__(self, ...)` is not a constructor, but a default method which initializes and sets up the class
  - Any objects given values under this is going to set their default value until argument is passed in when calling function
Inheritance
- `class myClass(parentClass)` is used to inherit parent attributes
  - Can use `pass` keyword in child class if you do not wish to add any more attributes

In [None]:
#enumerate example
l1 = ["eat", "sleep", "repeat"]
for ele in enumerate(l1, 5):
    print(ele)

for count, ele in enumerate(l1, 100):
    print(count, ele)

In [1]:
#making a dictionary using **kwargs
#kwargs are key/value arguments, so all you are doing is passing in a dictionary of values
def make_dict(**kwargs):
  return kwargs

make_dict(a=1,b=2,c=3)

{'a': 1, 'b': 2, 'c': 3}

### Take 2

#### Conditionals and Control Structures
- Enumerate
   - Dont forget second variable used in body to include in for clause:
   - `for ii, x in enumerate(list):`
    - `print(ii, : , x)`
- List Comprehensions
  - study conditionals in this section
- Generator
  - Made from list using list comprehension in for loop in () brackets
  - This is a function which creates iterator object and can generate elements using next()

#### Functions
- Lambda
  - treat variable as function call as `x = lambda a,b: a+b` = `x(a,b)`
  - `x = lambda: "str"` is just a function with no parameters, and is called using `x()` to fetch result
- Classes and Docstrings
  - Docstring of a function stored in its class using triple quotes
    - called using `func.__doc__`
- Function
  - Name called using `func.__name__`

In [3]:
#create generator from list
l1 = ['a','b','c']

list_gen = (val for val in l1)

print(type(list_gen))

#next(list_gen) prints 'a' and so on

<class 'generator'>


### Take 3

#### Data Structures
- Combining 2 Lists: `union(l2)`
- Finding matching elements in 2 lists: `l1 & l2`
- Adding whitespace to both sides equally to string: `center(spaces)`
  
#### Functions
- defining parameter data type and output data type
  - `def add_two(x: int) -> int:`

# Assessments - SQL

# Rapid Foundations: Python

### Day 1 - Intro

#### Notes

- **For Loops**: iterates over a sequence (list/tuple/dict/set/string)
  - Does not need iterator like while loop
  - **Control statements**: directs flow of execution
    - `break`: stops loop based on condition in body
    - `continue`: skip an element of the sequence
    - `pass`: when you do not want a block of code in for loop
  - `range(start, stop, increment)`: starts @ start, ends @ end-1, increments by #
  - `else`: block which executes after final loop
    - Does not execute if `break` is used in loop body
  - **Nested Loops**: inner loop executes 1x for each iteration of outer loop (cartesian product)
    - If only inner loop is in `print()`, then it will print for each iteration of outer loop
- **Conditions/If Statements**: uses logical conditions from math
  - `if`
    - nested if: if yes(a) and if yes(b) else
  - `elif`
  - `else`
  - `and`\ `or` \ `not` operators within if
  - **Shorthands (Conditional Expressions)**: for one-liner statements
    - if: `if` [condition] `print()`
    - if-else: `print() if` [expression] `else print()`
    - Multiple Else: `print() if` [expression] `else print() if` [expression] `else print()`

In [None]:
#draw basic shape (triangle)
print("   /|")
print("  / |")
print(" /  |")
print("/___|")

#### HackerRank

Notes:
- Python files are called modules (.py) which define functions, classes, variables
- `__main__`: special variable which indicates the module in use is called "main", or the main module.
  - If you would like to use a different module, you can import
  - HackerRank uses the main module for its exercises
- STDIN (standard input): used for I/O programming to accepts standard input methods
  - accepts input from user, file, and data streams

In [None]:
#1: print
if __name__ == '__main__':
    print("Hello, World!")

In [None]:
#2: if-else
import math
import os
import random
import re
import sys

if __name__ == '__main__':
    n = int(input().strip())
    if n % 2 != 0 and n >= 1:
        print("Weird")
    elif n >= 2 and n <= 5:
        print("Not Weird")
    elif n >= 6 and n <= 20:
        print("Weird")
    elif n % 2 == 0 and n > 20:
        print("Not Weird")

#can also use range(2,6) etc

In [None]:
#3: arithmetic operators
if __name__ == '__main__':
    a = int(input())
    b = int(input())
    print(a+b)
    print(a-b)
    print(a*b)

In [None]:
#4: division
if __name__ == '__main__':
    a = int(input())
    b = int(input())
    print(a//b)
    print(a/b)

#output: division result int, division result float

In [None]:
#5: loops
if __name__ == '__main__':
    n = int(input())
    for i in range(0,n):
        print(i*i)

#output: square of non-negative numbers less than input

### Day 2 - Variables and Data Types

#### Notes

- **Variables:**
  - **Text**: 
    - str (immutable)
  - **Numeric**:
    - int: `12`
    - float: `1.5` or scientific/power of 10 `10E3`
    - complex: imaginary number `1j`
      - cannot convert
    - random: has its own module `random` with function `randrange()`
  - **Collection**:
    - list: `[1,2]`
    - tuple (immutable): `(1,2)`
    - range: `range(start,stop)`
    - **Mapping**:
      - dictionary: `{a:1,b:2}`
  - **Set**:
    - set (distinct): `{1,2}`
    - frozenset (immutable): `frozenset({1,2})`
  - **Bool**: 
    - `True,False`
  - **Binary**:
    - bytes (immutable 0 - 256): `b"Hello"`
    - bytearray (mutable 0 - 256): returns elements which can be changed `bytearray(5)`
    - memoryview: returns object how its stored in memory
  - **None**

#### HackerRank

List Comprehension: creates new list based on values of existing list
- Inputs: 4 ints
  - 1,2,3: these set end of their ranges
  - 4: sum of values in each range which sets exclusive limit for ranges
- Syntax: new list = '[x for x in fruits if "a" in x]' 
- Permutations: all possible combinations of a list of elements

In [21]:
def cartesian_product(x,y,z,n):
    perms = [[i,j,k] for i in range(x+1) for j in range(y+1) for k in range(z+1) if i+j+k != n]
    print(perms)

cartesian_product(1,2,3,5)

#output: all combinations of the numbers between each range which do not add up to n
#this is achieved using 3 nested for loops, and each loop generates a list which includes a number from each range specified by user

[[0, 0, 0], [0, 0, 1], [0, 0, 2], [0, 0, 3], [0, 1, 0], [0, 1, 1], [0, 1, 2], [0, 1, 3], [0, 2, 0], [0, 2, 1], [0, 2, 2], [1, 0, 0], [1, 0, 1], [1, 0, 2], [1, 0, 3], [1, 1, 0], [1, 1, 1], [1, 1, 2], [1, 2, 0], [1, 2, 1], [1, 2, 3]]


Find the Runner-Up: print second highest number from given list
- Inputs:
  - 1: int
  - 2: str (numbers spaced out) --> split by space --> map int() to each number --> map object
- Steps:
  - sort input into asc order
  - `sorted()` can be used on a set, but to call `.sort()` off object it needs to be a list
  - can also use reverse=True kwarg to use positive index
  - convert arr to set to avoid fetching a duplicate max
  - fetch second to last number using index

In [23]:
def runner_up(num, arr):
    num = int(num)
    arr = map(int, arr.split())
    arr = sorted(set(arr))              #convert into set and sort asc
    print(arr[-2])                      #print second to last number

runner_up(5, "2 3 6 6 5")

5


Nested Lists: Print name of student from list who scored second-lowest. List tied students alphabetically
- Inputs: 
  - 1: # of students(int) --> range()
  - 2: name (str)
  - 3: score --> float
- Steps:
    - Make 3 empty lists: names, scores, and records (nested list)
    - Loop 1:
      - Append names to names list
      - Append scores to scores list
      - Append names + scores to records list
      - Remove duplicate scores from list and sort asc
      - Store target score (2nd Lowest)
      - Make a list of students who match target score using records list
      - Sort this list alphabetically
    - Loop 2: Print names

In [None]:
if __name__ == '__main__':
    names = []
    scores = []
    records = []
    for _ in range(int(input())):
        name = input()
        score = float(input())
        
        names.append(name)
        scores.append(score)
        records.append([name,score])
        
    scores = sorted(set(scores))
    target_score = scores[1]
    target_students = sorted([i[0] for i in records if i[1] == target_score])
    
    for student in target_students:
        print(student)

Dictionaries: Find average score of students whose grades are listed in dictionary.
- Input 1: # of records
- Input 2: name and scores
   - name is split from from scores (can be multiple)
   - `*`: splat operator, unpacks multiple input values from list
scores --> floats --> list
dictionary: key = name, value = scores
- Input 3: name of desired student
  
- Steps:
   - Pull scores from dictionary by name key
   - sum scores and divide by length
   - format to 2 dec places

In [None]:
if __name__ == '__main__':
    n = int(input())
    student_marks = {}
    for _ in range(n):
        name, *line = input().split()
        scores = list(map(float, line))
        student_marks[name] = scores
    query_name = input()
    
    scores = student_marks[query_name]
    avg = sum(scores)/len(scores)
    print(format(avg, ".2F"))

Tuples: Create tuple t from ints and find hash
- Inputs: 
  - 1: int
  - 2: list of nums --> split --> map int()
- Steps:
  - Convert map object --> tuple
  - hash tuple
  - `hash()`: returns hash value (int) of tuple, which is used to compare dict keys

In [None]:
if __name__ == '__main__':
    n = int(input())
    integer_list = map(int, input().split())
    
    myTuple = tuple(integer_list)
    print(hash(myTuple))

List: Update list based on functions passed in
   - Input 1: int (# of commands)
   - Input *N: lines containing commands
   - Steps:
     - For Loop:
       - Fetch Commands: split input --> map `str()` to convert type --> list
       - each if statement corresponds to command, and body executes command which matches user string
       - numbers inserted/removed/appended must be of `int` data type

In [None]:
if __name__ == '__main__':
    N = int(input())
    myList = []
    for _ in range(N):
        commands = list(map(str, input().split()))
        if commands[0] == 'insert':
            myList.insert(int(commands[1]), int(commands[2]))
        elif commands[0] == 'print':
            print(myList)
        elif commands[0] == 'remove':
            myList.remove(int(commands[1]))
        elif commands[0] == 'append':
            myList.append(int(commands[1]))
        elif commands[0] == 'sort':
            myList.sort()
        elif commands[0] == 'pop':
            myList.pop()
        elif commands[0] == 'reverse':
            myList.reverse()

### Day 3 - Strings

#### Notes

- **strings**: an array/sequence of chars
  - can use single or double quote enclosing
  - **nested string**
    - `"im 'bored'"`, `'im "bored"'`
  - **multiline**
    - """str"""
  - **loop**
    - `for x in "yasin":`
  - **word/char check**
    - `print("yasin" in/not in list)` = t/f
    - `if "yasin" in/not in text:`
  - **slicing/indexing**
    - `str[2:5]`: does not include end position
    - `str[:5]`: start to 5
    - `str[2:]`: 2 to end
      - **neg indexing**
        - `str[-5:-2]`: counts from end: between 5th char from end to 2nd char from end
  - **concat**
    - `a + " " + b`: combine these strings
  - **format**: 
    - f-string: `str = f"str"` or `print(f"str")`
      - variables: `f"hi {name}`
      - modifier: `f"price: ${price:.2F}"`
      - operations: `f"i am {today - birth} years old"`
- **carriage returns/operators**
  - `\n`: newline
  - `\`: escape (next character is printed, not used as a delim)
  - `+`: concatenate 2 strings
  - `\b`: erases a char
  - `\[octal]`: octal value (letter) of 3 digits
  - `\[hex]`: 2 digits and a letter which represents hex of a character
- **methods**
  - **modification**
    - `str.strip()`: remove whitespace
    - `str.replace("a","b")`: replace part of str
    - `str.split(",")`: split by delimiter --> output is list
    - `"-".join(list)`: list --> string
    - `str.lower(), str.upper()` = case change
    - `str.title()` = capitalize words
  - **validation**
    - `str.islower(), str.isupper()` = true/false
    - `str.isalnum(), .isalpha(), isdigit()`: checks if str is alphanumeric, digits, or letters
  - **combinations**
    - `str.upper().isupper()`
- **functions**
  - **counting**
    - `len()`: prints length int
- **index**
  - `str[0]`: starts w 0
  - `str.index("a")`: index of character/word/part of word
- **unpack**: `*str` --> list

#### HackerRank

- **Swap Case:** switch upper chars to lower and lower to upper
  - Input: str
  - Steps:
    - make new empty string
    - for loop
      - swap case based on whether its upper/lower
      - append modified char into string
    - return string
  
    *note: don't use == "True" for bool return function in if statement. use function itself as condition.

In [20]:
def swap_case(s):
    swapped = ""
    for char in s:
        if char.isupper():  #dont use == true
            swapped += char.lower()
        else:
            swapped += char.upper()
    
    return swapped

swap_case("PyTest")

'pYtEST'

- **String Split and Join**: split string on space delim and join using hyphen
  - Input: str
  - Steps:
      - split string --> returns list
      - join items using -

In [None]:
def split_and_join(line):
    modified = line.split()
    modified = "-".join(modified)
    return modified

split_and_join("file name")

'file-name'

- **What's Your Name?:** read two lines and print into given sentence
  - Inputs: 
    - 1: fName str
    - 2: lName str
  - Steps:
    - format using f string and print


In [None]:

def print_full_name(first, last):
    welcome = f"Hello {first} {last}! You just delved into python."
    print(welcome)

print_full_name("Yasin", "Sharaf")

Hello Yasin Sharaf! You just delved into python.


- **Mutations:** change character in a string, an immutable data tyoe
  - Inputs: 
    - 1: str
    - 2: "index" space "char" --> splits into list(index, char)
  - Steps:
    - make vars:
      - index to insert at
      - index after insertion
      - character parameter
      - new string
    - slice the string into two parts:
      - before change index
      - after change index (index + 1)
    - add in new char between both slices
    - print

In [None]:
def mutate_string(string, position, character):
    i = position
    j = i + 1
    char = character
    mutated_str = string[:i] + char + string[j:]
    
    return mutated_str

mutate_string("yasin",3, "ee")

'yaseen'

- **Find a string**: count substring occurrences in given string
  - Inputs: 
    - 1: str
    - 2: substr
  - Steps:
    - you need to store and use two values:
      - substring length: for sliding window in for loop (uses counter position to move starting point traverse over string)
      - remainder(string - substring): this is the inclusive range to traverse over (so add 1)
    - in each iteration, compare sliding window (size of substring, starts from iterator position) to substring
    - tally number of matches

In [None]:
def count_substring(string, sub_string):
    traverse_over =len(string) - len(sub_string)
    count = 0
    
    for i in range(0,traverse_over+1):
        block_end = i+len(sub_string)

        if(string[i:block_end] == sub_string):
            count += 1
    return count

count_substring("xiexie!", "xie")

2

- **String Validators**: validate whether each char type exists in the given string
  - Input: str
  - Steps:
    - list comprehension: loops validation methods over each char
    - validation must return true if any chars in string match

In [None]:
def str_validation(string):
    print(any(char.isalnum() for char in string))
    print(any(char.isalpha() for char in string))
    print(any(char.isdigit() for char in string))
    print(any(char.islower() for char in string))
    print(any(char.isupper() for char in string))

str_validation("Cub4n$$")

True
True
True
True
True


- **Text Wrap:** wrap given str into paragraph of given width
  - Inputs: 
    - 1: string
    - 2: width int
  - Modules:
    - textwrap: uses `fill()` which wraps text based on max width for each line
      - `wrap()` returns list


In [None]:
import textwrap

def wrap(string, max_width):
    mywidth = int(max_width)
    return textwrap.fill(string, max_width)

wrap("I went to the moon",4)

'I\nwent\nto\nthe\nmoon'

### Day 4 - Sets

#### Notes

Sets `{}`
- **Key Facts**
  - Initialize using `()`
  - distinct elements, regardless of input
  - No index, so cannot fetch elements using `set[i]`
  - Immutable, but can add/remove items
  - No order
  - `False` and `0` considered equal values
  - `True` and `1` considered equal values
  - Can be any data type
- **Functions**
  - `len(set)`
  - `sorted(set)`
- **Constructor**
  - `set((list))`: converts list to set
- **Methods**
  - **Comparisons**
    - `set.symmetric_difference(set2)`: Values only existing in one set
      - or `print(set1^set2)`
    - `set.union(set2)`: Union distinct (combine both)
      - or `print(set1 | set2)`
    - `set.intersection(set2)`: Inner join (matching elements) 
      - or `print(set1 & set 2)`
    - `set.difference(set2)`: Remove set 2 elements from set 1
      -  `print(set1 - set2)`
  - **Adding**
     - `set.add(i)`: adds element
     - `set.update([collection])`: add a collection of elements (iterable) to set
  - **Removing**
    - `set.discard(i)`: removes value
    - `set.remove(i)`: removes value and returns KeyError if that value does not exist
    - `set.pop([index])`: removes value at given index (last element by default)
  
  
  *note: a constructor function creates new object (ie a set) and has no return value

#### HackerRank

Intro to Sets: Compute avg of all plants w distinct heights
- Inputs:
  - 1: size of arr (int)
  - 2: space-seperated ints (str) --> arr
- Steps:
  - convert arr --> set
  - use for loop to add all elements and find avg

In [2]:
def average(array):
    array = list(map(int,array.split()))
    mySet = set(array)
    
    total = 0
    for i in mySet:
        total += i
        avg = total/len(mySet)
    return avg

average("1 2 3 4 5 6")

3.5

Apply Set Methods
- `_` is a var which holds `input()`, but doesnt need to be named since it will not be used
- Use an integer input to allow user to set range of a list of inputs
- Use `sorted()` to sort set
- print elements of set using loop

In [None]:
_, eng = input(), set(input().split())
_, french = input(), set(input().split())

symmetric_difference = eng.symmetric_difference(french)
print(len(unique_vals))

union = eng.union(french)
print(len(union))

intersection = eng.intersection(french)
print(len(intersection))

difference = eng.difference(french)
print(len(difference))

In [None]:
n = int(input())     #number of country inputs

set1 = set()
for country in range(n):
    set1.add(input())  

print(len(set1))

In [None]:
#sort and print values unique to each set
input()
set1 = set(map(int,input().split()))
input()
set2 = set(map(int,input().split()))

symmetric_difference = sorted(set1^set2)
for i in symmetric_difference:
    print(i)

In [None]:
# discard, remove, pop
n = int(input())                          #num of elements in set
s = set(map(int, input().split()))        #spaced out chars in str --> int(chars) --> set
N = int(input())                          #num of commands

for item in range(0, N):
    cmd = input().split()                 #str "command #" --> list with index command = 0 and # = 1
    if cmd[0] == "pop":
        s.pop()
    if cmd[0] == "remove":
        try:
            s.remove(int(cmd[1]))
        finally:
            continue
    if cmd[0] == "discard":
        try:
            s.discard(int(cmd[1]))
        finally:
            continue
            
print(sum(s))

### Day 5 - Math & Itertools

#### Notes

- **Facts**
  - Itertools: module for working with iterable containers (lists, sets, tuples, strings, etc)
  - Makes iterator operations more effective vs loops
- **Functions**
  - **Combinators**: either combines iterable with itself or with others
    - `product(a,b)`: cartesian product of iters. returns itertools object (convert to list)
      - kwargs: `repeat=` repeats product function n times
      - args: either 2 iterables or 1 iterable with itself
    - `permutations(a)`: all possible orderings of iter
      - args: length (int)
    - `combinations(a)`: all possible combinatons of elements in iter
      - args: length (int)
    - `combinations_with_replacement(a)`: combines element with itself as well
  - **Terminating**: transformation to a loop based on function
      - `accumulate(a)`: running totals for each element
          - kwargs: `func=[operator.mul, max]`
      - `groupby(a)`: seperates elements into groups based on condition. returns dict {key = condition t/f, value = subset}
        - kwargs: `key= [func/lambda]`
      - `chain(a,b,c)`: joins multiple iterables
        - chain_from_iterable(n): takes in nested list of iterables `n = [a,b,c]`
      - `compress(a,[1,0,0,1])`: keep elements based on t/f for each index and skip the rest
      - `dropwhile(func(), li)`: prints remaining elements after first false is met
      - `takewhile(func(), li)`: prints elements before first false is met
      - 
      - `filterfalse(func(),li)`: prints all elements which return false
      - `islice(li, [i],[i], ...)`: keeps specified elements
        - index slice
      - `starmap(func, tuple list)`: returns values from each tuple based on condition in function
      - `zip_longest(l1,l2)`: makes a pair of alternating values from both iterables
        - kwargs:`fill_value`: fills part of the pair when a list is shorter than the other
    - **Infinite**: used with a loop and continues until stop condition
      - `count(a)`: repeats loop until stop condition, starts at a
      - `cycle(a)`: cycles over iterable until stop condition
      - `repeat(a, [stop]):` repeats an element given time`
      - `next(a)`: can be used instead of increment variable to move to next element in loop.
  - **Constructor**:
    - `Iter(iterable)`: creates iterator object (can use next()) from iterable container
      - `reversed(iterable)`: reverses itrerable object and creates iterator.

#### HackerRank

Notes:
- when length of a combinator is up to a size, place combinator for loop inside range for loop
- outputting to 2 variables: use `a,b =`  syntax
- to join 2 elements in list and print together, use `''.join(elements)` method in for loop

In [None]:
#permutation: string into permutations
from itertools import permutations
S,k = input().split()

k = int(k)

perms = list(permutations(sorted(S),int(k)))

for perm in perms:
    print(''.join(perm))

In [None]:
#product()
from itertools import product

A = list(map(int, input().split()))
B = list(map(int, input().split()))

product = product(A, B)

for i in product:
    print(i, end=" ")

In [None]:
#combinations()
from itertools import combinations

S, k = input().split()
S = sorted(S)                             #lexicographic order
k = int(k)     

for i in range(1, k+1):                   #1 to size k
    for combo in combinations(S, i):      #repeat combos until size k is reached
        print(''.join(combo))

In [None]:
#combinations with replacement
from itertools import combinations_with_replacement

S, k = input().split()
S = sorted(S)
k = int(k)

combos = list(combinations_with_replacement(S,k))

for i in combos:
    print(''.join(i))

### Day 6 - Collections, date/time

#### Notes

Collections
- Lists
   - starts from index[0], but from end it starts from [-1], similar to string
   - changing element: `list[1] = 'yasin'`
   - **Methods**
   - **Adding Elements**
      - `extend(list2): appends another list to end
      - `append(a)`: appends another element to end
      - `insert(index, value)`: inserts value at position
   - **Deleting Elements**
      - `remove(a)`: removes value
      - `clear()`: truncates list
      - `pop()`: removes the last element unless index specified
      - `popleft()`: removes first element in a deque
   - **Indexing**
      - `index(a)`: find index of value
      - `count(a)`: counts value occurrences
   - **Sorting**
      - `sort()`: sorts in asc order
      - `reverse()`: sorts in desc order
      - **Duplicating**
      - `copy()`: copies list w attributes
- Tuples
      - initialize with `()`
      - immutable
      - contains a pair of values
      - Use cases: coordinated, etc which store data that you do not want to change
- Dictionaries
  - Initialize with `{}`
  - Populate using `d[key] = "value"`
  - **Methods**
    - `values()/keys()`: fetch all values or keys
- Collections Module
  - **Special Collections**
    - **Dictionaries**
      - `Counter()`: type of dictionary which stores count of each element
         - can be initialized using list of items, dictionary (item: count), or kwargs (a=,b=,c=)
         - Attributes: most_common
         - Functions: sorted()
      - `OrderedDict()`: stores values according to creation order
      - `Defaultdict(default_factory)`: gives missing keys a value so errors are not raised. Useful when using a counter variable for dict key in loop, and there is no 0 key.
      - `ChainMap(d1,d2,...)`: makes nested dict
        - uses original keys/values for accessing values
    - **Tuples**
      - `namedTuple()`: creates a tuple with names for positions which can be accessed using name attribute
    - **List**
      - `deque()`: double ended queue, optimized for append/pop from both sides.
        - O(n) linear --> O(1) constant time complexity aka size does not affect algorithm speed.
      - `UserList()/Dict/Str`: allows user modifications to these inputs.

Date/Time
- `datetime.now()`: current date/time
  - **Attributes**
    - `year`
    - `strftime("format")`: formats date object --> str
  - **Constructor**
    - datetime(year,month,day,[hour, minute, second, microsecond, tzone])



#### HackerRank

In [None]:
#counter from list
from collections import Counter

X = int(input()) #shoes
sizes = Counter(map(int, input().split())) #shoe sizes
N = int(input()) #customers

total = 0       #init counter
for i in range(N):
    c_size, price = map(int,input().split())       #space-seperated str --> int vars
    if sizes[c_size]:       #if customer size is in the size list
        sizes[c_size] -=1   #update inventory for that size after sale
        total += price      #add sale price to total

print(total)


### Day 7 - Errors, Exceptions, and Classes

#### Notes

- Errors
  - **Syntax Error**
    - TypeError: adding str and int
    - NameError: using undefined variable
    - ValueError: removing a nonexistent element from list
    - IndexError: calling on index nonexistent in list
    - Import Errors
     - ModuleNotFoundError: Importing module which does not exist
     - FileNotFoundError: Calling on file which does not exist
  - **Exceptions**
    - **Assertion**
     - AssertionError: when expression in `assert()` is false
   - **Raise**
     - this raises an exception or error based on if statement with print
    - **Try/Except Block**
      - Used to catch any exceptions which raise errors present in the statement
        - Can have multiple `except` blocks to test for various errors
      - Used to allow program to continue with message vs ending at error
      - `Else:` prints if no exceptions throw
      - `Finally:` this one always prints
      - Common errors to test for:
        - ZeroDivisionError: when 0 is in denominator
        - TypeError
  - **Exception Class**
   - Define class and for a specific error and use exception as base class
     - Here, you can use pass clause in body and it will be valid, but the benefit is it will describe the specific error
     - You can also add attributes (parameters, etc)
- Classes
  - Class: object constructor, or blueprint
  - This class contains built in functions (modify the class), methods, and attributes
    - Attributes: These are properties stored as data in the class, which can be called like this: 
      - x = 5 in MyClass
      - Printing MyClass.x will return 5
      - These can also be user-defined by setting them equal to the parameters `(self.message = message)`
    - Functions:
      - `__init__(self, param, ...)`: Gives values to object properties
      - `__str__(self)`: returns a string the way you want to format it
    - Parameters:
      - self: has to be the first parameter of functions in the class which references the current instance
   - Can modify and delete properties (MyClass.x = 10) or `del MyClass.x`
   - Pass: a fller for class body if you just want to inherit properties from another class or leave it empty

#### HackerRank

Notes:
- to test for zero division and value error based on user input, do int conversions in the try clause, not beforehand, or else error will be raised outside of try clause
- `eval()` evaluates an expression, or sees if a keyword is just a string or a function, variable etc.

In [None]:
#exceptions
for case in range(T):
    a, b = input().split()
    try:
        print(int(a) // int(b))
    except ZeroDivisionError as e1:
        print(f"Error Code: {e1}")
    except ValueError as e2:
        print(f"Error Code: {e2}")

In [None]:
#classes
import math


class Points(object):
    def __init__(self, x, y, z):    #params = set 1 (current)
        self.x = x
        self.y = y
        self.z = z

    def __sub__(self, no):    #this method returns 3 values: the difference of each matching set
        return Points(
            (self.x-no.x),
            (self.y-no.y),
            (self.z-no.z))

    def dot(self, no):        #this function returns the dot product (sum of products of matching sets aka sumproduct)
        return (self.x*no.x)+(self.y*no.y)+(self.z*no.z)

    def cross(self, no):         #difference of cross joins: y1*z2 - z1*y2 and so on
        return Points(
            (self.y*no.z-self.z*no.y),
            (self.z*no.x-self.x*no.z),
            (self.x*no.y-self.y*no.x))
        
    def absolute(self):
        return pow((self.x ** 2 + self.y ** 2 + self.z ** 2), 0.5)

In [None]:
#eval user input expression

eval(input())

### Day 8 - Functionals, RegEx, Parsing

#### Notes

Functionals
- **Args vs Params**
  - Arguments: used when calling a function (argument that is passed to function)
  - Parameters: used when defining a function (the arguments it takes in)
- **Positional and Kwargs**
  - Positional (arg): does not specify names of params, but needs to be in order `(1, 2)`
  - Keyword Argument (kwarg): specifies parameters so order does not matter `(a=1, b=2)`
    - Can use positional and keyword args, but positional is only before kwarg
    - Can predefined in function `def func(a,b,c=1)` which only accepts a and b from user
- **Variable Length Args**
  - `*args`: any number of args can be passed in
  - `**kwargs`: any number of kwargs can be passed in
  - Use for loop in function body to print variable length arg and kwarg: `for arg in args print(arg)`
  - To fetch arg from *args, use indexing to pick nth user input
- **Container Unpacking**
  - Can use list/tuple and unpack into function arguments: `func(*list1)`
  - can unpack dict into kwargs: `func(**dict1)`
- **Local vs Global**
  - A local variable in function body will not apply the attribute to your variable
  - Var inside func body needs to be defined using `global` for it to transform any variable you apply the function to
  - immutable objects cannot be changed, but local mutable objects can be changed via local variable in the function
    - For example, a list defined in the function will change a list var, but int or str will not
    - Immutable objects within a mutable object (str, int) in list can be changed
    - However, be careful not to make a local list variable in the function body, or else the changes will apply to that local variable and not to your variable
- **Yield Keyword**: returns a sequence of elements in a generator object (list of values which can be printed using next())

RegEx
   - **Module**: `re`
   - **Functions**
     - `findall(letters/word, string)` --> list(matches)
     - `search()` --> match object (match anywhere in string)
     - `split()` --> list(string split @ matches)
     - `sub([replacements])` --> replace matches
   - **Characters**
     - Metacharacters: has special meaning 
     - Special Sequences: starts with \ and followed by letter
     - Sets []: set of characters w special meaning
   - **Match Object**
     - **Methods**
       - `Span():` start and end positions
       - `string`: original string passed into regex function
       - `group()`: parts where there was a match
       - `start()/end()`: returns first and final match

Parsing
   - Split/Extract Substrings
   - **Modules (String)**
     - `split([delim], [max splits])`
       - max splits splits the first x amount of words based on delim, and the rest is its own string
     - `strip([chars])`: strips whitespaces by default, or can pass in set of chars as string

Mapping
   - can map functions to a variable or multiple variables based on the parameters it takes

#### HackerRank

In [1]:
#map and lambda
cube = lambda x: x**3   #x cubed

def fibonacci(n):                      #fibonacci: a[i] = sum(a[i-1] + a[i-2])
    n = int(n)                      
    l1 = [0, 1]                        #starting 2 numbers
    for i in range(2, n):              #start fibonnaci at 3rd number: need 2 numbers before it
       l1.append(l1[i-1] + l1[i-2])    #add sum of previous numbers to list
    return l1[0:n]                     #return all numbers from list till specified range

fibonacci_sequence = (list(map(cube, fibonacci(5))))     #map cube lambda function to each element
print(fibonacci_sequence)

[0, 1, 1, 8, 27]


In [4]:
#regex split: re.split()
import re
regex_pattern = r"[,.]"                #split on dot and comma
num = "100,000,000.000"

parsed = re.split(regex_pattern, num)
print("\n".join(parsed))               #print each part on new line


100
000
000
000


### Day 9: XML, Closures & Decorators

#### Notes

XML
- eXtensible Markup Language
- Designed to store/transport data, unlike HTML which displays data
- has similar style to HTML, can set own tags
- usually used with HTML which presents the data, since XML does not define presentation
- CSS then contains info on the styling of the data hierarchy in dictionaries
- Benefits: can manipulate HTML data without editing the  HTML file, you can just read and update XML data with JS.
- Has specifications for use cases including news, weather service, etc.
- Tree Structure: root --> child --> subchild --> text
- **Standards**: AJAX (JS + XML), DOM, XPath, XLST (stylesheet language)
- **Parsing**: use ElementTree library
  - `parse("XML File")`: entry point, parses entire doc at once --> list object (child elements)
    - `attrib`: fetches attribute of a specific element in list using its index
  - `getroot()`: gets root element

In [5]:
#XML Tree
#each element is a node
# <?xml version = "" encoding = "">
# <root element>: bookstore
#     <element [attribute]=>: book [category]
#         <element [attribute]=> title [lang] Harry Potter </element>
#     </element>
# <root element>

Closures
- Function that returns output of inner function (hides complex code)
- Nested Function: in a nested function, the inner function can only be accessed within the scope.
- Closures access inner functions by returning the inner function in the outer function clause
  - Use return ithout parenthesis
- Benefits:
  - As it calls an inner function, it reduces use of global variables and hides data (outer function is clean)
  - Kind of like using a core sql model for complex queries, then wrapping it in a final query
  - May be used instead of class if you have 1 method in it - good for code readability

Decorators
- Used with @ symbol
- Decorators is a function that does the following:
  - Takes in a func as an arg
  - Passes it to inner func --> modifies func arg
  - Returns output of inner func
- aka a closure with func arg
- if the function passed in has a return, include it in inner func body
- Can also chain decorators

#### HackerRank

In [3]:
#decorator to calculate execution time
import time
import math
import re

def calculate_time(func):                 #closure takes in func
   def inner1(*args, **kwargs):          #we want it to work w any func, so args and kwargs are variable
        begin = time.time()               #start time
        func(*args, **kwargs)             #run function with its args/keyword args
        end = time.time()                 #end time
        print("Total time taken in : ", func.__name__, end - begin)
   return inner1

@calculate_time
def thousands_split(num):
   time.sleep(2)
   regex_pattern = r"[,.]"                
   num = str(num)

   parsed = re.split(regex_pattern, num)
   print("\n".join(parsed))

thousands_split("100,000,000.00")


100
000
000
00
Total time taken in :  thousands_split 689.6331927776337


In [4]:
#XML Score: Sum of attributes
def get_attr_number(node):                                                    
    score_root = len(node.attrib)                                            #fetches length of root element
    score_childs = sum(get_attr_number(child) for child in node)             #this list comprehension applies the len(node.attrib) function to each child element
    return(score_root+score_childs)                                          #root element attr + child element attr's

### Day 10-11 NumPy

#### Notes

Numpy = Numerical Python (library that works with arrays)
- A NumPy array stores values as a single data type
- NumPy is MATLAB replacement due to math heavy
- Backend for Pandas
- Helps understand ML libraries like TensorFlow
- Written in C or C++, and Python
- Check version: `np.__version__`

Advantages vs Lists
- List stores additional metadata for each value, which use 8 bytes each. So NumPy reads less memory
- No type checking (all the same dtype)
- Memory of numbers are not scattered (continuous place memory), so computations are quicker
- Faster cache loading
- Up to 50x faster than lists, so its ideal for data science

Difference vs Lists
- Can perform operations between lists

Functions
**Making Array**
- `array([list], dtype="")`: numpy array constructor (can pass in list/tuple/etc)
  - 2-D array: `([l1],[l2])` (has arrays as its elements)
  - **kwargs**
    - `ndmin=`: number of dimensions
    - `dtype=`: sets data type (S, i4, i32, i, m , M, O, U, b, f)
  - **Attributes**
    - `ndim`: dimensions (level of array depth)
    - `shape`: dimensions, elements
    - `dtype`: datatype
    - `itemsize`: size of each element in bytes
    - `nbytes`: total bytes in array
  - **Indexing**: starts from 0, neg indexing starts from -1
    - 1-D: `array[i]`
    - 2-D: `array[row,col]`
      - `[:,2]`: this will be 3rd column both rows
      - `[0,1:-1:2]`: first row(middle numbers skip by 2)
      - think of 2d arrays as stacked rows
    - 3-D:`array([[[l1],[l2]],[[l3],[l4]]])`: Elements are 2D arrays
      - `array[array,array,element]`
    - **Advanced**
      - `arr expression`: checking if expression is true for each element
        - can also check for any or all data
      - can index
    - **Slicing**
      - `arr[start:end:step]`
        - start inclusive, end exclusive
        - can also do `[::2]` for skipping numbers from entire array
        - 2-D Arrays: `arr[arr:arr, start:end:step]`
  - **Changing Elements**:
    - `array[row,col]` = new number
    - `array[:,2]` = [1,2]: changes all rows of 3nd column
  - **Methods**
    - **Shaping Array**
      - `zeroes((shape))`
      - `ones((shape))`
      - `full((shape),num)`: creates an array of that number
      - `full-like(array,num)`: creates arr in shape of other array
      - `reshape(arr, arr, ...., element)`: reshapes current array
        - elements must be equal in both shapes (cannot shape 8 elements into 3x3)
        - returns a view
        - can have an unknown dimension (-1)
        - Flatten: `reshape(-1)`
        - or use `flatten()`
    - `np.random.rand(shape)`: random nums
    - `astype('type' or type)`: changes data type of array
    - **Creating Copies**
      - `copy()`: creates copy of original, which is not affected by changes to original
      - `view()`: view of original (is not new dataset), so changes affect view, and view changes affect original
      - `base()`: checks if the array owns the data (copy or view). Should return base if does not own
**Math**
- **Operations**
  - simple math operations on arr and between arrays
- **Linear Alg**
  - `matmul()`: matrix mult
   - determinants etc
  - **Stats**
   - min, max, sum
  - **Loading Data**
    - `np.genfromtext(file, delim=)`
    - `data.astype()`
  - **Iterating**
    - Use for loop with counter to go thru all elements. In 2d arrays, returns subdimensions
    - To return elements from subdimensions in 2d array, iterate on counter in nested for loop (to retrieve scalars/values)
**Joining Arrays**
- `concatenate(arr1,arr2)`
      - **kwargs**: axis=1 (2d array join across rows)
- `stack(arr1,arr2, )`: stacks on top of each other
      - **kwargs**: axis = 1 (lays out each array vertically and combines side by side)
      - `hstack(), vstack()`: horiz, vertical stacking
**Splitting Arrays**
- `array_split(arr, parts)`: if parts cannot be equal, it adjusts from the end
   - `split()` does not adjust for unequal elements
   - **kwargs**: axis = 1 (stack arrays and split by columns)
**Searching Array**
- `where(expression)`: returns index of matched values
- `searchsorted(arr, value)`: returns index of where value should be inserted in order
  - **kwargs**: side=right to give index from right (starts from 0)
  - can also pass in an array for value, which would return an array of indices
**Sorting Arrays**
- `sort(arr)`: works on 2d arrays as well (sorts both arrays)
**Filtering Arrays**
- must use new array of boolean values and pass array into index
- `arr[bool arr]`
- can also use a for loop with boolean values based on conditions and append to empty array
- can also make array out of the result of an array expression `newarr = arr%2 == 0`
  - this returns an array of t/f, which can be used to index the original array
- Notes
  - setting a second arr to 1st arr will make them identical in memory, so make sure to reference a copy not the actual array
  - can perform arithmetics between 2 arrays, simple math operations, sin/cos, etc
  - Array breakdown
    - 5th dim: elements
    - 4th dim: vector
    - 3rd dim: matrix with vector
    - 2nd dim: 1 element (3d array)
    - 1st dim: 1 element (4d array)
    - etc

In [2]:
#reshape array
import numpy as np



myList = list(map(int, input().split()))

arr = np.array(myList).reshape(3,3)
print(arr)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


### Day 12 - Debugging

#### Notes

Debugging (VSCode)
- Run and Debug button in side bar
  - **Custom JSON file**: if you make JSON file here, set current directory to file directory not root, as this will cause the file to not find your code
  - configs to make in file:
    - `cwd = ${fileDirname}`: change dir to project dir
    - `justMyCode = false"`: debugs code under the hood of libraries you used as the bug might be here
- **Breakpoint**: bookmark for where to debug
  - See which command the issue starts at, and place there
- **Sidebar View**
  - Variables
  - Watches: expressions that keep getting evaluated at pause
  - Call Stack: records every time a script/function calls a function, starting from top (most current), then moves down to previous steps
  - Greyed out = unused var
- Debug Console
  - can print out vars to see output (check var list in sidebar)
- Controle Menu (top)
  - use to step over/step into line and hover over var to see output as well
  - restart after making changes
- **In a Professional Setting**
  - Write tests in big project for each function, then debug each function
  - aka, break down project and debug right after a test fails

**Notes**: make sure you are in the correct folder in the explorer section for the json config to work

#### HackerRank

In [None]:
#debug this code: adds total score of vowels in each word in user input (list)
#should score 2x for words with even vowels
#'programming is awesome' should = 4, but prints out score of 2
#issue is most likely in incrementation, since score should at least be 3
#for loop counted 3 vowels in 1st word: passed
#but did not add 1 to score, and problem was in the incrementor else clause

def is_vowel(letter):
    return letter in ['a', 'e', 'i', 'o', 'u', 'y']

def score_words(words):                 #input = list of words <-- split str
    score = 0                           #score counter (total score of list)
    for word in words:                  #loops over words in list
        num_vowels = 0                  #vowel counter in words
        for letter in word:             #loops over each letter in user word
            if is_vowel(letter):        #add 1 point if matches vowel list
                num_vowels += 1
        if num_vowels % 2 == 0:         #if even num of vowels in the word
            score += 2                  #add 2x points
        else:
            score += 1                     #else assign score to word and move to next word
    return score


n = int(input())
words = input().split()
print(score_words(words))

### Day 13-14 Pandas

#### Notes

Pandas
- library for working with datasets, analyzing, and making conclusions
- clean messy datasets
- make data readable and relevant
- EDA

Data Cleaning
- Nulls:
  - `dropna()`: drop rows with nulls from dataset (ideal for very large datasets)
    - **kwargs:** inplace = change original df, subset= check for nulls in a col
  - `fillna()`: replace nulls with a value
    - use `df[col].fillna()` to apply to certain column
    - can also replace with a function (such as mean) and passing in as variable
- Formatting:
  - `to_datetime(df[date_col])`
- Replace Values
  - `loc[row,col]`: fetch by index and set value
    - Can replace using conditions with for loop and if
    - can use `iloc` if prefer using col index
- Duplicates
  - `duplicated()`: fetches t/f for each row
  - `drop_duplicates(inplace=)`: drops duplicate rows

EDA
- `corr()`: correlation matrix for columns
  
Series
- `Series(values)`: creates array with customizable index
  - **kwargs:**
    - list, dict
    - index= pass in custom index. can filter keys this way for dict

DataFrames
- `DataFrame(data)`
  - **kwargs:**
    - index= use custom index
  - `to_string()`: prints entire df
  - json can be read in as a dict

In [4]:
import pandas as pd
print(pd.__version__)

1.4.4


##### Creating DF

In [8]:
#from csv
df = pd.read_csv("Pandasin20Minutes/telco_churn.csv")

In [6]:
#from dict
tempdict = {'col1':[1,2,3], 'col2':[4,5,6]}
dictdf = pd.DataFrame.from_dict(tempdict)

In [10]:
#check system max rows for any df
#can change this
print(pd.options.display.max_rows)

60


##### Reading DF

###### Show Columns and Data Types

In [5]:
df.head(10)

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,99.0,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,103.0,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,110.0,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,88.0,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,122.0,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False
5,AL,118,510,Yes,No,0,223.4,98.0,37.98,220.6,101.0,18.75,203.9,118.0,9.18,6.3,6,1.7,0.0,False
6,MA,121,510,No,Yes,24,218.2,88.0,37.09,348.5,108.0,29.62,212.6,118.0,9.57,7.5,7,2.03,3.0,
7,MO,147,415,Yes,No,0,157.0,79.0,26.69,103.1,94.0,8.76,211.8,96.0,9.53,7.1,6,1.92,0.0,False
8,LA,117,408,No,No,0,184.5,97.0,31.37,351.6,80.0,29.89,215.8,90.0,9.71,8.7,4,2.35,1.0,False
9,WV,141,415,Yes,Yes,37,258.6,84.0,43.96,222.0,111.0,18.87,326.4,97.0,14.69,11.2,5,3.02,0.0,False


In [11]:
dictdf.head()

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6


In [13]:
dictdf.tail()

Unnamed: 0,col1,col2
0,1,4
1,2,5
2,3,6


In [11]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3333 entries, 0 to 3332
Data columns (total 20 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   State                   3333 non-null   object 
 1   Account length          3333 non-null   int64  
 2   Area code               3333 non-null   int64  
 3   International plan      3333 non-null   object 
 4   Voice mail plan         3333 non-null   object 
 5   Number vmail messages   3333 non-null   int64  
 6   Total day minutes       3323 non-null   float64
 7   Total day calls         3323 non-null   float64
 8   Total day charge        3315 non-null   float64
 9   Total eve minutes       3324 non-null   float64
 10  Total eve calls         3325 non-null   float64
 11  Total eve charge        3333 non-null   float64
 12  Total night minutes     3333 non-null   float64
 13  Total night calls       3332 non-null   float64
 14  Total night charge      3333 non-null   

In [28]:
#all columns
df.columns

Index(['State', 'Account length', 'Area code', 'International plan',
       'Voice mail plan', 'Number vmail messages', 'Total day minutes',
       'Total day calls', 'Total day charge', 'Total eve minutes',
       'Total eve calls', 'Total eve charge', 'Total night minutes',
       'Total night calls', 'Total night charge', 'Total intl minutes',
       'Total intl calls', 'Total intl charge', 'Customer service calls',
       'Churn'],
      dtype='object')

In [27]:
#data types
df.dtypes

State                      object
Account length              int64
Area code                   int64
International plan         object
Voice mail plan            object
Number vmail messages       int64
Total day minutes         float64
Total day calls           float64
Total day charge          float64
Total eve minutes         float64
Total eve calls           float64
Total eve charge          float64
Total night minutes       float64
Total night calls         float64
Total night charge        float64
Total intl minutes        float64
Total intl calls            int64
Total intl charge         float64
Customer service calls    float64
Churn                      object
dtype: object

###### Summary Statistics

In [17]:
df.describe()
#kwargs:
#include= summary on specific datatype

Unnamed: 0,Account length,Area code,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls
count,3333.0,3333.0,3333.0,3323.0,3323.0,3315.0,3324.0,3325.0,3333.0,3333.0,3332.0,3333.0,3333.0,3333.0,3328.0,3328.0
mean,101.064806,437.182418,8.09901,179.78715,100.456214,30.557831,201.033935,100.110677,17.08354,200.872037,100.115246,9.039325,10.237294,4.479448,2.764588,1.563101
std,39.822106,42.37129,13.688365,54.419625,20.057356,9.255987,50.676652,19.932115,4.310668,50.573847,19.56671,2.275873,2.79184,2.461214,0.754086,1.31587
min,1.0,408.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.2,33.0,1.04,0.0,0.0,0.0,0.0
25%,74.0,408.0,0.0,143.7,87.0,24.42,166.6,87.0,14.16,167.0,87.0,7.52,8.5,3.0,2.3,1.0
50%,101.0,415.0,0.0,179.4,101.0,30.5,201.4,100.0,17.12,201.2,100.0,9.05,10.3,4.0,2.78,1.0
75%,127.0,510.0,20.0,216.5,114.0,36.78,235.325,114.0,20.0,235.3,113.0,10.59,12.1,6.0,3.27,2.0
max,243.0,510.0,51.0,350.8,165.0,59.64,363.7,170.0,30.91,395.0,175.0,17.77,20.0,20.0,5.4,9.0


In [30]:
#unique values in a col
df.State.unique()

array(['KS', 'OH', 'NJ', 'OK', 'AL', 'MA', 'MO', 'LA', 'WV', 'IN', 'RI',
       'IA', 'MT', 'NY', 'ID', 'VT', 'VA', 'TX', 'FL', 'CO', 'AZ', 'SC',
       'NE', 'WY', 'HI', 'IL', 'NH', 'GA', 'AK', 'MD', 'AR', 'WI', 'OR',
       'MI', 'DE', 'UT', 'CA', 'MN', 'SD', 'NC', 'WA', 'NM', 'NV', 'DC',
       'KY', 'ME', 'MS', 'TN', 'PA', 'CT', 'ND'], dtype=object)

In [25]:
df.State
#spaced name df['col']
#multiple columns df[['col1','col2']]


0       KS
1       OH
2       NJ
3       OH
4       OK
        ..
3328    AZ
3329    WV
3330    RI
3331    CT
3332    TN
Name: State, Length: 3333, dtype: object

###### Filtering

In [33]:
#row filtering based on value
df[(df['International plan']=='No') & (df['Churn']==True)]

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
10,IN,65,415,No,No,0,129.1,137.0,21.95,228.5,83.0,19.42,208.8,111.0,9.40,12.7,6,3.43,4.0,True
15,NY,161,415,No,No,0,,,,317.8,97.0,27.01,160.6,128.0,7.23,5.4,9,1.46,4.0,True
21,CO,77,408,No,No,0,,,,169.9,121.0,14.44,209.6,64.0,9.43,5.7,6,1.54,5.0,True
33,AZ,12,408,No,No,0,249.6,118.0,,,,21.45,280.2,90.0,12.61,11.8,3,3.19,1.0,True
48,ID,119,415,No,No,0,159.1,114.0,27.05,231.3,117.0,19.66,143.2,91.0,6.44,8.8,3,2.38,5.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3280,AR,76,408,No,No,0,107.3,140.0,18.24,238.2,133.0,20.25,271.8,116.0,12.23,10.0,3,2.70,4.0,True
3287,KS,170,415,No,Yes,42,199.5,119.0,33.92,135.0,90.0,11.48,184.6,49.0,8.31,10.9,3,2.94,4.0,True
3301,CA,84,415,No,No,0,280.0,113.0,47.60,202.2,90.0,17.19,156.8,103.0,7.06,10.4,4,2.81,0.0,True
3322,MD,62,408,No,No,0,321.1,105.0,54.59,265.5,122.0,22.57,180.5,72.0,8.12,11.5,2,3.11,4.0,True


###### Indexing

In [7]:
#fetch data with iloc
#df.iloc[row,col]
#list of rows iloc[[row,row,row]]
df.iloc[14] #index 14 = row 15

State                        IA
Account length               62
Area code                   415
International plan           No
Voice mail plan              No
Number vmail messages         0
Total day minutes           NaN
Total day calls             NaN
Total day charge            NaN
Total eve minutes         307.2
Total eve calls            76.0
Total eve charge          26.11
Total night minutes       203.0
Total night calls          99.0
Total night charge         9.14
Total intl minutes         13.1
Total intl calls              6
Total intl charge          3.54
Customer service calls      4.0
Churn                     False
Name: 14, dtype: object

In [40]:
#slicing data
df.iloc[15:23] #range ends at n-1

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
15,NY,161,415,No,No,0,,,,317.8,97.0,27.01,160.6,128.0,7.23,5.4,9,1.46,4.0,True
16,ID,85,408,No,Yes,27,,,,280.9,90.0,23.88,89.3,,4.02,13.8,4,3.73,1.0,False
17,VT,93,510,No,No,0,,,,218.2,111.0,18.55,129.6,121.0,5.83,8.1,3,2.19,3.0,False
18,VA,76,510,No,Yes,33,,,,212.8,65.0,18.09,165.7,108.0,7.46,10.0,5,2.7,1.0,False
19,TX,73,415,No,No,0,,,,,88.0,13.56,192.8,74.0,8.68,13.0,2,3.51,1.0,False
20,FL,147,415,No,No,0,,,,239.7,93.0,20.37,208.8,133.0,9.4,10.6,4,2.86,0.0,False
21,CO,77,408,No,No,0,,,,169.9,121.0,14.44,209.6,64.0,9.43,5.7,6,1.54,5.0,True
22,AZ,130,415,No,No,0,183.0,112.0,31.11,72.9,99.0,6.2,181.8,78.0,8.18,9.5,19,2.57,0.0,


In [44]:
#save a filtered df using loc
#first set index on desired filter col
state = df.copy()
state.set_index('State',inplace=True)
#kwargs: inplace = indexes within the df (clustered)
#state.head()
state.loc['OR']

Unnamed: 0_level_0,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
OR,59,408,No,Yes,28,120.9,97.0,20.55,213.0,92.0,18.11,163.1,116.0,7.34,8.5,5,2.30,2.0,False
OR,116,415,Yes,No,0,215.4,104.0,36.62,204.8,79.0,17.41,278.5,109.0,12.53,12.6,5,3.40,3.0,False
OR,65,415,No,No,0,116.8,87.0,19.86,178.9,93.0,15.21,182.4,150.0,8.21,14.1,2,3.81,1.0,False
OR,38,415,No,No,0,194.4,94.0,33.05,186.7,95.0,15.87,223.3,90.0,10.05,10.8,5,2.92,3.0,False
OR,33,415,No,Yes,29,157.4,99.0,26.76,117.9,80.0,10.02,279.2,79.0,12.56,13.9,11,3.75,4.0,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
OR,124,510,No,No,0,169.3,108.0,28.78,178.6,91.0,15.18,242.3,82.0,10.90,12.2,3,3.29,1.0,False
OR,27,510,No,No,0,232.1,81.0,39.46,210.8,101.0,17.92,165.4,87.0,7.44,15.0,6,4.05,5.0,False
OR,89,415,No,No,0,111.2,101.0,18.90,122.1,94.0,10.38,180.8,85.0,8.14,12.6,2,3.40,3.0,False
OR,61,415,No,No,0,234.2,76.0,39.81,216.7,108.0,18.42,130.6,122.0,5.88,13.9,2,3.75,1.0,False


##### Update

In [49]:
#finding missing values
df.isnull().sum()

State                     0
Account length            0
Area code                 0
International plan        0
Voice mail plan           0
Number vmail messages     0
Total day minutes         0
Total day calls           0
Total day charge          0
Total eve minutes         0
Total eve calls           0
Total eve charge          0
Total night minutes       0
Total night calls         0
Total night charge        0
Total intl minutes        0
Total intl calls          0
Total intl charge         0
Customer service calls    0
Churn                     0
dtype: int64

In [50]:
#dropping missing values
df.dropna(inplace=True)

In [53]:
#drop a column
df.drop('Area code', axis=1)
#kwargs: axis= 0row, 1col

Unnamed: 0,State,Account length,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn
0,KS,128,No,Yes,25,265.1,110.0,45.07,197.4,99.0,16.78,244.7,91.0,11.01,10.0,3,2.70,1.0,False
1,OH,107,No,Yes,26,161.6,123.0,27.47,195.5,103.0,16.62,254.4,103.0,11.45,13.7,3,3.70,1.0,False
2,NJ,137,No,No,0,243.4,114.0,41.38,121.2,110.0,10.30,162.6,104.0,7.32,12.2,5,3.29,0.0,False
3,OH,84,Yes,No,0,299.4,71.0,50.90,61.9,88.0,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False
4,OK,75,Yes,No,0,166.7,113.0,28.34,148.3,122.0,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3328,AZ,192,No,Yes,36,156.2,77.0,26.55,215.5,126.0,18.32,279.1,83.0,12.56,9.9,6,2.67,2.0,False
3329,WV,68,No,No,0,231.1,57.0,39.29,153.4,55.0,13.04,191.3,123.0,8.61,9.6,4,2.59,3.0,False
3330,RI,28,No,No,0,180.8,109.0,30.74,288.8,58.0,24.55,191.9,91.0,8.64,14.1,6,3.81,2.0,False
3331,CT,184,Yes,No,0,213.8,105.0,36.35,159.6,84.0,13.57,139.2,137.0,6.26,5.0,10,1.35,2.0,False


In [54]:
#create calculated column
df['New Column'] = df['Total night minutes'] + df['Total intl minutes']
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,...,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn,New Column
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,...,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False,254.7
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,...,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False,268.1
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,...,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False,174.8
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,...,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False,203.5
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,...,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False,197.0


In [56]:
#update entire column
df['New Column'] = 100
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,...,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn,New Column
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,...,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False,100
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,...,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False,100
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,...,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False,100
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,...,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False,100
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,...,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False,100


In [57]:
#update single value using iloc indexing
df.iloc[0,-1] = 10 #first row, last col
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,...,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn,New Column
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,...,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False,10
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,...,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False,100
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,...,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False,100
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,...,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False,100
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,...,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False,100


In [59]:
#updating using condition (lambda expression) aka map
df['Churn binary'] = df['Churn'].apply(lambda x: 1 if x==True else 0)
df.head()

Unnamed: 0,State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,...,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn,New Column,Churn binary
0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,...,244.7,91.0,11.01,10.0,3,2.7,1.0,False,10,0
1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,...,254.4,103.0,11.45,13.7,3,3.7,1.0,False,100,0
2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,...,162.6,104.0,7.32,12.2,5,3.29,0.0,False,100,0
3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,...,196.9,89.0,8.86,6.6,7,1.78,2.0,False,100,0
4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,...,186.9,121.0,8.41,10.1,3,2.73,3.0,False,100,0


##### Delete/Output DF

In [60]:
#df.to_csv('output.csv')
#df.to_json()
#df.to_html()

',State,Account length,Area code,International plan,Voice mail plan,Number vmail messages,Total day minutes,Total day calls,Total day charge,Total eve minutes,Total eve calls,Total eve charge,Total night minutes,Total night calls,Total night charge,Total intl minutes,Total intl calls,Total intl charge,Customer service calls,Churn,New Column,Churn binary\r\n0,KS,128,415,No,Yes,25,265.1,110.0,45.07,197.4,99.0,16.78,244.7,91.0,11.01,10.0,3,2.7,1.0,False,10,0\r\n1,OH,107,415,No,Yes,26,161.6,123.0,27.47,195.5,103.0,16.62,254.4,103.0,11.45,13.7,3,3.7,1.0,False,100,0\r\n2,NJ,137,415,No,No,0,243.4,114.0,41.38,121.2,110.0,10.3,162.6,104.0,7.32,12.2,5,3.29,0.0,False,100,0\r\n3,OH,84,408,Yes,No,0,299.4,71.0,50.9,61.9,88.0,5.26,196.9,89.0,8.86,6.6,7,1.78,2.0,False,100,0\r\n4,OK,75,415,Yes,No,0,166.7,113.0,28.34,148.3,122.0,12.61,186.9,121.0,8.41,10.1,3,2.73,3.0,False,100,0\r\n5,AL,118,510,Yes,No,0,223.4,98.0,37.98,220.6,101.0,18.75,203.9,118.0,9.18,6.3,6,1.7,0.0,False,100,0\r\n7,MO,147,415,Yes,No,

In [61]:
#delete df
del df

#### Exercises

In [21]:
#make series from list
l1 = [1,2,3]
s1 = pd.Series(l1)
s2 = pd.Series([10,10,10])
print(type(s1), type(s2))

<class 'pandas.core.series.Series'> <class 'pandas.core.series.Series'>


In [20]:
#convert back to list
print(type(list(s1)))

<class 'list'>


In [22]:
type(s1)

pandas.core.series.Series

In [44]:
a = sum(s1,s2)
b = s2-s1
c = s1*s2
d = round(s2/s1, 0)

series = (a,b,c,d)
df = pd.DataFrame(series, index=["sum", "difference", "product", "divide"])
print(df)

               0     1     2
sum         16.0  16.0  16.0
difference   9.0   8.0   7.0
product     10.0  20.0  30.0
divide      10.0   5.0   3.0


In [47]:
#convert dict to series
seriesfromdict = pd.Series({'a': 100, 'b': 200, 'c': 300, 'd': 400, 'e': 800})
print(seriesfromdict)

a    100
b    200
c    300
d    400
e    800
dtype: int64


# Rapid Foundations - SQL

### Day 1-4

#### Notes

**Basic Queries**

SELECT
- TOP `N` [PERCENT] (SQL Server)
- DISTINCT
- **Functions**
  - **Aggregation:** Use group by with these
    - COUNT()
    - AVG()
    - SUM()
    - MAX/MIN()
    - LEN()

FROM

WHERE
- **Operators:** 
  - `<>`, `!=`: not equal
  - `%`: modulus
  - NOT, `!`: can specify any above condition after
  - BETWEEN(): within a range
  - IN(a,b,c): matches these values
    - IN(SELECT): subquery
  - OR/AND
  - LIKE
    - `%`: any number of characters placeholder
    - `_`: one character placeholder
    - `[]`: one character from list [a,b,c] or range [a-c]
      - `[!A-z]`, `[^A-Z]`(SQL Server): character not in list or range

GROUP BY

ORDER BY
- ASC/DESC: can use several columns in order, and use both asc/desc for each column

LIMIT (MySQL)/FETCH `first/last n rows only` (Oracle)

#### HackerRank

##### Basic Select

In [None]:
--cities in usa with population > 100000
SELECT * FROM CITY WHERE COUNTRYCODE = "USA" AND POPULATION > 100000;

--time 4 min

In [None]:
--name of cities in usa with population > 120000
SELECT NAME FROM CITY WHERE POPULATION > 120000 AND COUNTRYCODE = 'USA'

--time 1 min

In [None]:
--cities with id 1661
SELECT * FROM CITY WHERE ID = 1661;

--time 20 sec

In [None]:
--all records
SELECT * FROM CITY;

--time 15 sec

In [None]:
-- Query all attributes of every Japanese city in the CITY table. The COUNTRYCODE for Japan is JPN.
SELECT * FROM CITY WHERE COUNTRYCODE = 'JPN';

--time 20 sec

In [None]:
-- Query the names of all the Japanese cities in the CITY table. The COUNTRYCODE for Japan is JPN.
SELECT NAME FROM CITY WHERE COUNTRYCODE = 'JPN';

--time 25 seconds

In [None]:
-- count - distinct count
SELECT COUNT(CITY) - COUNT(DISTINCT CITY) FROM STATION;

--time 9 min

In [None]:
-- longest and shortest city names
SELECT TOP 1
    CITY, LEN(CITY) as l
FROM STATION
ORDER BY l desc;
SELECT TOP 1
    CITY, LEN(CITY) as l
FROM STATION
ORDER BY l asc;

# 1 Month Prep - Python

## Week 1

Notes:
- Counter(list): returns a dict of element and its count
  - Access counts using list[element]
- XOR `^`: for each unique element records 1 and a recurring element stores 0
  - Useful for finding unique and duplicate elements by seeing their corresponding xor value

In [3]:
# ratio of pos/neg/zeroes
def plusMinus(arr):
    pos = 0
    neg = 0
    zero = 0
    for i in arr:
        if i > 0:
            pos += 1
        elif i < 0:
            neg += 1
        else:
            zero += 1
        
    print(pos/len(arr))
    print(neg/len(arr))
    print(zero/len(arr))

#time: 10 min

In [4]:
# mini max sum
def miniMaxSum(arr):
    print(sum(arr)-max(arr), sum(arr)-min(arr))
    
#largest total of 4 numbers is total sum - smallest number, and vice versa
#time 15 min

In [None]:
def timeConversion(s):
    if s[8:] == "AM":
        if s[:2] == "12":
            return "00"+s[2:-2]
        else:
            return s[:-2]
    else:
        h = int(s[:2])
        if h<12:
            h+=12
        return str(h)+s[2:-2]
        
# first block deals with AM: convert 12 am to 00
# second block deals with PM:
# convert hr to int and add 12 and concat back as str

In [None]:
#leap year function
def is_leap(year):
    leap = False
    
    if year % 4 == 0:               #if year div/4
        if year % 100 == 0:         #and if year div/100
            if year % 400 == 0:     #and if year div/400
                leap = True
            else:
                leap = False
        else:                       #if year div/4 and year ! div/100
            leap = True
    
    
    return leap

#time 9 min

In [None]:
def is_leap(year):
    leap = False
    
    if year % 4 == 0:
        if year % 100 == 0:
            if year % 400 == 0:
                leap = True
            else:
                leap = False
        else:
            leap = True
    
    
    return leap

In [9]:
#takes in str list and query list
from collections import Counter
def matchingStrings(strings, queries):
    stringdict = Counter(strings)  # dict {str:count} where str is key. remember, dict[key] --> value
    
    result = []
    for query in queries:
        result.append(stringdict[query])   #for each query in query list, fetch count by passing in query as key of stringdict
        
    return result

2

In [11]:
#lonely integer: find unique element in list
#inputs: array of ints

#xor operation ^ records a 1 for every unique user input, and 0 for a duplicate input
#xor all elements and append results
#loop xor over user array and append results to result var

def lonelyinteger(a):
    result = 0
    for num in a:
        result ^= num
    return result