# 20-07-13: Daily Practice

---
---

## Daily practices

* [ ] [Practice & learn](#Practice-&-learn)
  * [x] Coding, algorithms & data structures
  * [x] Data science: access, manipulation, analysis, visualization
  * [ ] Engineering: SQL, PySpark, APIs, TDD, OOP
  * [ ] Machine learning: Scikit-learn, TensorFlow, PyTorch
  * [x] Interview questions (out loud)
* [ ] [Meta-data: reading & writing](#Meta-data:-reading-&-writing)
  * [ ] Blog
* [ ] [2-Hour Job Search](#2-Hour-Job-Search)
  * [ ] LAMP List
  * [x] Networking
  * [ ] Social media

---
---

## Practice & learn

---

### Coding, algorithms & data structures

#### [2D Array - DS](https://www.hackerrank.com/challenges/2d-array/problem)

Given a 2D Array, arr:

    1 1 1 0 0 0
    0 1 0 0 0 0
    1 1 1 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0

We define an hourglass in A to be a subset of values with indices falling in this pattern in arr's graphical representation:

    a b c
      d
    e f g

There are 16 hourglasses in arr, and an hourglass sum is the sum of an hourglass' values. Calculate the hourglass sum for every hourglass in arr, then print the maximum hourglass sum.

For example, given the 2D array:

    -9 -9 -9  1 1 1 
     0 -9  0  4 3 2
    -9 -9 -9  1 2 3
     0  0  8  6 6 0
     0  0  0 -2 0 0
     0  0  1  2 4 0

We calculate the following 16 hourglass values:

    -63, -34, -9, 12, 
    -10, 0, 28, 23, 
    -27, -11, -2, 10, 
    9, 17, 25, 18

Our highest hourglass value is 28 from the hourglass:

    0 4 3
      1
    8 6 6

In [1]:
# Complete the hourglassSum function below.
def hourglassSum(arr):
    pass

In [None]:
# === Version 1 === #
# Set up hourglass / "window" index structure
    # [0][:3]
    # [1][1]
    # [2][:3]
# Calculate the sum of the hourglass values, insert into separate array
# Iterate to move the hourglass "window" by 1 along horizontal axis
# Once hourglass edge hits the other edge, move to next index on vertical axis
# Iterate until all 16 sums

---

### Interview questions

> Answer common interview questions **_out loud_**.

* Technical: DS, ML, SE
* Behavioral

#### Technical

##### [Reverse a singly-linked list](https://www.hackerrank.com/challenges/reverse-a-linked-list/problem)

You’re given the pointer to the head node of a linked list. Change the next pointers of the nodes so that their order is reversed. The head pointer given may be null meaning that the initial list is empty.

Sample Input

    1
    5
    1
    2
    3
    4
    5

Sample Output

    5 4 3 2 1 

Explanation

* The initial linked list is: 1 -> 2 -> 3 -> 4 -> 5 -> NULL
* The reversed linked list is: 5 -> 4 -> 3 -> 2 -> 1 -> NULL

In [3]:
"""
HackerRank :: Reverse a singly-linked list

Complete the reverse function below.

For your reference:

SinglyLinkedListNode:
    int data
    SinglyLinkedListNode next
"""


def reverse(head):
    # head node value can be null
    # Keep track of previous node
    prev_node = None
    cur_node = head
    # Loop through - while node.next
    while cur_node:
        # Save node for overwriting cur_node
        next_node = cur_node.next
        # Set current node's next to prev_node
        cur_node.next = prev_node
        # Pass previous node to next iteration
        prev_node = cur_node
        cur_node = next_node

    return prev_node

#### Data Science Practice Interview

* What is survivorship bias and why is it important?
* What is regularization?
* What layers make up a CNN?
* Explain how you would use experiment design to determine user behavior.
* What type of model would be used to create the Amazon "customers also bought" feature?
* What do extrapolation and interpolation mean?

Key takeaways:

* Nail down stories that walk through the various problem frameworks
* Spend more time with learning how things work rather than only how to apply them (CNNs)

---

### Data science

#### Statistical Thinking in Python, Part 2

##### Chapter 1: Parameter estimation by optimization

In [4]:
# === Imports === #
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [5]:
# === Load data === #
nohitters = pd.read_csv("assets/data/mlb_nohitters.csv")
print(nohitters.shape)
nohitters.head()

(294, 5)


Unnamed: 0,date,game_number,winning_team,losing_team,winning_pitcher
0,18760715,140,,,
1,18800612,1035,,,
2,18800617,1046,,,
3,18800819,1177,,,
4,18800820,1179,,,


In [7]:
nohitters["date"] = pd.to_datetime(nohitters["date"])

In [8]:
nohitters.head()

Unnamed: 0,date,game_number,winning_team,losing_team,winning_pitcher
0,1970-01-01 00:00:00.018760715,140,,,
1,1970-01-01 00:00:00.018800612,1035,,,
2,1970-01-01 00:00:00.018800617,1046,,,
3,1970-01-01 00:00:00.018800819,1177,,,
4,1970-01-01 00:00:00.018800820,1179,,,


In [None]:
# === Get times between nohitters === #
# ... transformations and such here

In [None]:
# Seed random number generator
np.random.seed(42)

# Compute mean no-hitter time: tau
tau = np.mean(nohitter_times)

# Draw out of an exponential distribution with parameter tau: inter_nohitter_time
inter_nohitter_time = np.random.exponential(tau, 100000)

# Plot the PDF and label axes
_ = plt.hist(inter_nohitter_time,
             bins=50, density=True, histtype="step")
_ = plt.xlabel('Games between no-hitters')
_ = plt.ylabel('PDF')

# Show the plot
plt.show()

In [None]:
# Create an ECDF from real data: x, y
x, y = ecdf(nohitter_times)

# Create a CDF from theoretical samples: x_theor, y_theor
x_theor, y_theor = ecdf(inter_nohitter_time)

# Overlay the plots
plt.plot(x_theor, y_theor)
plt.plot(x, y, marker=".", linestyle="none")

# Margins and axis labels
plt.margins(.02)
plt.xlabel('Games between no-hitters')
plt.ylabel('CDF')

# Show the plot
plt.show()

In [None]:
# Plot the theoretical CDFs
plt.plot(x_theor, y_theor)
plt.plot(x, y, marker='.', linestyle='none')
plt.margins(0.02)
plt.xlabel('Games between no-hitters')
plt.ylabel('CDF')

# Take samples with half tau: samples_half
samples_half = np.random.exponential(tau/2, 10000)

# Take samples with double tau: samples_double
samples_double = np.random.exponential(tau*2, 10000)

# Generate CDFs from these samples
x_half, y_half = ecdf(samples_half)
x_double, y_double = ecdf(samples_double)

# Plot these CDFs as lines
_ = plt.plot(x_half, y_half)
_ = plt.plot(x_double, y_double)

# Show the plot
plt.show()

---

### Engineering

* SQL
* PySpark
* APIs
* Test-driven development
* Object-oriented practice

---

### Machine learning

* Scikit-learn
* TensorFlow / PyTorch

---
---

## Meta-data: reading & writing

* Blog post
* Social media discussion

---
---

## 2-Hour Job Search

* LAMP List
* Networking

I had one informational interview/meeting today with a colleague at a company which currently has a position open for which I am interested in applying.