# 20-06-24: Daily Data Practice

### 176/366

---

## Daily Practices

* Practice with DS/ML tools and processes
  * [fast.ai course](https://course.fast.ai/)
  * Hands-on ML | NLP In Action | Dive Into Deep Learning | Coursera / guided projects
    * Read, code along, take notes
    * _test yourself on the concepts_ — i.e. do all the chapter exercises
  * Try to hit benchmark accuracies with [UCI ML datasets](https://archive.ics.uci.edu/ml/index.php) or Kaggle
* Coding & Problem Solving Practice
  * HackerRank SQL or Packt SQL Data Analytics
  * Python on HackerRank or similar platform
* Meta Data: Review and write
  * Focus on a topic, review notes and resources, write a blog post about it
* 2-Hour Job Search
* Interviewing
  * Behavioral questions and scenarios
  * Business case walk-throughs
  * Hot-seat DS-related topics for recall practice (under pressure)

---

## Coding & Problem Solving Practice

> Work through practice problems on HackerRank or similar

### Python

### [Time Conversion](https://www.hackerrank.com/challenges/time-conversion/problem)

Given a time in 12-hour AM/PM format, convert it to military (24-hour) time.

Note: Midnight is 12:00:00AM on a 12-hour clock, and 00:00:00 on a 24-hour clock. Noon is 12:00:00PM on a 12-hour clock, and 12:00:00 on a 24-hour clock. 

In [11]:
# === Sample input test === #
time_string = "07:05:45PM"  # Output should be: "19:05:45"

In [14]:
# === Parse the string === #
am_pm = time_string[-2:]  # Get AM/PM
time = time_string[:-2].split(":")  # Split into components
time = [int(x) for x in time]  # Convert to integer
hour, min, sec = time  # Assign components
print(hour, min, sec)

7 5 45


In [15]:
# === Reconstruct as military time === #
if am_pm == "PM":
    hour += 12

hour = str(hour).zfill(2)
min = str(min).zfill(2)
sec = str(sec).zfill(2)
    
time_24 = f"{hour}:{min}:{sec}"
time_24

'19:05:45'

In [18]:
# === Bring it together === #
def timeConversion(s):
    """Converts time string from 12-hour to 24-hour time format."""
    # === Parse the string === #
    am_pm = s[-2:]  # Get AM/PM
    time = s[:-2].split(":")  # Split into components
    hour = int(time[0])  # Only will be changing the hour
    # === Reconstruct as military time === #
    if am_pm == "PM":  # Only change if PM
        hour += 12
    return f"{str(hour).zfill(2)}:{min}:{sec}"

In [32]:
# === Bring it together === #
def timeConversion(s):
    """Converts time string from 12-hour to 24-hour time format."""
    # === Parse the string === #
    am_pm = s[-2:]  # Get AM/PM
    time = s[:-2].split(":")  # Split into components
    hour = int(time[0])  # Only will be changing the hour
    # === Reconstruct as military time === #
    if am_pm == "PM":  # Only change if PM
        if hour < 12:
            hour += 12
    if am_pm == "AM" and hour == 12:
        hour = 0
    # No f-string version
    time[0] = str(hour).zfill(2)
    return ":".join(time)

In [36]:
# === Test cases === #
print(timeConversion("10:10:38AM"))  # Normal case
print(timeConversion("12:00:00AM"))  # Special case 1: midnight
print(timeConversion("12:00:00PM"))  # Special case 2: noon

10:10:38
00:00:00
12:00:00


### [Designer PDF Viewer](https://www.hackerrank.com/challenges/designer-pdf-viewer/problem)

For next time.

### SQL

Applied SQL Analytics workship on Packt

#### Common Table Expressions

* Similar to subqueries
* Create temporary tables using the `WITH` clause
* One advantage is that common table expressions are recursive
  * Recursive common table expressions can reference themselves
  
```SQL
WITH d as (
SELECT * FROM dealerships
  WHERE dealerships.state = 'CA'
  )
SELECT *
FROM salespeople
INNER JOIN d ON d.dealership_id = salespeople.dealership_id
ORDER BY 1;
```

#### Transforming Data

Process the query output data, like removing or subsituting values, or mapping values to other values.

##### The `CASE WHEN` Function

`CASE WHEN` is a function to map values in a column to other values. Here's the general form:

```SQL
CASE WHEN condition1 THEN value1
WHEN condition2 THEN value2
…
WHEN conditionX THEN valueX
ELSE else_value END;
```

Example:

* Return all rows for customers from the customers table
* Add a column `customer_type` that labels a user as:
  * Elite Customer type if they live in postal code 33111
  * Premium Customer type if they live in postal code 33124
  * Otherwise, it will mark the customer as a Standard Customer type.

```SQL
SELECT *,
  CASE WHEN postal_code='33111' THEN 'Elite Customer'
  WHEN postal_code='33124' THEN 'Premium Customer'
  ELSE 'Standard Customer' END
 AS customer_type
FROM customers;
```

Exercise 2.03:

* Customers from the states of MA, NH, VT, ME, CT, or RI
  * Label as New England
* Customers from the states of GA, FL, MS, AL, LA, KY, VA, NC, SC, TN, VI, WV, or AR
  * Labeled as Southeast
* Customers from any other state should be labeled as Other

```SQL
SELECT c.customer_id,
CASE WHEN c.state in ('MA', 'NH', 'VT', 'ME', 'CT', 'RI') THEN 'New England'
WHEN c.state in ('GA', 'FL', 'MS', 'AL', 'LA', 'KY', 'VA', 'NC', 'SC', 'TN', 'VI', 'WV', 'AR') THEN 'Southeast'
ELSE 'Other' END as region
FROM customers c
ORDER BY 1;
```

---

## 2-Hour Job Search

### Executive Summary

#### Extras

As a dedicated and passionate life-long learner, I take pride in my ability to learn both quickly and deeply as needed to accomplish specific goals.

I adapt readily, thriving in interdisciplinary environments where I can apply my wide range of expertise in harnessing data to uncover truths, tell stories and generate value.

I hold a BS in Economics from Cal Poly, SLO, where I was a Div I student-athlete, entreprenuer, musician, and writer.

developing and deploying production-grade software systems with cutting-edge tools and processes.

#### Edit 3

As a dedicated and passionate life-long learner, I take pride in my ability to learn both quickly and deeply as needed to accomplish specific goals. Collaborative, with a flexible, broad mind and a persistent growth mindset, I thrive in both independent and interdisciplinary team environments. I hope to bring expertise in solving complex problems all aspects of developing and deploying production-grade machine learning systems using cutting-edge tools and processes.

#### Edit 4

As a dedicated and passionate life-long learner, I take pride in my ability to learn both quickly and deeply as needed to accomplish specific goals. Collaborative yet independent, with a flexible, broad mind and a persistent growth mindset, I adapt readily and thrive in interdisciplinary environments where I can apply my wide range of expertise in harnessing data to tell stories and generate value.

#### Edit 5

As a dedicated and passionate life-long learner, I take pride in my ability to learn both quickly and deeply as needed. I hold a BS in Economics from Cal Poly, SLO, where I was a Div I student-athlete, entreprenuer, musician, and writer. Collaborative yet independent, with a flexible, broad mind and a persistent growth mindset, I adapt readily and thrive in interdisciplinary environments where I can apply my wide range of expertise in harnessing data to tell stories and generate value.

#### Edit 6

Data professional with a BS in Economics and years of experience building complex, data-driven systems. Collaborative yet independent, with a flexible, broad mind and a persistent growth mindset, I adapt readily and thrive in interdisciplinary environments where I can apply my wide range of expertise in harnessing data to uncover truths, tell stories and generate value.

#### Edit 7

Collaborative yet independent data professional with a BS in Economics and years of experience working on interdisciplinary teams building complex, data-driven software systems. Applied a wide range of technical and interpersonal expertise to successfully implement large manufacturing ERP systems and develop and deploy production-grade Python web apps and APIs serving recommender systems and computer vision models.

---

## Recommender Systems (book)

Three key methods that form the fundamental pillars of research:

* Collaborative filtering
* Content-based
* Knowledge-based

The basic idea of recommender systems is to use various sources of data to infer customer (user) interests.

* _user_: the entity to which the rec is provided
* _item_: the product being recommended

Recommendation analysis is often based on previous interactions between users and items, except in the case of knowledge-based recommender systems, which consider explicit user requirements.

> The basic principle of recommendations: significant dependencies exist between user- and item-centric activity.

### Goals of Recommender Systems

Two primary models for defining the recommendation problem:

* Prediction
  * Predict the rating value for a user-item combination
* Ranking
  * Recommend top-k items for a user, or less commonly, top-k users for an item
  
The common operational and technical goals of a recommender system:

* Relevance
* Novelty: new; something the user didn't know about
* Serendipity: somewhat unexpected recommendations - element of luck - surprising to the user
* Increasing recommendation diversity
  * When all recs are similar, increases risk of user not liking any of the items

### Basic Models of Recommender Systems

The basic models work with two kinds of data:

* User-item interactions (collaborative filtering)
* Attribute information about the users and items (content-based)

Hybrid systems can combine the strengths of various types of systems.

#### Collaborative Filtering Models

* Use the collaborative power of ratings provided by multiple users to make recommendations.
* Main challenge is that the ratings matrices are sparse
* Most models leverage inter-item correlations and/or inter-user correlations for prediction

Two types of methods commonly used in collaborative filtering:

* Memory-based methods
  * AKA neighborhood-based algorithms
  * Neighborhoods defined by:
    * User-based: similarity between rows to find similar users
    * Item-based: similarity between columns to find similar items
  * Pros
    * Simple
    * Easy to explain
  * Cons
    * Do not work well with sparse matrices
* Model-based methods
  * Machine learning and data mining
  * Pros
    * High level of coverage even with sparse matrices
  * Cons
    * More complex
    * Require more data (?)

##### Unary Ratings

Unary ratings are ratings with only one direction. I.e. only a "like", no "dislike". This is very applicable to implicit ratings such as purchasing an item. Just because the user did not purchase another item does not mean they dislike it.

##### Missing Value Analysis

Collaborative filtering can be thought of as a difficult/special case of missing value analysis, which is concerned with imputing values in an incomplete data matrix.

##### Generalized Classification and Regression

Collaborative filtering can also be viewed as a generalization of classification and regression modeling. In the case of the latter, the target is an attribute with missing values. In the case of the latter, any column can have missing values, and there is no clear line between class and feature variables. Also, there is no distinction between training and test rows because any row might have missing values - training and testing _entries_ rather than rows.

#### Content-Based Recommender Systems

The descriptive attributes (content) of items (in combination with the ratings and buying behavior) are used to make recommendations. For each user, the training data is the item content of the items she has bought/rated, with those ratings or buying behavior being the target.

One key advantage of content-based is the ability to make recommendations for new items that don't have rating data.

Disadvantages:

* More likely to provide obvious recommendations (not serendipitous; reduces rec diversity)
* Not good at providing recommendations to new users

For the latter, knowledge-based are good for the cold-start situations.

#### Knowledge-Based Recommender Systems

Recommendations are based on similarities between user requirements and item descriptions/content.

Useful with items that are not purchased often: houses, cars, etc. or in the case of the cold-start problem, when ratings data is not available.

Types of knowledge-based recsys:

* Constrain-based
  * Users specify requirements or constraints on item attributes
  * Ways to guide the search
    * Conversational: preferences determined iteratively with a feedback loop
    * Search-based: use a preset sequence of questions
    * Navigation-based: user makes iterative change requests to the recommendation
* Case-based
  * Specific cases are chosen by the user as targets/anchors, then similarity to those are calculated
  
Knowledge-based systems, because they also depend heavily on the item attributes, also fall prey to the issue of providing "obvious" recommendations.

##### Utility-Based Recommender Systems

Utility function is used to compute the probability of a user liking the item. The main challenge is defining an appropriate utility function for the active user.

A quick note: all recommender systems implicitly rank recommendations on their utility for the target user.

#### Hybrid and Ensemble-Based Recommender Systems

All three of the above methods use different sources of input and work best in different situations; have different sets of strengths and weaknesses. With a variety of input sources comes the possibility and practicality of using multiple methods to get the best of all worlds.

---

## DS + ML Practice

### Some options

* Pick a dataset and try to do X with it
  * Try to hit benchmark accuracies with [UCI ML datasets](https://archive.ics.uci.edu/ml/index.php) or Kaggle
* Practice with the common DS/ML tools and processes
  * Hands-on ML | NLP In Action | Dive Into Deep Learning | Coursera / guided projects
  * Machine learning flashcards

#### _The goal is to be comfortable explaining the entire process._

* Data access / sourcing, cleaning
  * Exploratory data analysis
  * Data wrangling techniques and processes
* Inference
  * Statistics
  * Probability
  * Visualization
* Modeling
  * Implement + justify choice of model / algorithm
  * Track performance + justify choice of metrics
    * Communicate results as relevant to the goal

---

## Interviewing

> Practice answering the most common behavioral and technical interview questions

### Technical

* Business case walk-throughs
* Hot-seat DS-related topics for recall practice (under pressure)

### Behavioral

* "Tell me a bit about yourself"
* "Tell me about a project you've worked on and are proud of"
* "What do you know about our company?"
* "Where do you see yourself in 3-5 years?"
* "Why do you want to work here / want this job?"
* "What makes you most qualified for this role?"
* "What is your greatest strength/weakness?"
  * "What is your greatest technical strength?"
* "Tell me about a time when you had conflict with someone and how you handled it"
* "Tell me about a mistake you made and how you handled it"
* Scenario questions (STAR: situation, task, action, result)
  * Success story / biggest accomplishment
  * Greatest challenge (overcome)
  * Persuaded someone who did not agree with you
  * Dealt with and resolved a conflict (among team members)
  * Led a team / showed leadership skills or aptitude
  * How you've dealt with stress / stressful situations
  * Most difficult problem encountered in previous job; how you solved it
  * Solved a problem creatively
  * Exceeded expectations to get a job done
  * Showed initiative
  * Something that's not on your resume
  * Example of important goal you set and how you reached it
  * A time you failed
* "Do you have any questions for me?"
  * What is your favorite aspect of working here?
  * What has your journey looked like at the company?
  * What are some challenges you face in your position?