# Technical Takehome for ID.Me

**Instructions**: The narrative below presents a hypothetical situation where you are an analytics engineer at a SAAS startup and poses some scenarios in a chronological order. Use your previous work experience and judgment to augment the details provided. Build your own narrative around the fictional SAAS company and your role there, and then detail how you would build an analytics foundation that solves their data needs.

Answers do not need to be isolated to the immediate preceding prompt; you may use previously given information as well as context from your previous answers.

DogDB is a California based company that has developed a novel NoCATS database and offers a managed, hosted solution as a monthly SAAS subscription with free, medium (`$50/mo`), and enterprise tiers (`$500/mo`). DogDB is seeing their number of customer accounts skyrocket (“up and to the right”) and have hired you as the first dedicated analytics engineer to help them understand and scale their data capabilities, in anticipation of an incipient funding round.

DogDB sells their service through a web-facing rails application. Here, a DogDB customer can sign up for an account, choose a pricing tier, and configure their NoCATS deployment. The accounting settings and configurations are stored in a PostgreSQL database.

In [41]:
# Python Packages for SQL ALchemy, PostgreSQL and ipython-sql
import sqlalchemy
import psycopg2 
%load_ext sql

engine = sqlalchemy.create_engine('postgresql://postgres:t!iv^8G@localhost:5432/dogdb')
%sql $engine.url

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


'Connected: postgres@dogdb'

In [39]:
%sql select * from customer_accounts

 * postgresql://postgres:***@localhost:5432/dogdb
3 rows affected.


account_id,email,current_tier,created_at,updated_at
1,abc@123.com,Free,2019-05-01 21:13:05.156042,2019-05-01 21:13:11.804514
2,123@abc.com,Medium,2019-07-12 16:05:02.414454,2020-01-04 17:23:05.594305
3,hello@world.com,Enterprise,2019-07-23 12:26:47.571431,2019-09-17 04:32:32.493065


In [40]:
%sql select * from customer_interactions

 * postgresql://postgres:***@localhost:5432/dogdb
4 rows affected.


account_id,channel,category,service_rep,status,created_at,completed_at
1,web,Tech Support,Andy,resolved,2021-01-25 19:11:35.295813,2021-01-25 19:13:52.812371
1,email,Billing,Jillian,open,2021-04-06 22:23:09.581234,
3,web,Billing,Monica,resolved,2021-11-13 06:25:54.821374,2021-11-15 12:19:33.882136
7,phone,Account Change,Derek,canceled,2022-02-14 15:02:47.219352,2022-02-20 09:22:48.145523


In [35]:
%sql select * from customer_licenses

 * postgresql://postgres:***@localhost:5432/dogdb
4 rows affected.


account_id,license_data,created_at,updated_at
1,"{“license_id”:“d17cb11cda9ba249c22f67e4aed65d0f65f1a80c”,“role”: “analyst”,“status”: “active”}",2022-03-12 02:56:37.652093,2022-03-12 02:56:37.652093
6,"{“license_id”:“be49ad8f4a68fbbdd1674b41da20759f54b0e930”,“role”: “developer”,“status”: “active”}",2021-05-28 04:42:58.955093,2021-05-28 04:42:58.955093
6,"{“license_id”:“8541866bb3a4c4ecf070b2c1b2f7bb9c0934d287”,“role”: “admin”,“status”: “active”}",2022-10-30 21:33:46.353060,2022-10-30 21:33:46.353060
35,"{“license_id”:“60831f59a531eef325e525ad58bae0e5e8c2d75a”,“role”: “developer”,“status”: “disabled”}",2021-03-26 02:38:02.136033,2022-07-21 23:03:29.862040


## Question 1:

Based on the table design above, what are your initial thoughts about DogDB’s data tracking? What are some of the advantages (if any) of their data models, and what are the shortcomings (if any) you foresee in DogDB’s future?

### Answer:

The current model of DogDB does an adequate job of capturing the basic needs for the organization to determine who their customers are, how their interactions are managed and which accounts have an attributed license.  There is an `account_id` featured in each table for simple joins between the data for analysis.

However there exist several deficiencies I see as the Analytics Engineer responsible for managing the health of the ecosystem:

1. **Issues with Primary Keys** Primary Keys are lacking on the `Customer Interactions` and `Customer Licenses` tables.  This is integral to the cardinality of these tables.  Without a unique identifier in these tables, you could end up with duplicate entries, which could throw off analytic aggregations and counts.  Modern SQL platforms also include optimizations for query indexing on Primary keys that could improve the speed of joins and aggregations on the tables.


2. **Issues with Data Types** The `Customer Licenses` table's key information is formatted in a JSON blob, which is notoriously difficult for SQL to parse and access.  In this blob are key indicators, such as the licenses' status, and a `license_id` which could provide uniqueness to the table.  This is important for the business to understand _which customers have active or multiple licenses_ for functionality, billing and access to services. The noted field here of `status` with differing sets of `roles` implies functionality of the licenses which could have security, privacy or access vulnerabilities to the application.  


3. **Issues with `account_id`** With the implication of desired scalability, the important identifier of `account_id` is a numerical integer which presumably increases with each entry.  For a fast scaling organization, this is a short-sighted choice.  Key identifiers such as `account_id` should exist in an alphanumeric format that can be auto-incremented or prepared the column to be generated as the column's identity such as [this PostgreSQL exmaple](https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-identity-column/).


4. **Tables with Conflicting Purposes** In schema design, it's easy to create tables that can focus on solving multiple issues.  Here the `Customer Accounts` table attempts to manage the Customer uniqueness _and_ the pricing structure for that customer.  Pricing plans are an important distinguisher of customer status, and as such deserve their own table.  What if pricing needs to change in the future?  What if a customer wishes to upgrade or purchase new services?  These questions reveal the need for payment terms to be their own table rather than tied to the `Customer Accounts` table.  Similarly the `Customer Interactions` table should singularly manage the interactions with Customers while allowing service representatives to have their own table.  This allows more information to be collected on a per ticket basis, while also allowing more information related to the service representative to exist elsewhere.  

Overall I believe the current structure of these tables could lead to inaccurate counts and analytics while also overlaoding tables with unrelated information.  This is a parallel issue to not being equipped to scale.  In the given example, scaling is an important factor for this data model, so efforts should be taken to ensure the key tables have room to breathe while also maintaining clear delination of responsibilities.

Below, I've utilized PostgreSQL to create new tables as I see their needs.

* `customer_accounts_new` now is entirely focused on Customer data
* `customer_pricing_plans` houses the Customer pricing plans with a numerical `monthly_charge` column to reflect prices in the prompt
* `customer_interactions_new` has its own Primary Key and allows Serivce reps to be in their own table
* `service_representatives` exists to track support agents that could leave the org or need more detail
* `customer_licenses_new` gains the most from breaking out the JSON data into their own columns, restoring its uniqueness with the `license_id` now becoming its primary key

In [42]:
%sql select * from customer_accounts_new

 * postgresql://postgres:***@localhost:5432/dogdb
6 rows affected.


id,email,pricing_plan_id,created_at,updated_at
1,abc@123.com,1,2019-05-01 21:13:05.156042,2019-05-01 21:13:11.804514
2,123@abc.com,2,2019-07-12 16:05:02.414454,2020-01-04 17:23:05.594305
3,hello@world.com,3,2019-07-23 12:26:47.571431,2019-09-17 04:32:32.493065
6,ilovedogs@dogs.com,1,2024-01-01 21:13:11.804514,2024-01-01 21:13:11.804514
7,fakedatarules@gophermail.com,2,2024-02-16 17:23:05.594305,2024-02-16 17:23:05.594305
35,skidboot@heelermail.com,3,2024-03-08 04:32:32.493065,2024-03-08 04:32:32.493065


In [46]:
%sql select * from customer_pricing_plans

 * postgresql://postgres:***@localhost:5432/dogdb
3 rows affected.


id,name,monthly_charge,created_date,updated_date
1,Free,0,2019-07-12,2019-07-12
2,Medium,50,2019-07-12,2019-07-12
3,Enterprise,500,2019-07-23,2019-07-23


In [43]:
%sql select * from customer_interactions_new

 * postgresql://postgres:***@localhost:5432/dogdb
4 rows affected.


id,account_id,channel,category,service_rep_id,status,created_at,updated_at
1,1,web,Tech Support,1,resolved,2021-01-25 19:11:35.295813,2021-01-25 19:13:52.812371
2,1,email,Billing,2,open,2021-04-06 22:23:09.581234,
3,3,web,Billing,3,resolved,2021-11-13 06:25:54.821374,2021-11-15 12:19:33.882136
4,7,phone,Account Change,4,canceled,2022-02-14 15:02:47.219352,2022-02-20 09:22:48.145523


In [45]:
%sql select * from service_representatives

 * postgresql://postgres:***@localhost:5432/dogdb
4 rows affected.


id,email,first_name,last_name,start_date,end_date
1,andybotwin@dogdb.com,Andy,Botwin,2021-01-01,
2,jillianbelk@dogdb.com,Jillian,Belk,2021-01-01,
3,monicageller@dogdb.com,Monica,Geller,2021-02-01,
4,derek.hostetler@dogdb.com,Derek,Hostetler,2020-03-08,2023-12-25


In [47]:
%sql select * from customer_licenses_new

 * postgresql://postgres:***@localhost:5432/dogdb
4 rows affected.


id,account_id,role,status,created_at,updated_at
d17cb11cda9ba249c22f67e4aed65d0f65f1a80c,1,analyst,active,2022-03-12 02:56:37.652093,2022-03-12 02:56:37.652093
be49ad8f4a68fbbdd1674b41da20759f54b0e930,6,developer,active,2021-05-28 04:42:58.955093,2021-05-28 04:42:58.955093
8541866bb3a4c4ecf070b2c1b2f7bb9c0934d287,6,admin,active,2022-10-30 21:33:46.353060,2022-10-30 21:33:46.353060
60831f59a531eef325e525ad58bae0e5e8c2d75a,35,developer,disabled,2021-03-26 02:38:02.136033,2022-07-21 23:03:29.862040


## Question 2:

Describe what the engineering team is most likely doing currently to support accounting in terms of process. Include the queries they are running if you think you can take a guess at what they are. What are some of the shortcomings of the current process?


## Question 3:


You have been tasked with designing a model to provide the data for the analyst. How would you structure the output? Include the query you would use to create the model.


## Question 4:

What are the troubleshooting steps you would take to identify the problem? Based on experience you’ve had in the past, develop a hypothetical narrative for identifying what the problem is and then the steps to address fixing it.


## Question 5:


Based on what you understand of DogDB so far, what would your first week at DogDB look like? What is your number one priority to try and change?

# Personal Review of Assignment

