# Synthetic Dataset Creation for Mid Semester Project

Here we create the synthetic datasets which students will use for their mid-semester project.

**Goals of the Dataset**

* Require multiple table joins across multiple dimenstions
* Rows are timestamped, students will need to do out of time joins to create training data

## Description of the Problem (What students will see)

You work in the data science department at _Dr. D's Direct-Sales Paperclip Company_.
Dr. D believes in selling paperclips the old fashioned way: over the phone.
In the paperclip business, knowing your customer is key.
Matching each customer with the correct salesperson is very important to ensure a successful sale.
However, historically Dr. D has simply let the process be completely random.

Recently, Dr. D decided to try to use machine learning to improve the company.
Using a historical dataset of sales data, he has assigned you and your team to create a model which will help assign salespeople to calls they should be making that day.
Each day you get a list of potential customers, and a list of salespeople working that day and must match each sales person to make the call they are best at, in order to maximize revenue that day.
Salespeople have six hours to make calls each day.
Calls last between 1 and 60 minutes (selling paperclips is complicated work!)

**Facts**

1. There are a certain number of salespeople who work at Dr D's whose job it is to make sales calls.
2. Sales people have enough time to make between 20 and 30 calls per day.
3. Each day, there is a list of customers which we would like to call.

**Tables**

You have access to a historical dataset which contains the following information

1. The salesperson table, which gives information about the sales team members
2. The customers table, which gives information about specific customers
3. The calls table, which details specific sales calls which have taken place. Including who made the call, to whom, how long the call took, etc.
4. The orders table, which details information about actual purchases.

### Data Dictionary

* `salesperson`

Column Name | Datatype | Desc
------------|----------|------
`id`|INT (unique)|The salesmember ID
`start_date`|DATE|The date this salesmember started working at the company
`end_date`|DATE|The date this salesmember ended working at the company. NULL if still employed
`title`|STRING|One of: "junior", "midlevel", "senior"

* `customer`

Column Name | Datatype | Desc
------------|----------|------
`id`|INT (unique)|The customer ID
`customer_type`|STRING|One of: "business", "personal", "other"
`industry`|STRING|Which industry this customer is in. NULL for "personal" customers.
`sub_industry`|STRING|Subcategory of industry.
`n_employees`|INT|Total number of people who work at the company.
`company_value`|FLOAT|Value of the company, if publically known.

* `calls`

Column Name | Datatype | Desc
------------|----------|------
`date`|DATE|Date of the call
`salesperson_id`|INT|ID number of the sales person who made the call
`customer_id`|INT|ID number of the customer the salesperson reached out to
`call_time_min`|INT|Duration of the call in minutes

* `sales`

Column Name | Datatype | Desc
------------|----------|------
`customer_id`|INT|ID number of the customer the salesperson reached out to
`date`|DATE|Date of sale
`amount`|FLOAT|Total amount of the sale

## Generating the Data

Here's the plan.

We randomly generate about 20 salespeople and around 1000 companies.

### Generating Customers

The goal here is to create a collection of potential customers.

For each customer, we generate a unit vector in $\mathbb{R}^3$ (maybe 4 or 5?) which represents thier _true personality._
This vector is hidden from the student.

1. Select $N$ customers, $N << 1000$ (around 30 or so). These are the prototype customers.
2. For each prototype customer, we generate features randomly.
3. For all other customers, we assign each customer to the prototype it is nearest to (via the personality vector).
4. Generate all the features of the prototype customer with respect to distributions centered around the features of the prototype it was assigned to.

As a sanity check, train a classification algorithm which predicts which prototype a given customer was assined to.
If accuracy < 90 or > 98%, then retune the parameters of the above algorithm.
We want the companies to be pretty similar to their prototype, but not _exactly_ the same.

Lastly, generate two more hidden parameters:

* Average Order Size
* Refill time

### Generating Salespeople

Create around 20 salespeople, 25% "junior", 50% "mid-level" and 25% "senior"
For each salesperson, create a list of vectors in $\mathbb{R}^3$.

* Junior -> 1-3 vectors
* Mid-level -> 2-5 vectors
* Senior -> 4-7 vectors

In addition, we generate to more random parameters:

* _charisma_ between 0.0 0.1.
* _efficiency_ between 15 and 45.


### Determining a Sale

Suppose salesperson $S$ calls customer $C$. 
Let $\vec{c}$ be $C$'s personality vector, $\{\vec{s}_i\}$ be $S$'s set of vectors, and $c_S$ be $S$'s charisma.
We can calculate the probability of a successful sale by taking the dot product of $\vec{c}$ with each of $S$'s vectors and taking the max.
If this is < 0, set the probability to zero.
Lastly, add $S$'s charisma, but to not let $P$ exceed .9.

```python
# Code which does the above

def sale_proba(customer_vec: np.ndarray, sales_vecs: List[np.ndarray], charisma: float):
    ps = [customer_vec @ s for s in sales_vecs]
    max_p = max(ps + [0])
    final_p = min(max_p + charisma, .95)

    return final_p
```

### Generating Historical Calls

Historically, Dr. D. has just assigned customers to salespeople totally randomly.

* For each date, we first determine which salespeople were working at Dr. D's that day.
* Shuffle the list of customers, removing all customers which had a sale < 5 days ago.
* For each salesperson:
  - Generate a call time using a distribution centered at _efficieny_
  - If the generated time > remaining time, end this salesperson
  - Pull the next customer off the sorted list
  - Determine the probability of sale
  - If a sale occurs, then generate a sale (explained below)
  - Subtract the call time from the remaining time for making calls
  - Repeat

### Generating Sales

When generating a sale, assign the sale date to the date of the call that generated the sale.
Companies will not make a purchase if there has not been enough time. If time since last sale < refill time, ommit this sale.
Sale ammount is sampled from a log-normal distribution, centered at average order size.

In [2]:
import numpy as np
import matplotlib.pyplot as plt

### Define parameters:


* `personality_vector_dim` - Number of elements in the personality vector
* `n_customers` - Total number of unique customers
* `n_prototype_customers` - Total number of prototype customers
* `ratio_personal` - Ratio of customers which are people (not businesses)
* `p_match_sub_industry` - Probablity that a derived customer will match the prototype's industry
* `mean_value` - Average value of companies (will be log-distributed)
* `std` - Standard div. for company value

In [145]:
## Parameters for the generation

# Customer Parameters

personality_vector_dim = 12
n_customers = 10_000
n_prototype_customers = 100
ratio_personal = .05
p_match_sub_industry = .9
mean_value = 1_000_000
std = 1

# Sales Person Parameters
n_senior = 6
n_mid_level = 24
n_junior = 5

# Historical Data Parameters
n_days_to_generate = 365*5
start_date = pd.Timestamp(2020, 1, 1)

# List of industries

industries = {
    "Agriculture": ["Farming", "Crop Production", "Livestock"],
    "Forestry": ["Timber", "Pulp", "Conservation"],
    "Fishing and Aquaculture": ["Commercial Fishing", "Fish Farming"],
    "Mining and Quarrying": ["Coal", "Metals", "Minerals"],
    "Oil and Gas Extraction": ["Exploration", "Refining", "Distribution"],
    "Utilities": ["Electricity", "Water", "Gas", "Sewage"],
    "Construction": ["Residential", "Commercial", "Infrastructure"],
    "Real Estate": ["Property Development", "Brokerage", "Appraisal"],
    "Manufacturing - Automotive": ["Vehicles", "Parts", "Heavy Equipment"],
    "Manufacturing - Electronics": ["Consumer Devices", "Semiconductors", "Components"],
    "Manufacturing - Textiles and Apparel": ["Clothing", "Footwear", "Accessories"],
    "Manufacturing - Chemicals and Pharmaceuticals": ["Drugs", "Industrial Chemicals", "Cosmetics"],
    "Manufacturing - Food and Beverage": ["Packaged Foods", "Beverages", "Agriculture Processing"],
    "Manufacturing - Metals and Machinery": ["Steel", "Tools", "Industrial Equipment"],
    "Transportation and Logistics": ["Trucking", "Rail", "Shipping", "Delivery"],
    "Warehousing and Storage": ["Distribution Centers", "Cold Storage", "Fulfillment"],
    "Retail Trade": ["Brick-and-Mortar", "Online Retail", "Wholesale Clubs"],
    "Wholesale Trade": ["Durable Goods", "Non-Durable Goods"],
    "Information Technology": ["Software", "Cloud Services", "Hardware"],
    "Telecommunications": ["Mobile Networks", "Internet Providers", "Satellites"],
    "Internet and E-Commerce": ["Online Marketplaces", "Digital Platforms", "SaaS"],
    "Financial Services - Banking": ["Commercial", "Retail", "Investment"],
    "Financial Services - Fintech": ["Payments", "Digital Lending", "Blockchain"],
    "Financial Services - Insurance": ["Health", "Property", "Life", "Reinsurance"],
    "Accounting and Auditing": ["Bookkeeping", "Tax Services", "Compliance"],
    "Legal Services": ["Law Firms", "Arbitration", "Legal Tech"],
    "Consulting and Professional Services": ["Management", "Strategy", "HR", "Engineering"],
    "Healthcare and Medical Services": ["Hospitals", "Clinics", "Nursing", "Diagnostics"],
    "Biotechnology": ["Genomics", "Medical Devices", "Life Sciences"],
    "Education and Training": ["K-12", "Higher Education", "EdTech"],
    "Media and Entertainment": ["Film", "Television", "Music", "Streaming"],
    "Arts and Culture": ["Museums", "Performing Arts", "Design", "Publishing"],
    "Hospitality and Tourism": ["Hotels", "Travel Agencies", "Events"],
    "Restaurants and Food Service": ["Catering", "Fast Food", "Fine Dining"],
    "Sports and Recreation": ["Professional Sports", "Fitness", "Outdoor Recreation"],
    "Public Administration and Government": ["Local", "State", "Federal"],
    "Defense and Security": ["Military", "Private Security", "Cybersecurity"],
    "Environmental Services and Waste Management": ["Recycling", "Pollution Control", "Sustainability"],
    "Energy and Renewables": ["Solar", "Wind", "Nuclear", "Fossil Fuels"],
    "Transportation Equipment and Aerospace": ["Aircraft", "Spacecraft", "Defense Manufacturing"],
    "Nonprofit and Charitable Organizations": ["Foundations", "NGOs", "Advocacy"],
    "Research and Development": ["Scientific Research", "Industrial Innovation", "Labs"]
}

n_industries = len(industries)

In [3]:
# Code to generate random unit vector

def generate_random_unit_vector(dim: int, n_vectors: int) -> np.ndarray:
    result = np.random.normal(size=(n_vectors, dim))
    result /= np.linalg.norm(result, axis=1, keepdims=True)

    return result

In [4]:
# Step 1: Generate personality vectors for customers

customer_personalities = generate_random_unit_vector(personality_vector_dim, n_customers)

prototype_customer_personalities = customer_personalities[:n_prototype_customers]
derived_customer_personnalities = customer_personalities[n_prototype_customers:]

prototype_customer_personalities.shape, derived_customer_personnalities.shape

((100, 12), (9900, 12))

In [5]:
from dataclasses import dataclass
import uuid

@dataclass
class Customer:
    customer_id: uuid.UUID
    personality: np.ndarray
    customer_type: str
    industry: str
    sub_industry: str
    n_employees: int
    company_value: float
    average_order: float
    reorder_time: float
    is_prototype: bool
    prototype_idx: int

In [6]:
# Step 2: Randomly create each prototype customer

prototypes = []

for p in prototype_customer_personalities:
    # Generate random features
    industry = np.random.choice(list(industries))
    sub_industry = np.random.choice(industries[industry])
    value = np.round(
        np.random.lognormal(mean=np.log(mean_value), sigma=std)
    )
    n_employees = int(value // 500 + np.random.normal(100, 100)) + 20
    average_order = np.abs(np.random.normal(value / 10000, 1000)) + 100
    reorder_time = np.random.randint(14, 50)

    # Create a new customer object
    c = Customer(
        customer_id=uuid.uuid4(),
        personality=p,
        customer_type="business",
        industry=industry,
        sub_industry=sub_industry,
        company_value=value,
        n_employees=n_employees,
        average_order=average_order,
        reorder_time=reorder_time,
        is_prototype=True,
        prototype_idx=-1,
    )

    prototypes.append(c)

In [7]:
# Step 3: Create derived customers using prototypes

derived = []

for d in derived_customer_personnalities:
    is_business = np.random.uniform() >= ratio_personal 

    # Find the closest prototype
    my_prototype_idx = np.argmax(prototype_customer_personalities @ d)
    my_prototype = prototypes[my_prototype_idx]

    if is_business:
        customer_type = "business"

        industry = my_prototype.industry
        if np.random.uniform() < p_match_sub_industry:
            sub_industry = my_prototype.sub_industry
        else:
            sub_industry = np.random.choice(industries[industry])
        value = np.round(
            np.random.lognormal(mean=np.log(my_prototype.company_value), sigma=std / 10)
        )
        n_employees = int(np.random.normal(my_prototype.n_employees, my_prototype.n_employees // 5))
        
    else:
        # Let the first component of the personality vector determine the type
        customer_type = "personal" if d[0] < 0 else "other"
        
        industry = None
        sub_industry = None
    
        value = np.random.normal(100, 25)
        n_employees = None

    # These are unique to each customer
    average_order = np.abs(np.random.normal(value / 10000, 1000)) + 100
    reorder_time = np.random.randint(14, 50)

    # Create a new customer object
    c = Customer(
        customer_id=uuid.uuid4(),
        personality=d,
        customer_type=customer_type,
        industry=industry,
        sub_industry=sub_industry,
        company_value=value,
        n_employees=n_employees,
        average_order=average_order,
        reorder_time=reorder_time,
        is_prototype=False,
        prototype_idx=my_prototype_idx,
    )

    derived.append(c)

In [40]:
customers = prototypes + derived

In [174]:
# Step 4: Create database table that will be included in the project

import pandas as pd

customer_df = pd.DataFrame(customers)
customer_table = customer_df[['customer_id', 'customer_type', 'industry', 'sub_industry', 'n_employees', 'company_value',]].copy()
customer_table

Unnamed: 0,customer_id,customer_type,industry,sub_industry,n_employees,company_value
0,f500b080-3627-451d-9fb7-02d47ac58089,business,Mining and Quarrying,Minerals,2633.0,1013530.0
1,41cc3b47-3444-4538-898c-6de7f11c7da6,business,Telecommunications,Mobile Networks,442.0,334769.0
2,6901bb2c-cc3b-4642-91f0-7aec02d0a22b,business,Arts and Culture,Museums,160.0,193327.0
3,f31bb757-8c27-4749-8ac5-ea7eee28575c,business,Financial Services - Insurance,Reinsurance,6909.0,2757589.0
4,c0cb65e2-756b-467d-a6a0-ca1b3bf43ef3,business,Arts and Culture,Museums,2127.0,1000922.0
...,...,...,...,...,...,...
9995,b04a12e0-9e2f-4932-ba5d-45c596511842,business,Accounting and Auditing,Compliance,1744.0,536592.0
9996,ebfe82c8-0aeb-40a7-a5ae-c8b43a1ea762,business,Information Technology,Software,1464.0,597275.0
9997,2dd9594e-2533-481d-9239-5f277a1f3373,business,Construction,Infrastructure,1322.0,583963.0
9998,09b3c0e9-2af7-4afa-8410-539c65830ea6,business,Media and Entertainment,Streaming,3062.0,1234156.0


## Testing out the customer table

This next part is just a test.
I want to see if the generated customers are similar enough to the prototypes.
So ideally, a model will be able to predict prototype company 90% of the time.
(Not too similar).

In [10]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

X = customer_table.iloc[: ,1:] 
X = pd.get_dummies(X).astype(float)

y = customer_df['prototype_idx'].values
y[:40] = np.arange(40)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2)

In [11]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

rf = RandomForestClassifier()
rf.fit(X_train, y_train)

y_hat = rf.predict(X_test)

accuracy_score(y_test, y_hat)

0.8805

I think this is okay for now.

## Generating Sales People

Sales people have a collection of personality vectors which determine how closely their sales style
fits the customer.
Depending on how junior or senior they are determines the number of vectors.

The vectors themselves are not going to be corelated with each other.

In [26]:
@dataclass
class SalesPerson:
    sales_id: uuid.UUID
    title: str
    personalities: np.ndarray
    charisma: float
    efficiency: float


In [None]:
sales_people = []

titles = ["junior"] * n_junior + ["midlevel"] * n_mid_level + ["senior"] * n_senior

for t in titles:
    if t == "junior":
        n_vecs = np.random.randint(1, 4)
    elif t == "midlevel":
        n_vecs = np.random.randint(3, 6)
    else:
        n_vecs = np.random.randint(4, 9)
    
    personalities = generate_random_unit_vector(dim=personality_vector_dim, n_vectors=n_vecs)
    charisma = np.random.uniform(0, .1)
    efficiency = np.random.uniform(15, 45)

    s = SalesPerson(
        sales_id=uuid.uuid4(),
        title=t,
        personalities=personalities,
        charisma=charisma,
        efficiency=efficiency,
    )
    sales_people.append(s)

In [177]:
sales_people_df = pd.DataFrame(sales_people)
sales_people_table = sales_people_df[['sales_id', 'title']].copy()
sales_people_table

Unnamed: 0,sales_id,title
0,b923cfa1-9cb1-41e9-9a51-aa8fbdc1b6dd,junior
1,46302332-5fae-47c4-8df7-b9fa260d9679,junior
2,7de678e3-6a3e-4913-84e9-30d891fd9e63,junior
3,e7cebfe6-2876-4c51-bd64-16776df68ed7,junior
4,96162858-ec84-4d98-a9bb-b6528fa95fab,junior
5,402d1863-424b-4cfc-b686-c50dc874fdfc,midlevel
6,3a41b15c-ae64-4eee-8d7e-fe2119c53e40,midlevel
7,3d7eaab9-f835-4e1b-bcf9-a0adf1163fcd,midlevel
8,a6a6f60c-1966-4424-bf38-4f80a45fc7c5,midlevel
9,c77b5c6c-6538-46a7-99d7-5499eb997256,midlevel


## Generating Historical Calls and Sales Data

Up until now, Dr. D has been having his sales team make calls randomly.
We start out by creating a day's worth of new data.

1. Shuffle the list of customers
2. Shuffle the list of salespeople - the first 5 get the day off. 
3. For each salesperson

* Call the next customer on the list
* Determine the probablity of sale, taking into account the time since the last order, and personalities of the salesperson and customer
* Randomly generate a sale with probability
* If a sale happens, add it to the sales table
* Generate a call time and remove the generated time from the remaining time of the day for this salesperson
* Add a new row to the calls table
* If there is < 60 minutes remaining, repeate all steps

In [80]:
def calculate_sale_probablity(
    sales_person: SalesPerson,
    customer: Customer,
    days_since_last_sale: int,
) -> float:
    if days_since_last_sale < customer.reorder_time :
        # Customers won't make orders if they don't need more paperclips
        return 0.0
    
    # Calculate the compatibility between the salesperson and the customer
    p = (sales_person.personalities @ customer.personality).max()

    # Add the salesperson charisma
    p = p + sales_person.charisma

    # Return a probablity
    return np.clip(p, 0.0, 1.0)

In [109]:
@dataclass
class Call:
    date: pd.Timestamp
    sales_id: uuid.UUID
    customer_id: uuid.UUID
    call_time: int

@dataclass
class Sale:
    customer_id: uuid.UUID
    date: pd.Timestamp
    amount: float

In [116]:
customers[0]

Customer(customer_id=UUID('cba71cbb-f5e5-445f-a33e-b2cfa73ff898'), personality=array([-0.10290786,  0.263382  ,  0.0827084 , -0.03165898, -0.14221877,
        0.66480392, -0.40236302,  0.17915105, -0.23858219,  0.4257064 ,
       -0.11548808, -0.06730891]), customer_type='business', industry='Manufacturing - Electronics', sub_industry='Consumer Devices', n_employees=8674, company_value=3777379.0, average_order=824.7550512948935, reorder_time=18, is_prototype=False, prototype_idx=3)

In [120]:
def get_order_amount(customer: Customer) -> float:
    """
    Calculate the order amount. Sampled from a normal distribution
    centered at customer's average order (avg) with standard dev = avg / 10
    """
    min_order = 10.0
    
    std = np.minimum(customer.average_order / 10.0, 1.0)
    result = np.random.normal(customer.average_order, std)

    result = np.maximum(result, min_order)

    return np.round(result, 2)

In [146]:
import tqdm

In [207]:
historical_dates = pd.date_range(start=start_date, freq='D', periods=n_days_to_generate)

last_sale_lookup = {}

calls = []
sales = []

for date in tqdm.notebook.tqdm(historical_dates):
    # Shuffle the customers
    np.random.shuffle(customers)

    customer_idx = 0

    np.random.shuffle(sales_people)

    # The first 5 salespeople have the day off!
    for sp in sales_people[6:]:
        work_day_minutes = 360 # 6 hours of time per person
        while work_day_minutes > 0:
            customer = customers[customer_idx]
            customer_idx += 1
            # Note: This ^^ is technically a potential IndexError but
            #       it won't happen with the numbers we chose.
            
            # Call time is the salesperson effiency +/- 5 minutes
            call_time = int(sp.efficiency + np.random.randint(-5, 6))
            work_day_minutes -= call_time

            # Only make this call if there is remaining time
            if work_day_minutes > 0:
                call = Call(
                    date=date,
                    sales_id=sp.sales_id,
                    customer_id=customer.customer_id,
                    call_time=call_time,
                )
    
                calls.append(call)
                
                # Get the last sale date. If none, use Jan 1, 1900
                last_sale_date = last_sale_lookup.get(customer.customer_id, pd.Timestamp(1900, 1, 1))
                days_since_last_sale = (date - last_sale_date).days
                                                      
                sale_proba = calculate_sale_probablity(sp, customer, days_since_last_sale)
    
                p = np.random.uniform(0.0, 1.0)
                if p < sale_proba:
                    sale = Sale(
                        customer_id=customer.customer_id,
                        date=date,
                        amount=get_order_amount(customer),
                    )
                    sales.append(sale)
                    last_sale_lookup[customer.customer_id] = date

print("Total calls:", len(calls))
print("Sales volume:", len(sales))

  0%|          | 0/1825 [00:00<?, ?it/s]

Total calls: 634842
Sales volume: 158670


In [152]:
calls_table = pd.DataFrame(calls)

In [153]:
sales_table = pd.DataFrame(sales)

# A little Chaos

To make the student's lives a little harder, we will now randomly delete around .2% of the data from the customer's table.

In [208]:
mask = np.random.rand(*customer_table.shape) < 0.002
customer_table_nulled = customer_table.copy()
customer_table_nulled[mask] = np.nan

# Except we want to preserve the id column, so restore that
customer_table_nulled['customer_id'] = customer_table['customer_id'].copy()

customer_table_nulled.isna().sum()

customer_id        0
customer_type     22
industry         499
sub_industry     501
n_employees      494
company_value     17
dtype: int64

# Load the Data into a SQLite3 DB

In [186]:
calls_table.groupby(['date', 'sales_id']).size().value_counts().sort_index()

7      8945
8     12074
9     13264
10     6256
11     5360
12     1553
13      886
14     2067
15     2588
16     2350
17     1879
18     2264
19     2045
20     1364
21      710
22      233
23       35
24        2
Name: count, dtype: int64

In [197]:
# Convert UUIDs to str so that this is compatible with SQLite3

customer_table_nulled['customer_id'] = customer_table_nulled['customer_id'].astype(str)
sales_people_table['sales_id'] = sales_people_table['sales_id'].astype(str)
calls_table['customer_id'] = calls_table['customer_id'].astype(str)
calls_table['sales_id'] = calls_table['sales_id'].astype(str)
sales_table['customer_id'] = sales_table['customer_id'].astype(str)


# Raname ID columns

customer_table_nulled = customer_table_nulled.rename({'customer_id': 'id'}, axis=1)
sales_people_table = sales_people_table.rename({'sales_id': 'id'}, axis=1)

In [198]:
import sqlite3


conn = sqlite3.connect('paperclips.db')


# NOTE: I called it 'sales_people' in the code, but the table will be 'salespeople' in the assignment. 
tables = {
    'salespeople': sales_people_table,
    'customers': customer_table_nulled,
    'calls': calls_table,
    'sales': sales_table,
}

try:
    for table_name, df in tables.items():
        print(f"Loading table: {table_name} ...")
        df.to_sql(table_name, conn, if_exists="replace", index=False)
    print("All dataframes loaded successfully.")
finally:
    conn.close()

Loading table: salespeople ...
Loading table: customers ...
Loading table: calls ...
Loading table: sales ...
All dataframes loaded successfully.
