# ES Order Data Quality Analysis

This notebook pulls ALL active ES Orders and categorizes them by data issues for remediation.

## Output: Excel file with tabs:
1. **Summary** - Overall counts and issues
2. **Ready_To_Migrate** - Orders with all required fields populated
3. **Missing_BAN** - Orders missing Billing_Invoice__c (CRITICAL - cannot migrate)
4. **Missing_A_Location** - Orders missing Address_A__c
5. **Missing_Node** - Orders missing Node__c (can be post-migration)
6. **Missing_Service_Start** - Orders missing Service_Start_Date__c
7. **All_Active_Orders** - Complete list of all active orders

---
**Created:** December 11, 2024

In [14]:
# === SETUP & IMPORTS ===

import sys
import pandas as pd
from simple_salesforce import Salesforce
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils.dataframe import dataframe_to_rows
from datetime import datetime
import os

print(f"Python: {sys.executable}")
print(f"Pandas: {pd.__version__}")
print("‚úÖ Imports successful")

Python: C:\Users\vjero\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\python.exe
Pandas: 2.2.3
‚úÖ Imports successful


In [15]:
# === CONFIGURATION ===

# ES (Source) Credentials
ES_USERNAME = "sfdcapi@everstream.net"
ES_PASSWORD = "pV4CAxns8DQtJsBq!"
ES_TOKEN = "r1uoYiusK19RbrflARydi86TA"
ES_DOMAIN = "login"  # or 'login' for production


# Active Status Filter
ACTIVE_STATUSES = ["Activated", "Suspended (Late Payment)", "Disconnect in Progress"]

# Output settings
TIMESTAMP = datetime.now().strftime("%Y%m%d_%H%M%S")
OUTPUT_FILE = f"es_orders_data_quality_{TIMESTAMP}.xlsx"

print("üìã Configuration loaded")
print(f"   Output: {OUTPUT_FILE}")

üìã Configuration loaded
   Output: es_orders_data_quality_20251211_185836.xlsx


In [16]:
# === CONNECT TO ES SALESFORCE ===

print("üîå Connecting to ES Salesforce...")
es_sf = Salesforce(
    username=ES_USERNAME,
    password=ES_PASSWORD,
    security_token=ES_TOKEN,
    domain=ES_DOMAIN,
)
print(f"‚úÖ Connected to ES: {es_sf.sf_instance}")

üîå Connecting to ES Salesforce...
‚úÖ Connected to ES: everstream.my.salesforce.com


In [17]:
# === QUERY ALL ACTIVE ORDERS ===

print("üìä Querying ALL active Orders (this may take a moment)...")

# Build status filter
status_filter = "','".join(ACTIVE_STATUSES)

# Query all fields we need for analysis and migration
orders_query = f"""
SELECT 
    Id, 
    Name,
    Service_ID__c,
    Status,
    AccountId,
    Account.Name,
    Billing_Invoice__c,
    Address_A__c,
    Address_Z__c,
    Node__c,
    OpportunityId,
    Service_Start_Date__c,
    Service_End_Date__c,
    Service_Provided__c,
    SOF_MRC__c,
    OSS_Service_ID__c,
    Vendor_Circuit_ID__c,
    Primary_Product_Family__c,
    Primary_Product_Name__c,
    CreatedDate,
    LastModifiedDate
FROM Order
WHERE Status IN ('{status_filter}')
ORDER BY Service_ID__c
"""

try:
    result = es_sf.query_all(orders_query)
    orders_df = pd.DataFrame(result["records"])

    # Flatten Account.Name
    if "Account" in orders_df.columns:
        orders_df["Account_Name"] = orders_df["Account"].apply(
            lambda x: x.get("Name") if isinstance(x, dict) else None
        )
        orders_df = orders_df.drop(columns=["Account", "attributes"], errors="ignore")
    else:
        orders_df = orders_df.drop(columns=["attributes"], errors="ignore")

    print(f"\n‚úÖ Retrieved {len(orders_df):,} active Orders")
    print(f"\n=== Status Breakdown ===")
    print(orders_df["Status"].value_counts().to_string())

except Exception as e:
    print(f"‚ùå Query error: {e}")
    raise

üìä Querying ALL active Orders (this may take a moment)...

‚úÖ Retrieved 18,055 active Orders

=== Status Breakdown ===
Status
Activated                   17702
Disconnect in Progress        348
Suspended (Late Payment)        5


In [18]:
# === CATEGORIZE ORDERS BY DATA ISSUES ===

print("üîç Categorizing Orders by data issues...")

# Create issue flags
orders_df["Missing_BAN"] = orders_df["Billing_Invoice__c"].isna()
orders_df["Missing_A_Location"] = orders_df["Address_A__c"].isna()
orders_df["Missing_Z_Location"] = orders_df["Address_Z__c"].isna()
orders_df["Missing_Node"] = orders_df["Node__c"].isna()
orders_df["Missing_Service_Start"] = orders_df["Service_Start_Date__c"].isna()
orders_df["Missing_Opportunity"] = orders_df["OpportunityId"].isna()

# Calculate issue count per order
issue_cols = [
    "Missing_BAN",
    "Missing_A_Location",
    "Missing_Z_Location",
    "Missing_Node",
    "Missing_Service_Start",
]
orders_df["Issue_Count"] = orders_df[issue_cols].sum(axis=1)

# Determine if ready to migrate (has BAN and A_Location at minimum)
orders_df["Ready_To_Migrate"] = (
    ~orders_df["Missing_BAN"] & ~orders_df["Missing_A_Location"]
)

# Summary
print("\n=== Data Quality Summary ===")
print(f"   Total Active Orders:        {len(orders_df):,}")
print(f"   ‚úÖ Ready to Migrate:        {orders_df['Ready_To_Migrate'].sum():,}")
print(f"   ‚ùå NOT Ready (has issues):  {(~orders_df['Ready_To_Migrate']).sum():,}")

print("\n=== Issue Breakdown ===")
print(f"   ‚ö†Ô∏è  Missing BAN (CRITICAL):   {orders_df['Missing_BAN'].sum():,}")
print(f"   ‚ö†Ô∏è  Missing A Location:       {orders_df['Missing_A_Location'].sum():,}")
print(f"   ‚ö†Ô∏è  Missing Z Location:       {orders_df['Missing_Z_Location'].sum():,}")
print(f"   ‚ÑπÔ∏è  Missing Node:             {orders_df['Missing_Node'].sum():,}")
print(f"   ‚ÑπÔ∏è  Missing Service Start:    {orders_df['Missing_Service_Start'].sum():,}")

üîç Categorizing Orders by data issues...

=== Data Quality Summary ===
   Total Active Orders:        18,055
   ‚úÖ Ready to Migrate:        14,433
   ‚ùå NOT Ready (has issues):  3,622

=== Issue Breakdown ===
   ‚ö†Ô∏è  Missing BAN (CRITICAL):   3,595
   ‚ö†Ô∏è  Missing A Location:       222
   ‚ö†Ô∏è  Missing Z Location:       0
   ‚ÑπÔ∏è  Missing Node:             15,360
   ‚ÑπÔ∏è  Missing Service Start:    9,105


In [19]:
# === CREATE CATEGORY DATAFRAMES ===

print("üìÅ Creating category dataframes...")

# Columns to include in output (excluding flags)
output_cols = [
    "Id",
    "Service_ID__c",
    "Status",
    "Account_Name",
    "AccountId",
    "Billing_Invoice__c",
    "Address_A__c",
    "Address_Z__c",
    "Node__c",
    "OpportunityId",
    "Service_Start_Date__c",
    "Service_End_Date__c",
    "Service_Provided__c",
    "SOF_MRC__c",
    "OSS_Service_ID__c",
    "Primary_Product_Family__c",
    "Primary_Product_Name__c",
    "Issue_Count",
    "Ready_To_Migrate",
]

# Filter columns that exist
output_cols = [c for c in output_cols if c in orders_df.columns]

# Ready to migrate
ready_df = orders_df[orders_df["Ready_To_Migrate"]][output_cols].copy()
print(f"   ‚úÖ Ready_To_Migrate: {len(ready_df):,} orders")

# Missing BAN (CRITICAL)
missing_ban_df = orders_df[orders_df["Missing_BAN"]][output_cols].copy()
print(f"   ‚ö†Ô∏è  Missing_BAN: {len(missing_ban_df):,} orders")

# Missing A Location
missing_aloc_df = orders_df[orders_df["Missing_A_Location"]][output_cols].copy()
print(f"   ‚ö†Ô∏è  Missing_A_Location: {len(missing_aloc_df):,} orders")

# Missing Node (for reference - can be post-migration)
missing_node_df = orders_df[orders_df["Missing_Node"]][output_cols].copy()
print(f"   ‚ÑπÔ∏è  Missing_Node: {len(missing_node_df):,} orders")

# Missing Service Start Date
missing_start_df = orders_df[orders_df["Missing_Service_Start"]][output_cols].copy()
print(f"   ‚ÑπÔ∏è  Missing_Service_Start: {len(missing_start_df):,} orders")

üìÅ Creating category dataframes...
   ‚úÖ Ready_To_Migrate: 14,433 orders
   ‚ö†Ô∏è  Missing_BAN: 3,595 orders
   ‚ö†Ô∏è  Missing_A_Location: 222 orders
   ‚ÑπÔ∏è  Missing_Node: 15,360 orders
   ‚ÑπÔ∏è  Missing_Service_Start: 9,105 orders


In [20]:
# === ANALYZE MISSING BAN ORDERS ===

print("\nüîç Analyzing Orders missing BAN (Billing_Invoice__c)...")

if len(missing_ban_df) > 0:
    # Check if they have Account names that might give us clues
    print("\n=== Account Distribution (Missing BAN) ===")
    account_dist = missing_ban_df["Account_Name"].value_counts().head(20)
    print(account_dist.to_string())

    # Check Product Family distribution
    if "Primary_Product_Family__c" in missing_ban_df.columns:
        print("\n=== Product Family Distribution (Missing BAN) ===")
        prod_dist = missing_ban_df["Primary_Product_Family__c"].value_counts().head(10)
        print(prod_dist.to_string())

    # Check if there's a pattern (internal accounts, etc.)
    internal_keywords = ["INTERNAL", "TEST", "DEMO", "SANDBOX"]
    internal_mask = (
        missing_ban_df["Account_Name"]
        .str.upper()
        .str.contains("|".join(internal_keywords), na=False)
    )
    internal_count = internal_mask.sum()
    print(
        f"\nüìå Possibly Internal/Test accounts: {internal_count:,} ({internal_count/len(missing_ban_df)*100:.1f}%)"
    )
else:
    print("‚úÖ No orders missing BAN!")


üîç Analyzing Orders missing BAN (Billing_Invoice__c)...

=== Account Distribution (Missing BAN) ===
Account_Name
INTERNAL EVERSTREAM (Network Expansion Build-Out)        1467
INTERNAL EVERSTREAM (RELO & EMERGENCY REPAIR)             505
INTERNAL EVERSTREAM NETWORK RESOURCES (DO NOT DELETE)     500
INTERNAL EVS (Vendor Orders)                              198
T-Mobile Small Cell                                       111
INTERNAL EVERSTREAM (MTU BUILDINGS)                       110
Internal Everstream Network Resources Michigan             92
T-Mobile                                                   76
The Lincoln Electric Company                               35
Sherwin Williams                                           33
City of Lakewood                                           28
US Cellular - Network Engineering                          27
AT&T Mobility (PEG) (UNC)                                  27
ACD                                                        21
DISH Wireless   

In [21]:
# === CREATE SUMMARY DATAFRAME ===

print("üìä Creating summary...")

summary_data = [
    {
        "Category": "Total Active Orders",
        "Count": len(orders_df),
        "Percentage": "100%",
        "Action": "N/A",
    },
    {
        "Category": "‚úÖ Ready to Migrate",
        "Count": len(ready_df),
        "Percentage": f"{len(ready_df)/len(orders_df)*100:.1f}%",
        "Action": "Proceed with migration",
    },
    {
        "Category": "‚ùå Missing BAN (CRITICAL)",
        "Count": len(missing_ban_df),
        "Percentage": f"{len(missing_ban_df)/len(orders_df)*100:.1f}%",
        "Action": "MUST FIX - Cannot migrate without Billing_Invoice__c",
    },
    {
        "Category": "‚ö†Ô∏è Missing A Location",
        "Count": len(missing_aloc_df),
        "Percentage": f"{len(missing_aloc_df)/len(orders_df)*100:.1f}%",
        "Action": "Should fix - Required for A_Location__c mapping",
    },
    {
        "Category": "‚ÑπÔ∏è Missing Node",
        "Count": len(missing_node_df),
        "Percentage": f"{len(missing_node_df)/len(orders_df)*100:.1f}%",
        "Action": "Can fix post-migration - A_Node__c is not critical",
    },
    {
        "Category": "‚ÑπÔ∏è Missing Service Start",
        "Count": len(missing_start_df),
        "Percentage": f"{len(missing_start_df)/len(orders_df)*100:.1f}%",
        "Action": "Review - May need for Active_Date__c",
    },
]

summary_df = pd.DataFrame(summary_data)
print(summary_df.to_string(index=False))

üìä Creating summary...
                Category  Count Percentage                                               Action
     Total Active Orders  18055       100%                                                  N/A
      ‚úÖ Ready to Migrate  14433      79.9%                               Proceed with migration
‚ùå Missing BAN (CRITICAL)   3595      19.9% MUST FIX - Cannot migrate without Billing_Invoice__c
   ‚ö†Ô∏è Missing A Location    222       1.2%      Should fix - Required for A_Location__c mapping
         ‚ÑπÔ∏è Missing Node  15360      85.1%   Can fix post-migration - A_Node__c is not critical
‚ÑπÔ∏è Missing Service Start   9105      50.4%                 Review - May need for Active_Date__c


In [22]:
# === EXPORT TO EXCEL ===

print(f"\nüìÅ Exporting to {OUTPUT_FILE}...")

with pd.ExcelWriter(OUTPUT_FILE, engine="openpyxl") as writer:

    # Summary
    summary_df.to_excel(writer, sheet_name="Summary", index=False)

    # Ready to Migrate
    ready_df.to_excel(writer, sheet_name="Ready_To_Migrate", index=False)

    # Missing BAN (CRITICAL)
    missing_ban_df.to_excel(writer, sheet_name="Missing_BAN_CRITICAL", index=False)

    # Missing A Location
    missing_aloc_df.to_excel(writer, sheet_name="Missing_A_Location", index=False)

    # Missing Node (reference)
    missing_node_df.to_excel(writer, sheet_name="Missing_Node", index=False)

    # Missing Service Start
    missing_start_df.to_excel(writer, sheet_name="Missing_Service_Start", index=False)

    # All Active Orders (complete list)
    orders_df[output_cols].to_excel(writer, sheet_name="All_Active_Orders", index=False)

print(f"\n‚úÖ Export complete: {OUTPUT_FILE}")
print(f"\n=== Sheets Created ===")
print(f"   1. Summary - Overview of data quality")
print(f"   2. Ready_To_Migrate - {len(ready_df):,} orders ready")
print(f"   3. Missing_BAN_CRITICAL - {len(missing_ban_df):,} orders (MUST FIX)")
print(f"   4. Missing_A_Location - {len(missing_aloc_df):,} orders")
print(f"   5. Missing_Node - {len(missing_node_df):,} orders (post-migration)")
print(f"   6. Missing_Service_Start - {len(missing_start_df):,} orders")
print(f"   7. All_Active_Orders - {len(orders_df):,} orders (complete list)")


üìÅ Exporting to es_orders_data_quality_20251211_185836.xlsx...

‚úÖ Export complete: es_orders_data_quality_20251211_185836.xlsx

=== Sheets Created ===
   1. Summary - Overview of data quality
   2. Ready_To_Migrate - 14,433 orders ready
   3. Missing_BAN_CRITICAL - 3,595 orders (MUST FIX)
   4. Missing_A_Location - 222 orders
   5. Missing_Node - 15,360 orders (post-migration)
   6. Missing_Service_Start - 9,105 orders
   7. All_Active_Orders - 18,055 orders (complete list)


---
## Next Steps

### Priority 1: Fix Missing BAN (3,451 orders)
These orders CANNOT be migrated without `Billing_Invoice__c`.

**Options:**
1. **Find/Create BANs** - Identify correct Billing_Invoice__c for each order
2. **Exclude from migration** - If these are internal/test records
3. **Create default BAN** - If business approves a catch-all BAN

### Priority 2: Fix Missing A Location (201 orders)
These need `Address_A__c` populated for `A_Location__c` mapping.

### Priority 3: Review Missing Node (15,811 orders)
Can be fixed post-migration since `A_Node__c` is not critical for initial load.

### Priority 4: Review Missing Service Start (8,875 orders)
May need for `Active_Date__c` - review if these are truly activated services.

In [23]:
# === PLACEHOLDER FOR ADDITIONAL ANALYSIS ===

# Add custom queries here as needed
pass