#### Goal: Customer Data Migration Analysis using Pandas
* In this solution, I use a Python script to upload data to Google's Cloud Platform

*  First assumption: We are migrating to Cloud Firestore
*  Second assumption: Firebase Admin SDK has been installed with Python
*  Third assumption: Service Key has been generated from the Firebase Project Console
*  Fourth assumption: prior_call and prior_email_call are customers who have contacted the call center previously via calls and email
*  Fifth assumption: gmail_email_i is the column which indicates if customers have been migrated or not. Where it is 0, means migration has not taken place and where it is 1, means migration has taken place.

#### Packages

In [None]:
import csv
import pandas as pd
import firebase_admin
import google.cloud
from firebase_admin import credentials, firestore

#### Set credentials
    * initialize Firestore server on system/localhost account

In [None]:
cred = credentials.Certificate("accountkey.json") # insert path to service account key generated from firestore here 
app = firebase_admin.initialize_app(cred) 

db = firestore.client() # initialize client

file_path = "CLEANED_CSV_FILE_PATH"         # file path 
collection_name = "COLLECTION_TO_ADD_TO"    # Firestore table to migrate data into

Create function to create batch processing, cleaning function, reading csv and uploading to NoSQL database

In [None]:
def batch_data(iterable, n=1):
    """ 
        create batch function to limit size of uploads
    """
    for ndx in range(0, len(iterable), n):
        yield iterable[ndx:min(ndx + n, len(iterable))]

def clean_data(_csv):
    """  
        read in the data and clean it.
        create subset of data for customers who can be prioritized for immediate migration.
        filter the data by setting a threshold value to filter out customers who have contacted the call centre more than once and are not yet migrated, based on fifth assumption.
    """
    data = pd.read_csv(_csv)

    if 'cust_id' in data.columns:
        data = data.drop_duplicates(subset='cust_id', keep='first')

    # create a new column by adding columns prior calls and emails - this gives a new column with the total number of contacts into the call centre
    data['total_contact_mode'] = data['prior_call'] + data['prior_email_call']
    data['total_contact_mode'].sort_values(ascending=False)

    # create a new column by adding columns prior calls and emails - this gives a new column with the total number of contacts into the call centre
    data['total_contact_mode'] = data['prior_call'] + data['prior_email_call']
    cleaned_data = data[(data['total_contact_mode'] >= 1) & (data['gmail_email_i'] == 0)]

    return cleaned_data.to_csv('cleaned_data.csv', index=False)

data = []
headers = []
with open(file_path) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            for header in row:
                headers.append(header)
            line_count += 1
        else:
            obj = {}
            for idx, item in enumerate(row):
                obj[headers[idx]] = item
            data.append(obj)
            line_count += 1
    print(f'Processed {line_count} lines.')

"""
    Loop through data and add data to the Firestore NoSQL database in batches of 500 - Firestore batch uploading limit
"""
for batched_data in batch_data(data, 499):
    batch = db.batch()
    for data_item in batched_data:
        doc_ref = db.collection(collection_name).document()
        batch.set(doc_ref, data_item)
    batch.commit()

print('Done')

#### In conclusion, from the five assumptions, non-Gmail customers who have repeatedly called into the centre several times above the set threshold would give a fraction of customers who can be migrated in batches of 500.