# Notebook Documentation

## Goal
The purpose of this notebook is to facilitate the upload and processing of background data from various survey providers, including Cint Access, Syno Distribution, Lucid, and PureSpectrum. Additionally, the notebook will handle the import of .sav files containing exported raw data, including non-completes, for analysis.

## Process Overview
1. **Data Upload**: Users will upload background data from the specified providers. This data typically includes respondent IDs, survey completion status, and other metadata.
2. **Data Processing**: The notebook will process the uploaded data to prepare it for analysis. This may involve data cleaning, transformation, and merging of datasets from different sources.
3. **Analysis of .sav Files**: The notebook will import and analyze .sav files, which are data files from SPSS (Statistical Package for the Social Sciences). These files contain survey responses, including those from respondents who did not complete the survey.
4. **Reconciliation**: The final step is to generate an Excel file containing IDs that will be used to reconcile responses with the data collected by the Syno Survey tool.

## Expected Outputs
- An Excel file with reconciled IDs for matching survey responses with the collected data.

## User Instructions
- Users should upload all background data files from the specified providers to the notebook.
- Users should also provide all .sav files from the exported raw data for analysis.
- The notebook will process the data and generate the required Excel file for reconciliation.

In [10]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

import pandas as pd 
import numpy as np 
import pyreadstat

from datetime import datetime
import os

# Send emails with Google
from email.message import EmailMessage
import smtplib
import ssl

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders

from utils.utils import read_data
from utils.utils import read_distribution
from utils.utils import read_purespectrum
from utils.utils import read_lucid
from utils.utils import lucidRemoves
from utils.utils import lucidAdds
from utils.utils import pureRemoves
from utils.utils import pureAdds
from utils.utils import distributionRemoves
from utils.utils import distributionAdds
from utils.utils import format_sheet

# Uncomment if using an .env file to store secrets
from dotenv import load_dotenv
load_dotenv()

True

1. Read the data in SPSS format

In [11]:
data = read_data()

Reading from directory: data/
Reading file: data/Raw Data 797946 2024-03-13.sav


2. Read the distribution files and merge them all

In [17]:
distribution = read_distribution()

Reading from directory: distribution/
Reading file: distribution/4744_7201 Capio brand tracking - Sweden 2024 January_2024-03-13.csv
Reading file: distribution/4744_7201 Capio brand tracking - Sweden 2024 March_2024-03-13.csv
Reading file: distribution/4744_7201 Capio brand tracking - Sweden 2024 February_2024-03-13.csv


3. Read the purespectrum files and merge them all

In [13]:
purespectrum = read_purespectrum()

Reading from directory: purespectrum/


In [14]:
lucid = read_lucid()

Reading from directory: lucid/


In [15]:
project_id = 4744

In [18]:
# Get the current date in the format MM_DD_YYYY
current_date = datetime.now().strftime("%m_%d_%Y")

# Use the current date in the filename
filename = f"Reconciliation - {current_date}.xlsx"

with pd.ExcelWriter(filename, engine='xlsxwriter') as writer:  # Specify the engine
    print("Exporting the data to an Excel file")

    # Distribution reconciliations
    if len(distribution) > 0:
        print("Generating Distribution Additions")
        distribution_additions = distributionAdds(distribution, data)
        distribution_additions.to_excel(writer, sheet_name="Distribution - Add", index=False)
        format_sheet(writer, "Distribution - Add", distribution_additions)
        
        print("Generating Distribution Removes")
        distribution_removes = distributionRemoves(distribution, data)
        distribution_removes.to_excel(writer, sheet_name="Distribution - Remove", index=False)
        format_sheet(writer, "Distribution - Remove", distribution_removes)

    # PureSpectrum reconciliations
    if len(purespectrum) > 0:
        print("Generating PureSpectrum Additions")
        pure_additions = pureAdds(purespectrum, data)
        pure_additions.to_excel(writer, sheet_name="PureSpectrum - Add", index=False)
        format_sheet(writer, "PureSpectrum - Add", pure_additions)
        
        print("Generating PureSpectrum Removes")
        pure_removes = pureRemoves(purespectrum, data)
        pure_removes.to_excel(writer, sheet_name="PureSpectrum - Remove", index=False)
        format_sheet(writer, "PureSpectrum - Remove", pure_removes)

    # Lucidd reconciliations
    if len(lucid) > 0:
        print("Generating Lucid Additions")
        lucid_additions = lucidAdds(lucid, data)
        lucid_additions.to_excel(writer, sheet_name="Lucid - Add", index=False)
        format_sheet(writer, "Lucid - Add", lucid_additions)
        
        print("Generating Lucid Removes")
        lucid_removes = lucidRemoves(lucid, data)
        lucid_removes.to_excel(writer, sheet_name="Lucid - Remove", index=False)
        format_sheet(writer, "Lucid - Remove", lucid_removes)
    
    print(f"File exported as {filename}")

Exporting the data to an Excel file
Generating Distribution Additions
This operation will affect 48 cells, please be patient
Generating Distribution Removes
This operation will affect 8 cells, please be patient
File exported as Reconciliation - 03_13_2024.xlsx


In [19]:
email_sender = os.environ["EMAIL_SENDER"]
email_password = os.environ["EMAIL_PASSWORD"]
email_receiver = "val@synoint.com"

em = MIMEMultipart()
em["From"] = email_sender
em["To"] = email_receiver
# em["cc"] = "gna@synoint.com"
em["Subject"] = f"Reconciliations for P{project_id} file generated - {current_date}"

body = "Please see the attached an Excel file with the requested reconciliations for Providers"

em.attach(MIMEText(body, "plain"))

attachment = open(filename, "rb")

p = MIMEBase("application", "octet-stream")
p.set_payload(attachment.read())
encoders.encode_base64(p)
p.add_header("Content-Disposition", f"attachment; filename={filename}")
em.attach(p)

context = ssl.create_default_context()

with smtplib.SMTP_SSL("smtp.gmail.com", 465, context = context) as smtp:
    smtp.login(email_sender, email_password)
    smtp.sendmail(email_sender, email_receiver, em.as_string())