# Extracting Contacts from Google Calendar Export

*Created by Andrew Therriault, February 4, 2023*

This script takes an export of meetings from Google Calendar, extracts all the meeting participants with email addresses, deduplicates them, and saves them to a CSV. I'm using this to help create a "friends and mentors" list for our company (since almost all of the people I'd want on that list, I've had a video call with over the past couple years), but it could also be useful for things like extracting lists of attendees for events organized on Google Calendar. A more sophisticated version of this could also do things like note the most recent event with each guest, but I'll leave that to others to implement for now.

Feel free to adapt and reuse this code as you see fit, but I'd strongly recommend not directly uploading the results into a CRM. - I'll be hand-screening these first, and I suggest you do the same. Otherwise, not only is it a spammy thing to do, but you're likely to get poor deliverability if a lot of the emails bounce.

Good luck, and hope it's helpful.

## Setting parameters & loading dependencies

#### Parameters

In [1]:
working_directory = 'c:/working/' #where the calendar file is located and export will be saved
calendar_file = 'gmail_cal.ics' #the name of the calendar file (should be .ics)
export_name = 'gcal_export.csv' #name of the csv file to output

#### Dependencies

In [2]:
import pandas as pd

#changing pandas display settings
pd.options.display.max_rows = 1000
pd.options.display.max_colwidth = 1000

import os
#changing working directory
os.chdir(working_directory)

import re

## Removing extra line breaks from the calendar file
ICS files have fixed length lines, so longer fields are split across multiple lines. Continued lines can be identified with an added blank space at the beginning, regardless of whether there's a space in the actual text (so you'll get spaces in the middle of email addresses, for example). This code loops over the lines in the import file and combines those broken fields into single lines, and writes the results to a new file.

In [3]:
with open(calendar_file, 'r', encoding='utf8') as f1, open('temp.txt', 'w', encoding='utf8') as f2:
    # Initialize the current line
    current_line = ''

    # Loop through the lines of the input file
    for line in f1:
        # If the line starts with a blank space, append it to the current line
        if line.startswith(' '):
            current_line += line.strip()
        else:
            # Write the current line to the output file
            if current_line:
                f2.write(current_line + '\n')
            
            # Reset the current line
            current_line = line.strip()

    # Write the final current line to the output file
    if current_line:
        f2.write(current_line + '\n')

## Extracting email addresses
This loops over the lines in the prepped file, and when a line is for an attendee, it pulls out the email address and adds it to a set.

In [4]:
# Create an empty set to store the emails
emails = set()

# Defining the regular expression pattern for an email address
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"

# Open the input file
with open('temp.txt', 'r', encoding='utf8') as f:
    # Loop through the lines of the input file
    for line in f:
        # Check if the line starts with 'ATTENDEE'
        if line.startswith('ATTENDEE'):
            # Find the start and end indices of the email
            start = line.find(';CN=') + 4
            end = line.find(';', start)
            # Extract the contact's information from the line
            contact = line[start:end]
            # Checks if the contact is an email address, and if so, adds it to the set
            if re.match(pattern, contact):
                emails.add(contact)

## Creating a dataframe of emails, extracting names and domains, and saving
This attempts to parse emails which have "first.last@domain" structures to get first and last names, but leaves others alone and just gives the username.

In [5]:
contacts = pd.DataFrame()
contacts['email'] = sorted(list(emails))
print(len(contacts))

612


In [6]:
contacts['domain'] = contacts.email.map(lambda x: x.split('@')[1])
contacts['username'] = contacts.email.map(lambda x: x.split('@')[0])

In [7]:
#functions to extract first and last names from first.last@domain format addresses, and otherwise return empty strings, then use those to create full name fields

def getfirst(name):
    if '.' in name[1:-1] and name.count('.') == 1:
        first = name.split('.')[0].capitalize()
    else:
        first = ''
    return first

def getlast(name):
    if '.' in name[1:-1] and name.count('.') == 1:
        last = name.split('.')[1].capitalize()
    else:
        last = ''
    return last

contacts['firstname'] = contacts.username.map(getfirst)
contacts['lastname'] = contacts.username.map(getlast)
contacts['fullname'] = contacts.firstname.str.cat(contacts.lastname, sep=' ').str.strip()

In [8]:
contacts.to_csv(export_name, index=False)