# Proof of Concept for the CTF Time API integration

This program retrieves a list of future CTF events from the CTFtime API, cleans and processes the data, and saves it to an Excel file. The program is written in Python and uses the http.client, json, pandas, and os libraries.

## Imports

In [None]:
# Importing the necessary libraries
import http.client
import datetime
import json
import pandas as pd
import os

## Variables

Define in the cell below the variables required for this program. In the final project those variables should be parametrized within the program or defined has environnement vairables depending on the running environnement.

In [None]:
# Variables
days = 30 # Day limit: the code will browse for CTFs from today until the given limit 
restriction_list = ['Open', 'Individual', 'High-school'] # Filter the CTF you're interested in depending on the restriction types
useless_columns = ['ctf_id','weight', 'logo', 'live_feed', 'format', 'participants', 'public_votable', 'is_votable_now', 'prizes', 'organizers', 'format_id', 'duration', 'onsite', 'location', 'ctftime_url', 'restrictions'] # Columns you don't need for your integration
artifact_location = "\\artifacts\\events.xlsx"

# Get current working directory
cwd = os.getcwd()
# Concatenate the final file path
file_path = cwd+artifact_location

# Retrieving Future Events

This function takes a single argument day_limit, which specifies the number of days in the future to retrieve events for. The function sends a GET request to the CTFtime API with the appropriate parameters and retrieves the JSON response. The response is then parsed and returned as a list of dictionaries containing the event data.

In [None]:
def get_future_events(day_limit):
    # Get the current date and time
    now = datetime.datetime.now(datetime.UTC)

    # Calculate the start and finish timestamps
    start_timestamp = int(now.timestamp())
    finish_timestamp = int((now + datetime.timedelta(days=day_limit)).replace(tzinfo=datetime.timezone.utc).timestamp())

    # Set the API endpoint and parameters
    url = '/api/v1/events/'
    params = {
        'limit': 500,
        'start': start_timestamp,
        'finish': finish_timestamp
    }

    # Build the query string
    query_string = '?' + '&'.join([f'{key}={value}' for key, value in params.items()])

    # Create an HTTPS connection to the CTFtime API
    conn = http.client.HTTPSConnection('ctftime.org')

    # Send a GET request to the API endpoint
    conn.request('GET', url + query_string)

    # Get the response from the API
    response = conn.getresponse()

    # Read the response body
    data = response.read().decode('utf-8')

    # Parse the JSON data
    return json.loads(data)

events = get_future_events(day_limit=days)
print(f"Retrieved {len(events)} CTFs for the next {days} days")



## Cleaning The Response

This function takes three arguments:

- `events_df`: The DataFrame containing the event data.
- `restriction_list`: The list of allowed restrictions for the events.
- `useless_columns`: The list of columns to drop from the DataFrame.

The function first extracts the organizer name from the organizer dictionary and stores it in a new column. It then computes the total duration of the event in hours and stores it in a new column. The function removes all onsite events and keeps only those events where the restrictions are in the specified restriction_list. It then adds a boolean column for each restriction in the restriction_list, indicating whether the event has that restriction. The function drops the useless columns specified in useless_columns and renames the id column to ctftime_id. The cleaned and processed DataFrame is then returned.

In [None]:
def clean_events(events, restriction_list, useless_columns):
    # Converting list to dataframme
    events_df = pd.DataFrame(events)

    # Extract organizer named from organizer dictionary
    events_df['organizer_name'] = events_df['organizers'].apply(lambda x: x[0]['name'])

    # Computing the total duration of the event in hours and store it in a new column
    events_df['duration_hours'] = events_df['duration'].apply(lambda x: x['hours'] + x['days']*24)

    # Removing all onsite events
    events_df = events_df[events_df['onsite'] == False]

    # Keep only ctf where restictions are in $restriction_list
    events_df = events_df[events_df['restrictions'].apply(lambda x: x in restriction_list)]

    for restriction in restriction_list:
        # Add a boolean column named like the restriction that takes 1 if the value in restrictions is equal to the restriction
        events_df[restriction.lower()] = events_df['restrictions'].apply(lambda x: x == restriction)

    # Drop useless columns
    events_df = events_df.drop(useless_columns, axis=1)

    # Rename column id into ctftime_id
    events_df = events_df.rename(columns={'id': 'ctftime_id'})

    return events_df

events_df = clean_events(events=events, restriction_list=restriction_list, useless_columns=useless_columns)    
print(f"After data cleaning we found {events_df.shape[0]} interesting CTFs for you.")

In [None]:
# Print all interesting events
events_df

## Find New Events

This function takes two arguments:

- `events_df`: The DataFrame containing the event data.
- `file_path`: The list of allowed restrictions for the events.

The goal here is to determine if an event has already been stored in the Excel or if the event retrieved isn't saved in Excel. This function should be used to alert user on the upcoming CTFs they don't already know.


In [None]:
def find_new_ctfs(events_df, file_path):
    existing_events = pd.read_excel(file_path)
    # Find the unregistered events
    new_events = events_df[~events_df['ctftime_id'].isin(existing_events['ctftime_id'])]
    return new_events

new_events = find_new_ctfs(events_df=events_df, file_path=file_path)
if new_events.shape[0] == 0:
    print("You have already found all the interesting CTFs for the given time period.")
new_events

## Store The Future Events 

This function takes two arguments:

- `events_df`: The cleaned and processed DataFrame containing the event data.
- `file_path`: The file path of the Excel file to save the data to.

The function saves the DataFrame to the specified Excel file using the to_excel() method.

In [None]:
def save_2_xl(events_df, file_path):
    # Save DataFrame to Excel file
    events_df.to_excel(file_path, index=False)

save_2_xl(events_df=events_df, file_path=file_path)
print(f"Future events stored in CSV file {file_path}")