# Proof of Concept for the CTF Time API integration

This program retrieves a list of future CTF events from the CTFtime API, cleans and processes the data, and saves it to an Excel file. The program is written in Python and uses the http.client, json, pandas, and os libraries.

## Imports

In [1]:
# Importing the necessary libraries
import http.client
import datetime
import json
import pandas as pd
import os

## Variables

Define in the cell below the variables required for this program. In the final project those variables should be parametrized within the program or defined has environnement vairables depending on the running environnement.

In [2]:
# Variables
days = 30 # Day limit: the code will browse for CTFs from today until the given limit 
restriction_list = ['Open', 'Individual', 'High-school'] # Filter the CTF you're interested in depending on the restriction types
useless_columns = ['ctf_id','weight', 'logo', 'live_feed', 'format', 'participants', 'public_votable', 'is_votable_now', 'prizes', 'organizers', 'format_id', 'duration', 'onsite', 'location', 'ctftime_url', 'restrictions'] # Columns you don't need for your integration
artifact_location = "\\artifacts\\events.xlsx"
team_id = "216659"

# Get current working directory
cwd = os.getcwd()
# Concatenate the final file path
file_path = cwd+artifact_location

# Retrieving Future Events

This function takes a single argument day_limit, which specifies the number of days in the future to retrieve events for. The function sends a GET request to the CTFtime API with the appropriate parameters and retrieves the JSON response. The response is then parsed and returned as a list of dictionaries containing the event data.

In [3]:
def get_future_events(day_limit):
    # Get the current date and time
    now = datetime.datetime.now(datetime.UTC)

    # Calculate the start and finish timestamps
    start_timestamp = int(now.timestamp())
    finish_timestamp = int((now + datetime.timedelta(days=day_limit)).replace(tzinfo=datetime.timezone.utc).timestamp())

    # Set the API endpoint and parameters
    url = '/api/v1/events/'
    params = {
        'limit': 500,
        'start': start_timestamp,
        'finish': finish_timestamp
    }

    # Build the query string
    query_string = '?' + '&'.join([f'{key}={value}' for key, value in params.items()])

    # Create an HTTPS connection to the CTFtime API
    conn = http.client.HTTPSConnection('ctftime.org')

    # Send a GET request to the API endpoint
    conn.request('GET', url + query_string)

    # Get the response from the API
    response = conn.getresponse()

    # Read the response body
    data = response.read().decode('utf-8')

    # Parse the JSON data
    return json.loads(data)

events = get_future_events(day_limit=days)
print(f"Retrieved {len(events)} CTFs for the next {days} days")



Retrieved 12 CTFs for the next 30 days


## Cleaning The Response

This function takes three arguments:

- `events_df`: The DataFrame containing the event data.
- `restriction_list`: The list of allowed restrictions for the events.
- `useless_columns`: The list of columns to drop from the DataFrame.

The function first extracts the organizer name from the organizer dictionary and stores it in a new column. It then computes the total duration of the event in hours and stores it in a new column. The function removes all onsite events and keeps only those events where the restrictions are in the specified restriction_list. It then adds a boolean column for each restriction in the restriction_list, indicating whether the event has that restriction. The function drops the useless columns specified in useless_columns and renames the id column to ctftime_id. The cleaned and processed DataFrame is then returned.

In [4]:
def clean_events(events, restriction_list, useless_columns):
    # Converting list to dataframme
    events_df = pd.DataFrame(events)

    # Extract organizer named from organizer dictionary
    events_df['organizer_name'] = events_df['organizers'].apply(lambda x: x[0]['name'])

    # Computing the total duration of the event in hours and store it in a new column
    events_df['duration_hours'] = events_df['duration'].apply(lambda x: x['hours'] + x['days']*24)

    # Removing all onsite events
    events_df = events_df[events_df['onsite'] == False]

    # Keep only ctf where restictions are in $restriction_list
    events_df = events_df[events_df['restrictions'].apply(lambda x: x in restriction_list)]

    for restriction in restriction_list:
        # Add a boolean column named like the restriction that takes 1 if the value in restrictions is equal to the restriction
        events_df[restriction.lower()] = events_df['restrictions'].apply(lambda x: x == restriction)

    # Drop useless columns
    events_df = events_df.drop(useless_columns, axis=1)

    # Rename column id into ctftime_id
    events_df = events_df.rename(columns={'id': 'ctftime_id'})

    return events_df

events_df = clean_events(events=events, restriction_list=restriction_list, useless_columns=useless_columns)    
print(f"After data cleaning we found {events_df.shape[0]} interesting CTFs for you.")

After data cleaning we found 12 interesting CTFs for you.


In [5]:
# Print all interesting events
events_df

Unnamed: 0,ctftime_id,title,start,finish,description,url,organizer_name,duration_hours,open,individual,high-school
0,2259,Junior.Crypt.2024 CTF,2024-07-03T15:00:00+00:00,2024-07-05T15:00:00+00:00,Junior.Crypt.2024 CTF is an open competition i...,http://ctf-spcs.mf.grsu.by/,Beavers0,48,True,False,False
1,2284,DownUnderCTF 2024,2024-07-05T09:30:00+00:00,2024-07-07T09:30:00+00:00,DownUnderCTF is the largest online Australian-...,https://play.duc.tf/,DownUnderCTF,48,True,False,False
2,2301,Interlogica CTF2024 - Wastelands,2024-07-05T12:37:00+00:00,2024-07-07T22:59:59+00:00,"Welcome to wastelands, where the faint echoes ...",https://ctf.interlogica.ninja/,Interlogica,58,True,False,False
3,2345,HITCON CTF 2024 Quals,2024-07-12T14:00:00+00:00,2024-07-14T14:00:00+00:00,TBA,https://ctf2024.hitcon.org/,HITCON,48,True,False,False
4,2416,OSCTF,2024-07-13T00:30:00+00:00,2024-07-13T16:30:00+00:00,This exciting Capture the Flag competition wil...,https://ctf.os.ftp.sh/,OSCTF,16,True,False,False
5,2414,CatTheQuest,2024-07-15T00:00:00+00:00,2024-07-21T00:00:00+00:00,Prepare yourself for CatTheQuest 2024: Registr...,https://catthequest.com/,CatTheFlag,144,True,False,False
6,2396,ImaginaryCTF 2024,2024-07-19T19:00:00+00:00,2024-07-21T19:00:00+00:00,ImaginaryCTF 2024 is a cybersecurity CTF compe...,https://2024.imaginaryctf.org/,[sqrt (-1) + 1],48,True,False,False
7,2293,MOCA CTF - Quals,2024-07-20T09:00:00+00:00,2024-07-21T09:00:00+00:00,Official CTF competition of the Metro Olografi...,https://play.pwnx.io/#/event/fb765f39-bc6f-46b...,Metro Olografix,24,True,False,False
8,2412,ENOWARS 8,2024-07-20T12:00:00+00:00,2024-07-20T21:00:00+00:00,The 8th installation of the epic ENOWARS trilo...,https://8.enowars.com/,ENOFLAG,9,True,False,False
9,2381,pbctf 2024,2024-07-20T14:00:00+00:00,2024-07-21T14:00:00+00:00,Fourth edition of pbctf!\r\n\r\nCTF lasts for ...,https://ctf.perfect.blue/,perfect blue,24,True,False,False


## Find New Events

This function takes two arguments:

- `events_df`: The DataFrame containing the event data.
- `file_path`: The list of allowed restrictions for the events.

The goal here is to determine if an event has already been stored in the Excel or if the event retrieved isn't saved in Excel. This function should be used to alert user on the upcoming CTFs they don't already know.


In [6]:
def find_new_ctfs(events_df, file_path):
    existing_events = pd.read_excel(file_path)
    # Find the unregistered events
    new_events = events_df[~events_df['ctftime_id'].isin(existing_events['ctftime_id'])]
    return new_events

new_events = find_new_ctfs(events_df=events_df, file_path=file_path)
if new_events.shape[0] == 0:
    print("You have already found all the interesting CTFs for the given time period.")
new_events

You have already found all the interesting CTFs for the given time period.


Unnamed: 0,ctftime_id,title,start,finish,description,url,organizer_name,duration_hours,open,individual,high-school


## Store The Future Events 

This function takes two arguments:

- `events_df`: The cleaned and processed DataFrame containing the event data.
- `file_path`: The file path of the Excel file to save the data to.

The function saves the DataFrame to the specified Excel file using the to_excel() method.

In [7]:
def save_2_xl(events_df, file_path):
    # Save DataFrame to Excel file
    events_df.to_excel(file_path, index=False)

save_2_xl(events_df=events_df, file_path=file_path)
print(f"Future events stored in CSV file {file_path}")

Future events stored in CSV file c:\Users\Tristan Querton\dev\GitHub\CTFTime-Integration\artifacts\events.xlsx


# Obtain Team Ranking

We also want to retrieve the information of a team based on the team ID. to facilitate the manipulation of the data we will create an object Team that will contain the information of the team. This object will have methods to retrieve the ranking of the team for a given year.

## Team Object

This class represents a team and contains the following attributes:
- `team_id`: The ID of the team.
- `primary_alias`: The primary alias of the team.
- `name`: The name of the team.
- `country`: The country of the team.
- `rating`: The rating of the team trough the years.
- `logo`: The logo of the team.
- `academic` : A boolean indicating if the team is an academic team.
- `aliases` : A list of aliases of the team.


In [8]:
class Team:
    """
    A class to represent a CTF Time team.
    
    Attributes:     
        - `team_id`: The ID of the team.
        - `primary_alias`: The primary alias of the team.
        - `name`: The name of the team.
        - `country`: The country of the team.
        - `rating`: The rating of the team trough the years.
        - `logo`: The logo of the team.
        - `academic` : A boolean indicating if the team is an academic team.
        - `aliases` : A list of aliases of the team.
    
    Methods: 
        - `display()`: Display the name of the team.
        - `get_team_rank_given_year(year)`: Get the rank of the team for a given year.
    """
        
    def __init__(self, team_id, primary_alias, name, rating, country, logo, aliases, academic):
        self.team_id = team_id
        self.primary_alias = primary_alias
        self.name = name
        self.rating = rating
        self.country = country
        self.logo = logo
        self.aliases = aliases
        self.academic = academic
        
    def display(self):
        print(f"Team {self.name} with id {self.team_id} and rating {self.rating}")
        
    def get_team_rank_given_year(self, year):
        year = str(year)
        year_ratings = self.rating.get(year, None) 
        return year_ratings

## Get Team Info

This function takes a single argument team_id, which specifies the ID of the team to retrieve information for. The function sends a GET request to the CTFtime API with the appropriate parameters and retrieves the JSON response. The response is then parsed and returned as a dictionary containing the team data.

In [9]:
def get_team_info(team_id):
    """
    Get the information of a team given its ID.
    
    Parameters:
        - `team_id`: The ID of the team.
        
    Returns:
        - A JSON string containing the information of the team.
    """
    
    # Set the API endpoint and parameters
    url = '/api/v1/teams/' + team_id + '/'

    # Create an HTTPS connection to the CTFtime API
    conn = http.client.HTTPSConnection('ctftime.org')

    # Send a GET request to the API endpoint
    conn.request('GET', url)

    # Get the response from the API
    response = conn.getresponse()

    # Read the response body
    data = response.read().decode('utf-8')

    return data

## Parsing Team Info

This function takes a single argument team, which is string containing the json team data. The function extracts the relevant information from the dictionary and creates a Team object with the data. The function returns the Team object.

In [10]:
def parse_team_info(team):
    """
    Parse the JSON string containing the information of a team into a Team object.
    
    Parameters:
        - `team`: A JSON string containing the information of the team.
    
    Returns:
        - A Team object.
    """
    
    # Creates a dictionary from the JSON string
    team_dict = json.loads(team)
    
    # Create a Team object
    team = Team(team_id = team_dict['id'],
                primary_alias = team_dict['primary_alias'],
                name = team_dict['name'],
                rating = team_dict['rating'],
                country = team_dict['country'],
                logo = team_dict['logo'],
                aliases = team_dict['aliases'],
                academic = team_dict['academic'])
    return team

## Get Team Object

This function takes a single argument team_id, which specifies the ID of the team to retrieve information for. The function calls the Get Team Info function to retrieve the team data and then calls the Parsing Team Info function to create a Team object. The function returns the Team object.

In [11]:
def get_team_object (team_id = team_id):
    """
    Get the Team object given the ID of the team. by default it will use the id defined in the variables.
    
    Parameters:
        - `team_id`: The ID of the team.
        
    Returns:
        - A Team object.
    """
    team = get_team_info(team_id)
    return parse_team_info(team)

In [12]:
team = get_team_object()
team.get_team_rank_given_year(2024)

{'organizer_points': 0, 'rating_points': 54.0889279963, 'rating_place': 527}

# Get Top Teams Ranking

This function takes a single argument year, which specifies the year to retrieve the top teams ranking for. The function sends a GET request to the CTFtime API with the appropriate parameters and retrieves the JSON response. The response is then parsed and returned as a list of dictionaries containing the team data. The API call retrieves the top 10 teams for the specified year.

In [15]:
def get_top_teams():
    """
    Get the information of the top teams.
    
    Returns:
        - A JSON string containing the information of the top teams.
    """

    # Set the API endpoint and parameters
    url = '/api/v1/top/'
    params = {
        'limit': 10
    }

    # Build the query string
    query_string = '?' + '&'.join([f'{key}={value}' for key, value in params.items()])

    # Create an HTTPS connection to the CTFtime API
    conn = http.client.HTTPSConnection('ctftime.org')

    # Send a GET request to the API endpoint
    conn.request('GET', url + query_string)

    # Get the response from the API
    response = conn.getresponse()

    # Read the response body
    data = response.read().decode('utf-8')

    # Parse the JSON data
    return json.loads(data)

def parse_top_teams(top_teams):
    """
    Parse the JSON string containing the information of the top teams into a list of Team objects.
    
    Parameters:
        - `top_teams`: A JSON string containing the information of the top teams.
    
    Returns:
        - A dataframe containing the top teams.
    """
    
    
    # Obtain current year: 
    now = datetime.datetime.now()
    current_year = now.year
    
    # Creates a dictionary from the JSON string
    content = top_teams.get(str(current_year), None)
    
    # Transform to df and return it
    return pd.DataFrame(content)
    
    

In [16]:
top_team_data = get_top_teams()
top_teams_df = parse_top_teams(top_team_data)
top_teams_df

Unnamed: 0,team_name,points,team_id
0,kalmarunionen,1381.309456,114856
1,thehackerscrew,875.720068,85618
2,The Flat Network Society,802.124432,87434
3,Project Sekai,668.790971,169557
4,r3kapig,662.89007,58979
5,Blue Water,603.228393,205897
6,organizers,567.499872,42934
7,justCatTheFish,554.282199,33893
8,bi0s,533.210099,662
9,Never Stop Exploiting,498.371509,13575
