# Introduction:

As the 2024 general elections draw near in South Africa, the nation stands at a critical juncture, poised to make decisions that will shape its political landscape for years to come. With the democratic process at the forefront, understanding the intricacies of these elections is paramount. This project aims to delve deep into the dynamics, trends, and factors influencing the upcoming elections, providing valuable insights into voter behavior, political strategies, and the broader socio-political context of South Africa.

In [80]:
import os
import requests
from bs4 import BeautifulSoup

# Define a function to parse HTML content and extract data
def parse_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    table = soup.find('table', class_='wikitable sortable')

    if '2014' in soup.title.text:
        table = soup.find_all('table', class_='wikitable sortable')[1]
    
    rows = table.find_all('tr')

    if '2009' in soup.title.text:
        headings_row = rows[0]
        data_rows = rows[0:-7]
    else:
        headings_row = rows[1]
        data_rows = rows[1:-7]
        
    headings = [th.text.strip() for th in headings_row.find_all('th')]
    data = []
    data_rows = rows[2:-7]
    for row in data_rows:
        columns = row.find_all('td')
        row_data = [column.text.strip() for column in columns]
        data.append(row_data)
    return headings, data

# Define a function to write data to a CSV file with UTF-8 encoding
def write_to_csv(headings, data, filename):
    import csv
    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(headings)
        for row in data:
            writer.writerow(row)


In [81]:
# Define the list of years
years = [1994, 1999, 2004, 2009, 2014, 2019]

# Create the 'data/raw' directory if it doesn't exist
os.makedirs('data/raw', exist_ok=True)

# Loop through each year
for year in years:
    # Construct the URL for the Wikipedia page for the current year
    url = f'https://en.wikipedia.org/wiki/{year}_South_African_general_election'
    
    # Fetch HTML content from the URL
    response = requests.get(url)
    html = response.content
    
    # Parse HTML content and extract data
    headings, data = parse_html(html)
    
    # Specify the filename for the CSV file
    csv_filename = f'data/raw/election_results_{year}.csv'
    
    # Write data to CSV file
    write_to_csv(headings, data, csv_filename)
    print(f"Data for the year {year} has been written to {csv_filename}")


Data for the year 1994 has been written to data/raw/election_results_1994.csv
Data for the year 1999 has been written to data/raw/election_results_1999.csv
Data for the year 2004 has been written to data/raw/election_results_2004.csv
Data for the year 2009 has been written to data/raw/election_results_2009.csv
Data for the year 2014 has been written to data/raw/election_results_2014.csv
Data for the year 2019 has been written to data/raw/election_results_2019.csv
