# **Predictive Analysis of BNK48 & CGM48's 16th Single General Election**

![GE4 General Election Banner](GE4banner.jpeg)

*Image Source: [BNK48 Official YouTube Video](https://www.youtube.com/watch?v=4EGrHyXIvf0)*

# Table of Contents
1. [Introduction to the 48 Group](#1)
2. [Understanding the General Election](#2)
3. [Data Collection Methodology](#3)
4. [Dataset Overview](#4)
5. [Exploratory Data Analysis](#5)
6. [Predictive Model Building](#6)
   - [Regression Analysis](#6-1)
   - [XGBoost](#6-2)
   - [LightGBM](#6-3)
   - [Hyperparameter Tuning (Optuna)](#6-4)
7. [Feature Importance Analysis](#7)
8. [Conclusion and Insights](#8)

<a id='1'></a>

## 1. Introduction to the 48 Group

Welcome to the predictive analysis project for the 16th Single General Election of BNK48 and CGM48, the Thai sister groups of the international 48 Group. These idol groups have revolutionized pop culture in Asia, capturing the hearts of fans through music, performances, and interactive fan events.

The 48 Group, which originated in Japan with AKB48, is known for its unique concept of "idols you can meet." The groups have localized sister groups in several countries, each with its own teams of performers. BNK48 and CGM48 are based in Thailand and have garnered a significant following through their engaging performances and public appearances.

<a id='2'></a>

## 2. Understanding the General Election

### General Election

The General Election is a hallmark event for 48 Group fans, where supporters vote for their favorite members. The outcome of this event is crucial as it determines the lineup for the next single and often influences the group's direction. Like the 3rd election, BNK48 and CGM48 utilized a blockchain-based token voting system within the iAM48 application for this 4th election, showcasing a blend of pop culture and cutting-edge technology. Preliminary results are announced to the fans at two separate events before the final results, adding to the excitement and speculation about the final rankings.

### Key Terms

- **Senbatsu**: The selected members who will perform the A-side of a single.
- **Coupling Song**: Additional tracks featured on a single, often performed by non-Senbatsu members.
- **Oshi**: A fan's favorite member, akin to a "most supported" idol.
- **Kami**: Derived from "Kamisama" meaning "God," referring to top-ranked idols.
- **Team**: Subgroups within BNK48 and CGM48, each with distinct identities and songs.
- **Center**: The lead position in a group's formation, often front and center during performances.

### Rank
- **Kami7**: The elite top seven idols as determined by fan votes in the general election. These members are considered the most popular and influential within the group.
- **Senbatsu**: The top 16 members selected to perform on the main track of a single, often considered the face of the group during promotions.
- **Under Girls**: Members who rank 17th to 32nd in the general election. They typically perform on the B-side of a single and are featured in secondary promotions.
- **Next Girls**: Ranks 33 to 48 from the election, these members are featured in additional songs and are recognized for their potential and growing popularity.

<a id='3'></a>

## 3. Data Collection Methodology

### Data Sources
The data for the analysis of the 3rd and 4th General Elections of BNK48 & CGM48 was meticulously gathered from a combination of online sources on 17 December 2023. The primary sources of data were:

- **Twitter Account [@Stats48TH](https://twitter.com/Stats48TH)**: An unofficial but valuable source providing comprehensive statistics and information on the members of BNK48 and CGM48.
- **Wikipedia Pages**: Detailed historical data and election results were extracted from Wikipedia for the:
  - [3rd General Election for the 12th Single](https://th.wikipedia.org/wiki/การเลือกตั้งทั่วไปเซ็มบัตสึบีเอ็นเคโฟร์ตีเอต_ประจำซิงเกิลที่_12)
  - [4th General Election for the 16th Single](https://th.wikipedia.org/wiki/การเลือกตั้งทั่วไปเซ็มบัตสึบีเอ็นเคโฟร์ตีเอต_ประจำซิงเกิลที่_16)
  
### Data Extraction
The data extraction process encompassed multiple steps to ensure the richness and accuracy of the dataset:

- **Reviewing Twitter Data**: An in-depth analysis of data summaries from the Twitter account was conducted to understand the nuances of the election results and member statistics.
- **Wikipedia Research**: The Wikipedia pages for the 3rd and 4th General Elections were scrutinized to validate and enrich the data obtained from Twitter.
- **Database Analysis**: The `Database members BNK48 & CGM48 (10_12_2023 21.45).xlsx`, provided by @Stats48TH, offered a granular view of the election outcomes and member profiles.
- **Power Query (Get Data)**: Power Query in Microsoft Excel was leveraged to perform data extraction, shaping, and transformation, facilitating the integration of various data sources.

### Data Processing and Model Creation
A comprehensive data processing workflow was adopted to refine the raw data and construct a robust data model:

- **OCR (Optical Character Recognition)**: OCR technology was utilized to digitize images and scanned documents, rendering them into an analyzable format.
- **Data Scraping**: Structured data was scraped from the web, transforming unstructured online information into a usable dataset.
- **Data Cleaning**: The dataset was meticulously cleaned to rectify any inaccuracies and standardize the format, thus ensuring the integrity of the analysis.
- **Data Model Creation in PowerPivot**: The data from various sources was combined using PowerPivot in Excel, which facilitated the creation of a data model that underscores the relationships between different entities. The ER Diagram provided a visual representation of this data model, showcasing how the `GE4rank` table was used as the cornerstone for 'Left Joins' with other tables based on the `Name_Band` attribute.

![ER_Diagram](ER_diagram.png)

### Final Dataset
The culmination of the above processes led to the assembly of the final dataset, which offers a holistic portrayal of the election results and member data. This dataset underpins the exploratory data analysis and predictive modeling executed in this project, aiming to extract meaningful insights and patterns.


<a id='4'></a>

## 4. Dataset Overview

This dataset includes detailed statistics from the latest General Election, including votes counted via blockchain technology, member rankings, and various other metrics that could influence the election outcomes.

In [2]:
import pandas as pd

In [4]:
df = pd.read_excel('BNK48_CGM48_df.xlsx')

In [5]:
df

Unnamed: 0,GE4_Rank,Name,Name_Band,GE4_token,GE4_Transaction,GE4_Wallet,GE4_Prelim1,GE4_Token_Prelim1,GE4_Prelim2,Band,...,Request_Hour,Game_Caster,Theater_Stage,iAM48_Kami,iAM48_Oshi,iAM48_Cookies,iAM48_Likes,Setbatsu_Total,Center_Main_Total,GE4_Position
0,1,Pim,Pim_CGM48,133891.5231,1023,570,4.0,5590.69,1.0,CGM48,...,2,6,0,918,12670,17831388,94093,9,1,Senbatsu
1,2,Paeyah,Paeyah_BNK48,110223.6325,841,459,5.0,4159.38,5.0,BNK48,...,3,0,56,1657,16731,8486508,60637,4,1,Senbatsu
2,3,Kaning,Kaning_CGM48,99316.8750,967,526,2.0,10419.31,2.0,CGM48,...,3,27,0,3101,31554,27935669,183438,11,3,Senbatsu
3,4,Minmin,Minmin_BNK48,72812.5690,557,281,3.0,6017.28,8.0,BNK48,...,5,2,148,1404,45844,26256835,182763,9,0,Senbatsu
4,5,Pancake,Pancake_BNK48,52589.7200,484,266,28.0,1069.53,21.0,BNK48,...,3,5,43,714,13133,10090840,63289,3,1,Senbatsu
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,60,Berry,Berry_BNK48,920.9499,139,104,,,,BNK48,...,0,0,0,83,2901,1945863,16496,0,0,Unranked
60,61,Wawa,Wawa_BNK48,669.7650,136,112,,,,BNK48,...,0,0,0,126,2870,592968,25511,0,0,Unranked
61,62,Papang,Papang_CGM48,494.7850,215,158,,,,CGM48,...,0,0,0,149,3385,1436223,10568,0,0,Unranked
62,63,Emma,Emma_CGM48,275.6710,182,147,,,,CGM48,...,0,0,0,62,2421,581191,11824,0,0,Unranked


In [6]:
#df.to_csv('BNK48_CGM48_df.csv', index=False)

<a id='5'></a>

## 5. Exploratory Data Analysis

<a id='6'></a>

## 6. Predictive Model Building

<a id='6-1'></a>

### 6.1 Regression Analysis

<a id='6-2'></a>

### 6.2 XGBoost


<a id='6-3'></a>

### 6.3 LightGBM

<a id='6-5'></a>

### 6.4 Hyperparameter Tuning (Optuna)

<a id='7'></a>

## 7. Feature Importance Analysis

<a id='8'></a>

## 8. Conclusion and Insights