# OpenAI-Assisted Social Media Page Analysis Using Python

This project focuses on analyzing social media page data using **Python**
to understand engagement patterns, page categories, and content themes.

The dataset contains information such as follower counts, posting activity,
page type, and bio descriptions for a wide range of social media pages.
The goal is to extract meaningful insights from this raw data through
systematic cleaning, filtering, and exploratory analysis.

This is an **analysis-focused project**, not a recommendation system or
machine learning model. All data handling and analysis logic is implemented
manually using Python data structures.

OpenAI is used only as a **supporting tool** to assist with interpretation
and documentation, while the core analytical work is performed independently.

This notebook covers:
- Loading and understanding raw social media data
- Cleaning and structuring the datasetnswer these questions:
- Who has the maximum posts
- Who has the maximum followers
- Who follows the maximum people
- How many categories (Digital creators, Non-profit foundation, etc do we have? How manypeople do we have?

In [10]:
with open("finaldata.txt", encoding = 'utf-8') as f:
    data = f.read()

In [11]:
chunks= data.split("\n\n")
chunks = [c for c in chunks if len(c)>5]

In [12]:
def parse_chunk(chunk):
    chunk = chunk.strip()
    sep_chunk = chunk.split("\n")
    username = sep_chunk[0]
    no_of_post = int(sep_chunk[1].split(" post")[0].replace(",", ""))
    no_of_followers= float(sep_chunk[2].split(" follower")[0].replace(",","").replace("K","").replace("M",""))
    if "K" in sep_chunk[2]:
        no_of_followers = int(no_of_followers * 1000)
    elif "M" in sep_chunk[2]:
        no_of_followers = int(no_of_followers * 1000000)
    else:
        no_of_followers=int(no_of_followers)
    
    no_of_following= float(sep_chunk[3].split(" following")[0].replace(",","").replace("K","").replace("M",""))
    if "K" in sep_chunk[3]:
        no_of_following = int(no_of_following * 1000)
    elif "M" in sep_chunk[3]:
        no_of_following = int(no_of_following * 1000000)
    else:
        no_of_following=int(no_of_following)
    
    name = sep_chunk[4]
    if len(sep_chunk)> 5:
        type_of_page = sep_chunk[5]
        bio = "\n".join(sep_chunk[6:])
    else:
        type_of_page = "Unknown"
        bio=""
    return {"username": username, "no_of_post": no_of_post, "no_of_followers": no_of_followers, "no_of_following": no_of_following, "name": name, "type_of_page": type_of_page, "bio": bio}   

In [13]:
all_chunks = []
for chunk in chunks:
    paserd_chunk = parse_chunk(chunk)
    all_chunks.append(paserd_chunk)

In [14]:
import json
file=json.dumps(all_chunks,indent=4)

with open("data.json", "w") as f:
    f.write(file)

# who has the maximun posts?
lets write code to find max posts

In [15]:
max_posts =0
for chunk in all_chunks:
    if max_posts<chunk["no_of_post"]:
        max_posts=chunk["no_of_post"]
        chunk_with_max_post=chunk
print(chunk_with_max_post["username"])

startuphub_blr


# who has the maximun followers
lets write code to find maximun followers

In [16]:
max_followers=0
for chunk in all_chunks:
    if max_followers< chunk["no_of_followers"]:
        max_followers=chunk["no_of_followers"]
        chunk_with_max_followers=chunk

print(chunk_with_max_followers["username"])

_anujsinghal


# Who follows maximum people
Lets write code to find maximum follows

In [17]:
max_follows = 0
for chunk in all_chunks:
    if chunk["no_of_following"] >max_follows:
        max_follows = chunk["no_of_following"]
        chunk_with_max_following=chunk

print(chunk_with_max_following["username"])

bangalore_tech_bro


# How many categories? 
lets write code to find out number of categories

In [18]:
categories=set()

for chunk in all_chunks:
        categories.add(chunk["type_of_page"])

print(f"Ther are {len(categories)} categories")

Ther are 34 categories
