# Project Overview
The goal of this project is to create an engagement signal board to track key user interaction metrics on Instagram, such as Engagement Score and Content Virality Index. Additionally, an A/B testing framework will be implemented to evaluate the impact of new features on user engagement, allowing data-driven decisions for product targeting and retention strategies.

## Step 1: Importing Libraries

- **Pandas and NumPy** : For data manipulation and numerical operations.
- **Matplotlib and Seaborn** : For creating visualizations to show engagement metrics.
- **scipy.stats.ttest_ind** : For performing A/B testing with statistical significance.
- **TensorFlow** : For setting up a simple model to predict engagement.
- **scikit-learn’s mean_squared_error** : To evaluate the model’s performance.

In [10]:
%%capture
! pip install tensorflow
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind
import tensorflow as tf
from sklearn.metrics import mean_squared_error


## Step 2: Loading data into DataFrame

In [17]:

comments = pd.read_csv('~/github/instagram-engagement/comments.csv')
follows = pd.read_csv('~/github/instagram-engagement/follows.csv')
likes = pd.read_csv('~/github/instagram-engagement/likes.csv')
photo_tags = pd.read_csv('~/github/instagram-engagement/photo_tags.csv')
photos = pd.read_csv('~/github/instagram-engagement/photos.csv')
tags = pd.read_csv('~/github/instagram-engagement/tags.csv')
users = pd.read_csv('~/github/instagram-engagement/users.csv')

## Step 3: Data Exploration and Cleaning

In [39]:
print("="*30)
print(f"Comments table total na count :\n{'='*30}\n{comments.isna().sum()}")
print("="*30)
print(f"follows table total na count :\n{'='*30}\n{follows.isna().sum()}")
print("="*30)
print(f"likes table total na count :\n{'='*30}\n{likes.isna().sum()}")
print("="*30)
print(f"photo_tags table total na count :\n{'='*30}\n{photo_tags.isna().sum()}")
print("="*30)
print(f"photos table total na count :\n{'='*30}\n{photos.isna().sum()}")
print("="*30)
print(f"Tags table total na count :\n{'='*30}\n{tags.isna().sum()}")
print("="*30)
print(f"users table total na count :\n{'='*30}\n{users.isna().sum()}")

Comments table total na count :
id              0
comment_text    0
user_id         0
photo_id        0
created_at      0
dtype: int64
follows table total na count :
follower_id    0
followee_id    0
created_at     0
dtype: int64
likes table total na count :
user_id       0
photo_id      0
created_at    0
dtype: int64
photo_tags table total na count :
photo_id    0
tag_id      0
dtype: int64
photos table total na count :
id             0
image_url      0
user_id        0
created_dat    0
dtype: int64
Tags table total na count :
id            0
tag_name      0
created_at    0
dtype: int64
users table total na count :
id            0
username      0
created_at    0
dtype: int64


In [40]:
# Display basic info for each DataFrame
print("Comments Data:", comments.info())
print("Follows Data:", follows.info())
print("Likes Data:", likes.info())
print("Photo Tags Data:", photo_tags.info())
print("Photos Data:", photos.info())
print("Tags Data:", tags.info())
print("Users Data:", users.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   id            1000 non-null   int64 
 1   comment_text  1000 non-null   object
 2   user_id       1000 non-null   int64 
 3   photo_id      1000 non-null   int64 
 4   created_at    1000 non-null   object
dtypes: int64(3), object(2)
memory usage: 39.2+ KB
Comments Data: None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   follower_id  1000 non-null   int64 
 1   followee_id  1000 non-null   int64 
 2   created_at   1000 non-null   object
dtypes: int64(2), object(1)
memory usage: 23.6+ KB
Follows Data: None
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  