## Task 8 Applicant behaivior Analysis

In [2]:
#load Libraries

In [4]:
import pandas as pd

In [6]:
#Step 1: Load dataset

In [8]:
df = pd.read_csv("applicant_behavior_dataset_large.csv", parse_dates=["Timestamp"])




In [10]:
print(" Dataset Loaded Successfully!")
print(" Shape:", df.shape)
print("\n Sample 5 rows:")
print(df.head(5).to_string(index=False))

 Dataset Loaded Successfully!
 Shape: (96328, 15)

 Sample 5 rows:
  Event_ID           Timestamp   User_ID Session_ID Device Traffic_Source  Country     City              Page Event_Type  Funnel_Stage  Time_On_Page_Sec  Is_Exit       Date  Hour
EV00000001 2025-12-20 15:00:00 USR003596 SES0001706 Mobile         Direct Pakistan New York              Home  page_view             0                22        0 2025-12-20    15
EV00000002 2025-12-20 15:03:00 USR003596 SES0001706 Mobile         Direct Pakistan New York       Internships  page_view             1                17        0 2025-12-20    15
EV00000003 2025-12-20 15:04:00 USR003596 SES0001706 Mobile         Direct Pakistan New York Internship Detail  page_view             2                40        0 2025-12-20    15
EV00000004 2025-12-20 15:06:00 USR003596 SES0001706 Mobile         Direct Pakistan New York           Profile  page_view             4                62        0 2025-12-20    15
EV00000005 2025-12-20 15:12:00 USR0035

In [12]:
#Step 2: Top page visits

In [14]:
page_visits = df[df["Event_Type"]=="page_view"].groupby("Page").size() \
    .sort_values(ascending=False).reset_index(name="Page_Views")

print("\nTop 10 Pages by Page Views:")
print(page_visits.head(10).to_string(index=False))



Top 10 Pages by Page Views:
             Page  Page_Views
      Internships       26338
             Home       17324
              FAQ        7859
Internship Detail        5907
      Apply Start        5526
          Profile        4659
       Apply Form        4436
          Contact        3505
        Upload CV        3343
           Review        2545


In [16]:
#Step 3: Funnel Reach (Sessions reaching each stage)

In [18]:
session_stage = df.groupby("Session_ID")["Funnel_Stage"].max().reset_index(name="Max_Stage_Reached")
total_sessions = session_stage["Session_ID"].nunique()

reach = []
for stage in range(0, 11):
    count = (session_stage["Max_Stage_Reached"] >= stage).sum()
    reach.append([stage, count, count/total_sessions])

reach_df = pd.DataFrame(reach, columns=["Funnel_Stage","Sessions_Reaching_Stage","Reach_Rate"])

print("\nFunnel Reach Summary:")
print(reach_df.to_string(index=False))



Funnel Reach Summary:
 Funnel_Stage  Sessions_Reaching_Stage  Reach_Rate
            0                     8000    1.000000
            1                     8000    1.000000
            2                     7878    0.984750
            3                     7472    0.934000
            4                     7064    0.883000
            5                     5650    0.706250
            6                     4613    0.576625
            7                     3562    0.445250
            8                     2637    0.329625
            9                     2338    0.292250
           10                     2168    0.271000


In [20]:
#Step 4: Funnel Drop-offs (Bottlenecks)

In [22]:
drop = []
for s in range(0, 10):
    reached_s = (session_stage["Max_Stage_Reached"] >= s).sum()
    reached_next = (session_stage["Max_Stage_Reached"] >= (s+1)).sum()
    drop_sessions = reached_s - reached_next
    drop_rate = drop_sessions / reached_s if reached_s else 0
    drop.append([s, s+1, reached_s, reached_next, drop_sessions, drop_rate])

drop_df = pd.DataFrame(drop, columns=[
    "From_Stage","To_Stage","Sessions_Reached_From","Sessions_Reached_To",
    "Dropoff_Sessions","Dropoff_Rate"
])

print("\nFunnel Drop-offs:")
print(drop_df.to_string(index=False))

print("\n Biggest Bottleneck Stage:")
print(drop_df.sort_values("Dropoff_Rate", ascending=False).head(1).to_string(index=False))



Funnel Drop-offs:
 From_Stage  To_Stage  Sessions_Reached_From  Sessions_Reached_To  Dropoff_Sessions  Dropoff_Rate
          0         1                   8000                 8000                 0      0.000000
          1         2                   8000                 7878               122      0.015250
          2         3                   7878                 7472               406      0.051536
          3         4                   7472                 7064               408      0.054604
          4         5                   7064                 5650              1414      0.200170
          5         6                   5650                 4613              1037      0.183540
          6         7                   4613                 3562              1051      0.227834
          7         8                   3562                 2637               925      0.259686
          8         9                   2637                 2338               299      0.113386
 

In [24]:
#Step 5: Top Exit Pages

In [26]:
exit_points = df[df["Is_Exit"]==1].groupby("Page").size() \
    .sort_values(ascending=False).reset_index(name="Exits")

print("\n Top 10 Exit Pages:")
print(exit_points.head(10).to_string(index=False))



 Top 10 Exit Pages:
       Page  Exits
Internships   2888
       Home   1779
        FAQ   1157
  Thank You    873
    Contact    478
  Upload CV    219
 Apply Form    177
Apply Start    124
    Profile    117
     Review     92


In [28]:
#Step 6: Device Conversion (Submit & Thank You)

In [30]:
session_stage["Converted_Submit"] = (session_stage["Max_Stage_Reached"] >= 9).astype(int)
session_stage["Converted_ThankYou"] = (session_stage["Max_Stage_Reached"] >= 10).astype(int)

session_device = df.groupby("Session_ID")["Device"].first().reset_index()
merged = session_stage.merge(session_device, on="Session_ID")

device_conv = merged.groupby("Device").agg(
    Total_Sessions=("Session_ID","count"),
    Submit_Conversions=("Converted_Submit","sum"),
    ThankYou_Conversions=("Converted_ThankYou","sum")
).reset_index()

device_conv["Submit_Conversion_Rate"] = device_conv["Submit_Conversions"] / device_conv["Total_Sessions"]
device_conv["ThankYou_Conversion_Rate"] = device_conv["ThankYou_Conversions"] / device_conv["Total_Sessions"]

print("\n Device Conversion Rates:")
print(device_conv.to_string(index=False))



 Device Conversion Rates:
 Device  Total_Sessions  Submit_Conversions  ThankYou_Conversions  Submit_Conversion_Rate  ThankYou_Conversion_Rate
Desktop            2206                 624                   582                0.282865                  0.263826
 Mobile            5477                1613                  1496                0.294504                  0.273142
 Tablet             317                 101                    90                0.318612                  0.283912


## Dataset Summary

Dataset Name: Applicant Behavior Dataset 

Total Events: 96,328

Total Sessions: 8,000

Total Users: 4,000

Time Period: July 2025 â€“ Dec 2025

Columns: 15 (page, session, device, source, funnel stage, exits etc.)

## THANK YOU