# 3. Analytics Dashboard

**Goal**: Analyze the flattened data to generate insights for the report.

---

In [1]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, count, desc, hour

spark = SparkSession.builder \
    .appName("ProjectSpark-Analyze") \
    .getOrCreate()

In [2]:
# Load Processed Data
input_parquet = "../data/processed/github_commits_flat.parquet"
df = spark.read.parquet(input_parquet)
df.createOrReplaceTempView("commits")

print(f"Total Commits Analyzed: {df.count()}")

Total Commits Analyzed: 10109


## Insight 1: Top 10 Active Contributors
Who pushed the most commits?

In [3]:
top_contributors = df.groupBy("actor_login") \
    .count() \
    .orderBy(desc("count")) \
    .limit(10)

top_contributors.show()

+--------------+-----+
|   actor_login|count|
+--------------+-----+
|mirror-updates|  413|
| KenanSulayman|   80|
|         alama|   59|
|    willholley|   53|
|   fluffyfreak|   51|
|       PLMbugz|   45|
|        hex7c0|   43|
|       amaduki|   41|
|      rnelson0|   41|
|        geekzy|   40|
+--------------+-----+



## Insight 2: Top 10 Repositories by Activity

In [4]:
top_repos = df.groupBy("repo_name") \
    .count() \
    .orderBy(desc("count")) \
    .limit(10)

top_repos.show(truncate=False)

+-----------------------+-----+
|repo_name              |count|
+-----------------------+-----+
|sakai-mirror/melete    |235  |
|sakai-mirror/mneme     |80   |
|KenanSulayman/heartbeat|80   |
|sakai-mirror/ambrosia  |80   |
|alama/PSO2Proxy        |59   |
|willholley/pouchdb     |53   |
|mquinson/PLM-data      |45   |
|amaduki/bbCompass      |41   |
|dumbbell/freebsd       |40   |
|jakesyl/munbot         |37   |
+-----------------------+-----+



## Insight 3: SQL Analysis Example
Using Spark SQL to find commits containing 'fix' in message.

In [5]:
fixes = spark.sql("""
    SELECT actor_login, repo_name, commit_message
    FROM commits
    WHERE lower(commit_message) LIKE '%fix%'
    LIMIT 5
""")
fixes.show(truncate=False)

+-----------+-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
|actor_login|repo_name                |commit_message                                                                                                                                     |
+-----------+-------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+
|rspt       |rspt/rspt-theme          |Fix main header height on mobile                                                                                                                   |
|suneg      |suneg/dojo_rules         |fixed spelling errors                                                                                                                              |
|walmik     |walmik/timer.jquery      |Merge pull request #2