# **Result Management System**

### The Result Management System is designed to efficiently process, analyze, and manage student academic records using Apache Spark, MongoDB, and Kafka. This system handles large datasets, performs statistical analysis, and provides insights into student performance.

### Key Features:
### 1: Data Generation & Storage – Generates student records and stores them in MongoDB.
### 2: Data Processing – Uses PySpark to clean, filter, and analyze student results.
### 3: Statistical Analysis – Computes performance metrics like average, highest, and lowest scores.
### 4: Visualization – Displays trends in student marks using histograms and summary statistics.
### 5:Real-time Streaming (Optional) – Uses Kafka for real-time updates and notifications.

### This project showcases the power of Big Data technologies in managing academic records efficiently.

### Installing Libraries

In [134]:
!pip install pyspark faker kafka-python pymongo



### Importing prerequisites

In [135]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, avg
from faker import Faker
import random
import json
from kafka import KafkaProducer
from pymongo import MongoClient
from pyspark.sql.functions import avg, max, min, col
from pyspark.sql import functions as F

### Initialising Spark Session




In [136]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .master("local[*]") \
    .appName("ResultManagement") \
    .config("spark.driver.memory", "2g") \
    .getOrCreate()

print(spark)


<pyspark.sql.session.SparkSession object at 0x7d9c0f313c10>


### Generating Data using Faker

In [137]:
from faker import Faker
import random

fake = Faker()

def generate_students(num_students=10000):
    student_ids = list(range(1000, 1000 + num_students))
    random.shuffle(student_ids)

    students = [(student_ids[i], fake.name(), random.randint(35, 100)) for i in range(num_students)]
    return students

data = generate_students()
columns = ["ID", "Name", "Marks"]
df = spark.createDataFrame(data, columns)
df.show(10)


+-----+-----------------+-----+
|   ID|             Name|Marks|
+-----+-----------------+-----+
| 4431|   Jerry Hamilton|   42|
| 7168|   Melanie Spears|   51|
| 7684|      James Young|   96|
| 9092|   Garrett Butler|   61|
| 9961|   Kimberly Yates|   94|
| 4460|        Kim Davis|   41|
| 1469|Jennifer Martinez|   56|
| 4250|      Mark Fisher|   86|
|10307|    Tony Richards|   35|
| 8510|  Monica Gonzalez|   37|
+-----+-----------------+-----+
only showing top 10 rows



### Converting to Spark Dataframe

In [138]:
df = spark.createDataFrame(students_data)

### Computing Stats(Average)

In [139]:

stats_df = df.select(
    F.lit("Electronics").alias("Subject"),
    avg(col("Electronics")).alias("Avg_Marks"),
    max(col("Electronics")).alias("Max_Marks"),
    min(col("Electronics")).alias("Min_Marks")
).union(df.select(
    F.lit("Programming").alias("Subject"),
    avg(col("Programming")).alias("Avg_Marks"),
    max(col("Programming")).alias("Max_Marks"),
    min(col("Programming")).alias("Min_Marks")
)).union(df.select(
    F.lit("Database").alias("Subject"),
    avg(col("Database")).alias("Avg_Marks"),
    max(col("Database")).alias("Max_Marks"),
    min(col("Database")).alias("Min_Marks")
)).union(df.select(
    F.lit("Data Science").alias("Subject"),
    avg(col("Data Science")).alias("Avg_Marks"),
    max(col("Data Science")).alias("Max_Marks"),
    min(col("Data Science")).alias("Min_Marks")
)).union(df.select(
    F.lit("Mathematics").alias("Subject"),
    avg(col("Mathematics")).alias("Avg_Marks"),
    max(col("Mathematics")).alias("Max_Marks"),
    min(col("Mathematics")).alias("Min_Marks")
)).union(df.select(
    F.lit("DSA").alias("Subject"),
    avg(col("DSA")).alias("Avg_Marks"),
    max(col("DSA")).alias("Max_Marks"),
    min(col("DSA")).alias("Min_Marks")
))

# Show the formatted table
stats_df.show()


+------------+---------+---------+---------+
|     Subject|Avg_Marks|Max_Marks|Min_Marks|
+------------+---------+---------+---------+
| Electronics|  69.8027|      100|       40|
| Programming|  70.3255|      100|       40|
|    Database|  69.6903|      100|       40|
|Data Science|  69.9387|      100|       40|
| Mathematics|  70.1751|      100|       40|
|         DSA|  70.1088|      100|       40|
+------------+---------+---------+---------+



### Collecting stats

In [140]:
stats = stats_df.collect()[0].asDict()

### Initialising Kafka producer to send stats

In [141]:
def send_to_kafka(topic, data):
    producer = KafkaProducer(
        bootstrap_servers='localhost:9092',
        value_serializer=lambda v: json.dumps(v).encode('utf-8')
    )
    producer.send(topic, data)
    producer.flush()

### Sending Computed stats to Kafka topic

In [142]:
#send_to_kafka("student_statistics", stats)

### Connecting to MongoDB

In [143]:
client = MongoClient("mongodb://localhost:27017/")
db = client["ResultManagement"]
collection = db["Feedback"]

### Storing Feedback in MongoDB

In [144]:
def store_feedback(student_id, feedback):
    collection.insert_one({"StudentID": student_id, "Feedback": feedback})

### Showing record and computed stats

In [145]:
df.show(10)
stats_df.show()

+---+------------+--------+-----------+-----------+----------------+-----------+---------+
|DSA|Data Science|Database|Electronics|Mathematics|            Name|Programming|StudentID|
+---+------------+--------+-----------+-----------+----------------+-----------+---------+
| 52|          87|      77|         44|         96| Kimberly Garcia|         41|     5820|
| 47|          60|      72|         72|         79|      Ryan Smith|         77|     7583|
| 51|          72|      52|         75|         81|  Barbara Glover|         97|     4630|
| 82|          89|      48|         88|         68|Tamara Hernandez|         43|     6777|
| 66|          76|      58|         79|         46|   Carrie Martin|         89|     4018|
| 84|          56|      55|         74|         63|    Amanda Klein|         49|     1485|
| 65|          88|      86|         90|         69| Christina White|         93|     9688|
| 68|          57|      82|         64|         80|      Mark Smith|         97|     9149|

In [146]:
df.write.csv("students_data.csv", header=True, mode="overwrite")
stats_df.write.csv("statistics.csv", header=True, mode="overwrite")

### Terminating Spark Session

In [147]:
spark.stop()