This assignment demonstrates the use of linear and parallel processing techniques to analyze student fee submission patterns. The goal is to identify the most common fee submission dates and compare execution times between the two approaches.
This assignment focuses on:
- Linear Processing: A step-by-step sequential approach for processing data, suitable for smaller datasets.
- Parallel Processing: A faster alternative that uses Python’s
multiprocessing
module, ideal for handling large datasets efficiently.
The assignment analyzes two CSV files:
students.csv
: Contains student details:StudentID
: Unique identifier for each student.Name
: Name of the student.Gender
: Gender of the student.EnrollmentYear
: The year the student enrolled.
fees.csv
: Contains fee payment records:FeeID
: Unique identifier for each fee payment record.StudentID
: Reference to the student making the payment.Semester
: Semester for which the fee is paid.Amount
: Fee amount paid.PaymentDate
: Date the payment was made.
Both approaches calculate the frequency of fee submission dates and list the most common dates along with their frequencies.
- Create a virtual environment:
python -m venv env
- Activate the virtual environment:
- Windows:
env\Scripts\activate
- Mac/Linux:
source env/bin/activate
- Windows:
- Install the required library:
pip install pandas
Execute the linear processing script:
python linear_processing.py
Execute the parallel processing script:
python parallel_processing.py
Here is a comparison of the execution times between the linear and parallel processing approaches.
- Execution Time: 100.98 seconds
- Execution Time: 82.57 seconds
- Top 10 Most Common Fee Submission Dates:
Date Frequency 2024-02-23 3609 2024-02-16 3571 2024-02-29 3540 2024-02-13 3526 2024-02-17 3526 2024-02-28 3520 2024-02-12 3500 2024-02-11 3493 2024-02-07 3464 2024-02-10 3452
- Linear approach to process fee submission data step by step.
- Parallel approach using Python’s
multiprocessing
module for faster execution. - Analysis of the most frequent fee submission dates.
- Comparison of execution times to demonstrate performance improvements.
This assignment uses:
- pandas: For efficient data manipulation and analysis.
- multiprocessing: To implement parallel processing.