# Summative Assessment
---
This assessment will combine all that I've learned up to the end of this module. Over the course of the weeks, I've been creating Productions that aim to cover each of the main functions that this prototype application is supposed to include. Those main functions are:

1. A means to load the initial dataset (in CSV files) and translate it into a suitable file format like JSON, XML or entity relationship structure.
2. A means to back up the current state of the data to a file or database. This should preserve the current state of the data and make it available when the program is reopened.
3. A process for cleaning and preparing the data, managing inconsistencies, errors and missing values. Cleaning can be done at either the CSV stage or after translating the data set into a new format. This is required to be done before applying any of the data manipulations and outputs detailed below. 
4. A Graphical User Interface (GUI) that allows a user to:
    - Load the initial dataset (CSV files)
    - Apply cleaning, transformation, REMOVE and RESHAPE to produce a prepared dataset
    - Load the prepared dataset from its translated format
    - Manipulate the **range** of data used to produce STATISTICS, GRAPHS and CORRELATION analysis
    - Use this range of data to produce STATISTICS, GRAPHS and CORRELATION analysis

It is not expected that this prototype be *generic*, but it is expected that it will handle more data of the given format

## Required Functions

The application should be capable of producing the following results:

1. REMOVE: No outputs should include data where Component is either 'System' or 'Folder'
2. RENAME: The column “User Full Name *Anonymized” should be renamed as User_ID both in ACTIVITY_LOG and USER_LOG CSVs.
3. MERGE: Merge suitable CSVs to analyse the interaction between users and components
4. RESHAPE: Reshape the data using pivot operation
5. COUNT: The interactions for each user with the Component for each month. Add this new field to the new structure.
    - HINT: This will be helpful when deciding which CSVs to merge
    - HINT: You may also want to think about this when reshaping - hierarchical indexing?
6. OUTPUT STATISTICS: Produce the mean, mode and median for the components: Quiz, Lecture, Assignment, Attendance, and Survey
    - For each month
    - For the entire 13-week academic semester
7.	OUTPUT CORRELATION:  Produce a suitable graph that displays the following information from user interactions with the following components: Assignment, Quiz, Lecture, Book, Project, and Course. Determine if there is any significant correlation between the ‘User_ID’ and ‘Component’. You will need to select an appropriate visualisation to demonstrate this.

## Non-Functional Requirements

- The GUI interface must be able to provide appropriate feedback to confirm or deny a user’s actions.
- The application must be able to handle internal and user-generated errors.

## Technical requirements 
- The application is built using Core Python from version 3.7 - 3.10. 
    - Has been negotiated, it makes no sense to use outdated versions so I use Python 3.12.7
- The application uses one or more of the advanced Application Programming Interfaces (API’s) introduced on this module such as: NumPy, Pandas, Seaborn, Matplotlib. It should NOT use alternative API’s for this functionality; however, appropriate Python core libraries can be used to access/query a database.
    - So I should be allowed to use `pymongo`
- The application MUST run within the Anaconda environment using a Jupyter notebook.
    - Place all functions in the top-level code block - categorise them by classes
-	The application and its parts must not run concurrently, and must NOT use Python threads.


# The Code

## Classes Containing Functions + Imports

In [1]:
%run analysis_funcs.ipynb
%run gui_funcs.ipynb
%run cleaning_funcs.ipynb
%run pickling_db_funcs.ipynb

In [2]:
import pandas as pd
import numpy as np

## Loading the Database

In [None]:
csv_to_json('ACTIVITY_LOG')
data = json_to_df('ACTIVITY_LOG')


Unnamed: 0,User Full Name *Anonymized,Component,Action,Target
0,129,Course,Viewed,Content
1,26,Quiz,Updated,Response
2,26,Quiz,Viewed,Attempt
3,86,Assignment,Viewed,Assignment
4,86,Assignment,Viewed,Submission_state
...,...,...,...,...
150830,125,Course,Viewed,Content
150831,86,System,Viewed,Content
150832,129,Course,Viewed,Course
150833,26,Quiz,Viewed,Attempt
