# Level 1 Software Quality Notebook


### General Introduction
---
#### Purpose of the Notebooks
These notebooks are part of a comprehensive framework designed to measure software quality metrics and implement corresponding controls in Jira. By integrating with Jira, GitHub, and other internal systems, the notebooks facilitate the collection, cleaning, and processing of data to derive meaningful insights.

The data products generated through these notebooks follow a multi-level structure, ranging from Level 0 to Level 4:

- **Level 0**: Raw data as collected from the sources.
- **Level 1**: Cleaned and calibrated data.
- **Level 2**: Data joined with additional sources.
- **Level 3**: Aggregated and summarized data.
- **Level 4**: Analyzed data with insights and visualizations.

The notebooks adhere to standards and definitions inspired by organizations such as NASA, NEON, and NIST, ensuring quality, consistency, and reliability.

### Specific Introduction
---
#### Purpose of the Level 1 Notebook
The Level 1 Notebook focuses on cleaning and calibrating the raw data collected in Level 0. This includes handling missing values, correcting errors, standardizing formats, and calibrating measurements.

**Key Objectives**:

- **Import Raw Data**: Retrieve raw data from Level 0, including Jira issues, GitHub Smart Commits, defects, QA failures, team structures, and individual contributors.
- **Identify Primary Key**: Determine the primary key to be used throughout the notebooks for consistent data linking and integrity.
- **Normalize Date Formats**: Ensure clean time-series data by standardizing date formats across all data sources.
- **Clean and Calibrate Data**: Perform cleaning and calibration tasks to prepare the data for further processing in subsequent levels.

By successfully accomplishing these objectives, the Level 1 Notebook ensures that the data is cleaned, calibrated, and ready for joining with additional sources in the next levels.

In [None]:
import pandas as pd

# Simulating raw data for Jira Issues
jira_issues_data = {
    'issue_id': ['ISSUE-101', 'ISSUE-102', 'ISSUE-103'],
    'title': ['Feature Request', 'Bug Fix', 'Documentation Update'],
    'status': ['Open', 'In Progress', 'Closed'],
    'assignee': ['Alice', 'Bob', 'Charlie'],
    'creation_date': ['2022-01-01', '2022-02-15', '2022-03-10']
}

# Creating a DataFrame for Jira Issues
jira_issues_df = pd.DataFrame(jira_issues_data)

# Displaying the DataFrame
jira_issues_df

In [None]:
# Simulating raw data for GitHub Smart Commits
smart_commits_data = {
    'commit_id': ['COMMIT-001', 'COMMIT-002', 'COMMIT-003'],
    'author': ['Alice', 'Bob', 'Charlie'],
    'date': ['2022-01-05', '2022-02-20', '2022-03-15'],
    'jira_issue': ['ISSUE-101', 'ISSUE-102', 'ISSUE-103']
}

# Creating a DataFrame for GitHub Smart Commits
smart_commits_df = pd.DataFrame(smart_commits_data)

# Displaying the DataFrame
smart_commits_df

In [None]:
# Simulating raw data for Defects and QA Failures
defects_data = {
    'defect_id': ['DEF-001', 'DEF-002', 'DEF-003'],
    'description': ['Null Pointer Exception', 'UI Misalignment', 'Security Vulnerability'],
    'severity': ['High', 'Medium', 'Critical'],
    'status': ['Open', 'Resolved', 'Closed'],
    'discovery_date': ['2022-01-10', '2022-02-25', '2022-03-20']
}

# Creating a DataFrame for Defects and QA Failures
defects_df = pd.DataFrame(defects_data)

# Displaying the DataFrame
defects_df

In [None]:
# Simulating raw data for Team Structures and Individual Contributors
team_data = {
    'team_id': ['TEAM-001', 'TEAM-002', 'TEAM-003'],
    'team_name': ['Development', 'QA', 'Security'],
    'git_repo': ['Repo-Dev', 'Repo-QA', 'Repo-Sec'],
    'jira_project': ['PROJECT-DEV', 'PROJECT-QA', 'PROJECT-SEC']
}

# Creating a DataFrame for Team Structures and Individual Contributors
team_df = pd.DataFrame(team_data)

# Simulating raw data for Customer Feedback (Amplitude, Qualtrics, Call Miner)
customer_feedback_data = {
    'customer_id': ['CUST-001', 'CUST-002', 'CUST-003'],
    'source': ['Amplitude', 'Qualtrics', 'Call Miner'],
    'feedback_type': ['Survey', 'Survey', 'Call Transcript'],
    'feedback_content': ['Positive', 'Neutral', 'Negative'],
    'date': ['2022-01-15', '2022-02-28', '2022-03-25']
}

# Creating a DataFrame for Customer Feedback
customer_feedback_df = pd.DataFrame(customer_feedback_data)

# Displaying the DataFrames
team_df, customer_feedback_df

In [None]:
# Displaying the simulated datasets to analyze common elements and relationships
smart_commits_df, defects_df, team_df, customer_feedback_df