# Level 2 Software Quality Notebook


## General Introduction
This series of notebooks is designed to develop a framework for measuring software quality metrics and their corresponding controls in Jira. By leveraging data from various sources such as Jira, GitHub, and customer feedback platforms, we aim to build multi-level data products (Level 0-4) that provide insights into software development processes and quality assurance.

## Level 2 Notebook Introduction
The Level 2 Notebook builds upon the cleaned data from Level 1 by incorporating customer feedback from various signal sources like Amplitude, Qualtrics, and Call Miner. By joining this feedback data with existing datasets on GitHub commits, Jira issues, defects, and team structures, we aim to create a multi-dimensional view of software quality. Tasks include normalizing feedback data, time-series analysis, data visualization, and preparation for next levels.

## Data Alignment with Primary Key
Once the customer feedback data is collected from Amplitude, Qualtrics, and Call Miner, the next step is to align this data with the previously identified primary key. This alignment ensures consistency across datasets and enables seamless integration with existing data from Level 1.

### Steps for Data Alignment
1. **Identify Common Elements**: Determine the common elements in the collected data that correspond to the primary key (e.g., Customer ID, Feature ID, etc.).
2. **Map to Primary Key**: Map the identified common elements to the primary key used in previous levels. This may involve data transformations or conversions.
3. **Handle Missing or Inconsistent Data**: Implement strategies to handle missing or inconsistent data in the alignment process. This may include imputation, exclusion, or other data cleaning techniques.
4. **Verify Alignment**: Perform checks to verify that the alignment is correct and that the data is consistent across sources.

By aligning the collected data with the primary key, we ensure that the customer feedback data can be accurately integrated with existing datasets, providing a comprehensive view of software quality metrics and user feedback.

## Data Preprocessing and Normalization
After aligning the collected data with the primary key, the next step is to preprocess and normalize the data. This ensures that the data is in a suitable format for analysis and integration with existing datasets.

### Steps for Data Preprocessing and Normalization
1. **Handle Missing Values**: Implement strategies to handle missing values in the collected data. This may include imputation, exclusion, or other techniques.
2. **Categorize Feedback**: Categorize customer feedback into relevant groups or themes for analysis. This may involve text analysis, sentiment analysis, or other methods.
3. **Standardize Date Formats**: Ensure that date formats are consistent across datasets, allowing for accurate time-series analysis.
4. **Normalize Data**: Apply normalization techniques to standardize the scale of numerical variables, if necessary.

These preprocessing and normalization steps prepare the data for integration with Level 1 data and subsequent analysis, ensuring that the data is clean, consistent, and ready for exploration.

## Data Integration with Level 1 Data
With the collected customer feedback data aligned, preprocessed, and normalized, the next step is to integrate this data with the cleaned data from Level 1. This integration creates a comprehensive dataset that includes information from GitHub commits, Jira issues, defects, team structures, and customer feedback.

### Steps for Data Integration
1. **Join Datasets**: Use the primary key to join the collected data with the Level 1 data. This may involve inner joins, outer joins, or other join techniques.
2. **Resolve Conflicts**: Identify and resolve any conflicts or inconsistencies that may arise during the join process.
3. **Verify Integration**: Perform checks to verify that the integration is successful and that the data is consistent across sources.
4. **Prepare for Analysis**: Structure the integrated data for further analysis, including potential machine learning models, predictions, and advanced analytics.

The successful integration of customer feedback data with existing datasets provides a rich and multi-dimensional view of software quality metrics, enabling deeper insights and more informed decision-making.

## Vendor Data Sources Documentation
In this section, we provide an overview of the available data and recommendations for Amplitude, Call Miner, and Qualtrics. Understanding the API endpoints, schemas, and data formats will simplify the process of collecting and applying the data.


### Amplitude
- **API Documentation**: [Amplitude API Documentation](https://developers.amplitude.com/docs)
- **Data Schema**: The data schema typically includes user behavior data, such as events, user properties, and device information.
- **Data Formats**: JSON is commonly used for data exchange with Amplitude's API.

### Call Miner
- **API Documentation**: [Call Miner API Documentation](https://callminer.com/api-documentation/)
- **Data Schema**: Call Miner's schema may include call transcripts, sentiment analysis, and other speech analytics data.
- **Data Formats**: The data format may vary, and specific details can be found in the API documentation.

### Qualtrics
- **API Documentation**: [Qualtrics API Documentation](https://api.qualtrics.com/)
- **Data Schema**: Qualtrics provides survey data, including responses, questions, and metadata.
- **Data Formats**: JSON and CSV are common data formats used with Qualtrics' API.

These details offer a comprehensive view of the available data from each vendor, aiding in the collection and integration process. It's advisable to consult the official API documentation and any available developer guides for detailed information tailored to specific needs.