# A9 Jira Dependency Analysis Notebook

This project provides a comprehensive analysis of Common Vulnerabilities and Exposures (CVEs) in the context of Jira data center dependencies. It not only aids in identifying and mitigating potential security risks but also serves as a valuable tool for performance optimization, cloud migration planning, and general exploration.

## Security Perspective

By analyzing CVEs, we can identify potential vulnerabilities in the packages that Jira data center depends on. This allows us to proactively address these vulnerabilities, thereby enhancing the security posture of our Jira instance.

## Performance Optimization

Understanding the dependencies of our Jira instance can help us identify potential bottlenecks or performance issues. For example, if a particular package that Jira depends on is known to have performance issues, we can look for alternatives or ways to optimize its usage.

## Cloud Migration Planning

When planning a migration to the cloud, it's important to understand the dependencies of our existing system. This analysis can help us identify which packages are actively being developed and maintained, and which ones might pose a risk in a cloud environment. This can inform our migration strategy and help ensure a smooth transition to the cloud and what known complexities we trade for unknown complexities with SaaS services. 

## Curiosity and Learning

Beyond the practical applications, this analysis can also satisfy our curiosity and desire to learn more about the inner workings of our Jira instance. By exploring the dependencies and associated CVEs, we can gain a deeper understanding of the system and how it operates.


![https://a9group.net/a9logo.png](https://a9group.net/a9logo.png)

In the case of Jira, the large number of dependencies means that changes to one component can potentially impact many others. This can make updates and bug fixes more challenging and time-consuming, leading to slower development cycles and increased maintenance costs.

In our analysis, we used Python and various data science tools to build and visualize Jira's dependency graph. This provided valuable insights into the software's complexity and highlighted areas for potential optimization.



In [None]:
# Define the base URL for the Maven repository
base_url = 'https://packages.atlassian.com/repository/public/'

In [None]:
import requests
from bs4 import BeautifulSoup
import networkx as nx



## Define POM location

In [None]:
# Define the URL of the Jira Core POM file
jira_core_pom_url = 'https://packages.atlassian.com/repository/public/com/atlassian/jira/jira-core/9.9.0-QR-20230330081425/jira-core-9.9.0-QR-20230330081425.pom'


In [None]:
def parse_pom(url):
    print(f'Parsing POM file at {url}')

    # Send a GET request to the URL
    response = requests.get(url)

    # Parse the response content with BeautifulSoup
    soup = BeautifulSoup(response.content, 'xml')

    # Extract the group ID, artifact ID, and version
    group_id = soup.find('groupId')
    artifact_id = soup.find('artifactId')
    version = soup.find('version')

    if group_id is not None:
        group_id = group_id.text
    else:
        print('Group ID not found')
    if artifact_id is not None:
        artifact_id = artifact_id.text
    else:
        print('Artifact ID not found')
    if version is not None:
        version = version.text
    else:
        print('Version not found')

    # Extract the dependencies
    dependencies = []
    for dependency in soup.find_all('dependency'):
        dep_group_id = dependency.find('groupId')
        dep_artifact_id = dependency.find('artifactId')
        dep_version = dependency.find('version')
        if dep_group_id is not None and dep_artifact_id is not None and dep_version is not None:
            dependencies.append((dep_group_id.text, dep_artifact_id.text, dep_version.text))

    # Return the parsed data
    data = {
        'group_id': group_id,
        'artifact_id': artifact_id,
        'version': version,
        'dependencies': dependencies
    }
    print(f'Parsed data: {data}')
    return data

In [None]:
def construct_pom_url(group_id, artifact_id, version):
    # Convert the group ID to a path
    group_path = group_id.replace('.', '/')

    # Construct the URL
    url = f'{base_url}/{group_path}/{artifact_id}/{version}/{artifact_id}-{version}.pom'
    print(url)
    return url



## Build a shallow graph - max_depth=3

In [None]:
def build_dependency_graph(url, graph=None, processed=None, depth=0, max_depth=3):
    # Initialize the graph and processed set if not provided
    if graph is None:
        graph = nx.DiGraph()
    if processed is None:
        processed = set()

    # Parse the POM file
    data = parse_pom(url)

    # Add the artifact to the graph and processed set
    artifact = (data['group_id'], data['artifact_id'], data['version'])
    graph.add_node(artifact)
    processed.add(artifact)

    # Recursively process the dependencies
    if depth < max_depth:
        for dep_group_id, dep_artifact_id, dep_version in data['dependencies']:
            # Skip dependencies that contain placeholders in their group ID, artifact ID, or version
            if '${' in dep_group_id or '${' in dep_artifact_id or '${' in dep_version:
                continue

            # Construct the URL for the dependency POM file
            dep_url = construct_pom_url(dep_group_id, dep_artifact_id, dep_version)

            # Add the dependency to the graph
            dependency = (dep_group_id, dep_artifact_id, dep_version)
            graph.add_edge(artifact, dependency)

            # Recursively process the dependency if it hasn't been processed yet
            if dependency not in processed:
                build_dependency_graph(dep_url, graph, processed, depth + 1, max_depth)

    return graph

In [None]:
# Build the dependency graph for the Jira API artifact
jira_core_graph = build_dependency_graph(jira_core_pom_url, max_depth=5)

# Print the number of nodes and edges in the graph
print(f'Number of nodes: {jira_core_graph.number_of_nodes()}')
print(f'Number of edges: {jira_core_graph.number_of_edges()}')

# Output the full GraphML file

find it in jira_core_graph.graphml


In [None]:
# Export the graph to a GraphML file
nx.write_graphml(jira_core_graph, 'jira_core_graph.graphml')

In [None]:
def construct_pom_url(group_id, artifact_id, version):
    # Replace dots in the group ID with slashes
    group_id_path = group_id.replace('.', '/')

    # Construct the URL
    url = f'https://packages.atlassian.com/repository/public/{group_id_path}/{artifact_id}/{version}/{artifact_id}-{version}.pom'

    return url

In [None]:
import matplotlib.colors as mcolors

def draw_graph_with_plotly(graph):
    # Get the positions of the nodes using the spring layout algorithm
    pos = nx.spring_layout(graph, dim=3, seed=42)

    # Create a trace for the nodes
    node_trace = go.Scatter3d(
        x=[pos[i][0] for i in graph.nodes()],
        y=[pos[i][1] for i in graph.nodes()],
        z=[pos[i][2] for i in graph.nodes()],
        mode='markers',
        marker=dict(size=6)
    )

    # Create a trace for each edge
    edge_traces = []
    for edge in graph.edges():
        x = np.array([pos[edge[0]][0], pos[edge[1]][0], None])
        y = np.array([pos[edge[0]][1], pos[edge[1]][1], None])
        z = np.array([pos[edge[0]][2], pos[edge[1]][2], None])
        color = '#' + mcolors.to_hex(np.random.rand(3)).replace('#', '')
        trace = go.Scatter3d(
            x=x,
            y=y,
            z=z,
            mode='lines',
            line=dict(color=color, width=2)
        )
        edge_traces.append(trace)

    # Create the layout
    layout = go.Layout(showlegend=False)

    # Create the figure and add the traces
    fig = go.Figure(data=[node_trace] + edge_traces, layout=layout)

    # Show the figure
    iplot(fig)



## Generate a shallow graph

In [None]:
from plotly.offline import iplot
import plotly.graph_objects as go


draw_graph_with_plotly(jira_core_graph)

## Write a [gpickle file](https://networkx.org/documentation/networkx-2.5/reference/readwrite/generated/networkx.readwrite.gpickle.read_gpickle.html)

In [None]:
import networkx as nx

# Save the full graph to a file
nx.write_gpickle(jira_core_graph, 'jira_core_graph.gpickle')

In [None]:
# Build the dependency graph for the Jira Core artifact with a higher recursion depth
jira_core_graph_small = build_dependency_graph(jira_core_pom_url, max_depth=10)

# Print the number of nodes and edges in the graph
print(f'Number of nodes: {jira_core_graph_small.number_of_nodes()}')
print(f'Number of edges: {jira_core_graph_small.number_of_edges()}')

In [None]:
def draw_graph_with_plotly(graph):
    # Get the positions of the nodes using the spring layout algorithm
    pos = nx.spring_layout(graph, dim=3, seed=42)

    # Create a trace for the nodes
    node_trace = go.Scatter3d(
        x=[pos[i][0] for i in graph.nodes()],
        y=[pos[i][1] for i in graph.nodes()],
        z=[pos[i][2] for i in graph.nodes()],
        mode='markers',
        marker=dict(size=6),
        text=[f'{i[0]}:{i[1]}:{i[2]}' for i in graph.nodes()],  # Add hover text
        hoverinfo='text'
    )

    # Create a trace for each edge
    edge_traces = []
    for edge in graph.edges():
        x = np.array([pos[edge[0]][0], pos[edge[1]][0], None])
        y = np.array([pos[edge[0]][1], pos[edge[1]][1], None])
        z = np.array([pos[edge[0]][2], pos[edge[1]][2], None])
        color = '#' + mcolors.to_hex(np.random.rand(3)).replace('#', '')
        trace = go.Scatter3d(
            x=x,
            y=y,
            z=z,
            mode='lines',
            line=dict(color=color, width=2)
        )
        edge_traces.append(trace)

    # Create the layout
    layout = go.Layout(
        showlegend=False,
        autosize=False,
        width=1000,
        height=800
    )

    # Create the figure and add the traces
    fig = go.Figure(data=[node_trace] + edge_traces, layout=layout)

    # Show the figure
    iplot(fig)

draw_graph_with_plotly(jira_core_graph_small)

# CVE Analysis


In [None]:
import requests
import json
import os
import gzip
import pandas as pd

# URLs for the recent and modified CVE data feeds
recent_feed_url = 'https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-recent.json.gz'
print(f"Downloading recent CVE feed from {cve_feed_url}...")
modified_feed_url = 'https://nvd.nist.gov/feeds/json/cve/1.1/nvdcve-1.1-modified.json.gz'
print(f"Downloading modified CVE feed from {modified_feed_url}...")

# Download the recent feed
recent_feed_response = requests.get(recent_feed_url)
with open('recent_cve_feed.json.gz', 'wb') as f:
    f.write(recent_feed_response.content)

# Download the modified feed
modified_feed_response = requests.get(modified_feed_url)
with open('modified_cve_feed.json.gz', 'wb') as f:
    f.write(modified_feed_response.content)

# Check if the files were downloaded successfully
os.listdir()

# Function to load CVE data from a gzipped JSON file
def load_cve_data(filename):
    with gzip.open(filename, 'rb') as f:
        cve_data = json.load(f)
    return cve_data

# Load the recent and modified CVE data
recent_cve_data = load_cve_data('recent_cve_feed.json.gz')
modified_cve_data = load_cve_data('modified_cve_feed.json.gz')

# Check the number of CVEs in each dataset
len(recent_cve_data['CVE_Items']), len(modified_cve_data['CVE_Items'])

# Function to extract CVE information from the data
def extract_cve_info(cve_data):
    cve_info = []
    for item in cve_data['CVE_Items']:
        cve_id = item['cve']['CVE_data_meta']['ID']
        description = item['cve']['description']['description_data'][0]['value']
        severity = item['impact']['baseMetricV3']['cvssV3']['baseSeverity'] if 'baseMetricV3' in item['impact'] else None
        affected_packages = [node['cpe_match'][0]['cpe23Uri'] for node in item['configurations']['nodes'] if 'cpe_match' in node and len(node['cpe_match']) > 0]
        cve_info.append((cve_id, description, severity, affected_packages))
    return pd.DataFrame(cve_info, columns=['CVE_ID', 'Description', 'Severity', 'Affected_Packages'])



# CVEs discovered in the reference POM:


In [None]:
# Extract CVE information from the recent and modified CVE data
recent_cve_info = extract_cve_info(recent_cve_data)
modified_cve_info = extract_cve_info(modified_cve_data)

# Combine the two dataframes
cve_info = pd.concat([recent_cve_info, modified_cve_info])

# Display the first few rows of the dataframe
cve_info.head()

# Extract the list of packages from the dependencies graph
packages = list(jira_core_graph.nodes)

# Filter the CVE data to include only the packages in the pom file
filtered_cve_info = cve_info[cve_info['Affected_Packages'].apply(lambda x: any(pkg in x for pkg in packages))]

# Sort the data by severity and disclosure date
sorted_cve_info = filtered_cve_info.sort_values(by=['Severity', 'CVE_ID'], ascending=[False, False])

# Display the sorted data
sorted_cve_info

# Unaffiliated CVEs:


In [None]:
# Filter the CVE data to include only the unaffiliated packages
unaffiliated_cve_info = cve_info[cve_info['Affected_Packages'].apply(lambda x: all(pkg not in x for pkg in packages))]

# Sort the data by severity and disclosure date
sorted_unaffiliated_cve_info = unaffiliated_cve_info.sort_values(by=['Severity', 'CVE_ID'], ascending=[False, False])

# Display the sorted data
sorted_unaffiliated_cve_info