<a href="https://colab.research.google.com/github/wjtopp3/CSIT-2033/blob/main/CSIT_2033_Week_13.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Secure Programming with Python

# Secure Software Processes
1. Threat modeling  
2. Secure workflows  
3. Automated Security Testing




# 1. Threat Modeling

![ThreatZine](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day4-threatmodeling.png)

[7] Shostack Ch. 1
- **Threat modeling** is a way to think ahead about security‚Äîbefore your system is built or changed.
- Considers system design, components, and data flows to figure out vulnerabilities; ask questions like
  - What could go wrong?
  - Who might attack this system, and how?
  - What damage could they do?
  - What can we do to prevent or minimize that damage?
- The goal is to spot and fix security issues early, when it is cheaper and easier to do so.
- It's a key part of building secure, resilient systems

[1] Kohnfelder Ch. 2
- Security mindset means shifting from a builder‚Äôs perspective to an attacker‚Äôs view
- Threats include intentional attacks, accidents, bugs, hardware failures, and human error.
- A security mindset helps guide secure decision-making and can be adapted to fit available time and resources.
- Incremental improvements in threat identification and mitigation significantly strengthen security even if all vulnerabilities aren't found.
  - Can also reveal opportunities for non-security-related improvements e.g., system efficiencies and new features.


# Identifying Assets and Attack Surfaces
- Identify and prioritize software assets based on their value and sensitivity, e.g., applications and APIs (both internal and external), source code and configuration files, data stores (databases, user credentials)
- Avoid complex risk calculations; instead, use a simple strategy like the Agile "T-shirt size" system (Large, Medium, Small) to prioritize asset protection efforts
  - https://www.easyagile.com/blog/agile-estimation-techniques





# üõ†Ô∏è Hands-On: Asset Prioritization using T-Shirt Sizing
### Prioritize the Following Assets Using T-Shirt Sizing
Match each asset to its appropriate T-shirt size priority:  
(Choices: **Extra-Large**, **Large**, **Medium**, **Small**)

| Asset | Priority (Match) |
|:------------------------|:------------------|
| 1. Financial transaction records | ___ |
| 2. Internal system logs containing harmless details | ___ |
| 3. Customer personal information (e.g., location, identifiers) | ___ |
| 4. Client-side application code accessible to all users | ___ |
| 5. Private encryption keys used for secure communications | ___ |
| 6. Advertising data collected by a social media platform | ___ |

<details>
<summary>Click to reveal the Answer Key</summary>
<br>

| Asset | Correct Priority |
|:------|:-----------------|
| 1. Financial transaction records | Extra-Large |
| 2. Internal system logs containing harmless details | Small |
| 3. Customer personal information (e.g., location, identifiers) | Large |
| 4. Client-side application code accessible to all users | Small |
| 5. Private encryption keys used for secure communications | Extra-Large |
| 6. Advertising data collected by a social media platform | Medium |
</details>


## Group Similar Assets When Appropriate
- Group similar assets when appropriate for easier management, but separate if  risk profiles or usage contexts differ significantly.
  - Consider an organization which maintains the following assets:
    - Two internal HR web applications hosted on the same internal network.
    - A public-facing customer support portal accessible via the internet.
    - An internal payroll system that processes sensitive financial data.
<details>
<summary><strong>How would you group them? (click for suggestions)</strong></summary>
<ul>
  <li><strong>Group the two internal HR web applications:</strong>
    <ul>
      <li>Same network.</li>
      <li>Accessed by the same group of employees.</li>
      <li>Similar security controls and data sensitivity levels.</li>
    </ul>
  </li>
  <li><strong>Do not group the customer support portal with internal applications:</strong>
    <ul>
      <li>Exposed to the internet and has a larger attack surface.</li>
      <li>Subject to different risks, such as DDoS or credential stuffing.</li>
      <li>May have a separate set of compliance or logging requirements.</li>
    </ul>
  </li>
  <li><strong>Do not group the payroll system with the HR applications:</strong>
    <ul>
      <li>Handles more sensitive data (e.g., salaries, bank details).</li>
      <li>Requires stricter access controls and different audit requirements.</li>
    </ul>
  </li>
</ul>
</details>

- Always consider asset value from multiple perspectives ‚Äî including customers, attackers, and the organization itself ‚Äî to avoid underestimating potential risks.
- Minimize attack surfaces wherever possible, since they are the first points of entry for attackers; early blocking reduces the spread of attacks.
- Recognize that attack surfaces include both digital and physical exposures such as public network connections and device interfaces.

# Using Threat Modeling Frameworks
- [5] Olmstead Ch. 6
- A **threat modeling framework** provides a structured approach to identifying, evaluating, and addressing potential security threats in a system or application. These frameworks help teams anticipate how attackers might exploit vulnerabilities and guide the design of appropriate defenses before issues occur.
- Two common frameworks include **STRIDE** and **DREAD**
- The STRIDE model is a Microsoft framework for identifying and categorizing different security threats affecting a system.
  - Spoofing - an attacker pretending to be someone/something else, e.g., gaining unauthorized access with a valid user's credentials
  - Tampering - unauthorized modification of data, code, or system components, e.g., altering database contents, disrupting a system's regular operation
  - Repudiation - denying actions or events by a user or system entity making it hard to attribute responsibility, e.g., manipulating a log file to make it appear someone else did something
  - Information disclosure - exposing sensitive information to unauthorized individuals or systems, e.g., sharing confidential data, such as student grade or financial information, or personal health information
  - Denial of Service (DOS) - disrupt or degrade the availability of a system or its components, making them inaccessible to legitimate users, e.g., flooding a web server with requests so legitimate users can't access information
  - Elevation of privilege - an attacker gains a higher level of access or permission than authorized ones, e.g., a vulnerability allows a user to escalate from regular user to administrator
- The DREAD model uses a scale from 0 to 10 for each component, with a lower score being better
  - Damage - assesses the potential impact of a security vulnerability if it were to be exploited; 0 indicates no damage, 10 indicates catastrophic damage
  - Reproducibility - how easily can an attacker reproduce the conditions necessary to exploit a vulnerability; 0 means the vulnerability is difficult to impossible to reproduce, and 10 means it is effortless to reproduce
  - Exploitability - how easily an attacker can exploit a vulnerability, considering the complexity of the attack and skills required to carry it out
  - Affected users - assessing the number of users or systems that could be impacted if a vulnerability is exploited
  - Discoverability - how easy can a vulnerability be discovered by an attacker; 0 means difficult and 10 means straightforward

4. Threat Modeling tools
- [Microsoft Threat Modeling Tool](https://learn.microsoft.com/en-us/azure/security/develop/threat-modeling-tool)
- [IBM Gardium Vulnerability Assessment](https://www.ibm.com/products/guardium-vulnerability-assessment)
- [OWASP Threat Dragon](https://owasp.org/www-project-threat-dragon/)
- [OWASP pytm](https://owasp.org/www-project-pytm/) is a pythonic Framework for threat modeling


# üõ†Ô∏è Hands-On: Use pytm to Create a Threat Model
- The following code defines a minimal system architecture with **actors** (entities that interact with the system, such as users or external services), **boundaries** (logical or physical zones that separate trust levels, such as internal networks or the internet), and **dataflows** (paths through which data moves between components, indicating protocols and direction).
- The script then generates and renders a threat report using a custom Jinja2 template.
- [Jinja2](https://pypi.org/project/Jinja2/) is a Python templating engine that allows dynamic generation of text files (e.g., HTML, Markdown, reports) using placeholders and control structures.
  - Templates can include variables, loops, and conditionals, making it easy to separate presentation logic from application logic.
  - Commonly used in web development and report generation, Jinja2 integrates seamlessly with tools like Flask, Django, and custom CLI applications like pytm.

In [None]:
!pip install pytm Jinja2

In [None]:
%%writefile minimal_model.py
from pytm import TM, Server, Dataflow, Datastore, Actor, Boundary
from jinja2 import Environment, FileSystemLoader
import os

# STRIDE category inference is used below because built-in pytm threats
# often lack an explicit category. Keyword matching is used to approximate
# for reporting purposes. Some threats may still be unlabeled.

STRIDE_CATEGORIES = {
    # Spoofing
    "spoof": "Spoofing",
    "forging": "Spoofing",
    "impersonation": "Spoofing",
    "credential falsification": "Spoofing",
    "session hijacking": "Spoofing",
    "replay": "Spoofing",

    # Tampering
    "tamper": "Tampering",
    "manipulation": "Tampering",
    "injection": "Tampering",
    "sql": "Tampering",
    "command": "Tampering",
    "format string": "Tampering",
    "api manipulation": "Tampering",
    "overwriting": "Tampering",
    "overwrite": "Tampering",

    # Repudiation
    "repudiation": "Repudiation",
    "audit log manipulation": "Repudiation",
    "log tampering": "Repudiation",

    # Information Disclosure
    "leak": "Information Disclosure",
    "exfiltration": "Information Disclosure",
    "exposure": "Information Disclosure",
    "unprotected": "Information Disclosure",
    "disclosure": "Information Disclosure",
    "data leak": "Information Disclosure",
    "sensitive": "Information Disclosure",
    "sniffing": "Information Disclosure",

    # Denial of Service
    "flood": "Denial of Service",
    "dos": "Denial of Service",
    "denial": "Denial of Service",
    "overflow": "Denial of Service",
    "crash": "Denial of Service",
    "allocation": "Denial of Service",
    "ping of the death": "Denial of Service",
    "smuggling": "Denial of Service",
    "excessive": "Denial of Service",

    # Elevation of Privilege
    "privilege": "Elevation of Privilege",
    "escalation": "Elevation of Privilege",
    "bypass": "Elevation of Privilege",
    "unauthorized": "Elevation of Privilege",
    "elevation": "Elevation of Privilege",
    "root": "Elevation of Privilege",
    "admin": "Elevation of Privilege",
}

def infer_stride_category(threat):
    description = threat.description.lower()
    name = threat.__class__.__name__.lower()

    for keyword, stride in STRIDE_CATEGORIES.items():
        if keyword in description or keyword in name:
            return stride
    return "Uncategorized"

  # Create a new threat model instance
tm = TM("Minimal Threat Model")

# define a basic system architecture

# trust boundaries
internet = Boundary("Internet")
internal = Boundary("Internal Network")

# external actor
user = Actor("User")

# key components
web_server = Server("Web Server", boundary=internet)
db = Datastore("Database", boundary=internal)

# data flows
Dataflow(user, web_server, "User sends credentials", protocol="HTTPS")
Dataflow(web_server, db, "Web server queries user info", protocol="SQL")

# Process the model to populate threats
tm.process()

# Access the elements
all_elements = list(tm._elements)
dataflows = [e for e in all_elements if isinstance(e, Dataflow)]
components = [e for e in all_elements if not isinstance(e, Dataflow)]
threats = list(tm._threats)

# Infer STRIDE category for each threat
for threat in threats:
    threat.category = infer_stride_category(threat)

# Set up Jinja2 templating
env = Environment(loader=FileSystemLoader(searchpath=os.path.dirname(__file__)))
template = env.get_template("custom_report.jinja2")
output = template.render(tm=tm, elements=components, dataflows=dataflows, threats=threats)

print(output)



In [None]:
%%writefile custom_report.jinja2
{# create custom template #}
Threat Model: {{ tm.name }}
=========================

Components:
{% for element in elements %}
- {{ element.name }} ({{ element.__class__.__name__ }})
{% endfor %}

Data Flows:
{% for df in dataflows %}
- {{ df.name }}: {{ df.source.name }} ‚Üí {{ df.sink.name }} via {{ df.protocol }}
{% endfor %}

Threats:
Threats:
{% for threat in threats %}
- **{{ threat.target.name if threat.target else "General Threat" }}**:
  - **Type:** {{ threat.__class__.__name__.replace('_', ' ') }}
  - **STRIDE:** {{ threat.category if threat.category else "Uncategorized" }}
  - **Description:** {{ threat.description }}
{% endfor %}


In [None]:
# The threat list covers common attack types like injection, spoofing, and
# session hijacking, as identified by pytm's built-in threat modeling logic.
!python3 minimal_model.py

# 2. Secure Workflows


# Secure Development Workflows
- A **Secure Development Workflow** integrates security practices throughout the entire software development lifecycle, from initial planning to deployment and maintenance.
- These workflows ensure that security is not treated as an afterthought, but as a core component of every phase of development.
- Key practices include threat modeling during design, secure coding standards during implementation, static and dynamic analysis during testing, and secure configuration management during deployment.
- Secure workflows also emphasize the use of version control, peer reviews, and automated CI/CD pipelines to reduce the risk of introducing vulnerabilities and to detect issues early.


# Integration in the SDLC
![SecureAgile](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day4-secureagilesdlc.png)
- By embedding security controls directly into development processes, teams can more effectively manage risks without slowing down delivery.
- Secure development workflows promote collaboration between developers, security professionals, and operations teams, following methodologies such as DevSecOps.
- These workflows often include automated tools for code scanning, dependency checking, and infrastructure validation, allowing security to scale with development **velocity** (the speed and efficiency of delivering code changes to production).
- Secure development workflows lead to more resilient applications, reduced remediation costs, and greater compliance with regulatory and industry standards.

# Integrating Security into CI/CD Pipelines
- **CI/CD** (Continuous Integration and Continuous Deployment) refers to the practice of regularly merging code changes with automated builds and tests to catch issues early (CI) and automatically releasing validated changes to production or staging environments (CD).
- Integrating automated security tests into CI/CD pipeline ensures that vulnerabilities are detected and addressed early‚Äîduring code commits, builds, and deployments, rather than after release.
- Common integrations include static application security testing (SAST), dependency scanning, secret detection, and configuration validation tools that run automatically as part of the pipeline.
- By **shifting security left** and making it a routine part of development workflows, teams can reduce risk without compromising development velocity and maintain a consistent security baseline across all code changes.  
![ShiftLeft](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day4-devsecshiftleft.png)
- As a practical example, developers can integrate tools like Bandit into their GitHub CI workflows to automatically detect common Python security issues during each code push.

# üõ†Ô∏è Hands-On: Integrate Bandit into a GitHub Actions Workflow

- In this hands-on we will add a GitHub workflow to our python-demo-project repository from Day 1 to demonstrate a secure development workflow into our project.

1. Add the following file to your repository as **.github/workflows/bandit.yml**
---

```
name: Security Scan

on: [push]

jobs:
  bandit-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.10'
      - name: Install Bandit
        run: pip install bandit
      - name: Run Bandit
        run: bandit -r .
```

- After committing the file, click on the "Actions" menu from your repository's home page.
- Bandit is configured to return a non-zero status for any vulnerabilities found, even those with moderate severity.
- Our initial Python code made a call to **requests.get** with no timeout, which is viewed as a moderate vulnerability (you may recall this as a threat mitigation guideline provided from Day 2 for securing APIs).
- We can modify the execution to supress the lower sev issues as follows:

```
    bandit -r . --severity-level high
```
  - or we can fix the problem; add a timeout to the call.
  - This is a good way to retest our action and verify our security test passes; the script will run when we commit the following change:

- Edit the main.py file and change

```
    response = requests.get("https://www.example.com")
```
- to

```
    response = requests.get("https://www.example.com", timeout=5)
```

- then commit your change and view the Actions results again (it may be in a pending status for awhile, since we use free GitHub we aren't usually first in line).

# Secure Design Reviews and Approval Processes
![CodeReview](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day4-codereviews.png)
- Vulnerabilities and insecure code should be identified before deployment.
- This requires systematic inspection of source code by one or more qualified reviewers looking for issues such as
  - improper input validation
  - insecure cryptographic use
  - injection flaws
  - logic errors
- Code reviews not only help find bugs but also encourage developers to follow
 secure coding standards and best practices (e.g., [OWASP](https://owasp.org/www-project-top-ten/) and [CERT](https://wiki.sei.cmu.edu/confluence/display/seccode/SEI+CERT+Coding+Standards)).
 - OWASP Guidelines for code reviews can be found [here](https://owasp.org/www-project-code-review-guide/assets/OWASP_Code_Review_Guide_v2.pdf).

## Effective Security Design Reviews
- References
- [Microsoft](https://www.microsoft.com/en-us/securityengineering/sdl/practices)
- [OWASP](https://owasp.org/www-project-application-security-verification-standard/)
- [NIST](https://csrc.nist.gov/publications/detail/sp/800-218/final)
- [GitHub](https://github.com/google/eng-practices/blob/master/review/index.md)
- An effective secure review combines manual inspection with automated tools
  - Manual reviews allow human reviewers to spot complex logic flaws and subtle security issues that scanners might miss
  - Automated static analysis tools can efficiently catch repetitive patterns, outdated libraries, or known vulnerabilities across large codebases.
    - Integrating these tools into a CI/CD pipeline ensures that each pull request or code commit is scanned early, preventing security regressions and helping teams maintain a strong security posture throughout the development cycle.

## Secure Design Review Checklist
  1. Understand the System Context
    - Have all components, data flows, and trust boundaries been identified?
    - Has a threat model (e.g., STRIDE or DREAD) been developed for the system?
    - Are all third-party services, libraries, and APIs documented?
  2. Authentication & Authorization
    - Does the system enforce strong, secure user authentication?
    - Are authentication credentials securely stored (e.g., hashed and salted passwords)?
    - Is access control enforced at all critical entry points?
    - Are role-based or attribute-based access control models clearly defined?
  3. Data Protection & Privacy
    - Is sensitive data (PII, credentials, tokens) encrypted in transit (TLS) and at rest?
    - Are proper cryptographic algorithms and key lengths selected?
    - Is key management handled securely and separately from application logic?
    - Are data retention and deletion policies aligned with privacy requirements?
  4. Input Validation & Output Encoding
    - Is all user input validated, sanitized, and length-limited?
    - Are appropriate output encoding mechanisms in place to prevent injection attacks (e.g., XSS, SQLi)?
    - Are dangerous file uploads, redirects, or deserialization scenarios accounted for?
  5. Error Handling & Logging
    - Are errors logged in a secure, centralized location without exposing sensitive details?
    - Do error messages avoid revealing internal implementation details to users?
    - Are logs protected from tampering and accessible only to authorized users?
  6. Secure Communications
    - Is TLS enforced for all client-server and service-to-service communication?
    - Are certificates validated, and is certificate pinning considered for critical systems?
    - Are insecure protocols (e.g., HTTP, FTP) avoided?
  7. Dependency & Environment Security
    - Are third-party libraries and dependencies tracked and regularly scanned for vulnerabilities (e.g., via SBOM or SCA tools)?
    - Is the build and deployment environment hardened against supply chain attacks?
    - Are secrets managed securely (e.g., not hardcoded or in source control)?
  8. Secure Defaults & Fail-Safe Design
    - Does the system follow the principle of least privilege by default?
    - Are security controls opt-out rather than opt-in?
    - Does the system fail securely (e.g., deny access by default when uncertain)?
  9. Resilience & Threat Mitigation
    - Are rate limiting, CAPTCHA, or other bot defenses implemented where needed?
    - Is the system protected against common attacks (e.g., replay attacks, CSRF, DoS)?
    - Are security headers (e.g., CSP, HSTS, X-Frame-Options) considered for web apps?
  10. Review & Documentation
    - Has the design been reviewed by at least one independent security reviewer?
    - Are security assumptions, decisions, and mitigations documented?
    - Are plans in place for ongoing threat monitoring and incident response?

- Approval processes further reinforce security by requiring that code cannot be merged into the main branch without passing defined **security gates**.
- These gates include
  - successful automated tests
  - static analysis results
  - formal sign-off from security-trained reviewers.

- Role-based access control (RBAC) within source control systems ensures that only authorized individuals can approve or deploy changes.
- Since a merge can be blocked when a security gate failure occurs, feedback should always be provided which helps the developer(s) resolve the issue(s) and learn from the experience.

# Complete Day 4 Exercise 1 Here
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FSCJ-FacultyDev/SWC-Columbus-2025/blob/main/exercises/Day4Exercise1_SecureDesignReview.ipynb)


# Incident Response
- Operational processes must be able to withstand, respond to, and recover from security incidents without compromising data integrity or business continuity
- Secure workflows are designed not only to prevent unauthorized actions but also to remain resilient under attack or failure.
- By integrating incident response into the lifecycle, organizations ensure that even if a breach or disruption occurs, there are predefined procedures in place to contain the threat, minimize impact, and restore secure operations.
- This reinforces both trust and continuity in systems that handle sensitive or mission-critical activities.

# Workflow Recovery
- Following an incident, ***workflow recovery*** is an integral part of a ***Business Continuity Plan (BCP)*** to restore affected business functions and digital services following an incident.
- This includes the recovery of applications, user access, and dependent systems in accordance with defined recovery time objectives (RTOs) and recovery point objectives (RPOs).
- Workflow recovery plans may involve failover systems, backups, and automated deployment scripts to rebuild environments efficiently.
- Coordination between IT, development, and security teams is essential to ensure continuity and reduce downtime.
- Integrating workflow recovery with incident response ensures not only that threats are neutralized, but that services are brought back online in a secure and controlled manner.

# 3. Automating Security Testing
- Integrating automated security tests into CI/CD Pipelines ensures that vulnerabilities are caught early in the development lifecycle, reducing the risk of deploying insecure code.
- Using automated tools for static application security testing (SAST), dependency scanning, and secret detection within build and deployment workflows helps enforce security policies without delaying delivery.
- This also helps developers receive immediate feedback when insecure code or libraries are introduced, allowing issues to be resolved before reaching production.
- Security gates in CI/CD pipelines can also be configured to block deployments if critical findings are detected, reinforcing a shift-left security strategy.

# Security Test Coverage and Prioritization
- The most critical parts of an application (such as authentication logic, data processing, and external interfaces) must be thoroughly tested for vulnerabilities.
- Since testing every line of code equally is often impractical, prioritization helps focus security efforts on high-risk areas that handle sensitive data or have a history of exploitation.
- Effective coverage (the extent to which your security tests examine critical code paths, inputs, and features) includes a mix of static and dynamic analysis, dependency checks, and manual reviews for complex logic.
- Mapping tests to known threat models or CWE categories can help guide where deeper scrutiny is needed, making security testing more efficient and impactful across the development lifecycle.

# Managing False Positives
- False positives in security test findings must be managed to distinguish real vulnerabilities from incorrect alerts.
- This is essential to maintaining trust in automated security tools and avoiding wasted developer effort.
- When tools produce too many irrelevant warnings, teams may start ignoring results altogether, missing real threats in the process.
- Prioritizing findings based on severity, exploitability, and impact helps filter meaningful issues from noise.
- Integrating results into developer workflows with clear remediation guidance also improves response time and reduces frustration.
- Regular tuning of security tools and rulesets is necessary to adapt to evolving codebases and reduce alert fatigue.










# Prioritizing Findings
- By considering not just severity but also **exploitability** (how easy it is to take advantage of the issue) and **impact** (what harm it can cause), developers can focus on what truly needs immediate action and avoid wasting time on theoretical or low-risk findings.
## Examples
### CVE-2022-12345 ‚Äì SQL Injection in Login Endpoint
Severity: High  
Exploitability: Easy (public exploit available)  
Impact: Allows account takeover  
Priority: Critical ‚Äî Fix Immediately
### Hardcoded test credentials found in test_config.py
Severity: Medium  
Exploitability: Low (file not deployed in production)  
Impact: No direct production risk  
Priority: Low ‚Äî Address later or exclude from scan scope
### Outdated jQuery version detected
Severity: Medium  
Exploitability: Medium (theoretical exploit)  
Impact: Potential XSS on legacy admin tools  
Priority: Medium ‚Äî Plan patch in next sprint  
### Missing HttpOnly flag on session cookie
Severity: High  
Exploitability: Moderate  
Impact: Increases XSS impact  
Priority: High ‚Äî Patch in current release
### Unused dependency xmltodict with known DoS vulnerability
Severity: High  
Exploitability: Low (not imported anywhere)  
Impact: Minimal unless activated  
Priority: Low ‚Äî Remove when cleaning dependencies

# SBOMs
- A Software Bill of Materials (SBOM) is a detailed inventory of all components, libraries, and dependencies used by a software application.  
- It provides a comprehensive record which lists open-source, proprietary, and third-party components.  
- It contains component metadata, including version numbers, licenses, and source information.  
- SBOMs promote visibility into the software supply chain and are used in conjunction with scanning tools to identify components with known security issues
- Popular SBOM generators include [Trivy](https://trivy.dev/latest/), [CycloneDX](https://cyclonedx.org/), [SPDX](https://spdx.dev/), [OWASP Dependency-Track](https://dependencytrack.org/), [Syft](https://www.cisa.gov/resources-tools/services/syft), [Anchore](https://anchore.com/), and [FOSSA](https://fossa.com/).
- SBOM scans are typically run as part of automated CI/CD workflows to verify:
  - Known vulnerabilities in dependencies
  - License compliance and component provenance
  - Tampering or unauthorized components in build artifacts
- SBOM data is cross-referenced with vulnerability databases (e.g., CVE, National Vulnerability Database, Aqua Vulnerability Database, OSS Index, GitHub Advisory Database, Snyk Vulnerability Database) to identify known issues
- Languages other than Python are also vulnerable, e.g. JavaScript/Node.js (npm), Java (Maven Central), and others



# üõ†Ô∏è Hands-On: Run an SBOM check

In [None]:
!pip freeze >requirements.txt
!echo 'showing line count for dependencies:'
!wc -l requirements.txt

In [None]:
!sudo apt-get install wget apt-transport-https gnupg lsb-release
!wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
!echo deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main | sudo tee -a /etc/apt/sources.list.d/trivy.list
!sudo apt-get update
!sudo apt-get install trivy

In [None]:
!pip install cyclonedx-bom
!python3 -m cyclonedx_py requirements -i requirements.txt -o sbom.json
!trivy sbom sbom.json

## Results
- Environment scanned: Python packages (via SBOM from requirements.txt)
- Total Vulnerabilities Found: 10
  - High severity: 5
  - Medium: 3
  - Low: 2
  - Critical: 0
- Warnings: Trivy warns that SBOMs generated by third-party tools (like cyclonedx-bom) may lead to incomplete or imprecise matching, but this report still picked up valid CVEs based on package name and version, so the findings are informative and should not be ignored.
- Each row in the table tells you:
  - Library: The affected package
  - Vulnerability: CVE ID with severity (e.g. CVE-2022-40023)
  - Installed Version: The version in your Colab environment
  - Fixed Version: The version where the issue is patched
  - Title + Link: A brief vulnerability description and a link for more info
- Examples:
  - High Severity
    - Mako 1.1.3 ‚Üí vulnerable to CVE-2022-40023 (Regular Expression DoS)
      - Fixed in 1.2.2
    - keras 3.8.0 ‚Üí vulnerable to CVE-2025-1550
      - Fixed in 3.9.0
    - jupyter-server has multiple high and medium CVEs
  - cryptography 43.0.3 has a known LOW severity issue ‚Äî fixed in 44.0.1
- Should You Be Concerned?
  - Yes, especially for HIGH severity vulnerabilities in actively used libraries like keras (RCE risk), jupyter-server (user hash disclosure, redirection, etc.), and Mako (REDos).
  - These could impact the confidentiality, integrity, or availability of systems if exposed to malicious input ‚Äî particularly in multi-user/shared environments like Jupyter notebooks or APIs.
- What You Should Do
  - Upgrade the packages: use pip install --upgrade <package> or pin higher versions in requirements.txt
  - Avoid vulnerable versions when building distributable apps or APIs.

# Integrating a Scan into a GitHub Action
### Sample YAML file; store in .github/workflows

```
name: Trivy Dependency Scan

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

jobs:
  trivy-scan:
    name: Scan Python dependencies with Trivy
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v3

      - name: Install Trivy
        run: |
          sudo apt-get update
          sudo apt-get install -y wget
          wget https://github.com/aquasecurity/trivy/releases/latest/download/trivy_0.48.4_Linux-64bit.deb
          sudo dpkg -i trivy_0.48.4_Linux-64bit.deb

      - name: Scan project directory for vulnerabilities
        run: trivy fs --exit-code 1 --severity CRITICAL,HIGH .

      # Optional: Save Trivy scan report as an artifact
      - name: Save Trivy scan report
        run: trivy fs --severity CRITICAL,HIGH --format table --output trivy-report.txt .
      
      - name: Upload report
        uses: actions/upload-artifact@v3
        with:
          name: trivy-report
          path: trivy-report.txt
```