### Week 15: Threat Modelling

#### Attack Vector
An attack vector is the path or method used by a threat actor to gain unauthorized access to a system or perform a malicious action. Applying STRIDE helps you identify potential attack vectors.

|System Component|Common Attack Vectors|
|:--|:--|
|Web Application|Cross-Site Scripting (XSS), SQL Injection, Parameter Tampering, Session Hijacking (often leading to Spoofing/Tampering).|
|Network|Man-in-the-Middle (MITM), Port Scanning, Denial of Service (DoS).|
|System/OS|Buffer Overflows, Privilege Escalation Exploits (leading to Elevation of Privilege).|

### Threat Modeling (STRIDE)
Threat Modeling is a structured, proactive process used to identify and prioritize security threats to a system and determine countermeasures. STRIDE is a mnemonic framework for categorizing these threats.

**The STRIDE Framework**
|Threat Category|Security Property Violated|Mitigation Focus|Definition & Example|
|:--|:--|:--|:--|
|Spoofing|Authentication|MFA, Strong Passwords, Identity Verification|Impersonating a user, system, or process.|
|Tampering|Integrity|Digital Signatures, Immutable Logging, Access Controls|Maliciously modifying data, code, or configuration.|
|Repudiation|Non-Repudiation|Audit Trails, Digital Signatures on Actions|Denying that a specific action took place due to lack of proof.|
|Information Disclosure|Confidentiality|Encryption (in transit and at rest), Authorization|Exposure of sensitive data to unauthorized individuals.|
|Denial of Service (DoS)|Availability|Throttling, Load Balancing, Resource Quotas|Preventing legitimate users from accessing the system.|
|Elevation of Privilege|Authorization|Least Privilege Principle, Role-Based Access Control (RBAC)|Gaining capabilities or access beyond what is intended.|


**The Threat Modeling Process**
1. Decompose the Application: Visualize the system, often using a Data Flow Diagram (DFD), to identify components (processes, data stores, external entities) and the flows between them.

2. Identify Trust Boundaries: Note where the level of trust changes (e.g., between the web browser and the web server, or the web server and the database). Threats are often found when crossing these boundaries.

3. Apply STRIDE: Systematically examine each component and data flow, asking how each of the six STRIDE threats could apply.

4. Determine Mitigations: Propose security controls to reduce the risk posed by the identified threats.

5. Review and Iterate: Threat modeling should be a continuous process, especially when the system's architecture changes.

---

## Application of STRIDE model on my recent project: Data Agnostic CNN

#### Defining the basic architecture and identifying trust boundaries of each component

Assuming that this will be deployed as a web app or a cloud-hosted ML app
- P = Process 
- D = Data Store

#### Defining the components:
1. Application : The tool or program that sends an image to the neural network and receives the classification results.
2. Web/API Server : Handles the requests, authentication, and manages the data flow between the client and user
3. ML Model Service : Trained model running as an endpoint (hosted in a cloud environment either in AWS Sagemaker or Azure ML)
4. Feature/Data Store : Storage that holds the sensitive information that is required to utilize the app.

| Component | Description | Trust Boundary |
|:--|:--|:--|
| App (P1) | Used by the client thus this runs outside our control | Internet or API gateway is the boundary to ensure that the app can be used |  
| API Server (P2) | Handles the routing, authentication, encryption, and decryption | Separate the service logic from the model code, as well as the network from the user if there are security issues or concerns |
| ML Model Service (P3) | Executes the PyTorch model and returns the predictions based on the clients' imput | Feature data separation from model service to ensure that either cannot be reverse engineered or poisoned | 
| Feature / Data Store (D1)| Stores logs and images | | 


#### STRIDE framework application on the components:

|Component / Flow|STRIDE Threat|Violated Property|Mitigation Strategy (For Portfolio)|
|:--|:--|:--|:--|
|P1 to P2 (API Call) | Information Disclosure (I) | Confidentiality | End to end encryption must be applied to encrypt all information that travels in the internet|
|P2 (API Server)| Denial of Service (D) |Availability| Rate limiting can be applied here to limit the requests per user or IP address to prevent both resource exhaustion as well as DoS attacks|
|P2 to P3 (Internal Call of API to ML Model Service)|Elevation of Privilege (E)|Authorization/Non-repudiation|Implement token-based authentication to ensure only the API server can call the model service and the model service only responds to API server, nothing else|
|P3: ML Model Service|Tampering (T) (Model)|Integrity|Model hashing and integrity checking to verify if the model's cryptographic has has been altered before inference to ensure that hte model's fine-tuned weights are not changed|
|P3 to D1 Flow (Logging)|Information Disclosure (I)|Confidentiality|Data masking or encryption at rest must be applied so that when the database stores images or sensitive metadata, the disk volume is encrypted thus there is no risk of leaking|
|P1: Client App|Spoofing (S)|Authentication|Authentication must be applied here. Multi-factor authentication will require the user to login with the correct credentials before accessing the app and API server subsequently.|

---

### Reflection: 

What are the benefits and challenges of using deep learning for computer vision tasks in a security context?

The application of deep learning within a security context offers significant benefits such as easier classication, faster anomaly detection, and visualization resulting to a faster triaging and response when dealing with security concerns.

Benefits of deep learning in security:
1. Automation and scale: Deep learning models excel at automating the classification of high-volume, high dimensional visual data which is more efficient than manual analysis. This allows the teams to handle massive data streams and be able to see patterns which allows them to classify the threat according to its family - this also allows them to respond faster as there are already some mitigation plans ready to deal with issues related to a certain threat family.
2. Feature extraction power: Deep learning models are able to learn from features and identify non-obvious patterns which humans can miss.
3. Adaptability: Models can be transferred through transfer learning, allowing for quick adaptation which cuts down on training time and lessens the need for big datasets.

Challenges: 
1. Adversarial attacks: These attacks can cause the model to misclassify making the model predictions inaccurate, allowing for a future possibility of attack as these threats are no longer classified as one.
2. Data poisoning and integrity: IF the training data has poisoned samples, these can produce vulnerabilities or backdoors in the model.
3. Explainability and trust: Since deep learning models are considered as black boxes, the decision making might be hard to understand as to why it classified a threat or not. This lack of interpretability will make the system less accountable and degbug failures may have an effect on every decision down the line.
4. Model theft and reverse engineering: A highly accurate model is an asset and once attackers steal the model weight or anything that can be used to reverse-engineer the model, then they can build the perfect defense-evading attack.