# Week 7

1. Objectives: Learn how statistical baselines are used to detect unusual network activity, a core principle in intrusion detection systems.

2. Read up on: Articles on "statistical anomaly detection" for network security.

3. Find a simple dataset of network traffic (e.g., number of login attempts per hour) and plot it to identify any unusual spikes.

4. Reflection:
How can a linear regression model's residuals (the difference between predicted and actual values) be seen as a form of anomaly detection? How does this relate to finding malicious activity in a system?.


### Statistical Baselines in Intrusion Detection Systems


Statistical baselines detect unusual network activity in Intrusion Detection Systems (IDS) by establishing "normal" network behavior through historical data analysis. 

This baseline profile, representing typical bandwidth, protocols, and traffic patterns, serves as a reference point. When real-time network data significantly deviates from this established norm, the IDS flags the deviation as an anomaly, triggering alerts for security teams to investigate potential intrusions or malicious activity. 

How it works:

1. Establish Normal Behavior (Baseline Creation):
 - The IDS collects and analyzes large amounts of historical network traffic data over a significant period.
 -  Statistical methods are used to identify patterns, characteristics, and typical ranges of network activity, such as bandwidth usage, protocols, and connection frequencies.
 - This analysis creates a statistical profile or "baseline" that defines what constitutes normal network behavior under various conditions.
2. Monitor Real-time Activity:
 - The IDS continuously monitors incoming network traffic and system activity.
3. Detect Deviations (Anomaly Detection):
 - Real-time data is compared against the established statistical baseline.
 - Algorithms like Z-score or time series analysis (e.g., ARIMA models) are used to identify outliers or significant deviations from the expected patterns.
4. Generate Alerts:
 - If a significant deviation is detected, indicating that current activity is outside the bounds of normal behavior, the IDS classifies it as a potential anomaly or intrusion.
 - The system then generates an alert, notifying security personnel to investigate the flagged activity. 

Example:
If a network baseline shows consistent, low traffic on a quiet Friday evening, and then a sudden, massive spike in traffic is observed, the statistical IDS would flag this as an unusual deviation, prompting an investigation for potential malicious activity. 

Articles on using statistical anomaly detection for network security.

- [Network Anomaly Detection: 
A Complete Guide](https://searchinform.com/articles/cybersecurity/measures/security-monitoring/network-anomaly-detection/#:~:text=The%20most%20foundational%20approach%20to,security%20team%20to%20investigate%20further.)

####  Notes:

An anomaly could simply be an unusual, but harmless, pattern, like a legitimate employee accessing data at an unusual time. An intrusion, on the other hand, usually involves a security breach that threatens the integrity of the system, often driven by malicious intent.

Understanding this distinction is crucial because not every anomaly signifies an intrusion. By focusing on detecting genuine threats, teams can ensure they are responding to actionable security issues without being overwhelmed by benign activities.

Use cases for Use Cases for Network Anomaly Detection

 - Fraud detection: Network anomaly detection can pinpoint irregular activities, such as unusual login times, data transfers, or system access patterns that could indicate fraudulent actions, particularly in industries like banking or e-commerce.
 - Insider threat detection: If an employee’s behavior suddenly changes—for example, accessing files they do not typically use—network anomaly detection systems can flag this and trigger an investigation.
- DDoS attack prevention: A sudden surge in network traffic might indicate a Distributed Denial of Service (DDoS) attack. Anomaly detection tools can identify these spikes and initiate a response to mitigate the attack before it overwhelms the system.

Types of Network Anomalies

1. Point Anomalies
     - A point anomaly occurs when a single data point or event deviates significantly from the expected range. These anomalies are often the most straightforward to detect because they stand out clearly from normal patterns. This outlier behavior might be flagged immediately, raising a red flag for potential misuse of access or even an external attacker using stolen credentials. But point anomalies aren’t always a sign of malicious activity.
2. Contextual Anomalies
     - Contextual anomalies are more nuanced. Rather than simply identifying isolated deviations from the norm, these anomalies consider the surrounding context of a behavior.
     - This type of anomaly detection can be a game-changer for spotting insider threats. A network administrator may typically have access to all files, but if they begin making unusual changes to user permissions or accessing highly sensitive data outside of their normal scope, this could signal an internal security breach. 
          - By examining the context of these actions—who is performing them, when they occur, and why—they can be flagged as suspicious, helping organizations identify threats that might otherwise go unnoticed.
     - The challenge here lies in defining what is "normal" in various contexts. Contextual anomaly detection requires deep integration into the network to understand the full range of user activities and detect patterns that truly signal potential risks. 
          - For instance, a financial analyst accessing payroll data at the end of a quarter might seem like a standard action, but the same behavior from a marketing intern would be highly suspicious.
3. Collective Anomalies
     - The third type of anomaly, collective anomalies, focuses on a group of data points or activities that together deviate from the expected pattern, even though each individual action might not raise suspicion. Collective anomalies are particularly important when identifying larger, more coordinated threats like Distributed Denial of Service (DDoS) attacks or data exfiltration campaigns. 
          -  For example, a sudden surge in network traffic might not look abnormal if viewed in isolation, but when analyzed across several devices or users in real time, it could reveal the early stages of a DDoS attack.
          - These types of anomalies are harder to detect because they often occur gradually or in bursts, making them less visible to traditional security tools that rely on static, point-by-point analysis. 
          - However, the advantage of detecting collective anomalies is that they often signal more severe, coordinated actions. A series of unusual logins to different systems within minutes of each other could indicate an attacker is trying to move laterally through the network to escalate their privileges.
     - One of the most significant benefits of collective anomaly detection is its ability to identify evolving threats that might have started as isolated events but are now building up into larger-scale attacks.
          - For example, a string of seemingly benign behavior—such as different users accessing the same file types at odd hours—could be part of a broader phishing scam or ransomware attack. 
     - By identifying these anomalies as a collective pattern, organizations can spot these evolving threats early and respond before damage is done.

Note, this will focus only on Statistical-Based Methods for Notes but I'll read up on the other items just in case.

#### Statistical-based Methods

The most foundational approach to detecting network anomalies involves statistical-based methods. These methods rely on the collection and analysis of historical data to establish what is considered "normal" network behavior. 

Once a baseline is established, any significant deviation from that baseline is flagged as an anomaly.\

Examples: 
- For instance, if a company’s typical network traffic is steady throughout the week, but a sudden spike is observed on a quiet Friday evening, the system will alert the security team to investigate further. 
- For example, consider an e-commerce platform that usually experiences 50 to 100 customer logins per minute. If the platform suddenly experiences 1,000 logins in a minute, this unusual surge would likely be flagged as an anomaly.

Statistical methods could identify this increase by measuring the standard deviation of login events, with any significant difference suggesting something outside the ordinary. 

However, while these methods are effective, they often come with limitations. 
- False positives can be common, especially in environments with fluctuating traffic or when dealing with rapidly evolving systems. 
- Adjusting thresholds and refining data models is a key part of overcoming this challenge and ensuring the system remains accurate.

Other Methods:
- Machine Learning-Based Methods - learn patterns from data and adapt over time.
- Rule-Based Methods - designed around predefined rules or conditions that indicate abnormal network behavior. These rules are often created by security experts who identify patterns known to be indicative of suspicious or malicious activity.
- Signature-Based Methods - Signature-based anomaly detection works by comparing network traffic or behavior to a database of known threats, much like an antivirus system checks files against a list of known viruses. 
    - When a match is found, the system flags it as malicious. This method is highly effective at detecting known attacks—such as specific malware strains, phishing attempts, or other well-documented threats.

### Tools and Technologies for Network Anomaly Detection

##### Intrusion Detection and Prevention Systems (IDS/IPS)

At the heart of network anomaly detection are Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS). These technologies are essential for identifying and stopping malicious activity before it can escalate into a full-blown breach.
- Intrusion Detection Systems (IDS) are designed to monitor network traffic for signs of malicious activity or policy violations. They analyze network traffic in real-time and compare it to known attack signatures or anomalous behavior patterns. When a potential threat is detected, IDS alerts security teams to investigate further.
- Intrusion Prevention Systems (IPS) go a step further by not just detecting threats but actively blocking them. If an IPS identifies suspicious activity, such as a DDoS attack or an unauthorized data transfer, it automatically takes action to mitigate the threat, such as blocking the source IP address or limiting access to specific parts of the network.

Both IDS and IPS are effective at detecting well-known threats, but the true power of these tools lies in how they work together to create an adaptive defense system capable of identifying a range of network anomalies.

##### Security Information and Event Management (SIEM)
These systems collect and aggregate security-related data from various sources—such as firewalls, servers, and IDS/IPS devices—into a central location. By correlating and analyzing this data, SIEM systems provide a comprehensive view of network activity, making it easier to identify anomalies that might otherwise go unnoticed. 

SIEM solutions excel at spotting unusual patterns that might indicate the early stages of a cyberattack. For instance, if a user accesses a large number of files in a short period—something not typical for their role—SIEM could flag the activity and raise an alert. With their advanced reporting and analysis capabilities, SIEM systems can help security teams pinpoint potential threats in real time, reducing the response time to incidents. 

A financial services company, for example, might use a SIEM system to analyze login patterns across its network. If an employee logs in from an unrecognized location at an unusual hour, the system can quickly flag this activity and trigger an automated response, such as requiring additional authentication or locking the account until further investigation is completed.


##### Network Traffic Analysis (NTA) Tools

These tools specialize in monitoring and analyzing network traffic, providing deep visibility into data flows and helping to detect anomalies that might be missed by other security systems. 

NTA tools can identify strange patterns of traffic, such as sudden increases in data transfer or unusual communication between devices, that may indicate a network breach. 

NTA is particularly useful in detecting advanced threats like lateral movement (where an attacker moves within the network) or data exfiltration (where sensitive data is transferred out of the network). 

For example, if an employee’s computer starts sending large amounts of sensitive data to an unfamiliar external IP address, an NTA system will flag this activity as suspicious. This gives security teams the ability to respond quickly, either by isolating the device or blocking external connections, before data is lost or stolen.

##### Open-Source vs. Commercial Solutions

Open-source tools are often free to use and can be highly customizable, making them an attractive option for smaller organizations or those with specialized needs. However, they may require more time and effort to configure, and lack the dedicated support that comes with commercial solutions.

Commercial solutions typically offer more robust features, support, and integration options, making them ideal for larger organizations with complex networks. These tools are often easier to deploy and come with built-in features for handling specific threats, making them a good choice for businesses that need a more turnkey solution.

For instance, an e-commerce business might choose an open-source IDS solution for basic monitoring if they have a small network, but opt for a commercial SIEM solution to handle complex security event correlation and incident response across a large-scale infrastructure.



##### Integrating Network Anomaly Detection Tools 

The effectiveness of network anomaly detection tools often depends on how well they integrate with one another. Combining IDS/IPS, SIEM, and NTA solutions can create a more comprehensive security infrastructure. 

For example, a SIEM system could correlate data from an NTA tool to flag anomalous traffic, and then pass the information to an IDS/IPS for deeper analysis or to block the threat. This integrated approach enhances threat detection capabilities, providing security teams with a more complete view of network activity. 

Moreover, by leveraging machine learning and other advanced techniques, these tools can become more proactive, detecting not just known threats but also novel attack patterns that might otherwise go unnoticed. 

While network anomaly detection tools continue to improve, there are still significant challenges that organizations must address. From handling false positives to managing large volumes of data, these obstacles can impact the effectiveness of security systems.

##### Challenges of Network Anomaly Detection

- **High False Positive Rates**: Too many false positive alerts can lead to alert fatigue, where security teams begin to ignore or rush through alerts, increasing the risk of a serious breach going unnoticed.
- **Evolving Attack Techniques**: To stay ahead of these constantly evolving tactics, network anomaly detection systems must become increasingly adaptive. This requires not just improving detection algorithms but also incorporating continuous learning and updates into the system to keep pace with new and emerging threats. It’s crucial for organizations to invest in solutions that can quickly integrate with threat intelligence feeds and other sources of real-time data, ensuring they can respond to new attack vectors as soon as they emerge.
- **Data Volume and Velocity**: The sheer volume and speed of data generated by modern networks make it increasingly difficult to monitor all network activity in real time. Today’s networks are more complex than ever, with a multitude of devices, users, and systems communicating at any given moment. The influx of data that needs to be analyzed for potential threats is staggering, and traditional methods of network anomaly detection often struggle to keep up.
- **Real-Time Processing Requirements**: Cyberattacks often escalate quickly, so detecting and responding to threats in real time is critical. The challenge, however, lies in the need for low-latency processing. As organizations implement more complex and comprehensive detection systems, the delay between detecting an anomaly and responding to it can increase. Even a few minutes of lag can allow an attacker to gain a foothold in the system or exfiltrate sensitive data.
- **Integrating with Existing Systems**: This integration can be particularly difficult when dealing with legacy systems that were not designed to work together. Without proper integration, security teams may find themselves overwhelmed by disjointed data sources or siloed alerts, which can create confusion and delays in response times.

##### Benefits of Implementing Network Anomaly Detection

1. **Improved Security Posture** - This early detection allows security teams to respond quickly, containing threats before they escalate. The ability to continuously monitor and detect unusual activity provides an added layer of defense. The result is a more robust security posture, one that can swiftly respond to both internal and external threats before they have the opportunity to cause significant harm.

2. **Reduced Risk of Data Breaches** - Data breaches are one of the most serious and costly security incidents a business can face. The financial and reputational damage caused by a breach can take years to recover from. By implementing network anomaly detection, organizations can dramatically reduce the risk of such breaches. This early intervention is key to reducing the potential impact of a breach. By identifying and responding to suspicious activity quickly, organizations can prevent large-scale data theft, minimize financial losses, and maintain customer trust.
3. **Faster Incident Response** - The faster an incident is addressed, the less damage it will cause to the organization. This ability to respond quickly and decisively minimizes the attack’s impact on operations, customer experience, and revenue. Moreover, it reduces the likelihood of additional attacks, as attackers may abandon efforts when they realize their attempts are being actively thwarted.
4. **Enhanced Compliance** - Many industries, including healthcare, finance, and retail, are subject to strict regulatory standards that require organizations to maintain robust security measures. Non-compliance can result in severe penalties, including fines, lawsuits, and loss of business.
    - Network anomaly detection plays a crucial role in helping organizations meet compliance requirements. By monitoring for unauthorized access and data transfers, these systems help ensure that sensitive information is being handled properly.
    - Beyond compliance, these tools also provide an audit trail, documenting when and how suspicious activity was detected and how it was addressed. This is invaluable during compliance audits, as it demonstrates a proactive approach to network security and helps ensure the organization meets all necessary requirements.
5. **Protecting Brand Reputation** - In today’s digital world, a company’s reputation is one of its most valuable assets. A single security breach can damage that reputation beyond repair, especially if it involves the loss of customer data or disruption to services. As customers become more aware of cybersecurity risks, they are increasingly selective about the businesses they trust with their personal information
    - Additionally, the transparency of early threat detection helps build trust with customers. By addressing security issues proactively and communicating that the company is actively working to prevent breaches, businesses can enhance their reputation as a secure and trustworthy provider.

6. **Continuous Improvement with Adaptive Learning** - One of the most exciting benefits of network anomaly detection is the ability of these systems to continuously improve over time. With advanced algorithms and adaptive learning models, the system learns from past events and fine-tunes its detection capabilities. This means the system becomes more accurate and less prone to false positives, while simultaneously improving its ability to detect novel threats. This continuous learning process ensures that the detection system remains relevant and effective, even as cyber threats evolve. To achieve these benefits, organizations need the right solutions in place.

### Reflection
How can a linear regression model's residuals (the difference between predicted and actual values) be seen as a form of anomaly detection? How does this relate to finding malicious activity in a system?.


Residuals can be used as a baseline when doing anomaly detection as the baseline will be uses as the basis on what is deemed normal or what will be flagged as an anomaly. When a certain event is beyond or below the baseline, the IDS or IPS can send an alert and handle this manually, allowing for increased detection speed and ensuring that it will be handled faster.

This can be seen as a form of anomaly detection as the residuals the distance between what is expected and what is actual. For example:
- The baseline set is a flat 6ms, indicating that there is no latency with the traffic or whatsoever. They can also add a threshold here based on what's acceptable considering the history that they know in order to reduce false positives.
- Suddenly, it becomes 440ms. This might mean that here is a significant amount of load right now (either a DDoS or a mix of other attacks) and they can easily spot this as this is far from the baseline. They can use IDS and IPS tools to check this out along with a SIEM tool to fully understand the cause. 
- The cybersecurity team can then block this event and understand the causes (post-mortem) to know if this is an attack or not.

There are times when an attack is simultaneous with the event detection which is why it is important to have a baseline to know whether or not the IPS can handle it or if needs further technical support.

This is related to finding malicious activity in a system in a way that when a baseline is set, anything that deviates from it will be easily seen. Allowing for faster flagging of issues and defending against attacks.

To further expound on how it is related to a linear regression model:
- Rsquared is the statistical baseline for risk meaning it uses the learned relationships between factors to forecast the target variable. The prediction here is the definition of normal risk.
- Residuals or the distance between the actual and predicted, will be the statistical alert. The further it is from the baseline might indicate a higher risk and vice versa. This is important to understand as the residuals here denote how far the actual deviates from the predicted value.
- Residuals can also help with unknown attacks like Zero Day attacks or highly evasive intrusions. If the actual value detected is high but the model's features are low or remains the same, this could signify that the activity is statistically inconsistent which is a red flag hinting that an event is an anomaly.

A model, therefore, acts as an anomaly detection system by mathematically isolating events that violate the learned patterns.