# Lesson 3: Information Extraction from Legal Documents Using spaCy

Here's the content converted to Markdown format:

### Lesson Overview

Welcome! In this lesson, we will explore the application of spaCy's Natural Language Processing (NLP) capabilities for extracting information from legal documents through Named Entity Recognition (NER). By the end of this lesson, you will gain the skills to identify and classify various entities within legal texts, transforming them into structured and actionable data. This information is essential in real-world applications like contract management, compliance monitoring, and automated legal analysis.

### NER Applications to Legal Documents

Named Entity Recognition (NER) is a transformative tool for handling legal documents by effectively identifying and categorizing key information within text. Here's how NER enhances the processing of legal documents:

- **Structuring Unstructured Legal Data**: Legal documents contain crucial information often scattered in unstructured formats. NER can pinpoint and organize essential data points, such as party names and significant dates, streamlining legal processes.

- **Enhancing Automation and Efficiency**: By automating the extraction of relevant information, NER significantly reduces manual workload, improving the speed and accuracy of legal document handling.

- **Supporting Compliance and Risk Management**: NER is instrumental in detecting contractual clauses pertinent to compliance and risk, facilitating due diligence and minimizing potential risks.

In practical terms, NER in legal contexts enables the extraction of specific clauses, identification of involved parties, and isolation of critical dates—factors that are of immense value in legal tech, compliance, and financial services sectors.

### Loading the English Model

To get started, let's load the `en_core_web_sm` model, a popular choice for basic NER tasks. This model incorporates statistical models trained on the English language.

import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

The `en_core_web_sm` model includes components for tokenization and NER, enabling us to identify various entities such as PERSON, ORG, DATE, and more. Now that we have our model ready, let's apply it to process a sample legal document and see NER in action.

### Implementing Information Extraction With spaCy

We'll commence by processing a sample legal document using spaCy. This will involve extracting entities from a piece of text representative of typical legal documents, which often include elements like dates and addresses.

# Sample legal document text
legal_text = """
    This Agreement is made on the 5th day of April, 2021, between John Doe, residing at 123 Elm Street, Springfield,
    and Acme Corporation, having an office at 456 Maple Avenue, Springfield. Both parties agree as follows.
    In the event of a dispute between the parties, the dispute shall be resolved in the Springfield Court.
"""

By processing this text with spaCy, we can create a doc object that will allow us to perform Named Entity Recognition on the content.

### Processing the Text

Let's pass the `legal_text` through the spaCy pipeline to obtain a doc object. This object contains the annotations generated by the NLP pipeline, including tokens and recognized entities.

# Process the text with spaCy
doc = nlp(legal_text)

Now that we have our doc object, it's time to take a closer look at the entities it contains.

### Extracting and Analyzing Entities

We can extract and analyze the entities detected in the doc object by iterating through them. For each entity, we'll print out its text and label, which indicates its type:

# Extract entities
for ent in doc.ents:
    print(f"{ent.text} - {ent.label_}")

Running this code would yield the following output:

```
the 5th day of April, 2021 - DATE
John Doe - PERSON
123 - CARDINAL
Elm Street - FAC
Springfield - GPE
Acme Corporation - ORG
456 Maple Avenue - FAC
Springfield - GPE
the Springfield Court - ORG
```

This output highlights spaCy's capability to recognize different types of entities within a text — a critical step for our legal document analysis. As you can see, entities like "Elm Street" and "456 Maple Avenue" are detected as facilities (FAC), while names like "John Doe" and "Acme Corporation" are accurately identified as a person and organization, respectively. However, note that "123" is incorrectly labeled as CARDINAL, showing some limitations in entity recognition.

### Real-Life Applications

The techniques demonstrated here find strong applications beyond this lesson:

- **Contract Analysis**: Efficiently assess agreements for key terms and parties involved.
- **Compliance Monitoring**: Automatically identify compliance-related clauses or deadlines.
- **Risk Management**: Detect potential risks, such as dispute resolution clauses, without reading entire contracts.

As you can see, these applications have the potential to save time and resources, which can significantly benefit legal and business operations.

### Lesson Summary and Practice

In this lesson, you have learned how to use spaCy for extracting entities from legal documents through NER. We reviewed the setup of spaCy, processed legal texts using the spaCy pipeline, and extracted structured information. This practical knowledge equips you with tools for handling various NLP tasks. Engaging in upcoming practice exercises will strengthen your skills in extracting and utilizing key entities in real-world scenarios, enhancing both efficiency and decision-making in legal domains.

Let's proceed to reinforce your understanding with some hands-on applications!

## Extract Key Entities from Text

To extract and print the detected entities from the `doc` object, you can add the following code to iterate through the entities. Here’s the complete code with the necessary additions:

```python
import spacy

nlp = spacy.load("en_core_web_sm")

legal_text = """
    This Agreement is made on the 5th day of April, 2021, between John Doe, residing at 123 Elm Street, Springfield,
    and Acme Corporation, having an office at 456 Maple Avenue, Springfield. Both parties agree as follows.
    In the event of a dispute between the parties, the dispute shall be resolved in the Springfield Court.
    """

doc = nlp(legal_text)

# Iterate through the entities detected in the doc object and print each entity's text and associated label
for ent in doc.ents:
    print(f"{ent.text} - {ent.label_}")
```

### Explanation:
- The loop `for ent in doc.ents:` iterates through each entity detected in the `doc` object.
- The `print(f"{ent.text} - {ent.label_}")` statement outputs the text of the entity along with its associated label, indicating the type of entity (e.g., PERSON, DATE, GPE, etc.).

When you run this code, it will display the recognized entities from the legal text along with their labels.


To extract and print the detected entities from the `doc` object, you can add the following code to iterate through the entities. Here’s the complete code with the necessary additions:

```python
import spacy

nlp = spacy.load("en_core_web_sm")

legal_text = """
    This Agreement is made on the 5th day of April, 2021, between John Doe, residing at 123 Elm Street, Springfield,
    and Acme Corporation, having an office at 456 Maple Avenue, Springfield. Both parties agree as follows.
    In the event of a dispute between the parties, the dispute shall be resolved in the Springfield Court.
    """

doc = nlp(legal_text)

# Iterate through the entities detected in the doc object and print each entity's text and associated label
for ent in doc.ents:
    print(f"{ent.text} - {ent.label_}")
```

### Explanation:
- The loop `for ent in doc.ents:` iterates through each entity detected in the `doc` object.
- The `print(f"{ent.text} - {ent.label_}")` statement outputs the text of the entity along with its associated label, indicating the type of entity (e.g., PERSON, DATE, GPE, etc.).

When you run this code, it will display the recognized entities from the legal text along with their labels.

## Filter Entities With spaCy

Now, let's slightly alter the existing code!

Your task is to print only the entities labeled as ORG (organizations) or PERSON (people). Implement a conditional statement in the entity iteration to achieve this.

After adjusting the code, run it, and observe the changes in the printed entities.

```python
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Sample legal document text
legal_text = """
    This Contract is entered into on the 10th of July, 2022, by and between Jane Smith, a resident of 789 Oak Lane, Metropolis,
    and WidgetCo LLC, with principal offices at 101 Pine Street, Metropolis. The parties hereby agree to the following terms.
    Should any conflict arise, it will be settled in the Metropolis District Court.
    """

# Process the text with spaCy
doc = nlp(legal_text)

# TODO: Change the existing code to print only entities labeled as ORG or PERSON
for ent in doc.ents:
    print(f"{ent.text} - {ent.label_}")

```

To modify the existing code so that it only prints entities labeled as ORG (organizations) or PERSON (people), you can add a conditional statement within the entity iteration. Here’s the updated code:

```python
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Sample legal document text
legal_text = """
    This Contract is entered into on the 10th of July, 2022, by and between Jane Smith, a resident of 789 Oak Lane, Metropolis,
    and WidgetCo LLC, with principal offices at 101 Pine Street, Metropolis. The parties hereby agree to the following terms.
    Should any conflict arise, it will be settled in the Metropolis District Court.
    """

# Process the text with spaCy
doc = nlp(legal_text)

# Print only entities labeled as ORG or PERSON
for ent in doc.ents:
    if ent.label_ in ["ORG", "PERSON"]:  # Check if the entity is an organization or a person
        print(f"{ent.text} - {ent.label_}")
```

### Explanation:
- The line `if ent.label_ in ["ORG", "PERSON"]:` checks if the label of the entity is either "ORG" or "PERSON".
- Only entities that meet this condition will be printed.

When you run this code, it will display only the recognized organizations and people from the legal text.

Time to put all that you've learned to use in order to successfully count the occurrences of different entities in a legal document!

Your mission is to complete the following steps:

Create an empty dictionary named entity_counts to store the count of each entity type.
Within the iteration over each entity detected in the document, check if the entity label (ent.label_) is already a key in the entity_counts dictionary.
If the label is not present, add it as a new key with a starting value of 1.
If the label is already in the dictionary, increase its value by 1.
Finally, loop over the items in entity_counts and print out each entity type (label) along with its corresponding count.
Let's conclude this session with your best effort!

```python

import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Sample legal document text
legal_text = """
    On January 5, 2024, a partnership agreement was formalized between Alice Johnson, residing at 321 Cedar Boulevard, Star City,
    and Tech Innovators Inc., with headquarters at 789 Birch Road, Star City. It is agreed that any legal matters will be adjudicated 
    within the jurisdiction of the Star City Superior Court.
    """

# Process the text with spaCy
doc = nlp(legal_text)

# TODO: Create an empty dictionary named 'entity_counts' that will store the count of each entity type

# Iterate through the entities detected in the doc object
for ent in doc.ents:
    # TODO: Check if the entity label (ent.label_) is already a key in the 'entity_counts' dictionary
    # - If the label is not yet in the dictionary, add it as a new key with a starting value of 1
    # - If the label is already in the dictionary (it is a key), increase its value by 1

# TODO: Loop over the items in 'entity_counts' and print out each entity type (label) and its corresponding count
```

Here's how you can complete the task of counting the occurrences of different entities in the legal document using the provided Python code with spaCy. I've filled in the TODO sections for you:

```python
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Sample legal document text
legal_text = """
    On January 5, 2024, a partnership agreement was formalized between Alice Johnson, residing at 321 Cedar Boulevard, Star City,
    and Tech Innovators Inc., with headquarters at 789 Birch Road, Star City. It is agreed that any legal matters will be adjudicated
    within the jurisdiction of the Star City Superior Court.
    """

# Process the text with spaCy
doc = nlp(legal_text)

# Create an empty dictionary named 'entity_counts' that will store the count of each entity type
entity_counts = {}

# Iterate through the entities detected in the doc object
for ent in doc.ents:
    # Check if the entity label (ent.label_) is already a key in the 'entity_counts' dictionary
    if ent.label_ not in entity_counts:
        # If the label is not yet in the dictionary, add it as a new key with a starting value of 1
        entity_counts[ent.label_] = 1
    else:
        # If the label is already in the dictionary, increase its value by 1
        entity_counts[ent.label_] += 1

# Loop over the items in 'entity_counts' and print out each entity type (label) and its corresponding count
for label, count in entity_counts.items():
    print(f"{label}: {count}")
```
### Explanation of the Code:

1. **Importing spaCy**: The code starts by importing the spaCy library and loading the English language model.

2. **Processing the Text**: The sample legal text is processed to extract entities.

3. **Creating the Dictionary**: An empty dictionary named `entity_counts` is created to store the counts of each entity type.

4. **Iterating Over Entities**: The code iterates through the detected entities in the document. For each entity:
   - It checks if the entity label is already a key in the `entity_counts` dictionary.
   - If not, it adds the label with a count of 1.
   - If it is already present, it increments the count.

5. **Printing the Results**: Finally, the code loops through the `entity_counts` dictionary and prints each entity type along with its count.

You can run this code in your Python environment to see the counts of different entities in the provided legal document.