# Lesson 2: Named Entity Recognition in User Reviews Using spaCy

Here’s the content converted to Markdown format:

# Topic Overview

Hello and welcome to our lesson on Named Entity Recognition (NER) using spaCy. In today's lesson, you'll learn how to use spaCy to perform NER on user reviews, enabling us to extract valuable business insights. By the end of this lesson, you will be able to set up spaCy, process text data for NER, and interpret the results to enhance business decision-making.

## Understanding Named Entity Recognition (NER)

Named Entity Recognition (NER) is a sub-task of information extraction that locates and classifies named entities in text into predefined categories like person names, organizations, locations, and dates. This process enables businesses to extract structured information from unstructured data, which can facilitate analytics and decision-making, particularly in the analysis of user reviews.

In the context of user reviews, common NER labels may be leveraged as follows:

- **PERSON**: This label can help identify influential figures or frequently mentioned staff members, allowing businesses to potentially highlight exceptional service in their marketing strategies.
- **ORG**: It may be used to monitor and respond to mentions of the brand or competitors, thereby enhancing competitive strategy.
- **GPE**: Recognizing where customers commonly use products or services can aid in geographical marketing efforts and logistical planning.
- **DATE**: Detecting trends over time could offer insights into seasonal variations or the impact of specific events on customer sentiment.

By applying these labels, businesses may transform user reviews into a valuable resource for informed decision-making.

## Loading Pretrained Models

spaCy provides different models for various languages and tasks. For NER, we'll use the `en_core_web_sm` model, a small English model that includes vocabulary, syntax, and entities.

To load the model, use:

```python
import spacy

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

## Creating Sample Data

Now that we have spaCy set up, let's process some text data. We'll start by creating sample user reviews and then use the spaCy pipeline to process these reviews.

Here's a list of sample user reviews we'll use:

# Sample user reviews
reviews = [
    "The new iPhone 13 is amazing! I bought it from the Apple store in New York.",
    "I recently dined at Noma in Copenhagen and the food was out of this world.",
    "Had a great stay at the Marriott Hotel in San Francisco. The staff was very friendly.",
    "Ordered a Samsung Galaxy S21 from Amazon and it got delivered in just two days!"
]

## Processing Each Review
```

For each review, we pass the text through spaCy's NLP pipeline, which creates a doc object that holds linguistic annotations.

```python
# Process each review
docs = [nlp(review) for review in reviews]

## Extracting and Interpreting Named Entities

Next, we will extract and interpret the named entities from our processed text.

We can access named entities by iterating over the `doc.ents` object and print the extracted named entities along with their corresponding labels. This helps us understand which parts of the text correspond to named entities and their types.

# Extract entities and labels
for doc in docs:
    print(f"Review: {doc.text}")
    print("Entities and their labels:")
    for ent in doc.ents:
        print(f"{ent.text} - {ent.label_}")
    print("\n")
```python

The output of the above code will be:

```sh
Review: The new iPhone 13 is amazing! I bought it from the Apple store in New York.
Entities and their labels:
13 - CARDINAL
Apple - ORG
New York - GPE

Review: I recently dined at Noma in Copenhagen and the food was out of this world.
Entities and their labels:
Noma - GPE
Copenhagen - ORG

Review: Had a great stay at the Marriott Hotel in San Francisco. The staff was very friendly.
Entities and their labels:
the Marriott Hotel - ORG
San Francisco - GPE

Review: Ordered a Samsung Galaxy S21 from Amazon and it got delivered in just two days!
Entities and their labels:
Amazon - ORG
just two days - DATE
```

This output demonstrates how spaCy can identify and label different named entities in user reviews. Entities like product names, organizations, and geographical locations are correctly recognized and classified.

## Use Case Application

By examining named entities in user reviews, businesses can:

- **Understand Customer Experiences**: Recognize key locations and service points, enabling better service mapping.
- **Track Organizational Mentions**: Monitor mentions of the business or competitors.
- **Identify Trends and Patterns**: Detect common locations, dates, and organizations to gain insights into customer behavior and market trends.

For example, in the following review:

"The new iPhone 13 is amazing! I bought it from the Apple store in New York."

We can identify "Apple store" as an organization and "New York" as a location. Although product names like "iPhone 13" are not identified by spaCy's pretrained model, businesses can still gain insights into where the product is being purchased and which company is involved. To recognize product names, additional custom training of the model may be necessary.

## Lesson Summary

In this lesson, you learned how to use spaCy for Named Entity Recognition (NER) to extract meaningful business insights from user reviews. We covered:

- **Understanding NER**: What NER is and its real-life applications.
- **Setting Up spaCy**: How to import spaCy and load a pretrained model.
- **Processing Text Data**: How to process user reviews using spaCy.
- **Extracting Named Entities**: How to extract and interpret named entities.
- **Use Case Applications**: How to derive business insights from the extracted entities.

To solidify your understanding, process a new set of user reviews and extract relevant entities. Analyze how these entities provide insights into customer feedback and business strategies. This hands-on practice will improve your problem-solving abilities and familiarity with spaCy for NER tasks.

Keep coding and exploring the power of NLP to drive business decisions!

## Filter Reviews Discussing Prices

Well done, Space Explorer! Now, it's time to find reviews that talk about prices. You need to process each review and check for entities of type MONEY inside the for loop of the find_reviews_with_money function. This will help filter out reviews discussing the cost of Apple products. Coding time!

```python
import spacy

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# Sample reviews about Apple products
reviews = [
    "The iPhone 13 costs $799. It's a bit pricey but worth it.",
    "I got the MacBook Air for $999 during the sale. Great deal!",
    "Bought the Apple Watch for only $199. Loving it so far.",
    "The iPad Pro is expensive at $1099 but it's a fantastic device.",
    "Got the AirPods for $159. A bit expensive for earbuds.",
    "I purchased the iMac for $1299 last week."
]

# Process each review with spaCy to extract entities
docs = [nlp(review) for review in reviews]

def find_reviews_with_money(docs):
    # List to store reviews containing monetary values
    reviews_with_money = []

    for doc in docs:
        # TODO: Check if any entity in the review is of type MONEY
    
    # Print the filtered reviews
    for review in reviews_with_money:
        print(f"Review containing MONEY: {review}")

# Run the function
find_reviews_with_money(docs)



```


Thank you for your patience! Let's ensure we correctly check for entities of type `MONEY` in the `find_reviews_with_money` function. Here’s the updated code with the necessary logic added:

```python
import spacy

# Load the spaCy model
nlp = spacy.load("en_core_web_sm")

# Sample reviews about Apple products
reviews = [
    "The iPhone 13 costs $799. It's a bit pricey but worth it.",
    "I got the MacBook Air for $999 during the sale. Great deal!",
    "Bought the Apple Watch for only $199. Loving it so far.",
    "The iPad Pro is expensive at $1099 but it's a fantastic device.",
    "Got the AirPods for $159. A bit expensive for earbuds.",
    "I purchased the iMac for $1299 last week."
]

# Process each review with spaCy to extract entities
docs = [nlp(review) for review in reviews]

def find_reviews_with_money(docs):
    # List to store reviews containing monetary values
    reviews_with_money = []

    for doc in docs:
        # Check if any entity in the review is of type MONEY
        for ent in doc.ents:
            if ent.label_ == "MONEY":
                reviews_with_money.append(doc.text)  # Add the review to the list if it contains MONEY
                break  # Exit the loop once we find a MONEY entity

    # Print the filtered reviews
    for review in reviews_with_money:
        print(f"Review containing MONEY: {review}")

# Run the function
find_reviews_with_money(docs)
```

### Key Changes Made:

1. **Entity Check Logic**: The code now iterates through each entity in `doc.ents` and checks if the entity's label is `MONEY`.

2. **Appending Reviews**: If a `MONEY` entity is found, the review text is appended to the `reviews_with_money` list, and the loop breaks to avoid unnecessary checks.

Now, when you run the function, it will correctly filter and print the reviews that mention prices for Apple products. If you have any further questions or need additional modifications, feel free to ask!

## Sorting Reviews by Dates

## Identifying Locations and Averaging Scores